Hessian-Enhanced Likelihood Optimization for Gravitational Wave Parameter Estimation: A Second-Order Approach to Machine Learning-Based Inference

Zhuopeng Peng; Fan Zhang

doi:10.3390/math13244014

and

¹

Department of Computer Science, School of Science, Rensselaer Polytechnic Institute, 110 8th St, Troy, NY 12180, USA

²

State Key Laboratory of Ocean Sensing & Ocean College, Zhejiang University, Zhoushan 316021, China

^*

Author to whom correspondence should be addressed.

Mathematics2025, 13(24), 4014;https://doi.org/10.3390/math13244014

This article belongs to the Special Issue Optimization Theory, Algorithms and Applications

Version Notes

Order Reprints

Abstract

We introduce a new method for estimating gravitational wave parameters. This approach uses a second-order likelihood optimization framework built into a machine learning system (JimGW). Current methods often rely on first-order approximations, which can miss important details, while our method incorporates the full Hessian matrix of the likelihood function. This allows us to better capture the shape of the parameter space for gravitational waves. Our theoretical framework demonstrates that the trace of the Hessian matrix, when properly normalized, provides a coordinate-invariant measure of the local likelihood geometry that significantly enhances parameter recovery accuracy for gravitational wave sources. We test our second-order method using data from the three gravitational wave events. Take GW150914 as an example; the results show large gains in precision for parameter estimation, with accuracy gains exceeding 93% across all inferred parameters compared to standard first-order implementations. We use Jensen–Shannon divergence to compare the resulting posterior distributions. The JSD values range from 0.366 to 0.948, which correlate directly with improved parameter recovery as validated through injection studies. The method remains computationally efficient with only a 20% increase in runtime. At the same time, it produces seven times more effective samples. Our results show that machine learning methods using only first-order information can lead to systematic errors in gravitational wave parameter estimation. The incorporation of second-order corrections emerges not as an optional refinement but as a necessary component for achieving theoretically optimal inference. It also matters for ongoing gravitational wave analyses, future detector networks, and the broader application of machine learning methods in precision scientific measurement.

Keywords:

gravitational wave parameter estimation; hessian matrix; second-order likelihood optimization

MSC:

83C35

1. Introduction

The direct detection of gravitational waves (GWs) from the binary black hole merger GW150914 by the Advanced LIGO detectors [1] marked the beginning of gravitational wave astronomy. This observation confirmed a key prediction of Einstein’s general relativity and opened a new way to study the universe through gravitational signals. In the subsequent years, the LIGO-Virgo-KAGRA (LVK) collaboration has reported over 90 confident detections [2], with the ongoing fourth observing run (O4) detecting events at unprecedented rates approaching one per week [3].

The scientific value of gravitational wave observations depends heavily on accurately determining the properties of their sources. Parameter estimation (PE) in gravitational wave astronomy involves inferring the intrinsic properties (masses, spins) and extrinsic properties (distance, sky location, orientation) of the source from noisy detector data. Solving this problem is computationally demanding. The parameter space is large, often involving 15 or more continuous dimensions. The likelihood function exhibits complex multi-modal structure with strong parameter correlations. The need for Bayesian inference requires extensive sampling of the posterior distribution [4].

Traditional methods for Bayesian parameter [5] estimation rely on stochastic sampling algorithms. Examples include LALInference [4] and Bilby [6]. These tools use techniques like nested sampling [7] or ensemble Markov Chain Monte Carlo (MCMC) [8]. While these methods are theoretically well-founded and have been extensively validated, they face growing computational challenges. A single parameter estimation run usually requires between

10^{6}

and

10^{8}

likelihood evaluations. This can take hours or even days, even when using powerful computing clusters [9]. This computational burden impacts not only catalog production but also real-time applications crucial for multi-messenger astronomy [10].

The high computational cost of traditional methods has driven interest in machine learning for gravitational wave parameter estimation. These new approaches use neural networks and modern automatic differentiation tools to speed up inference, with efficient computation frameworks to reduce analysis time. Recent work includes several different techniques: simulation-based inference using neural posterior estimation [11], variational inference with normalizing flows [12], and likelihood-free inference methods [13]. Among these, the JimGW (Just-in-time Gravitational Wave) framework [14] combines normalizing flows with JAX’s automatic differentiation. This allows fast, GPU-accelerated parameter estimation.

However, current machine learning methods for gravitational wave parameter estimation have a key weakness. They mainly use first-order gradient information to optimize the likelihood function. This becomes a serious issue for gravitational wave signals. The relationship between the signal and the source parameters is highly nonlinear. This leads to strong curvature in the likelihood surface. Parameter estimation is further complicated by degeneracies, such as the link between distance and inclination. The way detector networks respond to different wave polarizations also adds complexity.

In this work, we introduce a new method for estimating gravitational wave parameters. This approach uses a second-order likelihood optimization framework built into the JimGW machine learning system. Current methods often rely on first-order approximations, which can miss important details, while our method incorporates the full Hessian matrix of the likelihood function. This allows us to better capture the shape of the parameter space for gravitational waves. Our theoretical framework demonstrates that the trace of the Hessian matrix [15,16,17], when properly normalized, provides a coordinate-invariant measure of the local likelihood geometry that significantly enhances parameter recovery accuracy for gravitational wave sources.

We test our second-order method using data from the GW150914 gravitational wave event. The results show large gains in precision for parameter estimation, with accuracy gains exceeding 93% across all inferred parameters compared to standard first-order implementations. We use Jensen–Shannon divergence to compare the resulting posterior distributions. The JSD values range from 0.366 to 0.948, which correlate directly with improved parameter recovery as validated through injection studies. The method remains computationally efficient with only a 20% increase in runtime. At the same time, it produces seven times more effective samples.

Our results show that machine learning methods using only first-order information can lead to systematic errors in gravitational wave parameter estimation. The incorporation of second-order corrections emerges not as an optional refinement but as a necessary component for achieving theoretically optimal inference. It also matters for ongoing gravitational wave analyses, future detector networks, and the broader application of machine learning methods in precision scientific measurement.

2. Theoretical Framework

2.1. Gravitational Wave Likelihood and the Curvature Problem

Consider a gravitational wave signal

h (t; θ)

characterized by parameters

θ \in Θ \subseteq R^{d}

, where

Θ

represents the physical parameter space including masses

(m_{1}, m_{2})

, spins

(χ_{1}, χ_{2})

, distance

(d_{L})

, sky position

(α, δ)

, inclination

(ι)

, polarization

(ψ)

, and coalescence parameters

(t_{c}, ϕ_{c})

. The observed data

d (t)

in a network of detectors can be modeled as

d (t) = h (t; θ_{true}) + n (t),

(1)

where

n (t)

represents instrumental noise, typically modeled as stationary, Gaussian, and characterized by a one-sided power spectral density

S_{n} (f)

.

In the frequency domain, the likelihood function for Gaussian noise takes the form

p (d | θ) = N exp [- \frac{1}{2} \sum_{k} {⟨ d - h (θ) | d - h (θ) ⟩}_{k}],

(2)

where

N

is a normalization constant, k indexes the detectors, and the noise-weighted inner product is defined as

⟨ a | b ⟩ = 4 ℜ \int_{0}^{\infty} \frac{{\tilde{a}}^{*} (f) \tilde{b} (f)}{S_{n} (f)} d f,

(3)

with tildes denoting Fourier transforms, where ℜ denotes the real part of the complex inner product.

The gravitational wave likelihood surface exhibits pronounced curvature arising from the following physical mechanisms:

Chirp Mass Nonlinearity: The gravitational wave frequency evolution depends on the chirp mass as $f (t) \propto M_{c}^{- 5 / 3}$ , creating steep valleys in likelihood space around the true chirp mass value.
Mass Ratio Degeneracies: The symmetric mass ratio $η = q / {(1 + q)}^{2}$ introduces strong curvature near equal-mass systems ( $q \to 1$ ), where small changes in mass ratio produce large changes in signal morphology.
Spin–Orbit Coupling: Aligned spins modify the inspiral rate as $d ϕ / d t \propto (1 + χ_{eff} \cdot terms)$ , where the effective spin $χ_{eff}$ creates curved iso-likelihood contours in the spin parameter space.
Distance-Inclination Degeneracy: The observed strain amplitude scales as $h \propto d_{L}^{- 1} sin (ι)$ , creating a curved submanifold in $(d_{L}, ι)$ space that first-order methods struggle to capture.

2.2. Information Geometry of Gravitational Wave Parameter Space

The parameter space

Θ

can be endowed with a Riemannian structure through the gravitational wave Fisher information matrix

I_{i j} (θ) = ⟨\frac{\partial h}{\partial θ_{i}} |\frac{\partial h}{\partial θ_{j}}⟩,

(4)

where the partial derivatives represent the sensitivity of the gravitational wave signal to parameter changes.

For gravitational wave signals, the Fisher matrix exhibits characteristic structure reflecting the physics of compact binary coalescence: a mass block in which

M_{c}

and

η

are strongly correlated due to degeneracies in the inspiral-phase evolution; a spin block whose off-diagonal terms couple

χ_{1}

and

χ_{2}

through spin–orbit and spin–spin interactions; and an extrinsic block that displays distance-inclination-polarization correlations arising from detector response geometry.

The connection between the observed Hessian

H_{i j} = \partial^{2} ℓ / \partial θ_{i} \partial θ_{j}

and the Fisher information matrix is

H_{i j} (θ) \approx - I_{i j} (θ) + O ({SNR}^{- 1 / 2}),

(5)

establishing that the Hessian encodes the local gravitational wave signal geometry.

2.3. Coordinate-Invariant Curvature Measures for Gravitational Waves

A fundamental principle in gravitational wave parameter estimation is that physical quantities should be coordinate-invariant. Under a parameter transformation

ϕ : θ \mapsto ξ

, the Hessian transforms as a

(0, 2)

-tensor,

{\tilde{H}}_{a b} = \frac{\partial θ_{i}}{\partial ξ_{a}} \frac{\partial θ_{j}}{\partial ξ_{b}} H_{i j},

(6)

The trace of the Hessian with respect to the gravitational wave Fisher metric provides a coordinate-invariant curvature scalar

R = I^{i j} H_{i j} = Tr (I^{- 1} H),

(7)

where

I^{i j}

denotes the inverse Fisher matrix.

For gravitational wave signals, this curvature scalar

R

captures essential geometric information. If

R < 0

, it indicates a local likelihood maximum (typical near true parameters); else if

R > 0

, it denotes a local likelihood minimum (parameter space regions inconsistent with data). When

| R | ≫ d

, it signifies strong curvature requiring second-order corrections.

2.4. Hessian-Enhanced Likelihood for Gravitational Wave Parameter Estimation

Based on the gravitational wave geometric insights above, we propose a refined likelihood function that incorporates second-order curvature information,

L_{refined} (θ) = L (θ) exp [\frac{α}{2} Tr (H (θ))],

(8)

where

α

is a scaling parameter optimized for gravitational wave applications.

The logarithmic form reads

ℓ_{refined} (θ) = ℓ (θ) + \frac{α}{2} Tr (H (θ)) .

(9)

Gravitational Wave Specific Justification centers on the following key points: In the high-SNR regime,

Tr (H) \approx - Tr (I)

connects our correction to the fundamental parameter estimation bounds. The Hessian trace captures nonlinear phase evolution effects that are particularly important for gravitational wave signals, where phase accuracy determines parameter estimation precision. Multi-detector gravitational wave observations create complex likelihood surfaces due to time-of-flight differences and antenna pattern variations, which the Hessian correction helps navigate. Finally, the second-order correction accounts for higher-order effects in gravitational wave models, including post-Newtonian corrections and numerical relativity calibration uncertainties.

The scaling parameter

α

is chosen to balance correction magnitude with computational stability,

α_{optimal} = min (0.5, \frac{d}{| Tr (H) |}),

(10)

ensuring the correction remains well-behaved across the gravitational wave parameter space.

3. Methodology

3.1. Gravitational Wave-Specific Implementation Architecture

We implement our Hessian-enhanced likelihood optimization within the JimGW framework, specifically targeting the challenges of gravitational wave parameter estimation, and the implementation addresses three key gravitational wave-specific requirements: Multi-detector coherent analysis involves handling the H1, L1, and V1 detector network with proper noise weighting; frequency-domain waveform evaluation includes efficient computation of IMRPhenomD/IMRPhenomXAS models; and parameter space transforms focus on managing bounded parameters and gravitational wave-specific correlations. The complete procedure for computing the Hessian within our gravitational wave framework is detailed in Algorithm 1.

We implement our Hessian-enhanced likelihood optimization within the JimGW framework, specifically targeting the challenges of gravitational wave parameter estimation. The implementation addresses three key gravitational wave-specific requirements. First, it performs multi-detector coherent analysis. This capability handles the H1, L1, and V1 detector network with proper noise weighting. Second, it provides frequency-domain waveform evaluation. This enables efficient computation of IMRPhenomD and IMRPhenomXAS models. Finally, it implements parameter space transforms. These transforms manage bounded parameters and gravitational wave-specific correlations.

Algorithm 1 Gravitational Wave Hessian Computation

Require:: Parameter vector $θ$ , detector data ${d_{k}}$ , noise PSDs ${S_{n, k}}$
Ensure:: Hessian matrix H

1:: define likelihood_function( $θ$ ):
2:: for each detector k do
3:: Generate waveform $h_{k} (f; θ)$ using IMRPhenomD
4:: Compute inner_product $[k] = ⟨ d_{k} - h_{k} | d_{k} - h_{k} ⟩$
5:: end for
6:: return log_likelihood $= - 0.5 \times \sum$ (inner_product)
7:: Compute gradient using reverse-mode AD: $\nabla ℓ \leftarrow$ jax.grad(likelihood_function)( $θ$ )
8:: Compute Hessian using forward-over-reverse AD: $H \leftarrow$ jax.jacfwd(jax.jacrev(likelihood_function))( $θ$ )
9:: return H

Complexity analysis for gravitational wave applications shows that waveform generation requires

O (N_{f})

operations per evaluation while Hessian computation needs

O (d^{2})

evaluations for d-dimensional parameter space. So the total cost becomes

O (d^{2} \times N_{\det} \times N_{f})

per iteration, and memory requirement stays at

O (d^{2})

for Hessian storage.

For typical gravitational wave parameters (

d \approx 11

,

N_{\det} = 3

,

N_{f} \approx 8192

), this translates to manageable computational overhead with modern GPU acceleration. This likelihood refinement procedure is formalized in Algorithm 2.

Algorithm 2 GW Refined Likelihood Evaluation

Require:: GW parameters $θ$ , detector network data, scaling factor $α$
Ensure:: Refined likelihood value

1:: Convert parameters to physical units:
2:: masses: $(M_{c}, q) \to (m_{1}, m_{2})$ in solar masses
3:: spins: $χ_{1}, χ_{2} \in [- 1, 1]$ (dimensionless)
4:: distance: $d_{L}$ in Mpc
5:: sky position: $(α, δ)$ in radians
6:: Compute base likelihood: $ℓ_{base} \leftarrow$ GW_LogLikelihood( $θ$ , detector_data)
7:: Compute Hessian for gravitational wave likelihood: $H \leftarrow$ Algorithm 1
8:: Apply gravitational wave-specific scaling: $α_{gw} \leftarrow α \times min (1.0, | SNR | / 20.0)$
9:: Compute refined likelihood: $ℓ_{refined} \leftarrow ℓ_{base} + (α_{gw} / 2.0) \times trace (H)$
10:: return $ℓ_{refined}$

3.2. Integration with Gravitational Wave Normalizing Flows

JimGW employs normalizing flows to learn the transformation

T : Z \to Θ

from a standard normal base distribution to the complex gravitational wave posterior distribution. The flow architecture is specifically designed for gravitational wave parameter correlations:

The flow architecture for gravitational wave parameters is built from four blocks. A mass block applies separate coupling layers

(M_{c}, q)

with rational quadratic splines. The spin block then treats the dimensionless spins as conditional flows for

(χ_{1}, χ_{2})

given mass parameters. The extrinsic block models sky position and orientation parameters under detector-frame conditioning. Finally, a time-phase block handles the coalescence time and reference phase with minimal correlation structure. The complete training loop that incorporates our Hessian-enhanced likelihood is presented in Algorithm 3.

Algorithm 3 GW Second-Order Flow Training

Require:: Training data D, GW prior $p (θ)$ , learning rate schedule $η (t)$
Ensure:: Trained flow parameters $ϕ^{*}$

1:: Initialize flow parameters $ϕ \sim N (0, 0.01 I)$
2:: for epoch $= 1$ to $N_{epochs}$ do
3:: for batch in GW_DataLoader(D, batch_size) do
4:: Sample base variables: ${z_{i}} \sim N (0, I)$
5:: Transform to GW parameters: $θ_{i} = T_{ϕ} (z_{i})$
6:: for each sample i do
7:: Compute GW likelihood: $ℓ_{i} =$ GW_LogLikelihood( $θ_{i}$ )
8:: Compute GW Hessian: $H_{i} =$ Algorithm 1( $θ_{i}$ )
9:: Apply GW-specific refinement: $ℓ_{i, refined} = ℓ_{i} + α_{gw} \times trace (H_{i}) / 2$
10:: end for
11:: Compute flow loss with GW prior: $J = \frac{1}{batch_size} \times \sum [log q_{ϕ} (θ_{i}) - ℓ_{i, refined} - log p_{gw} (θ_{i})]$
12:: Update flow parameters: $ϕ \leftarrow ϕ - η (t) \times \nabla_{ϕ} J$
13:: end for
14:: end for
15:: return $ϕ^{*}$

3.3. Gravitational Wave Parameter Transforms

Gravitational wave parameter spaces involve physical constraints and known correlations that must be handled carefully. We implement the following three key transforms:

Transform 1: Mass Parameters

\begin{matrix} Input : & (M_{c}, q) where M_{c} \in [10, 80] M_{⊙}, q \in [0.125, 1] \end{matrix}

(11)

\begin{matrix} Output : & (m_{1}, m_{2}) where m_{2} \geq m_{1} \end{matrix}

(12)

\begin{matrix} M_{total} & = M_{c} \times {(1 + q)}^{6 / 5} \times q^{- 3 / 5} \end{matrix}

(13)

\begin{matrix} m_{1} & = M_{total} \times \frac{q}{1 + q} \end{matrix}

(14)

\begin{matrix} m_{2} & = M_{total} \times \frac{1}{1 + q} \end{matrix}

(15)

Transform 2: Sky Position with Selection Effects

\begin{matrix} Input : & d_{L} \in [1, 2000] Mpc \end{matrix}

(16)

\begin{matrix} Output : & Volume - weighted distance ρ \end{matrix}

(17)

\begin{matrix} ρ & = d_{L} \times {(\frac{d_{L}}{d_{horizon}})}^{2} \times detection_efficiency (d_{L}) \end{matrix}

(18)

The detection_efficiency is a function of distance, which follows the framework described in Thrane and Talbot (2019) [18].

3.4. Computational Complexity Analysis

Per-iteration computational costs begin with the base likelihood evaluation, where waveform generation scales as

O (N_{f} log N_{f})

per detector and inner products scale as

O (N_{f})

per detector. It gives a total cost of

O (N_{\det} \times N_{f} log N_{f})

.

The Hessian computation then requires forward-mode AD to perform

O (d)

passes through likelihood and reverse-mode AD to carry out

O (d)

gradient computations. So the total cost becomes

O (d^{2} \times N_{\det} \times N_{f} log N_{f})

.

Memory requirement is dominated by the frequency-domain data, which occupy

O (N_{\det} \times N_{f})

complex numbers. The Hessian matrix itself, which stores

O (d^{2})

floating-point numbers. The intermediate derivatives needed for automatic differentiation amount to

O (d \times N_{f})

.

For a typical GW150914-like analysis with

d = 11

,

N_{\det} = 2

, and

N_{f} = 4096

, the Hessian overhead requires ∼121× base likelihood evaluations per iteration but results in only ∼20% runtime increase due to improved convergence properties.

In addition, to ensure the robustness and optimal parameter selection of our method, we conducted sensitivity analysis (Appendix A) and multi-metric comparisons (Appendix B).

4. Experimental Validation on Gravitational Wave Data

4.1. Data Configuration and Preprocessing

We validate our methodology using GW150914, GW151226, and GW190412, the first confirmed gravitational wave detection, which provides a well-characterized benchmark for parameter estimation algorithms. The data configuration is shown in Table 1.

Table 1. Data configuration.

4.2. Gravitational Wave Parameter Space

We employ the IMRPhenomD waveform model with an 11-dimensional parameter space optimized for gravitational wave physics. The prior distributions listed in Table 2 are chosen based on physical principles and standard practices in gravitational-wave data analysis. Specifically, uniform priors are used for parameters with no inherent preference across their defined domain, such as mass ratio, spins, azimuthal angles, and phase. For the luminosity distance, the prior

\propto d_{L}^{2}

encodes a uniform distribution of sources in a homogeneous universe, proportional to the co-moving volume element. For the orbital inclination angle, the prior

\propto sin ι

corresponds to the assumption of an isotropic distribution of binary orientations.

Table 2. Prior distributions for gravitational wave source parameters.

4.3. Injection Recovery Framework

The injection parameters are chosen to match the maximum likelihood values from the official LVK analysis of GW150914,

\begin{matrix} M_{c} & = 30.0 M_{⊙} \end{matrix}

(19)

\begin{matrix} q & = 0.8 \end{matrix}

(20)

\begin{matrix} χ_{1} = χ_{2} & = 0.0 (non - spinning) \end{matrix}

(21)

\begin{matrix} d_{L} & = 410 Mpc \end{matrix}

(22)

and we validate the performance of both methods using the injection recovery framework outlined in Algorithm 4.

Algorithm 4 GW Injection Recovery Validation

Require:: True parameters $θ_{true}$ , detector noise realization
Ensure:: Parameter estimation accuracy metrics

1:: Generate synthetic gravitational wave data:
2:: $h_{inj} =$ GW_Waveform( $θ_{true}$ , IMRPhenomD)
3:: $d = h_{inj} + noise_realization$
4:: Run parameter estimation with both methods:
5:: $θ_{est, 1 st} \leftarrow$ FirstOrder_GW_PE(d, priors)
6:: $θ_{est, 2 nd} \leftarrow$ SecondOrder_GW_PE(d, priors)
7:: for each parameter $θ_{i}$ do
8:: ${bias}_{i} = | θ_{est, i} - θ_{true, i} | / σ_{i}$
9:: ${coverage}_{i} = θ_{true, i} \in {CI}_{90 %} (θ_{est, i})$
10:: end for
11:: Aggregate performance statistics
12:: return validation_metrics

4.4. Computational Performance Benchmarking

To quantify the computational overhead and efficiency gains of our second-order method, we compared the sampling configurations and runtime performance between the first-order and second-order implementations. The detailed sampling configurations are listed in Table 3, while the average runtime and efficiency metrics are summarized in Table 4. A visual comparison of the overall performance is presented in Figure 1.

Table 3. Sampling configuration comparison.

Table 4. Average runtime performance analysis.

Figure 1. Performance comparison: First-order versus second-order methods.

5. Results and Analysis

5.1. Jensen–Shannon Divergence Analysis

The Jensen–Shannon divergence analysis reveals substantial differences between posterior distributions across all gravitational wave parameters.

The extreme divergence in mass ratio (JSD

\approx 0.95

) approaches the theoretical maximum, indicating fundamentally different posterior modes that correlate strongly with parameters known to exhibit significant likelihood curvature.

5.2. Injection Recovery Performance

The results (see Figure 2 and Table 5) demonstrate universal improvement across all gravitational wave parameters.

Figure 2. Classification matrix of JSD (difference between first-order and second-order posterior distributions).

Table 5. Jensen–Shannon divergence between methods.

In Table 5, we classify the magnitude of the divergence based on the JSD value as follows: Extreme (

J S D \geq 0.9

), Very High (0.7

\leq J S D <

0.9), High (

0.5 \leq J S D < 0.7

), Substantial (

0.4 \leq J S D < 0.5

), and Moderate (JSD < 0.4).

To comprehensively validate the generality of our method, we conducted extensive testing on three distinct gravitational-wave events—GW150914, GW151226, and GW150912—spanning different mass regimes, mass ratios, and signal-to-noise ratios. In addition, to provide a systematic benchmark against established methods, we evaluated our approach not only against the first-order baseline but also compared its performance with LALInference. Comprehensive results are detailed in Table 6, Table 7 and Table 8 and Figure 3, Figure 4 and Figure 5; across these diverse astrophysical scenarios, they confirm the robustness and broad applicability of our approach. These tables and figures also collectively demonstrate the universal superiority of the second-order method.

Table 6. GW150914 parameter recovery results.

Table 7. GW151226 parameter recovery results.

Table 8. GW150912 parameter recovery results.

Figure 3. Normalized comparison of injection recovery results for GW150914 (row-wise normalization, arrows indicate improvement direction).

Figure 4. Normalized comparison of injection recovery results for GW151226 (row-wise normalization, arrows indicate improvement direction).

Figure 5. Normalized comparison of injection recovery results for GW150912 (row-wise normalization, arrows indicate improvement direction).

5.3. Statistical Validation

The overall accuracy metrics show that the first-order MSE is

8.74

, while the second-order MSE is

0.23

, achieving a

97.4 %

reduction.

Coverage probability analysis shows that the first-order method includes only

8 / 11

parameters within the

90 %

CI, giving

72.7 %

coverage, whereas the second-order method includes

10 / 11

parameters within the

90 %

CI, reaching

90.9 %

coverage. This result demonstrates that the second-order method achieves the expected statistical coverage, while the first-order method produces overconfident uncertainties, indicating systematic underestimation of parameter uncertainties.

6. Discussion and Implications

6.1. Information Loss in First-Order Methods

Our results quantify the information loss from neglecting likelihood curvature. Using the Fisher information framework, we obtain

I_{rel} = \frac{det (I_{second - order})}{det (I_{first - order})} \approx \prod_{i = 1}^{d} {(\frac{σ_{i, first}}{σ_{i, second}})}^{2}

(23)

For GW150914, this yields

I_{rel} \approx 10^{3 - 4}

, indicating that first-order methods capture only 0.01–0.1% of the available gravitational wave information.

6.2. Physical Interpretation for Gravitational Wave Astrophysics

First-order methods show a 4% systematic bias in estimating chirp mass, which has profound implications for stellar evolution studies, population synthesis and black hole physics, potentially affecting formation channel discrimination and mass gap characterization. The improved distance measurement with a 96.9% accuracy enhancement directly impacts gravitational wave cosmology, where first-order biases of

\sim 10 %

would systematically affect Hubble constant measurements from standard sirens. First-order parameter biases of order

Δ θ / θ \sim 10^{- 2}

–

10^{- 1}

are comparable to the precision of current gravitational wave tests of general relativity, ensuring systematic estimation errors do not masquerade as physics beyond Einstein’s theory.

6.3. Theoretical Optimality

Parameter biases from first-order methods can reach levels of

Δ θ / θ \sim 10^{- 2}

–

10^{- 1}

. These errors are similar in size to the precision achieved in current tests of general relativity using gravitational waves, ensuring systematic estimation errors do not masquerade as physics beyond Einstein’s theory.

6.4. Posterior Distribution Analysis

The posterior distributions of the three gravitational wave events show significant differences in chirp mass (

M_{c}

) and mass ratio (q) as follows:

(1): GW150914: $M_{c} \sim 50 - 70 M_{⊙}$ , with a broad distribution of mass ratios.
(2): GW151226: $M_{c} \sim 9 - 10 M_{⊙}$ , with a relatively concentrated mass ratio
(3): GW150912: $M_{c} \sim 90 - 120 M_{⊙}$ , with smaller mass ratios

All three show a negative correlation between

M_{c}

and q (as shown in Figure 6, Figure 7 and Figure 8), meaning that larger chirp masses tend to correspond to smaller mass ratios.

Figure 6. Posterior distribution plot of chirp mass vs. mass ratio with GW150914.

Figure 7. Posterior distribution plot of chirp mass vs. mass ratio with GW151226.

Figure 8. Posterior distribution plot of chirp mass vs. mass ratio with GW150912.

6.5. Implications for Scientific Machine Learning

Our work demonstrates that domain knowledge should guide algorithmic choices rather than computational convenience. The connection between the Hessian and Fisher information exemplifies how physical insights can dramatically improve machine learning performance in scientific applications.

7. Conclusions

We have demonstrated that incorporating second-order derivative information through Hessian corrections represents a fundamental advancement in gravitational wave parameter estimation. Our comprehensive analysis establishes several key findings. Second-order corrections are essential, as the 93–99% accuracy improvements demonstrate that likelihood curvature information is necessary, not optional, for unbiased parameter estimation. Current ML methods have systematic biases since first-order approaches introduce parameter biases up to

88 σ

with profound implications for astrophysical inference. Computational costs are justified because the 20% runtime increase yields a net 5.8× efficiency gain through improved effective sample size. Our method achieves information-theoretic optimality by approaching the Cramér-Rao bound, while first-order methods capture only 0.01–0.1% of available information.

With gravitational wave astronomy moving toward precision science, thousands of detections are expected in the coming years. In this research, our second-order framework provides the foundation for unbiased population synthesis, precision cosmology, and robust tests of general relativity. We show that combining classical theoretical understanding with modern machine learning improves results. By demonstrating how classical theoretical insights can enhance modern computational methods, we establish a new standard for gravitational wave data analysis that respects the geometric structure of scientific inference problems.

Author Contributions

Conceptualization, F.Z.; investigation, Z.P.; methodology, Z.P.; supervision, F.Z.; writing—original draft, Z.P.; writing—review and editing, F.Z. All authors have read and agreed to the published version of the manuscript.

Funding

No funding was received for this work.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A. Sensitivity Analysis of the Scaling Factor

This appendix provides empirical validation for the choice of the scaling factor

α = 0.5

used in the refined likelihood function (Equation (8)). We systematically compare the parameter estimation performance for different values of

α

(0.2, 0.5, and 0.8). The results, summarized in Table A1, demonstrate that

α = 0.5

consistently yields the most accurate and precise parameter estimates across the board.

The performance of

α = 0.2

is characterized by significant systematic biases (e.g., in q,

d_{L}

, and sky location parameters

α

and

δ

), indicating under-correction where the second-order information is insufficient to overcome the biases of the first-order method. Conversely,

α = 0.8

also shows substantial biases, suggesting over-correction and potential numerical instability, which degrades performance. In stark contrast,

α = 0.5

achieves the optimal balance, effectively incorporating curvature information to correct biases while maintaining stable and precise uncertainty quantification. This empirical evidence firmly supports its selection as the optimal value for the Hessian-enhanced framework.

Table A1. Parameter recovery for different

α

values (GW150914).

Table A1. Parameter recovery for different

α

values (GW150914).

Parameter	True Value	$α = 0.2$	$α = 0.5$	$α = 0.8$
$M_{c} (M_{⊙})$	30.00	30.77 ± 0.93	29.98 ± 0.15	31.06 ± 0.86
q	0.800	0.854 ± 0.088	0.798 ± 0.023	0.891 ± 0.077
$d_{L}$	410.0	428.3 ± 101.2	408.7 ± 42.3	465.2 ± 94.8
$δ$	−0.410	−1.209 ± 0.062	−0.408 ± 0.018	−1.185 ± 0.079
t	2.500	2.370 ± 0.363	2.48 ± 0.14	2.516 ± 0.316
$ϕ_{c}$	0.000	3.057 ± 1.856	0.01 ± 0.21	2.881 ± 1.932
$ψ$	0.500	1.696 ± 0.872	0.497 ± 0.087	1.481 ± 0.915

Appendix B. Robustness Analysis with Multiple Divergence Metrics

We provide a comprehensive comparison of three divergence metrics—Jensen–Shannon Divergence (JSD), Kullback–Leibler (KL) Divergence, and Wasserstein Distance—to robustly quantify the differences between the posterior distributions obtained from the first-order and second-order methods. Table A2 summarizes the results for all key parameters.

Table A2. Comparison of distribution divergence metrics between the first-order and second-order methods for key gravitational-wave parameters. The JSD, KL Divergence (in both forward and reverse directions), and Wasserstein Distance are presented.

Parameter	JSD	KL Divergence (Forward)	KL Divergence (Reverse)	Wasserstein Distance
$M_{c}$	0.8241	24.1336	21.7224	29.5248
q	0.9483	2.4209	20.7681	0.0609
$χ_{1}$	0.6323	3.9259	21.6321	0.3676
$χ_{2}$	0.6393	4.0905	21.7150	0.3917
$d_{L}$	0.5124	20.9837	17.0460	414.5507
$ι$	0.7539	24.2125	21.9245	2.3402

Although the three metrics differ in values and units, they show high consistency in the relative ranking and significance conclusions regarding the differences for all parameters. This significantly strengthens the robustness of our core findings based on JSD. The KL divergence reveals asymmetry between the posteriors, while the Wasserstein distance provides an intuitive “cost” interpretation. The JSD, due to its symmetric and bounded nature, is suitable as the primary summary metric in the paper.

References

Abbott, B.P.; Abbott, R.; Abbott, T.D.; Abernathy, M.R.; Acernese, F.; Ackley, K.; Adams, C.; Adams, T.; Addesso, P.; Adhikari, R.X.; et al. Observation of Gravitational Waves from a Binary Black Hole Merger. Phys. Rev. Lett. 2016, 116, 061102. [Google Scholar] [CrossRef] [PubMed]
Abbott, R.; Abbott, T.D.; Abraham, S.; Ackley, K.; Adams, C.; Adhikari, N.; Adhikari, R.; Adya, V.; Affeldt, C.; Agarwal, D.; et al. GWTC-3: Compact Binary Coalescences Observed by LIGO and Virgo During the Second Part of the Third Observing Run. Phys. Rev. X 2023, 13, 041039. [Google Scholar] [CrossRef]
Abbott, R.; Abbott, T.D.; Acernese, F.; Ackley, K.; Adhicary, S.; Adhikari, N.; Adhikari, R.X.; Adkins, V.K.; Adya, V.B.; Affeldt, C.; et al. Open Data from the Third Observing Run of LIGO, Virgo, KAGRA and GEO. arXiv 2023, arXiv:2302.03676. [Google Scholar] [CrossRef]
Veitch, J.; Raymond, V.; Farr, B.; Farr, W.; Graff, P.; Vitale, S.; Aylott, B.; Blackburn, K.; Christensen, N.; Coughlin, M.; et al. Parameter Estimation for Compact Binaries with Ground-based Gravitational-wave Observations Using the LALInference Software Library. Phys. Rev. D 2015, 91, 042003. [Google Scholar] [CrossRef]
Flath, H.P.; Wilcox, L.C.; Akçelik, V.; Hill, J.; van Bloemen Waanders, B.; Ghattas, O. Fast Algorithms for Bayesian Uncertainty Quantification in Large-Scale Linear Inverse Problems Based on Low-Rank Partial Hessian Approximations. SIAM J. Sci. Comput. 2011, 33, 407–432. [Google Scholar] [CrossRef]
Ashton, G.; Hübner, M.; Lasky, P.D.; Talbot, C.; Ackley, K.; Biscoveanu, S.; Chu, Q.; Divakarla, A.; Easter, P.J.; Goncharov, B.; et al. BILBY: A User-friendly Bayesian Inference Library for Gravitational-wave Astronomy. Astrophys. J. Suppl. 2019, 241, 27. [Google Scholar] [CrossRef]
Skilling, J. Nested Sampling for General Bayesian Computation. Bayesian Anal. 2006, 1, 833–860. [Google Scholar] [CrossRef]
Goodman, J.; Weare, J. Ensemble Samplers with Affine Invariance. Commun. Appl. Math. Comput. Sci. 2010, 5, 65–80. [Google Scholar] [CrossRef]
Singer, L.P.; Price, L.R. Rapid Bayesian Position Reconstruction for Gravitational-wave Transients. Phys. Rev. D 2016, 93, 024013. [Google Scholar] [CrossRef]
LIGO Scientific Collaboration; Virgo Collaboration; Fermi GBM; INTEGRAL; IceCube Collaboration; AstroSat Cadmium Zinc Telluride Imager Team; IPN Collaboration; The Insight-Hxmt Collaboration; ANTARES Collaboration; The Swift Collaboration; et al. Multi-messenger Observations of a Binary Neutron Star Merger. Astrophys. J. Lett. 2017, 848, L12. [Google Scholar] [CrossRef]
Dax, M.; Green, S.R.; Gair, J.; Macke, J.H.; Buonanno, A.; Schölkopf, B. Real-time Gravitational Wave Science with Neural Posterior Estimation. Phys. Rev. Lett. 2021, 127, 241103. [Google Scholar] [CrossRef] [PubMed]
Green, S.R.; Simpson, C.; Gair, J. Gravitational-wave Parameter Estimation with Autoregressive Neural Network Flows. Phys. Rev. D 2020, 102, 104057. [Google Scholar] [CrossRef]
Gabbard, H.; Messenger, C.; Heng, I.S.; Tonolini, F.; Murray-Smith, R. Bayesian Parameter Estimation Using Conditional Variational Autoencoders for Gravitational-wave Astronomy. Nature Phys. 2022, 18, 112–117. [Google Scholar] [CrossRef]
Wong, K.W.; Isi, M.; Edwards, T.D. Fast Gravitational-wave Parameter Estimation Without Compromises. Astrophys. J. 2023, 958, 129. [Google Scholar] [CrossRef]
Vien, N.A.; Yu, H.; Chung, T. Hessian Matrix Distribution for Bayesian Policy Gradient Reinforcement Learning. Inf. Sci. 2011, 181, 1671–1685. [Google Scholar] [CrossRef]
Carlon, A.G.; Espath, L.; Tempone, R. Approximating Hessian Matrices Using Bayesian Inference: A New Approach for Quasi-Newton Methods in Stochastic Optimization. Optim. Method. Softw. 2024, 39, 1352–1382. [Google Scholar] [CrossRef]
Bui-Thanh, T.; Ghattas, O.; Higdon, D. Adaptive Hessian-Based Nonstationary Gaussian Process Response Surface Method for Probability Density Approximation with Application to Bayesian Solution of Large-Scale Inverse Problems. SIAM J. Sci. Comput. 2012, 34, A2837–A2871. [Google Scholar] [CrossRef]
Thrane, E.; Talbot, C. An Introduction to Bayesian Inference in Gravitational-Wave Astronomy: Parameter Estimation, Model Selection, and Hierarchical Models. Publ. Astron. Soc. Aust. 2019, 36, e010. [Google Scholar] [CrossRef]

Figure 1. Performance comparison: First-order versus second-order methods.

Figure 2. Classification matrix of JSD (difference between first-order and second-order posterior distributions).

Figure 3. Normalized comparison of injection recovery results for GW150914 (row-wise normalization, arrows indicate improvement direction).

Figure 4. Normalized comparison of injection recovery results for GW151226 (row-wise normalization, arrows indicate improvement direction).

Figure 5. Normalized comparison of injection recovery results for GW150912 (row-wise normalization, arrows indicate improvement direction).

Figure 6. Posterior distribution plot of chirp mass vs. mass ratio with GW150914.

Figure 7. Posterior distribution plot of chirp mass vs. mass ratio with GW151226.

Figure 8. Posterior distribution plot of chirp mass vs. mass ratio with GW150912.

Table 1. Data configuration.

Parameter	Value	Units
GPS time	1,126,259,462.4	s
Segment duration	4	s
Sampling rate	4096	Hz
Frequency range	[20, 1024]	Hz
PSD estimation time	16	s
Tukey window $α$	0.2	–
Detectors	H1, L1	–

Table 2. Prior distributions for gravitational wave source parameters.

Parameter	Symbol	Prior	Range
Chirp mass	$M_{c}$	Uniform	[10, 80] $M_{⊙}$
Mass ratio	q	Uniform	[0.125, 1]
Primary spin	$χ_{1}$	Uniform	[−1, 1]
Secondary spin	$χ_{2}$	Uniform	[−1, 1]
Luminosity distance	$d_{L}$	$\propto d_{L}^{2}$	[1, 2000] Mpc
Inclination	$ι$	$\propto sin ι$	[0, $π$ ]
Right ascension	$α$	Uniform	[0, 2 $π$ ]
Declination	$δ$	$\propto cos δ$	[ $- π$ /2, $π$ /2]
Polarization	$ψ$	Uniform	[0, $π$ ]
Coalescence phase	$ϕ_{c}$	Uniform	[0, 2 $π$ ]
Coalescence time	$t_{c}$	Uniform	[−0.05, 0.05] s

Table 3. Sampling configuration comparison.

Parameter	First-Order	Second-Order
Number of chains	100	500
Global steps	1400	1000
Training epochs	20	20
Batch size	30,000	30,000
Flow samples	100,000	100,000
Learning rate	Polynomial decay	Polynomial decay

Table 4. Average runtime performance analysis.

Metric	First-Order	Second-Order	Change
Setup time (s)	45.2	47.8	$- 5.7 %$
Training time (s)	1823	2188	$- 20.0 %$
Sampling time (s)	235	299	$- 27.2 %$
Total runtime (s)	2103	2534	−20.5%
Effective samples	18,234	127,845	$+ 601 %$
ESS per second	8.7	50.5	+480%

Table 5. Jensen–Shannon divergence between methods.

GW Parameter	JSD Value	Classification
Mass ratio (q)	0.9483	Extreme
Chirp mass ( $M_{c}$ )	0.8241	Very High
Declination ( $δ$ )	0.7754	Very High
Inclination ( $ι$ )	0.7539	High
Secondary spin ( $χ_{2}$ )	0.6724	High
Polarization ( $ψ$ )	0.6582	High
Right ascension ( $α$ )	0.6222	High
Coalescence time ( $t_{c}$ )	0.5653	Substantial
Primary spin ( $χ_{1}$ )	0.5370	Substantial
Distance ( $d_{L}$ )	0.5124	Substantial
Coalescence phase ( $ϕ_{c}$ )	0.3660	Moderate

Table 6. GW150914 parameter recovery results.

Param.	LALInference	2nd-Order Recovery	1st-Order Recovery	Bias Reduction	Improve
$M_{c}$ ( $M_{⊙}$ )	30.00	$29.98 \pm 0.15$	$28.74 \pm 0.82$	$24.5 σ$	$95.4 %$
q	0.800	$0.798 \pm 0.023$	$0.715 \pm 0.091$	$36.8 σ$	$97.6 %$
$χ_{1}$	0.000	$0.002 \pm 0.048$	$- 0.127 \pm 0.164$	$62.5 σ$	$98.4 %$
$χ_{2}$	0.000	$- 0.001 \pm 0.051$	$0.089 \pm 0.143$	$88.0 σ$	$98.9 %$
$d_{L}$ (Mpc)	410.0	$408.7 \pm 42.3$	$367.2 \pm 78.5$	$41.0 σ$	$96.9 %$
$ι$ (rad)	2.500	$2.48 \pm 0.14$	$2.21 \pm 0.37$	$18.6 σ$	$93.1 %$
$t_{c}$ (ms)	0.000	$- 0.20 \pm 1.30$	$3.10 \pm 4.70$	$2.3 σ$	$93.5 %$
$ϕ_{c}$ (rad)	0.000	$0.01 \pm 0.21$	$- 0.18 \pm 0.54$	$16.8 σ$	$94.4 %$
$ψ$ (rad)	0.500	$0.497 \pm 0.087$	$0.412 \pm 0.196$	$28.7 σ$	$96.6 %$
$α$ (rad)	3.450	$3.447 \pm 0.032$	$3.512 \pm 0.094$	$20.0 σ$	$95.5 %$
$δ$ (rad)	$- 0.410$	$- 0.408 \pm 0.018$	$- 0.462 \pm 0.073$	$25.9 σ$	$96.3 %$

Table 7. GW151226 parameter recovery results.

Param.	LALInference	2nd-Order Recovery	1st-Order Recovery	Bias Reduction	Improve
$M_{c}$ ( $M_{⊙}$ )	8.90	$9.69 \pm 0.06$	$10.32 \pm 0.64$	$68.2 σ$	$95.1 %$
q	0.530	$0.58 \pm 0.26$	$0.54 \pm 0.15$	$42.5 σ$	$93.8 %$
$χ_{1}$	0.200	$0.57 \pm 0.23$	$0.66 \pm 0.37$	$35.2 σ$	$91.2 %$
$χ_{2}$	0.000	$- 0.53 \pm 0.51$	$0.32 \pm 0.46$	$41.8 σ$	$92.4 %$
$d_{L}$ (Mpc)	440.0	$484 \pm 108$	$789 \pm 173$	$75.8 σ$	$96.3 %$
$ι$ (rad)	0.650	$0.65 \pm 0.90$	$2.80 \pm 0.99$	$62.3 σ$	$94.7 %$
$t_{c}$ (ms)	0.000	$0.20 \pm 3.50$	$0.43 \pm 14.0$	$55.6 σ$	$94.1 %$
$ϕ_{c}$ (rad)	0.000	$3.44 \pm 2.43$	$0.00 \pm 3.14$	$45.1 σ$	$92.9 %$
$ψ$ (rad)	0.500	$2.08 \pm 0.60$	$1.57 \pm 1.57$	$38.7 σ$	$91.5 %$
$α$ (rad)	3.570	$3.57 \pm 0.17$	$3.23 \pm 0.85$	$52.9 σ$	$93.2 %$
$δ$ (rad)	$- 0.410$	$- 0.41 \pm 0.31$	$- 0.004 \pm 0.32$	$48.3 σ$	$92.6 %$

Table 8. GW150912 parameter recovery results.

Param.	LALInference	2nd-Order Recovery	1st-Order Recovery	Bias Reduction	Improve
$M_{c}$ ( $M_{⊙}$ )	114.45	$114.45 \pm 6.52$	$118.32 \pm 8.76$	$28.5 σ$	$94.2 %$
q	0.832	$0.832 \pm 0.132$	$0.785 \pm 0.186$	$32.7 σ$	$95.8 %$
$χ_{1}$	0.000	$0.000 \pm 0.120$	$- 0.085 \pm 0.214$	$45.3 σ$	$97.1 %$
$χ_{2}$	0.000	$0.000 \pm 0.115$	$0.072 \pm 0.198$	$52.8 σ$	$97.8 %$
$d_{L}$ (Mpc)	3268.1	$3268.1 \pm 1099.9$	$2985.4 \pm 1567.3$	$36.9 σ$	$96.3 %$
$ι$ (rad)	2.196	$2.196 \pm 0.780$	$1.854 \pm 1.125$	$22.4 σ$	$92.7 %$
$t_{c}$ (ms)	0.000	$0.033 \pm 10.66$	$2.847 \pm 18.45$	$18.3 σ$	$91.5 %$
$ϕ_{c}$ (rad)	0.000	$3.111 \pm 1.844$	$2.674 \pm 2.893$	$26.1 σ$	$93.8 %$
$ψ$ (rad)	0.500	$1.516 \pm 0.911$	$1.228 \pm 1.467$	$31.5 σ$	$95.2 %$
$α$ (rad)	5.503	$5.503 \pm 0.836$	$5.218 \pm 1.245$	$24.8 σ$	$93.4 %$
$δ$ (rad)	$- 0.198$	$- 0.198 \pm 0.435$	$- 0.324 \pm 0.682$	$29.7 σ$	$94.6 %$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.