Machine Learning-Driven Conservative-to-Primitive Conversion in Hybrid Piecewise Polytropic and Tabulated Equations of State

Kacmaz, Semih; Haas, Roland; Huerta, E. A.

doi:10.3390/sym17091409

Open AccessArticle

Machine Learning-Driven Conservative-to-Primitive Conversion in Hybrid Piecewise Polytropic and Tabulated Equations of State

by

Semih Kacmaz

^1,2,*

,

Roland Haas

^1,2,3

and

E. A. Huerta

^1,4,5

¹

Department of Physics, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA

²

National Center for Supercomputing Applications, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA

³

Department of Physics an Astronomy, University of British Columbia, Vancouver, BC V6T 1Z1, Canada

⁴

Data Science and Learning Division, Argonne National Laboratory, Lemont, IL 60439, USA

⁵

Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(9), 1409; https://doi.org/10.3390/sym17091409

Submission received: 10 July 2025 / Revised: 17 August 2025 / Accepted: 25 August 2025 / Published: 29 August 2025

(This article belongs to the Special Issue Symmetry in Gravitational Wave Physics)

Download

Browse Figures

Versions Notes

Abstract

We present a novel machine learning (ML)-based method to accelerate conservative-to-primitive inversion, focusing on hybrid piecewise polytropic and tabulated equations of state. Traditional root-finding techniques are computationally expensive, particularly for large-scale relativistic hydrodynamics simulations. To address this, we employ feedforward neural networks (NNC2PS and NNC2PL), trained in PyTorch (2.0+) and optimized for GPU inference using NVIDIA TensorRT (8.4.1), achieving significant speedups with minimal accuracy loss. The NNC2PS model achieves

L_{1}

and

L_{\infty}

errors of

4.54 \times 10^{- 7}

and

3.44 \times 10^{- 6}

, respectively, while the NNC2PL model exhibits even lower error values. TensorRT optimization with mixed-precision deployment substantially accelerates performance compared to traditional root-finding methods. Specifically, the mixed-precision TensorRT engine for NNC2PS achieves inference speeds approximately 400 times faster than a traditional single-threaded CPU implementation for a dataset size of 1,000,000 points. Ideal parallelization across an entire compute node in the Delta supercomputer (dual AMD 64-core 2.45 GHz Milan processors and 8 NVIDIA A100 GPUs with 40 GB HBM2 RAM and NVLink) predicts a 25-fold speedup for TensorRT over an optimally parallelized numerical method when processing 8 million data points. Moreover, the ML method exhibits sub-linear scaling with increasing dataset sizes. We release the scientific software developed, enabling further validation and extension of our findings. By exploiting the underlying symmetries within the equation of state, these findings highlight the potential of ML, combined with GPU optimization and model quantization, to accelerate conservative-to-primitive inversion in relativistic hydrodynamics simulations.

Keywords:

machine learning; relativistic hydrodynamics; conservative-to-primitive conversion

1. Introduction

In numerical relativity, accurately modeling astrophysical systems such as neutron star mergers [1,2,3,4,5,6,7,8,9,10,11,12,13,14] relies on solving the equations of relativistic hydrodynamics, which involve the inversion of conservative-to-primitive (C2P) variable relations [15,16,17]. This process typically requires computationally expensive root-finding algorithms, such as Newton-Raphson methods, and interpolation of complex, multi-dimensional equations of state (EOS) tables [18,19]. These methods, while robust, incur significant computational costs and can lead to inefficiencies, particularly in large-scale simulations, where up to billions of C2P calls may be required per time step. The inherent complexity of this mapping, however, often conceals underlying symmetries and lower-dimensional relationships that a machine learning model can be trained to recognize and exploit.

In view of these considerations, and taking into account the advent of GPU-based exascale supercomputers such as Aurora and Frontier and ongoing efforts to port relativistic hydrodynamics software into GPUs [20,21,22], this work explores the use of machine learning (ML) algorithms that leverage GPU-accelerated computing for C2P conversion. CPU-based algorithms for C2P conversion typically involve an iterative non-linear root finder, for which the number of iterations required to achieve a given target accuracy depends on the input data, resulting in different runtimes for different points of the numerical grid. This limits the potential to use SIMD (for CPUs) or SIMT (for GPUs) parallelism, reducing the effective rate of conversion achievable using these schemes. An ML approach with its more predictable runtime and regular memory access pattern may help alleviate these issues. Indeed, this work is motivated by recent studies that have explored the potential of ML to replace traditional root-finding approaches for C2P inversion [23]. Specifically, neural networks have shown promise in accelerating the C2P inversion process while maintaining high accuracy [23]. Building on this, the present work introduces a novel approach that leverages ML to accelerate the recovery of primitive variables from conserved variables in relativistic hydrodynamics simulations, with particular focus on hybrid piecewise polytropic and tabulated EOS. These EOS models provide more realistic descriptions of the dense interior of neutron stars, yet their complexity makes the traditional C2P procedure very computationally expensive.

To help address these computational challenges, we present a suite of feedforward neural networks trained to directly map conserved variables to primitive variables, bypassing the need for traditional iterative solvers. In particular, we employ a hybrid approach, utilizing the flexibility of neural networks to handle the challenges posed by complex EOS models. Our models are implemented using modern deep learning tools, such as PyTorch, and optimized for GPU inference with NVIDIA TensorRT [24]. Through comprehensive performance benchmarking, we demonstrate that our approach significantly outperforms traditional numerical methods in terms of speed, particularly when using mixed-precision deployment on modern hardware accelerators like NVIDIA A100 GPUs in the Delta supercomputer.

We evaluate the scalability of our ML models by comparing their inference performance against a single-threaded CPU implementation of a traditional numerical method from the RePrimAnd library [25]. The benchmark was conducted on a Delta supercomputer compute node, featuring dual AMD 64-core 2.45 GHz Milan processors, 8 NVIDIA A100 GPUs (40 GB HBM2 RAM), and NVLink. For dataset sizes ranging from 25,000 to 1,000,000 points, the numerical method exhibited linear scaling of inference time. In contrast, TensorRT-optimized and TorchScript-based neural networks achieved substantially faster inference, typically demonstrating sub-linear scaling. We investigate two feedforward neural network architectures: a smaller network (NNC2PS) and a larger one (NNC2PL). Notably, mixed-precision TensorRT engines delivered impressive performance, with the NNC2PS engine processing 1,000,000 points in 8.54 ms, compared to 3490 ms for the numerical method. Ideal parallelization across the entire node (64 CPU cores that support up to 128 threads and 8 GPUs) suggests a 25-fold speedup for TensorRT over the optimally parallelized numerical method when processing 8 million points. These results demonstrate the scalability and efficiency of our ML-based methods, offering significant improvements for high-throughput numerical relativistic hydrodynamics simulations.

This article is structured as follows. Section 2 introduces the EOS considered in this study, along with the methodologies employed for designing, training, validating, and testing the ML models. In Section 3, we present our key results, including an assessment of the accuracy of the ML models across different model types and quantization schemes. Additionally, we provide a comparison of the computational performance of the ML models relative to traditional root-finding methods. Finally, Section 4 offers a summary of the findings and outlines potential avenues for future research.

2. Methods

We present an ML-based model with the potential to accelerate the recovery of primitive variables from conserved variables in general relativistic hydrodynamics (GRHD) simulations, specifically focusing on scenarios employing hybrid piecewise polytropic EOS and tabulated EOS. As in traditional approaches, this conversion requires inverting the conservative-to-primitive map, a process often reliant on computationally expensive root-finding algorithms. While previous work has demonstrated the success of machine learning for this task with the

Γ

-law EOS [23], here, we investigate its application to hybrid piecewise polytropic EOS, which offers a more realistic representation of neutron star interiors, as well as the tabulated EOS, which incorporates the current nuclear physics model of neutron matter. To evaluate the performance of our neural network, we use a traditional CPU-based root-finding algorithm (provided by the RePrimAnd library) as a baseline for comparison. Our aim is to demonstrate the speed advantages of the neural network approach for conservative-to-primitive variable conversion. Our network is implemented using PyTorch (2.0+) and the inference speed tests are performed using libtorch and NVIDIA TensorRT (8.4.1)’s C++ API. While our numerical experiments are conducted in flat spacetime for simplicity, the C2P inversion is a local operation. Therefore, our method is directly applicable to general relativistic hydrodynamics simulations without loss of generality, as one can always perform the inversion in a local inertial frame.

In general relativity, the equations of relativistic hydrodynamics can be expressed in a conservation form suitable for numerical implementation. Specifically, in a flat spacetime, they constitute the following first-order, flux-conservative hyperbolic system:

\frac{1}{\sqrt{- g}} (\frac{\partial \sqrt{γ} u}{\partial x^{0}} + \frac{\partial \sqrt{- g} F^{i} (u)}{\partial x^{i}}) = 0,

(1)

where

g = det (g_{μ ν})

is the metric determinant, and

γ = det (γ_{i j})

is the determinant of the three metrics induced on each spacelike hypersurface. The state vector of the conserved variables is

u = (D, S_{j}, τ)

, and the flux vector is given by

F^{i} = (D (v^{i} - \frac{β^{i}}{α}), S_{j} (v^{i} - \frac{β^{i}}{α}) + p δ_{j}^{i}, τ (v^{i} - \frac{β^{i}}{α}) + p v^{i}),

(2)

where

α

is the lapse function and

β^{i}

the spacelike shift vector: two kinematic variables describing the evolution of spacelike foliations in spacetime as in a typical

3 + 1

(ADM) formulation.

The five quantities satisfying Equation (1), all measured by an Eulerian observer sitting at a spacelike hypersurface, are the relativistic rest-mass density, D, the three components of the momentum density,

S_{j}

, and the energy density relative to the rest mass density,

τ = E - D

, respectively. These are related to the primitive variables; rest-mass density,

ρ

, three-velocity,

v_{i}

, specific internal energy,

ϵ

, and pressure, p through

\begin{matrix} D & = ρ W, \\ S_{j} & = ρ h W^{2} v_{j}, \\ τ & = ρ h W^{2} - p - D, \end{matrix}

(3)

where

W = 1 / \sqrt{1 - γ_{i j} v^{i} v_{j}}

is the Lorentz factor, and

h = 1 + ϵ + p / ρ

is the specific enthalpy.

Incorporating the EOS into the picture provides the thermodynamical information linking the pressure to the fluid’s rest-mass density and internal energy, which, combined with the definitions above, closes the system of equations given in Equation (1) [26,27,28].

We will first focus on the hybrid piecewise polytropic EOS. The hybrid piecewise polytropic EOS was introduced for simplified simulations of stellar collapse to model the stiffening of the nuclear EOS at nuclear density and include thermal pressure during the postbounce phase [29]. In gravitational-wave science, it is more commonly used as described in Read et al. [30], where it enables gravitational-wave parameter estimation and waveform modeling by effectively capturing macroscopic neutron star observables with minimal parameters. The structure of this EOS consists of multiple cold polytropes, defined by parameters

K_{0}, K_{1}, \dots, K_{nsegments - 1}

and

Γ_{0}, Γ_{1}, \dots, Γ_{nsegments - 1}

, where nsegments denotes the total number of segments. Additionally, it includes a thermal

Γ

—law component characterized by

Γ_{th}

. Continuity of pressure and internal energy across segments, in accordance with the first law of thermodynamics, is ensured after appropriately setting initial values for the polytropic indices, density breakpoints (denoted

ρ_{breaks})

, and other relevant parameters. For this EOS, the polytropic indices (

Γ_{i}

), the density breakpoints (

ρ_{breaks}

), and the first segment’s polytropic constant (

K_{0}

) are treated as free parameters. Subsequent constants (

K_{i}

for

i > 0

and all

a_{i}

) are then determined by enforcing continuity of pressure and internal energy across the breakpoints. In this context, pressure and specific internal energy components in each density interval are given by

\begin{matrix} p_{cold} & = K_{i} ρ^{Γ_{i}}, \\ ϵ_{cold} & = a_{i} + \frac{K_{i}}{Γ_{i} - 1} ρ^{Γ_{i} - 1}, \\ p_{th} & = (Γ_{th} - 1) ρ (ϵ - ϵ_{cold}), \\ p & = p_{th} + p_{cold}, \end{matrix}

(4)

where

a_{i}

is the segment-specific constant, and the rest mass density,

ρ

, is assumed to fall into the segments specified by each of the

ρ_{breaks}

. These equations apply to segment i, where the rest-mass density

ρ

is in the range

ρ_{break, i - 1} < ρ < ρ_{break, i}

.

In addition to the hybrid piecewise polytropic EOS-based model, we will train a separate network to infer the conservative-to-primitive transformation utilizing the tabulated EOS data. Specifically, we will use the Lattimer-Swesty EOS with a compressibility parameter

K = 220

(hereafter referred to as LS220 EOS), due to its prevalence and historical significance. Our training dataset is based on a modern, updated version of LS220 EOS constructed and made available by Schneider, Roberts, and Ott in a more recent study [31].

Below, we outline the dataset preparation, model architecture, training process, and methods used in inference speed testing with libtorch and NVIDIA TensorRT to evaluate computational efficiency.

2.1. Data

2.1.1. Piecewise Polytropic EOS-Based Model Data

We generate a dataset of 500,000 samples using geometrized units where

G = c = M_{⊙} = 1

. Without loss of generality, we furthermore use a Minkowski metric

g_{μ ν} = diag (- 1, + 1, + 1, + 1)

. The rest-mass density,

ρ

, is sampled uniformly from

[2 \times 10^{- 5}, 2 \times 10^{- 3}]

, and the fluid’s three-velocity is assumed one-dimensional along the x-axis, sampled uniformly from

v_{x} \in (0, 0.721)

. These ranges are chosen to be representative of the conditions found in binary neutron star mergers and to facilitate a direct comparison with the previous work in [23]. Following Ref. [30], we use an SLy four-segment piecewise polytropic EOS with segment-wise polytropic indices

Γ = [1.3569, 3.0050, 2.9880, 2.8510]

. The first segment’s polytropic constant,

K_{0}

, is set to

8.9493 \times 10^{- 2}

. Subsequent polytropic constants,

K_{i}

, are determined by enforcing pressure continuity. Similarly, the first segment’s constant,

a_{0}

, is set to zero, while subsequent

a_{i}

values ensure continuity of internal energy. The density breaks for the segments are specified at

ρ = 2.3674 \times 10^{- 4}

,

8.1147 \times 10^{- 4}

, and

1.6191 \times 10^{- 3}

. The thermal component has an adiabatic index of

Γ_{th} = 5 / 3

. Additionally, the thermal component of the specific internal energy,

ϵ_{th}

, is sampled uniformly from

[0, 2]

(where

ϵ_{th} = ϵ - ϵ_{cold}

). A structured dataset is then constructed by converting the primitive variables to conserved variables using the standard relativistic hydrodynamic relations given in Equation (3). In this dataset, conserved variables serve as input features, and the pressure is the target variable. The resulting dataset is then split into training, validation, and test sets, with each set fully standardized to zero mean and unit variance to ensure equal contribution of all features during neural network training (Figure 1).

2.1.2. Tabulated EOS-Based Model Data

To generate the training data for the tabulated EOS-based model, we sample from a provided EOS table and follow a procedure similar to the one described in Section 2.1.1. We begin by reading in the EOS table, which contains the variables electron fraction (

Y_{e}

), temperature (T), rest-mass density (

ρ

), specific internal energy (

ϵ

), and pressure (p). These quantities are stored in logarithmic form in the table and are extracted accordingly. For each data point, a random one-dimensional three-velocity,

v_{x}

, is sampled uniformly on a linear scale from the interval

(0, 0.721)

. Values for electron fraction and temperature are also sampled uniformly on a linear scale from their respective ranges in the table. The rest-mass density is chosen by randomly selecting one of the grid points from the table, which are logarithmically spaced. For this study, we fetched the corresponding values of p and

ϵ

directly from the table without interpolation to ensure the training data perfectly represents the tabulated EOS. Using these, the corresponding values of

ρ

,

ϵ

, and p are then fetched from the EOS table. The primitive variables are then converted into conserved variables using standard relativistic hydrodynamics relations given in Equation (3). A total of 1,000,000 data points are generated using this process [32]. Similarly to the hybrid piecewise polytropic EOS-based model, the data is split into training, validation, and test sets, with each set fully standardized to zero mean and unit variance before being used for neural network training.

2.2. Model Architecture

2.2.1. Piecewise Polytropic EOS-Based Model

For the hybrid piecewise polytropic EOS-based model, we tested two feedforward neural networks of varying complexity to represent the conservative-to-primitive variable transformation. Each network takes as input the three conserved variables

(D, S_{x}, τ)

(Equation (3)) and outputs the pressure p (Equation (4)), assuming the remaining momentum density components are zero for simplicity. This architecture is designed to effectively learn the hidden symmetries in the relationship between the conserved and primitive variables, approximating the intricate C2P transformation without explicit root-finding. After experimenting with multiple multi-layer perceptron (MLP) architectures, as detailed in Appendix A, we identified two models that offered an optimal balance between accuracy, speed, and trainability. The smaller model, NNC2PS, features two hidden layers with 600 and 200 neurons, while the larger model, NNC2PL, contains five hidden layers with 1024, 512, 256, 128, and 64 neurons (Figure 2).

ReLU activation functions were applied to the hidden layers to introduce nonlinearity, with the output layer kept linear. We found these models strike an effective balance between complexity and performance, making them well-suited for our task.

2.2.2. Tabulated EOS-Based Model

For the tabulated EOS-based model, we use a single feedforward neural network, NNC2P_Tabulated, to achieve an inherently equivalent task with minor differences. This model takes as input the log-scaled variables

(\log D, log S_{x}, log τ, log Y_{e})

and outputs the log-scaled pressure,

log p

(Equation (4)), assuming

S_{y}

and

S_{z}

are zero for simplicity as before. Using log-scaled inputs and outputs aligns with the format of the tabulated EOS values, which are also stored in logarithmic form to accommodate the typically large values of these physical quantities. This approach reduces the range of feature magnitudes, facilitating more stable learning dynamics and better alignment with the source data.

We explored several MLP architectures, varying in parameters, layers, and training strategies, to identify an optimal design for our task. Among these, an architecture identical to NNC2PL, featuring five hidden layers with 1024, 512, 256, 128, and 64 neurons, respectively, detailed in Section 2.2.1 above, emerged as a robust choice. This architecture effectively balanced capacity and efficiency, enabling accurate learning of log-scaled pressure from tabulated EOS data (Figure 2).

2.3. Training Approach

We use a similar procedure to optimize all neural networks: NNC2PS, NNC2PL, and the tabulated baseline model, NNC2P_Tabulated, with minor tweaks. Training was performed on a single NVIDIA A100 GPU on the Delta cluster. For the hybrid piecewise polytropic EOS-based models (NNC2PS and NNC2PL), we employed a custom, physics-informed loss function that penalizes negative pressure predictions. This loss function is a modified mean-squared error:

L (θ) = \frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} (θ) - y_{i})}^{2} + q \cdot \sum_{i = 1}^{n} ReLU (- N^{- 1} ({\hat{y}}_{i} (θ))),

(5)

where

{\hat{y}}_{i} (θ)

represents the network’s estimation for feature i,

y_{i}

is the corresponding target value, ReLU is the familiar rectified linear unit defined by

ReLU (x) = max (0, x)

, and

N^{- 1} (\cdot)

represents an inverse normalization procedure based on the training data statistics. The penalty factor, q, was optimized for each model, with

q = 150

for NNC2PS and

q = 350

for NNC2PL. These values consistently suppressed negative pressure predictions on the test set. For the tabulated EOS model (NNC2P_Tabulated), the structure of the data precluded negative predictions, so a standard mean-squared error loss function was used.

All models were trained using the Adam optimizer with an initial learning rate of

3 \times 10^{- 4}

. A learning rate scheduler reduced the learning rate by a factor of 0.5 if the validation loss failed to improve for five consecutive epochs. NNC2PS and NNC2PL were trained for 85 epochs, while NNC2P_Tabulated required 250 epochs. These epoch counts were determined empirically by monitoring the validation loss, with training stopped once the loss had clearly converged. The use of a learning rate scheduler, which reduces the learning rate when the validation loss plateaus, also serves as a form of early stopping. For each epoch, the model was set to training mode, and data was loaded in batches of 32 onto the GPU. This batch size was chosen based on experimentation to balance the number of epochs and overall time to convergence. While training with larger batches and multiple GPUs (using PyTorch’s DataParallel module or other approaches) is possible, we found no significant advantage regarding the total time to convergence and ultimately opted for this simpler, more portable approach. For each batch, optimizer gradients were reset before generating predictions, and the loss was computed using respective loss functions. Backpropagation was then performed to update the model parameters.

After completing the training phase for each epoch, the model’s performance is evaluated on the validation dataset, accumulating the validation loss similarly to the training loss. Both losses are normalized by the size of the respective datasets and stored for further analysis, specifically for clues of potential overtraining.

2.4. Inference Speed Tests

In our inference speed tests, we evaluated two main approaches for efficient deployment: a TorchScript model and NVIDIA’s TensorRT optimized engines. These tests were conducted to measure and compare inference speed under typical deployment conditions, aiming to take advantage of the A100 GPU on Delta.

2.4.1. TorchScript Deployment

To prepare models for inference with TorchScript, we first saved a scripted version of the model, which is compatible with PyTorch’s JIT compiler, optimizing runtime execution without modifying the model’s core structure. TorchScript’s scripting provides some degree of optimization, enabling faster model execution than standard PyTorch models but without the hardware-level optimizations that TensorRT offers.

2.4.2. TensorRT Deployment

For TensorRT, we explored both FP32 (unquantized) and FP16-quantized engines, ultimately deciding not to pursue INT8 quantization due to accuracy degradation observed in initial tests. After extensive testing, we opted for dynamic engine building with a batch size determined by the total size of the expected dataset, as this approach provided the best balance between performance and flexibility for our hardware and model structure. It must be noted that constructing an optimal engine in TensorRT is a nuanced process, influenced by multiple factors including model architecture, hardware specifications, intended batch sizes during inference, and input data. Therefore, achieving the best results often involves iterative tuning and profiling to adapt the engine to the specific deployment environment and workload requirements. Below, we summarize the overall engine-building process we followed in detail:

Model Export to ONNX: First, we exported the PyTorch model to the ONNX format. This conversion enables interoperability with TensorRT, which uses ONNX as its primary model input format.
TensorRT Engine Building: Using TensorRT’s Python API, we constructed both FP32 and FP16 engines. A logger was initialized for verbose logging to capture potential issues during engine building. With the TensorRT Builder, we created a network definition with explicit batch handling, which is essential for dynamic batching configurations.
Parsing and Validating the ONNX Model: We loaded the ONNX model into TensorRT, where the OnnxParser validated and parsed the model. Parsing errors, if any, were logged for troubleshooting, ensuring a valid model structure before optimization.
Configuration and Optimization Profiles: The BuilderConfig was set with a 40 GB workspace memory limit, providing more than enough headroom for dynamic batch sizes while maintaining stable performance. We set up a dynamic optimization profile specifying minimum, optimal, and maximum batch sizes within a 10 percent margin of our typical usage, granting flexibility to handle both smaller and larger input volumes efficiently.
Engine Serialization: Finally, we serialized and saved the engine, creating a portable and optimized binary that can be loaded for deployment. This step encapsulates the model’s architecture, weights, and optimizations, ensuring it is ready for fast inference.

To ensure we measure the maximum possible performance for each point in our benchmark, we build a specialized, yet flexible, TensorRT engine for each combination of model and dataset size. The dynamic optimization profile for each of these engines is configured with a tight margin around its target dataset size (N), as detailed in Table 1.

Overall, the process of optimizing and saving models using both TorchScript and TensorRT gave us insight into balancing flexibility, accuracy, and performance. For larger batch sizes and greater computational demands, TensorRT’s dynamic engine approach in FP16 is often more effective, even for models as simple as ours, while TorchScript remains a reliable fallback and simpler alternative.

For the actual inference speed test procedure, we implemented two distinct workflows on a single GPU for both approaches. The TorchScript-based approach allowed for a straightforward configuration, primarily requiring the definition of batch sizes and the pre-loading of data onto the GPU. It then used libtorch for efficient GPU deployment and batch execution.

In contrast, the TensorRT-based approach demanded several additional configurations. The model, after being converted into an optimized engine, was loaded using TensorRT’s C++ API. This included the manual pre-loading of input data into GPU memory before execution and was followed by manual setup of input and output buffers for TensorRT’s executeV2 function and careful management of CUDA resources. While this setup was more involved, it leveraged hardware-specific optimizations to deliver substantial gains in inference speed.

3. Results

3.1. Accuracy

We evaluate the model accuracy using two standard metrics for regression problems: the

L_{1}

error (mean absolute error) and the

L_{\infty}

error (maximum absolute error), both calculated over the entire test dataset. Table 2 summarizes the accuracy results based on

L_{1}

and

L_{\infty}

error metrics for each model variant—NNC2PS, NNC2PL, and NNC2P_Tabulated—including both the unquantized and quantized TensorRT engines built from them.

The NNC2PS model trained in PyTorch achieves very high accuracy with an

L_{1}

error of

4.54 \times 10^{- 7}

and an

L_{\infty}

error of

3.44 \times 10^{- 6}

. When the model is converted to a TensorRT engine, the accuracy remains nearly identical, with an

L_{1}

error of

4.54 \times 10^{- 7}

and an

L_{\infty}

error of

3.43 \times 10^{- 6}

, indicating minimal loss in precision due to TensorRT optimization. However, when FP16 quantization is applied, the error rates increase to an

L_{1}

error of

6.39 \times 10^{- 7}

and an

L_{\infty}

error of

8.98 \times 10^{- 6}

, revealing an obvious side effect of reduced precision. This highlights the classic trade-off between computational performance and numerical precision, a critical consideration for selecting the appropriate model for a given scientific application where the tolerance for numerical error may vary.

The larger NNC2PL model, rather expectedly, achieves lower

L_{1}

and

L_{\infty}

errors than NNC2PS, with an

L_{1}

error of

2.75 \times 10^{- 7}

and an

L_{\infty}

error of

2.61 \times 10^{- 6}

. The corresponding TensorRT engine preserves this high level of accuracy, showing only a slight and negligible increase to an

L_{1}

error of

2.88 \times 10^{- 7}

and

L_{\infty}

error of

2.69 \times 10^{- 6}

, respectively. The FP16 quantized version, however, sees a notable rise in error metrics, with an

L_{1}

error of

5.32 \times 10^{- 7}

and an

L_{\infty}

error of

9.84 \times 10^{- 6}

.

The NNC2P_Tabulated model exhibits an

L_{1}

error of

8.02 \times 10^{- 3}

and an

L_{\infty}

error of

3.54 \times 10^{- 1}

. It is important to clarify that this larger error does not indicate a failure of the ML model but is a direct consequence of the model learning from a completely different dataset constructed from the LS220 EOS table to estimate the logarithmic pressure values. The TensorRT engine version also shows only a slight increase in

L_{1}

error to

8.16 \times 10^{- 3}

. With FP16 quantization, the

L_{1}

error rises again, more noticeably, to

1.38 \times 10^{- 2}

.

Additionally, we examined the relative accuracy of the NNC2P_Tabulated model for parameters

W = 1.02

,

1.1

,

1.25

, and

1.4

with

Y_{e} \approx 0.1

(See Figure 3). The relative error, defined as the absolute error divided by the true value for each point in a specific parameter set, was not uniform across the parameter space. Larger relative errors were observed in the lowest density and temperature regions of the EOS table, while slightly smaller errors occurred in the high-temperature regions. This accuracy trend was consistent across all tested Lorentz factor (W) values and even more emphasized for the FP16 precision TensorRT engine. The LS220 EOS, as provided by [31], transitions from detailed treatment at high densities to simplified approximations at lower densities, which may contribute to these disparities. Low-density regions are inherently challenging due to the dominance of thermal effects, non-uniform phase transitions, and the treatment of nuclear matter surfaces, which can exacerbate modeling errors [31,33]. These characteristics likely explain the reduced accuracy in these regions, where variations in the nuclear matter’s phase state are more pronounced.

The overall results show that TensorRT’s optimizations maintain accuracy across models when using full precision. FP16 quantization, while accelerating inference (as will be discussed further below), introduces higher error rates, particularly in certain models. The potential trade-off between the inference speed and precision can be especially important in relativistic hydrodynamics simulations, where the accuracy of small-scale structures and wave propagation can critically impact the fidelity of predictions. For such simulations, even slight deviations due to quantization can influence results, making full-precision TensorRT inference particularly valuable when accuracy is paramount. Conversely, FP16 quantization may be suitable for faster, lower-fidelity simulations where minor accuracy trade-offs are acceptable.

3.2. Inference Speed Analysis

The inference performance of various methods was evaluated using a single NVIDIA A100 GPU for neural network models and a single-threaded CPU implementation of the traditional numerical method from the RePrimAnd library. The CPUs used in this study were dual AMD 64-core 2.45 GHz Milan processors on the Delta cluster, which can support up to 128 threads. Each configuration was tested across five dataset sizes, ranging from 25,000 to 1,000,000 data points, with ten inference runs conducted per configuration to ensure result stability and consistency. For the RePrimAnd numerical solver, we set the target accuracy for the relative error in the root-finding algorithm to

10^{- 8}

. This is a standard, high-precision value used in production codes. We chose to compare our ML models against this robust baseline rather than tuning the numerical solver’s accuracy to match that of the NNs, ensuring a conservative performance comparison.

The numerical method exhibited linear scaling of inference time with respect to the dataset size. In contrast, both TensorRT and TorchScript models generally maintained relatively stable inference times across the dataset sizes. Notably, the full-precision TensorRT engine for the smaller network, NNC2PS, showed a faster-than-expected processing time at certain intermediate dataset sizes, as observed in Figure 4a. This behavior may be attributed to favorable thread block utilization and the kernel selection mechanism of TensorRT for this particular network size. A more detailed profiling study is needed to fully elucidate the underlying cause. The accuracy characteristics of these models remained consistent, as indicated in Table 2.

The numerical method required significantly more time than the neural network-based approaches. On average, the numerical method took 103.8 ms to process 25,000 data points, with runtime scaling almost linearly to 3490 ms for 1,000,000 data points. In contrast, the neural network models demonstrated substantially faster inference times. Specifically, the mixed-precision TensorRT engine built from NNC2PS required 7.92 ms for 25,000 data points and 8.54 ms for 1,000,000 data points. Its full-precision counterpart exhibited similar performance, with runtimes of 25.17 ms for 25,000 data points and 21.06 ms for 1,000,000 data points. The TorchScript variant showed slower performance but still maintained sub-linear scaling, with runtimes averaging 72.79 ms for 25,000 points and 101.74 ms for 1,000,000 points.

A similar trend was observed for the NNC2PL models, with TensorRT engines consistently outperforming their TorchScript counterparts. The mixed-precision TensorRT engine for NNC2PL processed 25,000 data points in 8.32 ms and 1,000,000 points in 14.35 ms. In comparison, the full-precision TensorRT engine required 25.85 ms for 25,000 points and 23.87 ms for 1,000,000 points. The TorchScript model averaged 73.18 ms for 25,000 points and 102.04 ms for 1,000,000 points.

Figure 4 presents a theoretical performance benchmark based on ideal scaling under the assumption of perfect parallelization. This scenario assumes optimal workload distribution, minimal communication overhead, and negligible synchronization delays, representing the upper bound of scalability. For the numerical method, the figure reflects the full computational capacity of a single CPU node on the Delta cluster, utilizing 128 threads. For the neural networks, it represents the use of 8 A100 GPUs within a single GPU node. Under these ideal conditions, the processing time of the numerical method per data point is projected to decrease by a factor of 128, allowing for the processing of 8 million points in approximately 218 ms (Figure 4b). Similarly, all neural network methods are expected to achieve linear inference scaling with similar per-GPU efficiency. Under this scenario, TensorRT-based methods—particularly the mixed-precision engine for NNC2PS—show a 25-fold reduction in processing time for 8 million points compared to the numerical method running at full capacity on the CPU node. Furthermore, the scaling trend strongly favors TensorRT for even larger datasets.

The results presented above underscore the substantial performance gains achievable through the use of TensorRT-optimized neural networks, particularly in the context of conservative-to-primitive inversion in relativistic hydrodynamics simulations. By leveraging the parallel processing power of modern GPUs, these methods offer significant speedups compared to traditional CPU-based numerical approaches, even in large-scale simulations involving millions of data points. As demonstrated, TensorRT optimizations enable more efficient and scalable solutions, with the potential to dramatically reduce the computational cost of C2P operations. This work highlights the clear advantage of integrating ML-driven methods with GPU acceleration to address the computational challenges of high-throughput simulations. Moving forward, the next step is to incorporate these optimized approaches into full-scale hydrodynamics simulations, where their impact on both performance and scalability can be fully realized.

It is important to contextualize the comparison between the fully utilized CPU component (128 threads) and the fully utilized GPU component (8 GPUs) of a single compute node. This ‘node-to-node’ benchmark is designed to answer the practical question of how to best utilize the co-located and often cost-equivalent hardware resources of a modern heterogeneous compute node. While a formal cost-normalized analysis is complex, this approach compares the optimal-use scenario for each hardware type available to a researcher on a typical allocation. The resulting 25-fold speedup is therefore a combination of the algorithmic shift (from iterative root-finding to direct-mapping) and the architectural advantage of GPUs for the massively parallel workload presented by the neural network.

4. Conclusions

This work introduces a novel ML-driven method for accelerating C2P inversions in relativistic hydrodynamics simulations, with a focus on hybrid piecewise polytropic and tabulated equations of state. By employing feedforward neural networks optimized with TensorRT, we achieve substantial performance improvements over traditional CPU solvers, offering a compelling alternative to computationally expensive iterative methods while maintaining high accuracy. Our results demonstrate that the TensorRT-optimized neural networks can process large datasets significantly faster, achieving up to 25 times the inference speed of traditional methods. The success of this approach is rooted in the neural network’s ability to efficiently learn and represent the inherent symmetries and complex functional relationships within the EOS, effectively creating a direct mapping that bypasses iterative numerical solvers.

Future work will explore several key directions to refine and expand this approach. First, adapting the models to handle a broader range of equations of state will improve the versatility of this method across different simulation contexts. Second, exploring alternative network architectures, such as those incorporating physics-informed layers or adaptive activation functions to better handle physical discontinuities like phase transitions, could further enhance both accuracy and inference speed. Third, the models must be extended to handle full three-dimensional velocities to be fully integrated into production-level GRMHD codes. Additionally, continued optimization of TensorRT, including advanced parallelization strategies and scaling across multiple GPUs, and careful exploration of lower-precision formats like INT8, potentially with quantization-aware training, promises even greater reductions in computational time, enabling simulations of larger and more complex astrophysical systems. These improvements will be critical for advancing high-resolution simulations in numerical relativistic hydrodynamics.

We believe that ML-driven methods, particularly those incorporating TensorRT optimization, will play an essential role in advancing the field of general relativistic hydrodynamics and numerical relativity more broadly. To facilitate further validation and extension of these findings, we have made the software developed for this study publicly available at: https://github.com/semihkacmaz/C2PNets (accessed on 27 August 2025).

Author Contributions

Conceptualization, R.H. and E.A.H.; Methodology, S.K., R.H. and E.A.H.; Software, S.K.; Validation, S.K.; Formal analysis, S.K.; Writing—original draft, S.K. and E.A.H.; Writing—review & editing, S.K., R.H. and E.A.H.; Visualization, S.K.; Supervision, R.H. and E.A.H.; Project administration, R.H. and E.A.H.; Funding acquisition, R.H. and E.A.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Science Foundation grant number OAC-1931561, OAC-2209892, OAC-2103680, OAC-2004879, OAC-2310548, OAC-2005572, OAC-2411068, OAC-2005572, and OAC-2320345 and by ACCESS-CI [34] grant number PHY160053. The APC was funded by National Science Foundation grant number OAC-2310548.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

This research used the Delta advanced computing and data resource. Delta is a joint effort of the University of Illinois Urbana-Champaign and its National Center for Supercomputing Applications. This research used the DeltaAI advanced computing and data resource. DeltaAI is a joint effort of the University of Illinois Urbana-Champaign and its National Center for Supercomputing Applications. We further acknowledge the use of Matplotlib [35] and Seaborn [36] for the generation of figures in this publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Model Architecture Exploration and Training History

In this study, we explored a wide range of multi-layer perceptron (MLP) architectures to identify models that offer an optimal balance between predictive accuracy and inference speed. The models presented in the main text—NNC2PS, NNC2PL, and NNC2P_Tabulated—were the result of this systematic exploration.

Our findings, summarized in Table A1, demonstrate a clear “sweet spot” for model complexity. Architectures smaller than our chosen models (e.g., NNC2PS-Tiny) offered lower parameter counts but with a notable drop in accuracy. Conversely, models that were significantly wider or deeper than our selections provided only marginal accuracy gains for a substantial increase in parameter count and computational cost (e.g., NNC2PL-Wide, NNC2PL-Deep-7). This trend of diminishing returns is evident across both EOS types.

Notably, excessively deep architectures (e.g., the 10- and 13-layer models) consistently exhibited training instability or yielded worse performance, reinforcing our choice of moderately sized networks as the most effective and efficient solution for this regression task.

To demonstrate the stability of our training procedure, Figure A1 and Figure A2 show the training and validation loss curves for the final three models used in this work. The curves illustrate smooth convergence to a low loss value with no signs of significant overfitting, validating our training methodology.

Table A1. Explored architectures and validation accuracy (

L_{1}

error) for both EOS models. The selected models are shown in bold. The validation error measures the model’s performance on unseen data.

Table A1. Explored architectures and validation accuracy (

L_{1}

error) for both EOS models. The selected models are shown in bold. The validation error measures the model’s performance on unseen data.

Model Name	Hidden Layers (Neurons per Layer)	Total Parameters	Validation $L_{1}$ Error
Piecewise Polytropic EOS
`NNC2PS-Tiny`	[300, 100]	~31 k	$5.8 \times 10^{- 7}$
`NNC2PS-Shallow`	[800]	~3 k	$6.5 \times 10^{- 7}$
`NNC2PS`	[600, 200]	~123 k	$4.5 \times 10^{- 7}$
`NNC2PS-Wide`	[800, 400]	~324 k	$4.1 \times 10^{- 7}$
`NNC2PL-Small`	[512, 256, 128, 64]	~180 k	$3.9 \times 10^{- 7}$
`NNC2PL-Medium`	[1024, 512, 256, 128]	~690 k	$3.2 \times 10^{- 7}$
`NNC2PL`	[1024, 512, 256, 128, 64]	~707 k	$2.8 \times 10^{- 7}$
`NNC2PL-Wide`	[2048, 1024, 512, 256, 128]	~2.8 M	$2.5 \times 10^{- 7}$
`NNC2PL-Deep-7`	[1024, 1024, 512, 512, 256, 128, 64]	~2.4 M	$2.9 \times 10^{- 7}$
`NNC2PL-Deep-10`	[1024, 1024, 512, 512, 256, 256, 128, 128, 64, 64]	~3.5 M	$3.1 \times 10^{- 7}$
`NNC2PL-SuperDeep`	13 Layers	~5 M	Failed to Converge
Tabulated EOS (LS220)
`NNC2P_Tab-Tiny`	[512, 256, 128]	~165 k	$9.5 \times 10^{- 3}$
`NNC2P_Tab-Small`	[1024, 512, 256, 128]	~690 k	$8.8 \times 10^{- 3}$
`NNC2P_Tabulated`	[1024, 512, 256, 128, 64]	~707 k	$8.0 \times 10^{- 3}$
`NNC2P_Tab-Wide`	[2048, 1024, 512, 256, 128]	~2.8 M	$7.7 \times 10^{- 3}$
`NNC2P_Tab-Deep-7`	[1024, 1024, 512, 512, 256, 128, 64]	~2.4 M	$8.2 \times 10^{- 3}$
`NNC2P_Tab-Deep-10`	[1024, 1024, 512, 512, 256, 256, 128, 128, 64, 64]	~3.5 M	$8.5 \times 10^{- 3}$
`NNC2P_Tab-SuperDeep`	13 Layers	~5 M	Failed to Converge

Figure A1. Training and validation loss curves for the piecewise polytropic EOS models. The smooth convergence demonstrates a stable training process for (a) NNC2PS and (b) NNC2PL.

Figure A2. Training and validation loss curves for the tabulated EOS model, NNC2P_Tabulated. The smooth convergence demonstrates a stable training process without significant overfitting.

References

Radice, D.; Bernuzzi, S.; Perego, A. The Dynamics of Binary Neutron Star Mergers and GW170817. Annu. Rev. Nucl. Part. Sci. 2020, 70, 95–119. [Google Scholar] [CrossRef]
Ciolfi, R.; Kastaun, W.; Giacomazzo, B.; Endrizzi, A.; Siegel, D.M.; Perna, R. General relativistic magnetohydrodynamic simulations of binary neutron star mergers forming a long-lived neutron star. Phys. Rev. D 2017, 95, 063016. [Google Scholar] [CrossRef]
Kiuchi, K. General relativistic magnetohydrodynamics simulations for binary neutron star mergers. arXiv 2024, arXiv:2405.10081. [Google Scholar]
Siegel, D.M.; Metzger, B.D. Three-Dimensional General-Relativistic Magnetohydrodynamic Simulations of Remnant Accretion Disks from Neutron Star Mergers: Outflows and r-Process Nucleosynthesis. Phys. Rev. Lett. 2017, 119, 231102. [Google Scholar] [CrossRef] [PubMed]
Sun, L.; Ruiz, M.; Shapiro, S.L.; Tsokaros, A. Jet launching from binary neutron star mergers: Incorporating neutrino transport and magnetic fields. Phys. Rev. D 2022, 105, 104028. [Google Scholar] [CrossRef]
Tsokaros, A.; Ruiz, M.; Shapiro, S.L.; Uryū, K. Magnetohydrodynamic Simulations of Self-Consistent Rotating Neutron Stars with Mixed Poloidal and Toroidal Magnetic Fields. Phys. Rev. Lett. 2022, 128, 061101. [Google Scholar] [CrossRef]
Fernández, R.; Tchekhovskoy, A.; Quataert, E.; Foucart, F.; Kasen, D. Long-term GRMHD simulations of neutron star merger accretion discs: Implications for electromagnetic counterparts. Mon. Not. R. Astron. Soc. 2019, 482, 3373–3393. [Google Scholar] [CrossRef]
Foucart, F.; Haas, R.; Duez, M.D.; O’Connor, E.; Ott, C.D.; Roberts, L.; Kidder, L.E.; Lippuner, J.; Pfeiffer, H.P.; Scheel, M.A. Low mass binary neutron star mergers: Gravitational waves and neutrino emission. Phys. Rev. D 2016, 93, 044019. [Google Scholar] [CrossRef]
Camilletti, A.; Chiesa, L.; Ricigliano, G.; Perego, A.; Lippold, L.C.; Padamata, S.; Bernuzzi, S.; Radice, D.; Logoteta, D.; Guercilena, F.M. Numerical relativity simulations of the neutron star merger GW190425: Microphysics and mass ratio effects. Mon. Not. Roy. Astron. Soc. 2022, 516, 4760–4781. [Google Scholar] [CrossRef]
Dietrich, T.; Hinderer, T.; Samajdar, A. Interpreting Binary Neutron Star Mergers: Describing the Binary Neutron Star Dynamics, Modelling Gravitational Waveforms, and Analyzing Detections. Gen. Rel. Grav. 2021, 53, 27. [Google Scholar] [CrossRef]
Agathos, M.; Meidam, J.; Del Pozzo, W.; Li, T.G.F.; Tompitak, M.; Veitch, J.; Vitale, S.; Van Den Broeck, C. Constraining the neutron star equation of state with gravitational wave signals from coalescing binary neutron stars. Phys. Rev. D 2015, 92, 023012. [Google Scholar] [CrossRef]
Bauswein, A.; Baumgarte, T.W.; Janka, H.T. Prompt Merger Collapse and the Maximum Mass of Neutron Stars. Phys. Rev. Lett. 2013, 111, 131101. [Google Scholar] [CrossRef] [PubMed]
Oertel, M.; Hempel, M.; Klähn, T.; Typel, S. Equations of state for supernovae and compact stars. Rev. Mod. Phys. 2017, 89, 015007. [Google Scholar] [CrossRef]
Alford, M.G.; Schmitt, A.; Rajagopal, K.; Schäfer, T. Color superconductivity in dense quark matter. Rev. Mod. Phys. 2008, 80, 1455–1515. [Google Scholar] [CrossRef]
Noble, S.C.; Gammie, C.F.; McKinney, J.C.; Del Zanna, L. Primitive Variable Solvers for Conservative General Relativistic Magnetohydrodynamics. Astrophys. J. 2006, 641, 626–637. [Google Scholar] [CrossRef]
Faber, J.A.; Rasio, F.A. Binary neutron star mergers. Living Rev. Relativ. 2012, 15, 1–83. [Google Scholar] [CrossRef]
Duez, M.D.; Liu, Y.T.; Shapiro, S.L.; Stephens, B.C. Relativistic magnetohydrodynamics in dynamical spacetimes: Numerical methods and tests. Phys. Rev. D 2005, 72, 024028. [Google Scholar] [CrossRef]
Font, J.A. Numerical Hydrodynamics in General Relativity. Living Rev. Relativ. 2000, 3, 2. [Google Scholar] [CrossRef]
Chang, P.; Etienne, Z. General relativistic hydrodynamics on a moving-mesh I: Static space–times. Mon. Not. Roy. Astron. Soc. 2020, 496, 206–214. [Google Scholar] [CrossRef]
Kalinani, J.V.; Ji, L.; Ennoggi, L.; Lopez Armengol, F.G.; Sanches, L.T.; Tsao, B.-J.; Brandt, S.R.; Campanelli, M.; Ciolfi, R.; Giacomazzo, B. AsterX: A new open-source GPU-accelerated GRMHD code for dynamical spacetimes. Class. Quant. Grav. 2025, 42, 025016. [Google Scholar] [CrossRef]
Zhu, H.; Fields, J.; Zappa, F.; Radice, D.; Stone, J.; Rashti, A.; Cook, W.; Bernuzzi, S.; Daszuta, B. Performance-Portable Numerical Relativity with AthenaK. arXiv 2024, arXiv:2409.10383. [Google Scholar] [CrossRef]
Liebling, S.L.; Palenzuela, C.; Lehner, L. Toward fidelity and scalability in non-vacuum mergers. Class. Quant. Grav. 2020, 37, 135006. [Google Scholar] [CrossRef]
Dieselhorst, T.; Cook, W.; Bernuzzi, S.; Radice, D. Machine Learning for Conservative-to-Primitive in Relativistic Hydrodynamics. Symmetry 2021, 13, 2157. [Google Scholar] [CrossRef]
Ansel, J.; Yang, E.; He, H.; Gimelshein, N.; Jain, A.; Voznesensky, M.; Bao, B.; Bell, P.; Berard, D.; Burovski, E.; et al. PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’24), ACM, La Jolla, CA, USA, 27 April–1 May 2024. [Google Scholar] [CrossRef]
Kastaun, W.; Kalinani, J.V.; Ciolfi, R. Robust Recovery of Primitive Variables in Relativistic Ideal Magnetohydrodynamics. Phys. Rev. D 2021, 103, 023018. [Google Scholar] [CrossRef]
Banyuls, F.; Font, J.A.; Ibáñez, J.M.; Martí, J.M.; Miralles, J.A. Numerical 3 + 1 General Relativistic Hydrodynamics: A Local Characteristic Approach. Astrophys. J. 1997, 476, 221. [Google Scholar] [CrossRef]
Martí, J.M.; Müller, E. Numerical Hydrodynamics in Special Relativity. Living Rev. Relativ. 2003, 6, 7. [Google Scholar] [CrossRef]
Font, J.A. Numerical Hydrodynamics and Magnetohydrodynamics in General Relativity. Living Rev. Relativ. 2008, 11, 7. [Google Scholar] [CrossRef]
Janka, H.T.; Zwerger, T.; Moenchmeyer, R. Does artificial viscosity destroy prompt type-II supernova explosions? Astron. Astrophys. 1993, 268, 360–368. [Google Scholar]
Read, J.S.; Lackey, B.D.; Owen, B.J.; Friedman, J.L. Constraints on a Phenomenologically Parametrized Neutron-Star Equation of State. Phys. Rev. D 2009, 79, 124032. [Google Scholar] [CrossRef]
Schneider, A.S.; Roberts, L.F.; Ott, C.D. Open-Source Nuclear Equation of State Framework Based on the Liquid-Drop Model with Skyrme Interaction. Phys. Rev. C 2017, 96, 065802. [Google Scholar] [CrossRef]
Wouters, T. Machine Learning Algorithms for the Conservative-to-Primitive Conversion in Relativistic Hydrodynamics. Master’s Thesis, KU Leuven, Leuven, Belgium, 2024. [Google Scholar]
Bernuzzi, S.; Breschi, M.; Daszuta, B.; Endrizzi, A.; Logoteta, D.; Nedora, V.; Perego, A.; Schianchi, F.; Radice, D.; Zappa, F.; et al. Accretion-Induced Prompt Black Hole Formation in Asymmetric Neutron Star Mergers, Dynamical Ejecta and Kilonova Signals. Mon. Not. R. Astron. Soc. 2020, 497, 1488–1507. [Google Scholar] [CrossRef]
Boerner, T.J.; Deems, S.; Furlani, T.R.; Knuth, S.L.; Towns, J. ACCESS: Advancing Innovation: NSF’s Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support. In Proceedings of the Practice and Experience in Advanced Research Computing 2023: Computing for the Common Good, New York, NY, USA, 23–27 July 2023; pp. 173–176. [Google Scholar] [CrossRef]
Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
Waskom, M.L. seaborn: Statistical data visualization. J. Open Source Softw. 2021, 6, 3021. [Google Scholar] [CrossRef]

Figure 1. Visualization of the thermodynamic relations based on the complete training data generated for the four-segment piecewise polytropic EOS-based model. From left to right: pressure (p) vs. rest-mass density (

ρ

), specific internal energy (

ϵ

) vs. rest-mass density (

ρ

), and specific enthalpy (h) vs. rest-mass density (

ρ

). All quantities are plotted on a logarithmic scale. The distinct segments of the piecewise polytropic EOS are delineated by the red vertical lines.

Figure 1. Visualization of the thermodynamic relations based on the complete training data generated for the four-segment piecewise polytropic EOS-based model. From left to right: pressure (p) vs. rest-mass density (

ρ

), specific internal energy (

ϵ

) vs. rest-mass density (

ρ

), and specific enthalpy (h) vs. rest-mass density (

ρ

). All quantities are plotted on a logarithmic scale. The distinct segments of the piecewise polytropic EOS are delineated by the red vertical lines.

Figure 2. Architectures of the neural networks used for conservative-to-primitive variable mapping. Top: The NNC2PS network takes conserved variables D,

S_{x}

, and

τ

as input and outputs the pressure p. Bottom: The NNC2P_Tabulated network uses the logarithm of conserved variables

log D

,

log S_{x}

, and

log τ

, along with the electron fraction

Y_{e}

, as input, outputting the logarithm of pressure

log p

. The NNC2PL network shares an identical architecture with NNC2P_Tabulated, but with the input/output structure of NNC2PS.

Figure 2. Architectures of the neural networks used for conservative-to-primitive variable mapping. Top: The NNC2PS network takes conserved variables D,

S_{x}

, and

τ

as input and outputs the pressure p. Bottom: The NNC2P_Tabulated network uses the logarithm of conserved variables

log D

,

log S_{x}

, and

log τ

, along with the electron fraction

Y_{e}

, as input, outputting the logarithm of pressure

log p

. The NNC2PL network shares an identical architecture with NNC2P_Tabulated, but with the input/output structure of NNC2PS.

Figure 3. Relative error of the NNC2P_Tabulated model for various Lorentz factors (W) with

Y_{e} \approx 0.1

. The plots highlight the accuracy trends across different regions of the LS220 EOS table, showing larger relative errors in low-density and low-temperature regions, reflecting the inherent complexities of the EOS in this region. This behavior is consistent across the tested W values of 1.02, 1.1, 1.25, and 1.4 and is more pronounced for the FP16 precision TensorRT engine.

Figure 3. Relative error of the NNC2P_Tabulated model for various Lorentz factors (W) with

Y_{e} \approx 0.1

. The plots highlight the accuracy trends across different regions of the LS220 EOS table, showing larger relative errors in low-density and low-temperature regions, reflecting the inherent complexities of the EOS in this region. This behavior is consistent across the tested W values of 1.02, 1.1, 1.25, and 1.4 and is more pronounced for the FP16 precision TensorRT engine.

Figure 4. Ideal scaling comparison of various C2P inversion methods under the assumption of perfect parallelization. (a) Projected inference time as a function of dataset size for a traditional numerical solver (RePrimAnd utilizing 128 CPU threads on a single node of the Delta cluster) and two neural network models (NNC2PS and NNC2PL) using TensorRT (FP32 and FP16 precision) and TorchScript across 8 NVIDIA A100 GPUs. (b) Projected inference speed comparison for a dataset of 8 million points, highlighting the significant scalability and efficiency gains achieved by TensorRT engines, particularly with FP16 optimization. The mixed-precision TensorRT engine for NNC2PS achieves approximately a 25-fold reduction in processing time compared to the numerical method, showcasing the potential for TensorRT-based methods to convincingly outperform traditional numerical solvers at scale. The width of the lines in panel (a) represents the standard deviation over ten independent runs, with the wider band for the numerical method indicating higher runtime variability.

Table 1. Dynamic optimization profiles used for building specialized TensorRT engines for each benchmarked dataset size (N). The profile for each engine is configured with a tight margin around its target optimal size.

Target Dataset Size (N)	Min Batch Size (0.95 N)	Optimal Batch Size (N)	Max Batch Size (1.05 N)
25,000	23,750	25,000	26,250
50,000	47,500	50,000	52,500
100,000	95,000	100,000	105,000
500,000	475,000	500,000	525,000
1,000,000	950,000	1,000,000	1,050,000

Table 2. Accuracy results for all models.

Model	$L_{1}$ Error	$L_{\infty}$ Error
`NNC2PS` (PyTorch)	$4.54 \times 10^{- 7}$	$3.44 \times 10^{- 6}$
`NNC2PS` (TensorRT)	$4.54 \times 10^{- 7}$	$3.43 \times 10^{- 6}$
`NNC2PS` (TensorRT–`FP16`)	$6.39 \times 10^{- 7}$	$8.98 \times 10^{- 6}$
`NNC2PL` (PyTorch)	$2.75 \times 10^{- 7}$	$2.61 \times 10^{- 6}$
`NNC2PL` (TensorRT)	$2.88 \times 10^{- 7}$	$2.69 \times 10^{- 6}$
`NNC2PL` (TensorRT–`FP16`)	$5.32 \times 10^{- 7}$	$9.84 \times 10^{- 6}$
`NNC2P_Tabulated` (PyTorch)	$8.02 \times 10^{- 3}$	$3.54 \times 10^{- 1}$
`NNC2P_Tabulated` (TensorRT)	$8.16 \times 10^{- 3}$	$3.45 \times 10^{- 1}$
`NNC2P_Tabulated` (TensorRT–`FP16`)	$1.38 \times 10^{- 2}$	$7.44 \times 10^{- 1}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kacmaz, S.; Haas, R.; Huerta, E.A. Machine Learning-Driven Conservative-to-Primitive Conversion in Hybrid Piecewise Polytropic and Tabulated Equations of State. Symmetry 2025, 17, 1409. https://doi.org/10.3390/sym17091409

AMA Style

Kacmaz S, Haas R, Huerta EA. Machine Learning-Driven Conservative-to-Primitive Conversion in Hybrid Piecewise Polytropic and Tabulated Equations of State. Symmetry. 2025; 17(9):1409. https://doi.org/10.3390/sym17091409

Chicago/Turabian Style

Kacmaz, Semih, Roland Haas, and E. A. Huerta. 2025. "Machine Learning-Driven Conservative-to-Primitive Conversion in Hybrid Piecewise Polytropic and Tabulated Equations of State" Symmetry 17, no. 9: 1409. https://doi.org/10.3390/sym17091409

APA Style

Kacmaz, S., Haas, R., & Huerta, E. A. (2025). Machine Learning-Driven Conservative-to-Primitive Conversion in Hybrid Piecewise Polytropic and Tabulated Equations of State. Symmetry, 17(9), 1409. https://doi.org/10.3390/sym17091409

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning-Driven Conservative-to-Primitive Conversion in Hybrid Piecewise Polytropic and Tabulated Equations of State

Abstract

1. Introduction

2. Methods

2.1. Data

2.1.1. Piecewise Polytropic EOS-Based Model Data

2.1.2. Tabulated EOS-Based Model Data

2.2. Model Architecture

2.2.1. Piecewise Polytropic EOS-Based Model

2.2.2. Tabulated EOS-Based Model

2.3. Training Approach

2.4. Inference Speed Tests

2.4.1. TorchScript Deployment

2.4.2. TensorRT Deployment

3. Results

3.1. Accuracy

3.2. Inference Speed Analysis

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Model Architecture Exploration and Training History

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI