A new mass model for nuclear astrophysics: crossing 200 keV accuracy

By using a machine learning algorithms, we present an improved nuclear mass table with a root mean square deviation of less than $200$\.keV. The model is equipped with statistical error bars in order to compare with available experimental data. We use the resulting model to predict the composition of the outer crust of a neutron star. By means of simple Monte Carlo methods, we propagate the statistical uncertainties of the mass model to the equation of state of the system.


I. INTRODUCTION
Neutron stars (NS) are fascinating objects: with a typical mass of M ≈ 1.5 M and radius R ≈ 12 km [1], they represent the ideal laboratory to study properties of nuclear matter under extreme conditions. The strong pressure gradient forces the matter within the NS to arrange into layers with different properties [2]. Going from the most external regions of the star to its centre the matter density ρ spans several orders of magnitude from ≈ 10 −8 ρ 0 to ≈ 3-10 ρ 0 , where ρ 0 = 0.16 fm −3 ≈ 2.7 × 10 14 g cm −3 is the typical value of the density at the centre of an atomic nucleus [3].
The external region of a cold non-accreting NS is named the outer crust. It consists of a Coulomb lattice comprising fully-ionized atoms with Z protons and N neutrons. As discussed in Ref. [4], at β-equilibrium the composition of each layer of the crust at a given pressure P is obtained by minimising the Gibbs free energy per nucleon. The latter is the sum of three main contributions: the nuclear, electronic and lattice. Since a large fraction of nuclei present in the outer crust are extremely neutron rich, their binding energy is not known experimentally, and consequently one has to rely on a nuclear mass model. Several models are available within the scientific literature with a typical accuracy, i.e., the root mean square (RMS) deviation of the residuals, of 500 keV [5]. The most accurate are those of Wang and Liu [6], having a typical RMS of ≈ 200-300 keV. Although such an RMS is remarkably low compared to the typical binding energy of a nucleus, the discrepancies between various models are still important, especially when used to predict the composition of the outer crust of a NS [7].
Analysis of the residuals of various mass models shows that they do not show chaotic behaviour [8], thus it should be possible to further improve their accuracy, at least up to the level of Garvey-Kelson relations [9], by adding additional terms to account for the missing physics. This may be a very complex task, but machine learning methods can provide major support in achieving this goal.
In recent years, several authors have tried to reduce the discrepancy between theory and experiment by supplementing various mass models with neural networks (NNs) [10][11][12][13][14][15], where the NN learns the behaviour of the residuals. NNs are excellent interpolators [16], but they should be used with great care for extrapolation. The major problem is the presence of an unwanted trend related to the particular choice of the activation function. See Refs. [17,18] for a more detailed discussion on the topic.
A possible alternative to NNs has been discussed in Ref. [14], and it is based on Gaussian processes (GPs) [19][20][21]. This GP method assumes that the residuals originate from some multivariate Gaussian distribution, whose covariance matrix contains some parameters to be adjusted in order to maximise the likelihood for the GP's fit to the residuals. The main advantage of a GP over a NN is that its predictions do not contain unwanted trends in extrapolation, but instead will always return to 0 after a predictable extrapolation distance. Moreover, GP predictions come equipped naturally with error bars. This is not the case for a standard NN (only Bayesian Neural Networks are equipped with posterior distributions that can be interpreted as error bars [22]), and a more involved procedure is required to obtain an estimate [18].
In the current article, we present a new mass table, made by combining the predictions of a Duflo-Zucker [23] mass model with a GP, in order to further reduce the RMS of the residuals. We use the resulting model to analyse the composition of the outer crust of a NS. As previously done in Ref. [15], we perform a full error analysis of the mass model and we use a Monte Carlo procedure to propagate these statistical uncertainties through to the final equation of state (EoS).
The article is organised as follows: in Sec. II we briefly introduce the concept of Gaussian processes and their use for regression, and in Sec. III we discuss the nuclear mass model and the improvement provided by the GP. In Sec.IV we illustrate our results concerning the outer crust, and finally we present our conclusions in Sec.V.

II. GAUSSIAN PROCESS REGRESSION
We now introduces Gaussian processes, and their use as a regression tool. A Jupyter notebook is available as Supplementary Material; it was used to create Fig. 1, and contains additional plots which give a step-by-step introduction.
A Gaussian process (GP) is an infinite-dimensional Gaussian distribution. Similar to how a one dimensional (1D) Gaussian distribution has a mean µ and variance σ 2 , a GP has a mean function µ(x), and a covariance function k(x, x ), also known as the kernel. In principle, x can be a vector of length d representing a point in a d-dimensional input space, but we will just consider the case d = 1 for now, i.e., where x is a single number. Just as we can draw random samples (numbers) from a 1D Gaussian distribution, we can also draw random samples from a GP, which are functions f (x). The kernel k(x, x ) tells us the typical correlation between the value of f at any 2 inputs x and x , and entirely determines the behaviour of the GP (relative to the mean function). For simplicity, we use here a constant mean function of 0.
GPs can be used for regression of data if the underlying process generating the data is smooth and continuous. See Ref. [24] for a thorough introduction to GPs for regression and machine learning. Many software packages are available for GP regression; in the current article we use the Python package GPy [25]. For a set of data Y(x) = {y 1 (x 1 ), y 2 (x 2 ), . . . y n (x n )}, instead of assuming a fixed functional form for the interpolating function, we treat the data as originating from a Gaussian process GP: We make no parametric assumption on the shape of the interpolating function, making GPs a very flexible tool. We adopt the commonly used RBF (radial basis function) kernel, also known as the squared exponential or Gaussian, which yields very smooth samples f (x), and has the form where η 2 , are parameters to be optimised for a given Y. Both have easily interpretable meanings: η gives the typical magnitude of the oscillations of f (x), and the typical correlation length in x. When |x − x | is small, the correlation is large, and we expect f (x) and f (x ) to have similar values. As |x − x | grows beyond a few correlation lengths , the correlation between f (x) and f (x ) drops rapidly to 0. In Fig. 1 we show a simple demonstration of GP regression, where the underlying true function generating the data (dotted line) is simply y = sin(x). The GP mean (solid line) here represents average of all possible samples passing through the data Y (crosses), i.e. the mean prediction. The GP mean is smooth, and interpolates exactly all data points. Outside the input domain, it approaches 0. As we would expect, the quality of the GP regressions is greatest where there is more data available, in this case 0 ≤ x ≤ 4.
Also shown in Fig. 1 are confidence intervals, here representing 2σ (≈ 95%). The confidence intervals are 0 at each data point, and grow in between data points, more rapidly so when data are further apart. At the edges of the input domain, they also grow rapidly, representing the uncertainty in extrapolation, until reaching a maximum of 2η.
Clearly some sets of kernel parameters lead to better regression. For example, if is smaller than the typical data spacing, the GP mean will approach 0 in between data points, making it useless for interpolation; if η 2 is too large, the size of the confidence intervals will be overestimated. These parameters can be optimised by maximising the likelihood -see Ref. [26] for more details -as has been done in Fig. 1.

III. NUCLEAR MASSES
Nuclear mass models are used to reproduce the nuclear binding energies of all known nuclei, ≈ 2300 [27]. Within the mass database we distinguish two types of data: nuclear masses that have been directly measured (≈ 2400) and the extrapolated ones (≈ 750). The latter are obtained by indirect mass-measurements and we will use them to benchmark our extrapolations.
In the current article, we use the Duflo-Zucker mass-model [23]; it consist of 10 terms (DZ10 model), and is able to reproduce all known masses with a root mean square deviation of σ RM S ≈ 0.6 MeV [15]. We refer the reader to Refs. [28,29] for a detailed discussion on the different terms comprising the models.
The parameters of the DZ10 model have been adjusted in Ref. [15] using the Block-Bootstrap (BB) method [30]. The reason for using BB is that it provides robust error bars on the parameters that take into account correlations between them [31,32].
The assumption used to fit DZ10, as with any other mass model, is that the experimental binding energies B exp (N, Z) are equal to the theoretical ones B th (N, Z|a 0 ) up to a Gaussian error ε(N, Z) as where B th (N, Z) is the binding energy calculated using DZ10. In Fig. 2, we illustrate the residuals for DZ10 as a function of the nucleon number A = N + Z. One clearly sees that these residuals show structures, thus indicating the presence of some missing physics that is not properly described by the model. In Fig.3, we plot the same residuals as a histogram (labelled 'DZ10'). On the same figure we also draw a Gaussian with mean 0 and width fixed to the RMS of the residuals. The height of the Gaussian is fitted on the residuals. We observe that the residuals display a Gaussian distribution. A more detailed statistical test can be performed on these residuals to verify that they do not follow a regular Gaussian distribution -see for example Refs. [15,33] for more details -but for the current discussion a qualitative analysis is sufficient.
Having identified that there is room to improve the accuracy of the model, the most natural option to take is to add new terms [29]. For example, a version of the Duflo-Zucker model with 33 parameters is available. Although the RMS reduces to ≈ 300 keV, the extra terms appear poorly constrained [29], and therefore the model is unsuitable for extrapolation. We refer the reader to Ref. [34] for a detailed discussion on poorly constrained parameters.
Instead of explicitly creating new terms for a given mass model, we can take advantage of machine learning methods. For example, in Refs. [13,15], the authors have adjusted a neural network on the residuals of the DZ10 model in order to reduce the discrepancy between theory and experiment. The NN is able to reduce this discrepancy to a typical RMS of ≈ 350 keV [15].
NN are often very complex models, with several hundred free parameters. As discussed in [14], a Gaussian process represents a valid alternative to a NN; the main advantages are the very small number of adjustable parameters, as discussed in Sec.II, and the superior performance on the database of nuclear masses when compared with a NN [14].  Having introduced the GP in Sec.II, we now apply it to the case of nuclear masses. As done in Ref. [14], we consider the same kernel given in Eq.1, but now in the 2-dimensional case, meaning there are now three adjustable parameters. We also use a fourth parameter σ n , named the nugget. The use of the nugget carries several advantages, including numerical stability [35], and improved predictions [36]. The kernel we use is then given by where in the present case x = (N, Z), and η 2 , ρ Z , ρ N are the adjustable parameters. Following Ref. [14], ρ N and ρ Z are interpreted as correlation lengths in the neutron and proton directions, while η 2 gives the strength of the correlation between neighbouring nuclei.
The addition of the nugget term means that the GP mean does not necessarily pass directly through each data point, and that the confidence intervals only shrink to a minimum of σ n . After a preliminary investigation, we fixed σ n to 0.2.
As discussed previously, we adjust the parameters of the GP on the residuals of the DZ10 model (shown in the upper panel of Fig. 2). The parameters η, ρ N , ρ Z are determined through maximising the likelihood for the GP. See Ref. [26] for details. In Fig.4, we illustrate the posterior distribution of the parameters in form of a corner plot. The distributions were obtained with Markov Chain Monte Carlo (MCMC) sampling [37]. The plot illustrates the shapes of the distributions around the optimal parameter set, and it provides us with the error bars for the parameters and information about their correlations. In this case we see that all parameters are very well determined by the residuals data, and a weak correlation is observed between η and ρ N , and between η and ρ Z .
A very interesting result is that the two coherence lengths ρ N,Z are as large as, or greater than, 2. This means that, if we know the residual for a nucleus with mass number A, we can infer properties of the nucleus with A ± 2. This result is in agreement with the analysis done in Ref. [15], which was based on the auto-correlation coefficients.
We now construct our new model for B th (appearing in Eq. 3) as B th = B DZ10 − GP , which we name DZ10-GP. In the lower panel of Fig. 2, we illustrate the residuals obtained from the DZ10-GP model. We clearly see that the GP has been able to capture the missing physics of the DZ10 model, in particular the spikes observed in the upper panel of Fig. 2.
The total RMS of this combined model is σ = 178 keV, which at the moment is probably among the lowest values ever obtained using a mass model fitted on all the available masses, with a total of 10 + 4 = 14 adjustable parameters. We observe that the maximum discrepancy between theory and experiment is now always lower than 1 MeV, and the structure observed in Fig. 2 (upper panel) has now disappeared (lower panel), with the new residuals approaching a true white noise. The presence or not of white noise in the model may represent a lower bound on the accuracy one can achieve with a theoretical model, as discussed in Ref. [8]; we leave such an interesting analysis for a future investigation.
Having created the DZ10-GP model, we now benchmark its extrapolations on the set of ≈ 750 nuclear masses obtained via indirect measurements [27]. The results are presented in Fig.5. The original DZ10 model gives an RMS  It is worth noting that some outliers are still present. We have checked that the 6 nuclei with a residual larger than 6 MeV are all in the region of super-heavy nuclei with Z ≥ 108.
Since the main goal of this article is the study of the outer crust of a neutron star, in Fig. 6 we illustrate in great detail the evolution of the residuals for two isotopic chains -copper and nickel -that play a very important role in determining the composition of the outer crust [7].
We observe that the original DZ10 model reproduces fairly well the data in the middle of the isotopic chains, and that it tends to give large discrepancies at the edges. Even the inclusion of the statistical error bars of DZ10 are not enough to explain such a discrepancy. On the contrary, the use of the GP helps to flatten out the discrepancies, and produces predictions very close to the data in the extrapolated region. By considering the experimental and the theoretical error bars, we observe that our DZ10-GP model reproduces these data reasonably well.
In Fig. 7, we show the evolution, along two isotopic chains, of the GP's contribution to binding energy. We see that these contributions drop to zero as the neutron-rich region is approached. On the same figure, we also report the evolution of a 1-σ error bar provided by the GP. As discussed previously, we notice that the error bars grow towards the neutron drip-line, where we have little or no available data to constrain the GP.
This behaviour can be understood from the value of the GP's correlation length for neutrons, ρ N = 2.67: by construction the GP predictions tend to the mean of the data, in this case zero, after ≈ 2-3 times ρ N . This means that the GP will be effective in describing extrapolated neutron-rich nuclei with at most ≈ 8-10 neutrons more than the last nucleus in our training set. This is clearly only a rule of thumb, but it is enough to safely cover the extrapolated nuclei that are present in the outer crust [38] of a neutron star.

IV. OUTER CRUST
To determine the chemical composition of the outer crust, we minimise the Gibbs free energy per particle, which is defined as [39] where ρ b is the baryonic density. The three terms E nuc , E e , E l are the nuclear, electronic and lattice energies per nucleon respectively [40]. The pressure P arises only from lattice and electron contributions as P = P L + P e . For more details, we refer to Ref. [39], where the entire formalism has been discussed in great detail.
The novelty of the current approach is in the treatment of the nuclear term, which takes the form where m p(n) is the mass of the proton (neutron) and B is the nuclear binding energy given by the mass model. In the current article, we use the mass model DZ10-GP as discussed in Sec.II. The composition given by the current mass model is given in Tab. IV. By comparing the DZ10-GP results with those obtained using only the DZ10 model, we observe some discrepancies in the extrapolated region at low P . In particular we notice that the improved mass model (DZ10-GP) predicts the existence of 80 Zr, that is not considered in the original DZ10 model. At higher P , the two mass models tend to give very similar results. This is simple to understand since, as discussed in Sec. II, the GP correction tends to zero for large extrapolations, as seen in Fig. 7.  I: Composition of the outer crust of a NS using the DZ10 and DZ10-GP mass models. In the first and fourth columns we report the maximum value of pressure at which the nucleus is found using the minimisation procedure. The horizontal line separates the measured and extrapolated masses reported in AME2016 [27].
Since our goal is to obtain the statistical uncertainties of the equation of state, we perform a simple Monte Carlo sampling of the error bars of our DZ10-GP model (under a Gaussian assumption). We generate 10 4 new mass tables, and we use them to calculate the composition of the outer crust.
Using a frequentist approach [41], we define the existence probability of each nucleus as the ratio of the number of times a given nucleus appears in the various EoS at a given pressure, divided by the total number of mass tables. See Ref. [15] for more details.
In Fig.8, we show the evolution of the existence probability for each nucleus in the outer crust as a function of the pressure of the star. We notice that, as confirmed by other authors [38], the favourable configurations are those close to the neutron shell closures at N= 50 and N= 82. However, due to the large error bars, there is a non-negligible probability for several nuclei to be present within the outer crust. For example in the transition region from N=50 to N=82 at P ≈ 10 −4 MeV fm −3 , we observe that other nuclei may also be present as Ruthenium, Niobium or Zirconium. We finally notice that given the structure of the error bars of the model as illustrated in Fig.6, the existence probability becomes smaller thus reflecting the uncertainty of the mass model used. To have a better picture of the outer crust, a further reduction of the error bars and thus an increase of the model accuracy is thus required.
Using the same data set, we also define a statistical uncertainty for the EoS: by counting the 10 4 EoS built before, we define the 68%, 95%, and 99% quantiles of the counts, i.e., 1-σ, 2-σ and 3-σ deviations, under the assumption that the errors follow a Gaussian distribution. The results are presented in Fig.9. We observe that the largest uncertainties are located close to the transition from N=50 to N=82 at P ≈ 1.2 × 10 −4 [MeV fm −3 ] and approaching the transition to the inner crust at P ≈ 5 × 10 −4 [MeV fm −3 ].

V. CONCLUSIONS
By using a Gaussian process fitted to the residuals of the Duflo-Zucker mass model, we have been able to create a mass model with a global RMS of less than 200 keV. The resulting DZ10-GP model has the major advantage of having a very limited amount of parameters (10 in the original DZ model plus 4 for the GP), but it is also one of the very few mass models equipped with error bars [29,42]. The values of the mass table are available in the Supplementary Material.
Apart form its simplicity and reduced computational cost, GP has also the advantage of providing automatically error bars [43]. This is not the case for other machine learning methods as for example decision trees [33] or neural networks [18]. As discussed in an Editorial in Physical Review A [44]; the use of theoretical error bars is fundamental in order to make a reasonable comparison with experimental data, especially when using the mass model to perform extrapolations: this is not common practice among different mass models, apart form few exceptions [29,42,45]. In this article, we have also applied the GP-DZ10 mass model to study the composition of the outer crust of a neutron star, paying particular attention to the role of statistical errors and how they propagate to the final EoS. Following the methodology presented in Ref. [15], we have defined an existence probability of a nucleus within the crust. Such a quantity help us identifying the possible accuracy problems related with our model and it may help prioritising future experimental proposal to further improve our knowledge of the crust of a neutron star.