Next Article in Journal
The Information Geometry of Space-Time
Previous Article in Journal
An Entropic Dynamics Approach to Geometrodynamics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

The Nested_fit Data Analysis Program †

by
Martino Trassinelli
Institut des NanoSciences de Paris, CNRS, Sorbonne Université, 4 Place Jussieu, 75005 Paris, France
Presented at the 39th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Garching, Germany, 30 June–5 July 2019.
Proceedings 2019, 33(1), 14; https://doi.org/10.3390/proceedings2019033014
Published: 28 November 2019

Abstract

:
We present here Nested_fit, a Bayesian data analysis code developed for investigations of atomic spectra and other physical data. It is based on the nested sampling algorithm with the implementation of an upgraded lawn mower robot method for finding new live points. For a given data set and a chosen model, the program provides the Bayesian evidence, for the comparison of different hypotheses/models, and the different parameter probability distributions. A large database of spectral profiles is already available (Gaussian, Lorentz, Voigt, Log-normal, etc.) and additional ones can easily added. It is written in Fortran, for an optimized parallel computation, and it is accompanied by a Python library for the results visualization.

1. Introduction

Nested_fit is a general purpose parallelized data analysis code for the evaluation of Bayesian evidence and parameter probability distributions for given data sets and modeling function. The computation of the Bayesian evidence is based on the nested sampling algorithm [1,2,3], for the integration of the likelihood function over the parameter space. This integration is obtained reducing the J-dimensional volume (where J is the number of parameters) in a one-dimensional integral by a clever exploration of the parameter space. In Nested_fit, this exploration is obtained with a search algorithm for new parameter values called lawn mower robot, which has been initially developed by L. Simons [4] and modified here for a better exploration of multimodal problems.
Nested_fit has been developed over the past years to analyze several sets of experimental data from, mainly, atomic physics experiments. For this reason, it has some special feature well adapted to the analysis of atomic spectra as specific line profiles, possibility to study correlated spectra at the same time, eg. background and signal-plus-background spectra, and with a likelihood function built considering a Poisson statistics per each channel, well adapted to low-statistics data.
In the next section we will describe the general structure and feautres of Nested_fit. In Section 3 we shortly introduce the basic concepts of Bayesian model comparison and the nested sampling method. The specific algorithm for the parameter space exploration for the nested sampling is presented in details in Section 4. An example of application of Nested_fit is presented in Section 5 for the analysis of single two-body electron capture ion decay. A conclusive section will end the article, where recent application of Nested_fit to different atomic physics analysis are mentioned.

2. General Structure of the Program

The general structure of the program is represented in Figure 1. The main input files are two: the file nf_input.dat, where all computation input parameters are included, and the data file, which name is indicated in the parameter input file. The function name in the input file indicates the model to be used for the calculation of the likelihood function. Several functions are already defined in the function library for modelling spectral lines: Gaussian, Log-normal, Lorentzian, Voigt (Gaussian and Lorentzian convolution), Gaussian convoluted with an exponential (for asymmetric peaks), etc. Additional functions can be easily defined by the users in the dedicated routine (USERFNC). Differently from the version presented in Ref. [5] (V. 0.7), in the new version discussed here (V. 2.2) non-analytical or simulated profile models can be implemented. In this case, one or more additional files have to be provided by the users. These external data, which can have some noise like the case of simulated data, are interpolated by B-splines using FITPACK routines [6]. The B-spline parameters are stored and used as profile/model with the total amplitude and a possible offset as free parameters. An additional feature of this new program version, is the possibility to analyze data with error bars. This option has to be indicated in the input file.
Several data sets can be analyzed at the same time by selecting the option “set of data: YES”. This is particularly important for the correct study of physically correlated spectra at the same time, e.g., background and signal-plus-background spectra. This is done using a global user-defined function with common parameters of specific models for each spectrum. In the case of multiple data files, the program read an additional input parameter file nf_input_set.dat for the additional datafile names to analyze and data ranges to consider.
The exploration of the parameter space and the corresponding evaluation of the likelihood function is done implementing the nested sampling algorithm [1,2,3]. If the data are in the format ( c h a n n e l , c o u n t s ) , a Poisson distribution for each channel is assumed for the likelihood function. If the data has error bars ( c h a n n e l , y , δ y ) , a Gaussian distribution is assumed (new feature in V. 2.2).
The main analysis results are summarized in the output file nf_output_res.dat . Here the details of the computation (n. of live points, n. of trials, n. of total iteration) can be found as well as the final evidence value and its uncertainty E ± δ E , the parameter values a ^ corresponding to the maximum of the likelihood function, the mean, the median, the standard deviation and the confidence intervals (68%, 95% and 99%) of the posterior probability distribution of each parameter. The information gain H and the Bayesian complexity C are also provided in the output.
Data for plots and for further analyses are provided in the files nf_output_data_*.dat. These files contain the original input data together with the model function values corresponding to the parameters with the highest likelihood function value (nf_output_data_max.dat) or the parameter mean value (nf_output_data_mean.dat) or median value (nf_output_data_median.dat) with the corresponding residuals and error bars. Additional nf_output_fit_*.dat files contain a model evaluation with higher density than the original data for graphical presentation purpose.
The step-by-step details of the nested sampling exploration are provided in the file nf_output_points.dat that contains the live points used during the parameter space exploration, their associated likelihood values and posterior probabilities. From this file, the different parameter probability distributions and joint probabilities can be built from the marginalization of the unretained parameters. For this purpose, a special dedicated Phython library Nested_res has been developed. Additional informations can be found in Ref. [5].

3. Implementation of the Nested Sampling for the Evidence Calculation

For a given data set(s) { x i , y i } and model(s) M , the Bayesian evidence P ( { x i , y i } | M , I ) is extracted for the evaluation of the probability to the different models them-selves:
P ( M | { x i , y i } , I ) P ( { x i , y i } | M , I ) × P ( M | I ) ,
where P ( M | I ) is the prior probability of each model (assumed constant if not specific preferences for the model is present) and I indicates the background information. The Bayesian evidence is the integral value of the likelihood function over the entire parameter space defined by the priors P ( a | M , I ) :
E ( M ) P ( { x i , y i } | M , I ) = P ( { x i , y i } | a , M , I ) P ( a | M , I ) d J a = L M ( a ) P ( a | M , I ) d J a ,
where J is the number of the parameters of the considered model, and where we explicitly show the dependency of likelihood function L M ( a ) on the model M .
The calculation of the Bayesian evidence is made with the nested sampling, similarly to other available codes [2,7,8,9,10]. Nested sampling allows for reducing the above integral in the one-dimensional integral
E ( M ) = 0 1 L ( X ) d X ,
where X is defined by the relation
X ( L ) = L ( a ) > L P ( a | I ) d J a .
Equation (3) can be numerically calculated using the rectangle integration method subdividing the [ 0 , 1 ] interval in M + 1 segments with an ensemble { X m } of M ordered points 0 < X M < . . . < X 2 < X 1 < X 0 = 1 . We have then
E ( M ) m L m Δ X m ,
where L m = L ( X m ) and Δ X m = X m X m + 1 . The evaluation of L m is obtained by the exploration of the likelihood function with a Monte Carlo sampling via a subsequence of steps. For this, we use a collection of K parameter values { a k } that we call live points. More details on the nested sampling algorithm and its implementation can be found in Refs. [1,2,3,7,8,9,10]. The specific implementation of nested sampling in Nested_fit is presented in details in Ref. [5].
The bottleneck of the nested sampling algorithm is the search of new points within the J-dimensional volume defined by L > L m . Different methods are commonly employed to accomplish this difficult task. One efficient method is the ellipsoidal nested sampling [7]. It is based on the approximation of the iso-likelihood contour defined by L = L m by a J-dimensional ellipsoid calculated from the covariance matrix of the live points. The new point is then selected within the ellipsoidal volume (with an enlargement factor selected by the user). This method, well adapted for unimodal posterior distribution has also been extended to multimodal problems [8,9], i.e., with the presence of distinguished regions of the parameter space with high values of the likelihood function. Other search algorithms are based on Markov chain Monte Carlo (MCMC) methods [10] and the recent Galilean Monte Carlo [11,12], particularly adapted to explore the regions close to the boundary of V L > L m volumes. Nested_fit program is based on an improved version of the lawn mower robot method, originally developed by L. Simons [4] and presented in details in the next section.

4. The Lawn Mower Robot Search Algorithm

A schematic view of the improved lawn mower robot algorithm is represented in Figure 2. To cancel the correlation between the starting point and the final point, a series of N jumps are made in this volume. The different stages of the algorithm are
1.
Choose randomly a starting point a n = 0 = a 0 from the available live points { a m , k } as starting point of the Markov chain where n is the number of the jump. The number of tries n t (see below) is set to zero.
2.
From the values a n 1 , find a new parameter sets a n where each j th parameter is calculated by ( a n ) j = ( a n 1 ) j + f r j σ j , where σ j is the standard deviation of the live points of the nested sampling computation step relative to the j th parameter, r j [ 1 , 1 ] is a sorted random number and f is a factor defined by the user.
(a)
If L ( a n ) > L m and n < N , go to the beginning of step 2 with an increment of the jump number n = n + 1 .
(b)
If L ( a n ) > L m and n = N , a n = N is new live point to be included in the new set { a m + 1 , k } .
(c)
If L ( a n ) < L m and n < N and the number of tries n t is less than the maximum allowed number N t , go back to beginning of step 2 with an increment of the number of tries n t = n t + 1 .
(d)
If L ( a n ) < L m and n < N and n t = N t a new parameter set a 0 has to be selected. Instead than choosing one of the existing live points, a 0 is built from distinct j th components from different live points: ( a 0 ) j = ( a m , k ) j where k is randomly chosen between 1 and K for each j. Then a n = 0 = a 0 and go to the beginning of step 2.
Step 2c, the main improvement of the original lawn mower robot algorithm, makes the algorithm well adapted to problems with multimodal parameter distributions allowing easy jump between high-likelihood regions. The value of N t is fixed in the code ( N t = 10,000 in the present version). The other parameters can be provided by the input file.

5. An Application to Low-Statistics Data

To show the capabilities of Nested_fit, we present in this section its implementation on a particular critical case corresponding to a debated experiment. In 2008 it was observed an unexpected modulation in the two-body electron capture decay of single H-like 61 142 Pm ions to the stable 60 142 Nd bare nucleus, with a monochromatic electron-neutrino emission [13]. The same modulation frequency, but with much smaller amplitude, was found in 2010 data [14] but not in the latest campaign in 2014 [15] where much more events have been recorded.
The unstable ions are produced by collision with a solid target and then injected in a storage ring where they are cooled down. In the storage ring, the decay time of single ions is measured from changes of the Schottky noise frequency induced by the ion revolution. The H-like 61 142 Pm ion and 60 142 Nd bare nucleus masses correspond in fact to different revolution frequencies. From the accumulated data of single decay events, the decay probability per unit of time can be measured. An example of the data collected in 2010 is presented in Figure 3 (up).
The observed modulation of the expected exponential decay has not yet a clear explanation. A possible connection with neutrino masses differences is speculated in the literature. The determination of the presence or not of a modulation is a perfect case for implementing Bayesian model comparison with Nested_fit.
When a possible modulation of the exponential decay is assumed, the likelihood function corresponding to 2010 data presents several maxima. This reflect the periodicity nature of the considered function, which can manifest itself via different harmonics, and the low number of available counts per channel. The difficulties to deal with these multiple likelihood maxima pushed in fact the creation of the improved lawn mower robot algorithm.
In Figure 3 (top) we present the collected data together with the exponential and modulated exponential functions corresponding to the most probable parameter set. The output result from Nested_fit are presented in Table 1 where model 1 and 2 represent the absence of presence of modulation. For each model, values of the evidence, Bayesian complexity and extracted information are provided, as well as model probabilities. The uncertainty of the probabilities is related to the uncertainty of the evidence. As example of probability distribution, we present in Figure 3 (bottom) the joint probability of the amplitude a and pulsation ω of the modulation in model 2. The 2D histogram (obtained with Python nested_res.py library that accompany Nested_fit program) is constructed by marginalization on the other parameters. As it can be seen, different maxima are visible, which make difficult the convergence of the nested sampling method. The improved lawn mower robot algorithm can deal with this kind of situation, even if the computation time is sometime long (several days in a single CPU).
As it can be observed, the assigned probability to each model are similar and the confidential intervals for the parameter relative to the modulation model are very large. These two aspects reflect the difficulty to treat this problem where the acquired data are not sufficient to provide a marked preference for one model with or without modulation (see Ref. [15] for a more extended discussion). Even if apparently unsatisfying, this result avoid however possible over-interpretation of the data commonly encountered when classical methods are employed, as recently discussed in Ref. [16] in the context of nuclear physics.

6. Conclusions

We presented here the program Nested_fit, a general purpose parallelized data analysis code for the evaluation of Bayesian evidence and other statistically relevant outputs. It uses the nested sampling method with the implementation of the improved lawn mower robot algorithm for the evaluation of the Bayesian evidence. Nested_fit has been developed over the past years for the analysis of several sets of atomic experimental data that strongly contribute to the code evolution. We would like to mention in particular the analysis of low-statistics X-ray spectra of He-like uranium [5,17], X-ray spectra of pionic atoms [18,19], electron photoemission spectra from nano-particles [20,21], single-ion decay spectra [15] and response function of crystal X-ray spectrometers (in progress).
Compared to the version reported in Ref. [5], the presented version (V. 2.2) shows additional important features: i) the possibility to interpolate and use computed or simulated external profiles and ii) the implementation of Gaussian likelihood function for data with error bars.
Future developments of Nested_fit will be focussed on the implementation of new exploration methods for the live point evolution of the nested sampling [8,9,11,12]. More precisely, the main goal is the improvement the efficiency for the exploration of the parameter space where the likelihood function presents several local maxima.

Funding

This research received no external funding.

Acknowledgments

The author thanks the Alexander von Humboldt Foundation that provide the financing to attend the MaxEnt2019 conference. Moreover, the author would like to express once again his deep gratitude to Leopold M. Simons who introduced the author to the Bayesian data analysis and without whom this work could not have been started. The development of this program would not be possible without the close interactions and discussions with many collaborators that the author would like to thank as well: N. Winckler, R. Grisenti, A. Lévy, D. Gotta, Y. Litvinov, J. Machado, N. Paul and all members of the Pionic Hydrogen, FOCAL and GSI Oscillation collaborations and the ASUR group at INSP.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Skilling, J. Nested Sampling. AIP Conf. Proc. 2004, 735, 395–405. [Google Scholar]
  2. Sivia, D.S.; Skilling, J. Data Analysis: A Bayesian Tutorial, 2nd ed.; Oxford University Press: Oxford, UK, 2006. [Google Scholar]
  3. Skilling, J. Nested sampling for general Bayesian computation. Bayesian Anal. 2006, 1, 833–859. [Google Scholar] [CrossRef]
  4. Theisen, M. Analyse der Linienform von Röntgenübergängen nach der Bayesmethode. Master Thesis, Faculty of Mathematics, Computer Science and Natural Sciences, RWTH Aachen University, Aachen, Germany, 2013. [Google Scholar]
  5. Trassinelli, M. Bayesian data analysis tools for atomic physics. Nucl. Instrum. Methods B 2017, 408, 301–312. [Google Scholar] [CrossRef]
  6. Dierckx, P. Curve and Surface Fitting with Splines; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
  7. Mukherjee, P.; Parkinson, D.; Liddle, A.R. A Nested Sampling Algorithm for Cosmological Model Selection. Astrophys. J. Lett. 2006, 638, L51. [Google Scholar] [CrossRef]
  8. Feroz, F.; Hobson, M.P. Multimodal nested sampling: An efficient and robust alternative to Markov Chain Monte Carlo methods for astronomical data analyses. Mon. Not. R. Astron. Soc. 2008, 384, 449–463. [Google Scholar] [CrossRef]
  9. Feroz, F.; Hobson, M.P.; Bridges, M. MultiNest: An efficient and robust Bayesian inference tool for cosmology and particle physics. Mon. Not. R. Astron. Soc. 2009, 398, 1601–1614. [Google Scholar] [CrossRef]
  10. Veitch, J.; Vecchio, A. Bayesian coherent analysis of in-spiral gravitational wave signals with a detector network. Phys. Rev. D 2010, 81, 062003. [Google Scholar] [CrossRef]
  11. Skilling, J. Bayesian computation in big spaces-nested sampling and Galilean Monte Carlo. AIP Conf. Proc. 2012, 1443, 145–156. [Google Scholar]
  12. Feroz, F.; Skilling, J. Exploring multi-modal distributions with nested sampling. AIP Conf. Proc. 2013, 1553, 106–113. [Google Scholar]
  13. Litvinov, Y.A.; Bosch, F.; Winckler, N.; Boutin, D.; Essel, H.G.; Faestermann, T.; Geissel, H.; Hess, S.; Kienle, P.; Knöbel, R.; et al. Observation of non-exponential orbital electron capture decays of hydrogen-like 140Pr and 142Pm ions. Phys. Lett. B 2008, 664, 162–168. [Google Scholar] [CrossRef]
  14. Kienle, P.; Bosch, F.; Bühler, P.; Faestermann, T.; Litvinov, Y.A.; Winckler, N.; Sanjari, M.; Shubina, D.; Atanasov, D.; Geissel, H.; et al. High-resolution measurement of the time-modulated orbital electron capture and of the decay of hydrogen-like 142Pm60+ ions. Phys. Lett. B 2013, 726, 638–645. [Google Scholar] [CrossRef]
  15. Ozturk, F.C.; Akkus, B.; Atanasov, D.; Beyer, H.; Bosch, F.; Boutin, D.; Brandau, C.; Bühler, P.; Cakirli, R.B.; Chen, R.J.; et al. Recision Test of Purely Exponential Electron Capture Decay of Hydrogen-Like 142 Pm Ions. arXiv 2019, preprint. arXiv:1907.06920. [Google Scholar]
  16. King, G.; Lovell, A.; Neufcourt, L.; Nunes, F. Direct Comparison between Bayesian and Frequentist Uncertainty Quantification for Nuclear Reactions. Phys. Rev. Lett. 2019, 122, 232502. [Google Scholar] [CrossRef] [PubMed]
  17. Trassinelli, M.; Kumar, A.; Beyer, H.F.; Indelicato, P.; Märtin, R.; Reuschl, R.; Stöhlker, T. Doppler-tuned Bragg spectroscopy of excited levels in He-like uranium: A discussion of the uncertainty contributions. J. Phys. CS 2009, 163, 012026. [Google Scholar] [CrossRef]
  18. Trassinelli, M.; Anagnostopoulos, D.F.; Borchert, G.; Dax, A.; Egger, J.P.; Gotta, D.; Hennebach, M.; Indelicato, P.; Liu, Y.W.; Manil, B.; et al. Measurement of the charged pion mass using X-ray spectroscopy of exotic atoms. Phys. Lett. B 2016, 759, 583–588. [Google Scholar] [CrossRef]
  19. Trassinelli, M.; Anagnostopoulos, D.; Borchert, G.; Dax, A.; Egger, J.P.; Gotta, D.; Hennebach, M.; Indelicato, P.; Liu, Y.W.; Manil, B.; et al. Measurement of the charged pion mass using a low-density target of light atoms. EPJ Web Conf. 2016, 130, 01022. [Google Scholar] [CrossRef]
  20. Papagiannouli, I.; Patanen, M.; Blanchet, V.; Bozek, J.D.; de Anda Villa, M.; Huttula, M.; Kokkonen, E.; Lamour, E.; Mevel, E.; Pelimanni, E.; et al. Depth Profiling of the Chemical Composition of Free-Standing Carbon Dots Using X-ray Photoelectron Spectroscopy. J. Phys. Chem. A 2018, 122, 14889–14897. [Google Scholar] [CrossRef]
  21. Villa, M.D.A.; Gaudin, J.; Amans, D.; Boudjada, F.; Bozek, J.; Grisenti, R.E.; Lamour, E.; Laurens, G.; Macé, S.; Nicolas, C.; et al. Assessing the surface oxidation state of free-standing gold nanoparticles produced by laser ablation. Langmuir 2019. submitted. [Google Scholar]
Figure 1. Scheme of the Nested_fit program.
Figure 1. Scheme of the Nested_fit program.
Proceedings 33 00014 g001
Figure 2. Scheme of lawn mower robot algorithm.
Figure 2. Scheme of lawn mower robot algorithm.
Proceedings 33 00014 g002
Figure 3. Top: Data relative to the single decay of H-like 61 142 Pm to 60 142 Nd bare nucleus obtained with a binning of 0.08 s. The profile curves relative to pure exponential and exponential with modulation models are also represented. Bottom: 2D histogram of the joint probability of the amplitude a and pulsation ω of the model with modulation. Red, yellow and green colors represent approximatively the regions corresponding to 68%, 95% and 66% confidence intervals. Both figures are obtained by Python nested_res.py library that accompany Nested_fit program.
Figure 3. Top: Data relative to the single decay of H-like 61 142 Pm to 60 142 Nd bare nucleus obtained with a binning of 0.08 s. The profile curves relative to pure exponential and exponential with modulation models are also represented. Bottom: 2D histogram of the joint probability of the amplitude a and pulsation ω of the model with modulation. Red, yellow and green colors represent approximatively the regions corresponding to 68%, 95% and 66% confidence intervals. Both figures are obtained by Python nested_res.py library that accompany Nested_fit program.
Proceedings 33 00014 g003
Table 1. Summary of the results provided from Nested_fit for the two considered models. The parameter values are given in terms of most probable value and 95% confidential interval (CI).
Table 1. Summary of the results provided from Nested_fit for the two considered models. The parameter values are given in terms of most probable value and 95% confidential interval (CI).
Model 1Model 2
Function y = N 0 e t / τ y = N 0 e t / τ [ 1 + a sin ( ω t + ϕ ) ]
log e (Evidence) 1594.11± 0.30 1594.60± 0.36
Probability34.2–41.9%58.1–68.8%
Complexity2.0515.19
Extracted information [nat]4.766.32
ω (CI 95%) [rad s 1 ] 0.89( 0.17 6.86)
a (CI 95%) 9.2× 10 2 ( 2.2× 10 4 7.2× 10 2 )
ϕ (CI 95%) [rad] 3.84( 0.18 6.14)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Trassinelli, M. The Nested_fit Data Analysis Program. Proceedings 2019, 33, 14. https://doi.org/10.3390/proceedings2019033014

AMA Style

Trassinelli M. The Nested_fit Data Analysis Program. Proceedings. 2019; 33(1):14. https://doi.org/10.3390/proceedings2019033014

Chicago/Turabian Style

Trassinelli, Martino. 2019. "The Nested_fit Data Analysis Program" Proceedings 33, no. 1: 14. https://doi.org/10.3390/proceedings2019033014

Article Metrics

Back to TopTop