# TI-Stan: Model Comparison Using Thermodynamic Integration and HMC

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

#### 1.1. Motivation

#### 1.2. Thermodynamic Integral Derivation

- use MCMC to estimate the expected energy at each value of $\beta $,
- develop a fixed schedule for $\beta $ or employ an adaptive strategy to choose optimal stepped values of $\beta $,
- and compute the quadrature estimate of the integral.

#### 1.3. Outline

## 2. Adaptive Annealing and Importance Sampling

- Start at $\beta =0$ where $\mathrm{p}(\mathsf{\Theta}|\mathit{M},\mathit{D},\beta ,I)=\mathrm{p}(\mathsf{\Theta}|\mathit{M},I)$, and draw C samples from this distribution (the prior).
- Compute the Monte Carlo estimator for the expected energy at the current $\beta $,$${\left(\right)}_{{E}_{L}}\beta $$
- Increment $\beta $ by $\Delta {\beta}_{i}$, where$$\Delta {\beta}_{i}=\frac{\mathrm{log}\frac{max{w}_{j}}{min{w}_{j}}}{max{E}_{L}({\mathsf{\Theta}}_{\mathit{i}})-min{E}_{L}({\mathsf{\Theta}}_{\mathit{i}})},$$$${w}_{j}=exp[-\Delta {\beta}_{i}{E}_{L}({\mathsf{\Theta}}_{\mathit{j}})].$$
- Re-sample the population of samples using importance sampling.
- Use MCMC to refresh the current population of samples. This yields a more accurate sampling of the distribution at the current temperature. This step can be easily parallelized, as each sample’s position can be shifted independently of the others.
- Return to step 2 and continue until ${\beta}_{i}$ reaches 1.
- Estimate Equation (9) using quadrature and the expected energy estimates built up using Equation (10).

#### 2.1. Importance Sampling with Re-Sampling

Algorithm 1 Importance sampling with re-sampling |

1: function ImportanceSampling($w,\alpha ,{E}^{*},C$) |

2: Sort w, $\alpha $, and ${E}^{*}$ by w |

3: $w\leftarrow (C/\sum w)w$ |

4: $u\leftarrow \mathrm{R}\mathrm{AND}(0,1)$ |

5: ${w}_{old}\leftarrow w$ |

6: for $i\leftarrow 1,C$ do |

7: ${w}_{i}\leftarrow {\sum}_{k}^{i}{w}_{old,i}$ |

8: end for |

9: $j\leftarrow 0$ |

10: $q\leftarrow -1$ |

11: for $m\leftarrow 1,C$ do |

12: while ${w}_{m}>u$ do |

13: ${\alpha}_{j}\leftarrow {\alpha}_{m}$ |

14: ${E}_{j}^{*}\leftarrow {E}_{m}^{*}$ |

15: $q\leftarrow m$ |

16: $u\leftarrow u+1$ |

17: $j\leftarrow j+1$ |

18: end while |

19: end for |

20: end function |

#### 2.2. Adaptive Annealing

## 3. Representing the Model Parameters

## 4. Thermodynamic Integration with Binary Slice Sampling

Algorithm 2 Thermodynamic integration with binary slice sampling |

1: procedure TI($P,M,S,N,C,B,W,data$) |

2: Inputs: P–Number of parameters, M–Number of chains steps, S–Number of slice sampling steps, N–New origin probability, C–Number of chains, B–Number of bits per parameter, W–Ratio to control adaptive annealing, $data$–Data |

3: for $m\leftarrow 1,C$ do |

4: $X\leftarrow \mathrm{R}\mathrm{AND}\mathrm{I}\mathrm{NT}(0,{2}^{PB}-1)$ |

5: ${\alpha}^{m}\leftarrow \mathrm{L}\mathrm{INE}\mathrm{T}\mathrm{O}\mathrm{A}\mathrm{XES}(X,B,P)$ |

6: ${E}_{m}^{*}\leftarrow \mathrm{E}\mathrm{NERGY}({\alpha}^{m},data)$ |

7: end for |

8: $i\leftarrow 1$ |

9: Compute ${\langle {E}^{*}\rangle}_{i}$ |

10: ${\beta}_{1}\leftarrow min\{\mathrm{log}(W)/[max({E}^{*})-min({E}^{*})],1\}$ |

11: $w\leftarrow exp(-{\beta}_{1}{E}^{*})$ |

12: $\mathrm{I}\mathrm{MPORTANCE}\mathrm{S}\mathrm{AMPLING}(w,\alpha ,{E}^{*},C)$ |

13: while ${\beta}_{i}>0$ and ${\beta}_{i}<1$ do |

14: for $i\leftarrow 1,M$ do |

15: for $m\leftarrow 1,C$ do |

16: $\mathrm{B}\mathrm{INARY}\mathrm{S}\mathrm{LICE}\mathrm{S}\mathrm{AMPLING}({\alpha}^{m},{E}_{m}^{*},B,C,P,S,N,{\beta}_{i},data)$ |

17: end for |

18: $\mathrm{L}\mathrm{EAP}\mathrm{F}\mathrm{ROG}(\alpha ,{E}^{*},B,C,P,data)$ |

19: end for |

20: $i\leftarrow i+1$ |

21: $\Delta \beta \leftarrow \mathrm{log}(W)/[max({E}^{*})-min({E}^{*})]$ |

22: ${\beta}_{i}\leftarrow min({\beta}_{i-1}+\Delta \beta ,1)$ |

23: if ${\beta}_{i-1}+\Delta \beta >1$ then |

24: $\Delta \beta \leftarrow 1-{\beta}_{i-1}$ |

25: end if |

26: $w\leftarrow exp(-\Delta \beta {E}^{*})$ |

27: $\mathrm{I}\mathrm{MPORTANCE}\mathrm{S}\mathrm{AMPLING}(w,\alpha ,{E}^{*},C)$ |

28: end while |

29: Estimate Equation (9) using trapezoid rule and $\left\{{\beta}_{i}\right\}$ and $\left\{{\langle {E}^{*}\rangle}_{i}\right\}$ |

30: end procedure |

Algorithm 3 Binary slice sampling |

1: function BinarySliceSampling($\alpha ,{E}^{*},{B}_{in},C,P,S,N,\beta ,data$) |

2: $B\leftarrow {2}^{P{B}_{in}}$ |

3: ${X}_{orig}\leftarrow \mathrm{R}\mathrm{AND}\mathrm{I}\mathrm{NT}(0,B)$ |

4: ${\alpha}_{orig}\leftarrow \mathrm{L}\mathrm{INE}\mathrm{T}\mathrm{O}\mathrm{A}\mathrm{XES}({X}_{orig},{B}_{in},P)$ |

5: ${\alpha}_{i}\leftarrow \alpha $ |

6: for $i\leftarrow 1,S$ do |

7: if $\mathrm{R}\mathrm{AND}(0,1)<N$ then |

8: ${X}_{orig}\leftarrow \mathrm{R}\mathrm{AND}\mathrm{I}\mathrm{NT}(0,B)$ |

9: ${\alpha}_{orig}\leftarrow \mathrm{L}\mathrm{INE}\mathrm{T}\mathrm{O}\mathrm{A}\mathrm{XES}({X}_{orig},{B}_{in},P)$ |

10: end if |

11: ${\alpha}_{i}\leftarrow ({\alpha}_{i}-{\alpha}_{orig})\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}{2}^{{B}_{in}}$ |

12: $X\leftarrow \mathrm{A}\mathrm{XES}\mathrm{T}\mathrm{O}\mathrm{L}\mathrm{INE}({\alpha}_{i},{B}_{in},P)$ |

13: $U\leftarrow \mathrm{R}\mathrm{AND}\mathrm{I}\mathrm{NT}(0,B)$ |

14: $y\leftarrow \beta \mathrm{E}\mathrm{NERGY}(\alpha ,data)-\mathrm{log}(\mathrm{R}\mathrm{AND}(0,1))$ |

15: $l\leftarrow P{B}_{in}$ |

16: ${E}^{*\prime}\leftarrow \infty $ |

17: while $\beta {E}^{*\prime}>y$ and $l>1$ do |

18: $N\leftarrow \mathrm{R}\mathrm{AND}\mathrm{I}\mathrm{NT}(0,{2}^{l})$ |

19: ${X}^{\prime}\leftarrow (\{[(X-U)\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}B]\oplus N\}+U)\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}B$ |

20: ${\alpha}_{i}\leftarrow \mathrm{L}\mathrm{INE}\mathrm{T}\mathrm{O}\mathrm{A}\mathrm{XES}({X}^{\prime},{B}_{in},P)$ |

21: ${\alpha}_{i}\leftarrow ({\alpha}_{i}+{\alpha}_{orig})\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}{2}^{{B}_{in}}$ |

22: ${E}^{*\prime}\leftarrow \mathrm{E}\mathrm{NERGY}({\alpha}_{i},data)$ |

23: $l\leftarrow l-P$ |

24: end while |

25: $\alpha \leftarrow {\alpha}_{i}$ |

26: end for |

27: end function |

Algorithm 4 Leapfrog sampling function |

1: procedure Leapfrog($\alpha ,{E}^{*},B,C,P,data$) |

2: for $m\leftarrow 1,C$ do |

3: ${X}_{m}\leftarrow \mathrm{A}\mathrm{XES}\mathrm{T}\mathrm{O}\mathrm{L}\mathrm{INE}({\alpha}_{m},B,P)$ |

4: end for |

5: Sort X |

6: for $m\leftarrow 1,C$ do |

7: ${\alpha}_{m}\leftarrow \mathrm{L}\mathrm{INE}\mathrm{T}\mathrm{O}\mathrm{A}\mathrm{XES}({X}_{m},B,P)$ |

8: ${E}_{m}^{*}\leftarrow \mathrm{E}\mathrm{NERGY}({\alpha}_{m},data)$ |

9: end for |

10: for $m\leftarrow 1,C$ do |

11: $\alpha cur\leftarrow {\alpha}_{m}$ |

12: if $m=1$ then |

13: $l\leftarrow {\alpha}_{C}$ |

14: else |

15: $l\leftarrow {\alpha}_{m-1}$ |

16: end if |

17: if $m=C$ then |

18: $r\leftarrow {\alpha}_{1}$ |

19: else |

20: $r\leftarrow {\alpha}_{m+1}$ |

21: end if |

22: $Xl\leftarrow \mathrm{A}\mathrm{XES}\mathrm{T}\mathrm{O}\mathrm{L}\mathrm{INE}(l,B,P)$ |

23: $Xr\leftarrow \mathrm{A}\mathrm{XES}\mathrm{T}\mathrm{O}\mathrm{L}\mathrm{INE}(r,B,P)$ |

24: $\alpha new\leftarrow (l+r-\alpha cur)\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}{2}^{B}$ |

25: $Xnew\leftarrow \mathrm{A}\mathrm{XES}\mathrm{T}\mathrm{O}\mathrm{L}\mathrm{INE}(\alpha new,B,P)$ |

26: if ($m=1$ and ($Xnew>Xl$ or $Xnew<Xr$)) or ($m=C$ and ($Xnew>Xl$ or $Xnew<Xr$)) or ($m>1$ and $m<C$ and $Xnew>Xl$ and $Xnew<Xr$ then |

27: $Enew\leftarrow \mathrm{E}\mathrm{NERGY}(\alpha new,data)$ |

28: if $Enew<{E}_{m}^{*}$ then |

29: ${E}_{m}^{*}\leftarrow Enew$ |

30: ${\alpha}_{m}\leftarrow \alpha new$ |

31: else |

32: $u\leftarrow \mathrm{R}\mathrm{AND}(0,1)$ |

33: if $Enew-{E}_{m}^{*}<-\mathrm{log}(u)$ then |

34: ${E}_{m}^{*}\leftarrow Enew$ |

35: ${\alpha}_{m}\leftarrow \alpha new$ |

36: end if |

37: end if |

38: end if |

39: end for |

40: end procedure |

#### 4.1. Space-Filling Curves

- Locality. Points that are nearby in ${\mathbb{N}}_{0}^{N}$ should be nearby on the curve index in ${\mathbb{N}}_{0}^{1}$ as well. The converse should be true as well.
- Time-efficiency. The algorithms for performing the mapping between parameter space and curve indexes should be time-efficient.
- Bi-directionality. Algorithms should exist for mapping parameter space to curve indexes and from curve indexes to parameter space.

#### 4.1.1. Hilbert Curve

Algorithm 5 Hilbert curve line-to-axes function |

1: function LineToAxes($Line,B,P$) |

2: for $i\leftarrow 1,P$ do |

3: $line{n}_{P-i}\leftarrow (Line\gg (B(i-1))\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}{2}^{B}$ |

4: end for |

5: $M\leftarrow 1\ll B-1$ |

6: for $i\leftarrow 1,P$ do |

7: ${X}_{i}\leftarrow 0$ |

8: end for |

9: $q\leftarrow 0$, $p\leftarrow M$ |

10: for $i\leftarrow 1,P$ do |

11: $j\leftarrow M$ |

12: while $j>0$ do |

13: if $line{n}_{i}\wedge j$ then |

14: $X.q\leftarrow X.q\vee p$ |

15: end if |

16: $q\leftarrow q+1$ |

17: if $q=n$ then |

18: $q\leftarrow 0$, $p\leftarrow p\gg 1$ |

19: end if |

20: $j\leftarrow j\gg 1$ |

21: end while |

22: end for |

23: $t\leftarrow {X}_{P}\gg 1$ |

24: for $i\leftarrow P,2$ do |

25: ${X}_{i}\leftarrow {X}_{i}\oplus {X}_{i-1}$ |

26: end for |

27: ${X}_{1}\leftarrow {X}_{1}\oplus t$, $M\leftarrow 2\ll (B-1)$, $Q\leftarrow 2$ |

28: while $Q\ne M$ do |

29: $P\leftarrow Q-1$ |

30: for $i\leftarrow P,2$ do |

31: if ${X}_{i}\wedge Q$ then |

32: ${X}_{1}\leftarrow {X}_{1}\oplus P$ |

33: else |

34: $t\leftarrow ({X}_{1}\oplus {X}_{i})\wedge P$, ${X}_{1}\leftarrow {X}_{1}\oplus t$, ${X}_{i}\leftarrow {X}_{i}\oplus t$ |

35: end if |

36: end for |

37: if ${X}_{1}\wedge Q$ then |

38: ${X}_{1}\leftarrow {X}_{1}\oplus P$ |

39: end if |

40: $Q\leftarrow Q\ll 1$ |

41: end while |

42: return X |

43: end function |

#### 4.1.2. Z-Order Curve

- Generate bitmasks based on number of bits b and number of parameters n.
- AND the first mask with the Z-order integer to select only every n bits.
- Loop over each mask. For the ith mask, XOR the Z-order integer with itself shifted to the right by i, then mask the result.
- Shift the original Z-order integer to the right by 1, then repeat the above from step 2 for each dimension.

Algorithm 6 Z-order curve mask computation function |

1: function ComputeBitMask($B,P$) |

2: $P\leftarrow P-1$ |

3: for $i\leftarrow 1,B$ do |

4: $b{d}_{i}\leftarrow (i+1)P$ ▹ Stored as binary strings, with the leftmost two bits discarded |

5: end for |

6: $maxLength\leftarrow $ length of longest string in $bd$ |

7: $moveBits\leftarrow $ empty list |

8: for $i\leftarrow 1,maxLength$ do |

9: Append an empty list to $moveBits$ |

10: for $j\leftarrow 1,P$ do |

11: if $\mathrm{L}\mathrm{ENGTH}(b{d}_{j})\ge i$ then |

12: if The ith bit from the end of $b{d}_{j}$ is 1 then |

13: Append j to $moveBit{s}_{i}$ |

14: end if |

15: end if |

16: end for |

17: end for |

18: for $i\leftarrow 1,B$ do |

19: $bitPosition{s}_{i}\leftarrow i$ |

20: end for |

21: $maskOld\leftarrow (1\ll B)-1$ |

22: $bitmasks\leftarrow $ empty list |

23: for $i\leftarrow \mathrm{L}\mathrm{ENGTH}(moveBits),1$ do |

24: if $\mathrm{L}\mathrm{ENGTH}(moveBit{s}_{i})>0$ then |

25: $shifted\leftarrow 0$ |

26: for $bitIdxToMove\in moveBit{s}_{i}$ do |

27: $shifted\leftarrow shifted\vee (1\ll bitPosition{s}_{bitIdxToMove})$ |

28: $bitPosition{s}_{bitIdxToMove}\leftarrow bitPosition{s}_{bitIdxToMove}+{2}^{i}$ |

29: end for |

30: $nonshifted\leftarrow \neg shifted\wedge maskOld$ |

31: $shifted\leftarrow shifted\ll {2}^{i}$ |

32: $maskNew\leftarrow shifted\vee nonshifted$ |

33: Append $maskNew$ to $bitmasks$ |

34: $maskOld\leftarrow maskNew$ |

35: end if |

36: end for |

37: return $bitmasks$ |

38: end function |

Algorithm 7 Z-Order curve line-to-axes function |

1: function LineToAxes($z,B,P$) |

2: if $P=1$ then |

3: return z |

4: end if |

5: $masks\leftarrow \mathrm{C}\mathrm{OMPUTE}\mathrm{B}\mathrm{IT}\mathrm{M}\mathrm{ASK}(B,P)$ ▹ Call only once for each B and P |

6: Pop final entry from $masks$ list into $first$ |

7: Reverse the $masks$ list |

8: Append $1\ll B$ to $masks$ |

9: $minshift\leftarrow P-1$ |

10: for $i\leftarrow 1,P$ do |

11: $zz\leftarrow z\gg i$ |

12: $zz\leftarrow zz\wedge first$ |

13: $shift\leftarrow minshift$ |

14: for $mask\in masks$ do |

15: $zz\leftarrow (zz\oplus (zz\gg shift))\wedge mask$ |

16: $shift\leftarrow shift\ll 1$ |

17: end for |

18: ${\alpha}_{i}\leftarrow zz$ |

19: end for |

20: return $\alpha $ |

21: end function |

#### 4.2. Parallel Implementation

## 5. Thermodynamic Integration with Stan

Algorithm 8 Thermodynamic integration with Stan |

1: procedure TI($P,S,C,W,data$) |

2: Inputs: P–Number of parameters, S–Number of Stan iterations per temperature, C–Number of chains, W–Ratio to control adaptive annealing, $data$–Data |

3: for $m\leftarrow 1,C$ do |

4: $X\leftarrow \mathrm{R}\mathrm{AND}\mathrm{I}\mathrm{NT}(0,{2}^{PB}-1)$ |

5: ${\alpha}^{m}\leftarrow \mathrm{L}\mathrm{INE}\mathrm{T}\mathrm{O}\mathrm{A}\mathrm{XES}(X,B,P)$ |

6: ${E}_{m}^{*}\leftarrow \mathrm{E}\mathrm{NERGY}({\alpha}^{m},data)$ |

7: end for |

8: $i\leftarrow 1$ |

9: Compute ${\langle {E}^{*}\rangle}_{i}$ |

10: ${\beta}_{1}\leftarrow min\{\mathrm{log}(W)/[max({E}^{*})-min({E}^{*})],1\}$ |

11: $w\leftarrow exp(-{\beta}_{1}{E}^{*})$ |

12: $\mathrm{I}\mathrm{MPORTANCE}\mathrm{S}\mathrm{AMPLING}(w,\alpha ,{E}^{*},C)$ |

13: while ${\beta}_{i}>0$ and ${\beta}_{i}<1$ do |

14: for $m\leftarrow 1,C$ do |

15: $\mathrm{S}\mathrm{TAN}\mathrm{S}\mathrm{AMPLING}({\alpha}^{m},{E}_{m}^{*},C,P,S,{\beta}_{i},data)$ |

16: end for |

17: $i\leftarrow i+1$ |

18: $\Delta \beta \leftarrow \mathrm{log}(W)/[max({E}^{*})-min({E}^{*})]$ |

19: ${\beta}_{i}\leftarrow min({\beta}_{i-1}+\Delta \beta ,1)$ |

20: if ${\beta}_{i-1}+\Delta \beta >1$ then |

21: $\Delta \beta \leftarrow 1-{\beta}_{i-1}$ |

22: end if |

23: $w\leftarrow exp(-\Delta \beta {E}^{*})$ |

24: $\mathrm{I}\mathrm{MPORTANCE}\mathrm{S}\mathrm{AMPLING}(w,\alpha ,{E}^{*},C)$ |

25: end while |

26: Estimate Equation (9) using trapezoid rule and $\left\{{\beta}_{i}\right\}$ and $\left\{{\langle {E}^{*}\rangle}_{i}\right\}$ |

27: end procedure |

## 6. Examples

#### 6.1. Eggcrate Likelihood

#### 6.2. Detection of Multiple Stationary Frequencies

#### 6.3. Twin Gaussian Shells

#### 6.4. Ideal Gas Partition Function

#### 6.5. Results

#### 6.5.1. Eggcrate Likelihood Results

#### 6.5.2. Twin Gaussian Shells’ Results

#### 10-Dimensional Twin Gaussian Shells

#### 30-Dimensional Twin Gaussian Shells

#### 100-Dimensional Twin Gaussian Shells

#### 6.5.3. Detection of Multiple Stationary Frequencies’ Results

#### 6.5.4. Ideal Gas Partition Function Results

## 7. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

NS | Nested sampling |

NS-CC | Combined-chain nested sampling |

NS-MR | Multiple-replacement nested sampling |

TI | Thermodynamic integration |

BSS | Binary slice sampling |

TI-Stan | Thermodynamic integration with Stan |

TI-BSS | Thermodynamic integration with binary slice sampling |

TI-BSS-H | Thermodynamic integration with binary slice sampling and the Hilbert curve |

TI-BSS-Z | Thermodynamic integration with binary slice sampling and the Z-order curve |

HMC | Hamiltonian Monte Carlo |

NUTS | No U Turn Sampler |

MCMC | Markov chain Monte Carlo |

CDF | cumulative distribution function |

## References

- Kirkwood, J.G. Statistical Mechanics of Fluid Mixtures. J. Chem. Phys.
**1935**, 3, 300–313. [Google Scholar] [CrossRef] - Goggans, P.M.; Chi, Y. Using thermodynamic integration to calculate the posterior probability in Bayesian model selection problems. AIP Conf. Proc.
**2004**, 707, 59–66. [Google Scholar] [CrossRef] - Skilling, J. BayeSys and MassInf; Maximum Entropy Data Consultants Ltd., 2004. Available online: http://www.inference.phy.cam.ac.uk/bayesys (accessed on 24 November 2019).
- Carpenter, B.; Gelman, A.; Hoffman, M.D.; Lee, D.; Goodrich, B.; Betancourt, M.; Brubaker, M.; Guo, J.; Li, P.; Riddell, A. Stan: A Probabilistic Programming Language. J. Stat. Softw.
**2017**, 76. [Google Scholar] [CrossRef] - Stan Development Team. PyStan: The Python interface to Stan, 2018. Available online: https://pystan.readthedocs.io/en/latest/ (accessed on 24 November 2019).
- Hoffman, M.D.; Gelman, A. The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res.
**2014**, 15, 1593–1623. [Google Scholar] - Henderson, R.W. TI-Stan: Adaptively Annealed Thermodynamic Integration with HMC. In Proceedings of the Multidisciplinary Digital Publishing Institute Proceedings, Garching, Germany, 30 June–5 July 2019; Volume 33, p. 9. [Google Scholar]
- Henderson, R.W. Design and Analysis of Efficient Parallel Bayesian Model Comparison Algorithms. Ph.D. Dissertation, University of Mississippi, Oxford, MS, USA, 2019. [Google Scholar]
- Gelman, A.; Meng, X.L. Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Stat. Sci.
**1998**, 13, 163–185. [Google Scholar] [CrossRef] - Oates, C.J.; Papamarkou, T.; Girolami, M. The Controlled Thermodynamic Integral for Bayesian Model Evidence Evaluation. J. Am. Stat. Assoc.
**2016**, 111, 634–645. [Google Scholar] [CrossRef] - Skilling, J. Nested sampling. Bayesian inference and maximum entropy methods in science and engineering. In American Institute of Physics Conference Series; AIP Publishing: Melville, NY, USA, 2004; Volume 735, pp. 395–405. [Google Scholar] [CrossRef]
- Skilling, J. Nested sampling for general Bayesian computation. Bayesian Anal.
**2006**, 1, 833–859. [Google Scholar] [CrossRef] - Skilling, J. Galilean and Hamiltonian Monte Carlo. In Proceedings of the 39th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Garching, Germany, 30 June–5 July 2019. [Google Scholar]
- Liu, J.S.; Chen, R.; Logvinenko, T. A Theoretical Framework for Sequential Importance Sampling with Resampling. In Sequential Monte Carlo Methods in Practice; Doucet, A., de Freitas, N., Gordon, N., Eds.; Springer: New York, NY, USA, 2001; pp. 225–246. [Google Scholar] [CrossRef]
- Skilling, J.; MacKay, D.J.C. [Slice Sampling]: Discussion. Ann. Stat.
**2003**, 31, 753–755. [Google Scholar] - Neal, R.M. Slice Sampling. Ann. Stat.
**2003**, 31, 705–767. [Google Scholar] [CrossRef] - Sagan, H. Space-Filling Curves; Springer: New York, NY, USA, 1994. [Google Scholar]
- Skilling, J. Programming the Hilbert curve. In AIP Conference Proceedings; Erickson, G., Zhai, Y., Eds.; AIP Publishing: Melville, NY, USA, 2004; Volume 1, pp. 381–387. [Google Scholar]
- Henderson, R.W.; Goggans, P.M. Using the Z-order curve for Bayesian model comparison. In International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering; Polpo, A., Stern, J., Louzada, F., Izbicki, R., Takada, H., Eds.; Springer: Berlin/Heidelberger, Germany, 2018; Volume 239, pp. 295–304. [Google Scholar]
- Gabriel. How to Compute a 3D Morton Number (Interleave the Bits of 3 Ints). 2013. Available online: https://stackoverflow.com/questions/1024754/how-to-compute-a-3d-morton-number-interleave-the-bits-of-3-ints (accessed on 24 November 2019).
- Neal, R.M. MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo; Brooks, S., Gelman, A., Jones, G., Meng, X.L., Eds.; Chapman and Hall: London, UK, 2011. [Google Scholar]
- Feroz, F.; Hobson, M.P.; Bridges, M. MultiNest: An efficient and robust Bayesian inference tool for cosmology and particle physics. Mon. Not. Roy. Astron. Soc.
**2009**, 398, 1601–1614. [Google Scholar] [CrossRef] - Bretthorst, G.L. Bayesian Spectrum Analysis and Parameter Estimation; Springer: Berlin/Heidelberger, Germany, 1988. [Google Scholar]
- Bretthorst, G.L. Nonuniform sampling: Bandwidth and aliasing. In AIP Conference Proceedings; AIP Publishing: Melville, NY, USA, 2001; Volume 567, pp. 1–28. [Google Scholar] [CrossRef]
- Handley, W.; Hobson, M.; Lasenby, A. PolyChord: Next-generation nested sampling. Mon. Not. Roy. Astron. Soc.
**2015**, 453, 4384–4398. [Google Scholar] [CrossRef][Green Version] - Von der Linden, W.; Dose, V.; Von Toussaint, U. Bayesian Probability Theory: Applications in the Physical Sciences; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]

**Figure 4.**The simulated signal. The points represent the non-uniformly sampled points from the original signal corrupted by Gaussian noise.

**Figure 5.**Pseudo-color plot of a two-dimensional twin Gaussian shell with ${w}_{1}={w}_{2}=0.1$, ${r}_{1}={r}_{2}=2$, ${\mathbf{c}}_{1}={[-3.5,0]}^{T}$, and ${\mathbf{c}}_{2}={[3.5,0]}^{T}$. The color values correspond to likelihood values.

**Figure 6.**Box-plot of log-evidence for the egg-crate problem for each TI method. TI-Stan with W=xx is denoted by ti stan, xx; TI-BSS-H with W=xx is denoted by tih, xx; and TI-BSS-Z with W=xx is denoted by tiz, xx.

**Figure 7.**Box-plot of run time in seconds for the egg-crate problem for each TI method. TI-Stan with W=xx is denoted by ti stan, xx; TI-BSS-H with W=xx is denoted by tih, xx; and TI-BSS-Z with W=xx is denoted by tiz, xx.

**Figure 8.**Box-plot of log-evidence for the 10-D twin Gaussian shell problem for TI-Stan and TI-BSS-H. TI-Stan with W=xx is denoted by ti stan, xx and TI-BSS-H with W=xx is denoted by tih, xx.

**Figure 9.**Box-plot of run time in seconds for the 10-D twin Gaussian shell problem for TI-Stan and TI-BSS-H. TI-Stan with W=xx is denoted by ti stan, xx and TI-BSS-H with W=xx is denoted by tih, xx.

**Figure 14.**Box-plot of log-evidence for the one stationary frequency model for TI-Stan, TI-BSS-H, and TI-BSS-Z, for two values of W. TI-Stan with W=xx is denoted by ti stan, xx; TI-BSS-H with W=xx is denoted by tih, xx; and TI-BSS-Z with W=xx is denoted by tiz, xx.

**Figure 15.**Box-plot of log-evidence for the two stationary frequency model for TI-Stan and TI-BSS-H, for two values of W. TI-Stan with W=xx is denoted by ti stan, xx and TI-BSS-H with W=xx is denoted by tih, xx.

**Figure 16.**Box-plot of log-evidence for the three stationary frequency model for TI-Stan, TI-BSS-H, and TI-BSS-Z, for two values of W. TI-Stan with W=xx is denoted by ti stan, xx; TI-BSS-H with W=xx is denoted by tih, xx; and TI-BSS-Z with W=xx is denoted by tiz, xx.

**Figure 17.**Box-plot of run time for the $J=1$ stationary frequency model for TI-Stan, TI-BSS-H, and TI-BSS-Z, for two values of W. TI-Stan with W=xx is denoted by ti stan, xx; TI-BSS-H with W=xx is denoted by tih, xx; and TI-BSS-Z with W=xx is denoted by tiz, xx.

**Figure 18.**Box-plot of run time for the $J=2$ stationary frequency model for TI-Stan and TI-BSS-H, for two values of W. TI-Stan with W=xx is denoted by ti stan, xx; and TI-BSS-H with W=xx is denoted by tih, xx.

**Figure 19.**Box-plot of run time for the $J=3$ stationary frequency model for TI-Stan, TI-BSS-H, and TI-BSS-Z, for two values of W. TI-Stan with W=xx is denoted by ti stan, xx; TI-BSS-H with W=xx is denoted by tih, xx; and TI-BSS-Z with W=xx is denoted by tiz, xx.

Lower Bound | Upper Bound | |
---|---|---|

${A}_{j}$ | $-2$ | 2 |

${B}_{j}$ | $-2$ | 2 |

${f}_{j}$ | 0 Hz | $6.4$ Hz |

j | ${\mathit{A}}_{\mathit{j}}$ | ${\mathit{B}}_{\mathit{j}}$ | ${\mathit{f}}_{\mathit{j}}$ (Hz) |
---|---|---|---|

1 | $1.0$ | $0.0$ | $3.1$ |

2 | $1.0$ | $0.0$ | $5.9$ |

Parameter | Value | Definition |
---|---|---|

S | 200 | Number of binary slice sampling steps |

M | 2 | Number of combined binary slice sampling and leapfrog steps |

C | 256 | Number of chains |

B | 32 | Number of bits per parameter in SFC |

Parameter | Value | Definition |
---|---|---|

S | 200 | Number of steps allowed in Stan |

C | 256 | Number of chains |

**Table 5.**Ideal Gas Partition Function $\mathrm{log}\tilde{Z}$ results for TI-Stan for two values of W. Twenty TI-Stan runs were completed for each value of W and N.

W | N | Mean Relative Error | StDev Relative Error | Mean $\mathrm{log}\tilde{\mathit{Z}}$ | StDev $\mathrm{log}\tilde{\mathit{Z}}$ | Analytic $\mathrm{log}\tilde{\mathit{Z}}$ (30) |
---|---|---|---|---|---|---|

1.05 | 12 | 0.52% | 0.37% | −12.43 | 0.0565 | −12.49 |

" | 102 | 0.51% | 0.20% | −118.20 | 0.235 | −118.81 |

" | 1002 | 0.62% | 0.26% | −1184.16 | 3.04 | −1191.51 |

1.5 | 12 | 2.94% | 1.72% | −12.12 | 0.215 | −12.49 |

" | 102 | 3.39% | 1.44% | −114.78 | 1.71 | −118.81 |

" | 1002 | 4.50% | 1.40% | −1137.83 | 16.73 | −1191.51 |

**Table 6.**Ideal Gas Partition Function $\mathrm{log}\tilde{Z}$ run times for TI-Stan for two values of W. Twenty TI-Stan runs were completed for each value of W and N.

W | N | Mean Run Time (s) | StDev Run Time (s) |
---|---|---|---|

1.05 | 12 | 69.80 | 3.01 |

" | 102 | 230.56 | 5.74 |

" | 1002 | 2076.28 | 44.82 |

1.5 | 12 | 8.42 | 0.43 |

" | 102 | 27.87 | 1.44 |

" | 1002 | 247.53 | 6.80 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Henderson, R.W.; Goggans, P.M.
TI-Stan: Model Comparison Using Thermodynamic Integration and HMC. *Entropy* **2019**, *21*, 1161.
https://doi.org/10.3390/e21121161

**AMA Style**

Henderson RW, Goggans PM.
TI-Stan: Model Comparison Using Thermodynamic Integration and HMC. *Entropy*. 2019; 21(12):1161.
https://doi.org/10.3390/e21121161

**Chicago/Turabian Style**

Henderson, R. Wesley, and Paul M. Goggans.
2019. "TI-Stan: Model Comparison Using Thermodynamic Integration and HMC" *Entropy* 21, no. 12: 1161.
https://doi.org/10.3390/e21121161