High-Precision Arithmetic in Mathematical Physics

: For many scientiﬁc calculations, particularly those involving empirical data, IEEE 32-bit ﬂoating-point arithmetic produces results of sufﬁcient accuracy, while for other applications IEEE 64-bit ﬂoating-point is more appropriate. But for some very demanding applications, even higher levels of precision are often required. This article discusses the challenge of high-precision computation, in the context of mathematical physics, and highlights what facilities are required to support future computation, in light of emerging developments in computer architecture.


Introduction
For many scientific calculations, particularly those that employ empirical data, IEEE 32-bit floating-point arithmetic is sufficiently accurate, and is preferred since it saves memory, run time and energy usage.For other applications, 64-bit floating-point arithmetic is required to produce results of sufficient accuracy, although some users find that they can obtain satisfactory results by switching between 32-bit and 64-bit, using the latter only for certain numerically sensitive sections of code.One challenge of modern computing is to develop tools that will help users determine which portions of a computation can be performed with lower precision and which must be performed with higher precision.Moreover, some scientists and engineers running large computations have recently discovered, often to their great dismay, that with the rapidly increasing scale of their computations, numerical difficulties render the results of questionable accuracy even with 64-bit arithmetic.What often happens is that a conditional test takes the wrong branch, so that in some circumstances the results are completely wrong.
As a single example that has been reported to the present authors, the ATLAS experiment at the Large Hadron Collider (which was employed in the 2012 discovery of the Higgs boson) relies crucially on the ability to track charged particles with exquisite precision (10 microns over a 10m length) and high reliability (over 99% of roughly 1000 charged particles per collision correctly identified).The software used for these detections involves roughly five million lines of C++ and Python code, developed over a 15-year period by some 2000 physicists and engineers.
Recently, in an attempt to speed up these calculations, researchers found that merely changing the underlying math library caused some collisions to be missed and others misidentified.This suggests that their code has significant numerical sensitivities, and results may be invalid in certain cases.
How serious are these difficulties?Are they an inevitable consequence of the limited accuracy of empirical data, or are they exacerbated by numerical sensitivities in the code?How can they be tracked down in source code?And, once identified, how can they most easily be remedied or controlled?

Numerical Reproducibility
Closely related to the question of numerical error is the issue of reproducibility in scientific computing.A December 2012 workshop held at Brown University in Rhode Island, USA, noted that: Numerical round-off error and numerical differences are greatly magnified as computational simulations are scaled up to run on highly parallel systems.As a result, it is increasingly difficult to determine whether a code has been correctly ported to a new system, because computational results quickly diverge from standard benchmark cases.And it is doubly difficult for other researchers, using independently written codes and distinct computer systems, to reproduce published results.[1,2].
Example 1.1 (Variable precision I).As a simple illustration of these issues, suppose one wishes to compute the arc length of the irregular function g(x) = x + 0 k 10 2 −k sin(2 k x), over the interval (0, π),using 10 7 abscissa points.See Figure 1.
If this computation is performed with ordinary IEEE double arithmetic, the calculation takes 2.59 seconds and yields the result 7.073157029008510.If performed on another system, the results typically differ in the last four digits.If it is done using "double-double" arithmetic (approximately 31-digit accuracy; see Section 2), the run takes 47.39 seconds and yields the result 7.073157029007832, which is correct to 15 digits.But if only the summation is performed using double-double arithmetic, and the integrand is computed using standard double arithmetic, the result is identical to the double-double result (to 15 digits), yet the computation only takes 3.47 seconds.In other words, the judicious usage of double-double arithmetic in a critical summation completely eliminates the numerical non-reproducibility, with only a minor increase in run time.
depart from any benchmark standard case, nonetheless it is essential to distinguish avoidable numerical error from fundamental chaos.
Researchers working with this atmospheric model were perplexed by the difficulty of reproducing benchmark results.Even when their code was ported from one system to another, or when the number of processors used was changed, the computed data diverged from a benchmark run after just a few days of simulated time.As a result, they could never be sure that in the process of porting their code or changing the number of processors that they did not introduce a bug into their code.
After an in-depth analysis of this code, He and Ding found that merely by employing double-double arithmetic in two critical global summations, almost all of this numerical variability was eliminated.This permitted benchmark results to be accurately reproduced for a significantly longer time, with virtually no change in total run time [3].
Researchers working with some other large computational physics applications reported similar success, again merely by modifying their codes to employ a form of high-precision arithmetic in several critical summation loops [4].
In short, while higher-precision arithmetic is not a guarantee of absolute bit-for-bit reproducibility, nonetheless in many cases it is an excellent solution to numerical difficulties.

Dealing with Numerical Difficulties
Some suggest that the only proper way to deal with such difficulties is to carefully examine each algorithm and subroutine in a code, to ensure that only the best schemes are being used, and that they are all implemented in the most stable manner.Others suggest employing interval arithmetic in some or all sections of the code.The trouble with these choices, from a pragmatic point of view, stems in part from the regrettable fact that very few scientists and engineers who do scientific computation in their work have the requisite advanced training in numerical analysis.
For example, as we pointed out in [5], in 2010 a total of 870 students graduated in mathematics, physics, statistics, chemistry, computer science and engineering at the University of California, Berkeley.Several hundred others, in fields such as biology, geology, medicine, economics, psychology, sociology, are also likely to do scientific computing in their work.By comparison, in 2010 only 219 students enrolled in two sections of Math 128A, a one-semester introductory numerical analysis course, and only 24 enrolled in Math 128B, a more rigorous numerical analysis course (with similar numbers for previous years).Thus of the 2010 U.C. Berkeley graduates who likely will do scientific computing in their professional work, only about 2% had advanced training in numerical analysis.This percentage does not appear to be any higher at other universities, and may well be lower.
Along this line, a 2008 report issued jointly by the Association for Computing Machinery and the IEEE Computer Society noted that "numerical methods" have been dropped from the recommended curriculum, in recognition of the fact that at most institutions, "numerical analysis is not seen to form a major part of the computer science curriculum" [6].
High-precision arithmetic is thus potentially quite useful in real-world computing, because even in cases where superior algorithms or implementation techniques are known in the literature, simply increasing the precision used for the existing algorithm, using tools such as those described below, is often both simpler and safer, because in most cases only minor changes need to be made to the existing code.And for other computations, as we shall see, there is no alternative to higher precision.

U.C. Berkeley's "Precimonious" Tool
Recently a team led by James Demmel of U. C. Berkeley has begun developing software facilities to find and ameliorate numerical anomalies in large-scale computations.These include facilities to: • Test the level of numerical accuracy required for an application.
• Delimit the portions of code that are inaccurate.
• Search the space of possible code modifications.
• Repair numerical difficulties, including usage of high-precision arithmetic.
• Navigate through a hierarchy of precision levels (32-bit, 64-bit or higher as needed).
The current version of this tool is known as "Precimonious."Details are presented in [7].

Computations that Require Extra Precision
Some scientific computations actually require more than the standard IEEE 64-bit floating-point arithmetic.Typically these applications feature one or more of these characteristics: (a) ill-conditioned linear systems; (b) large summations; (c) long-time simulations; (d) large-scale, highly parallel simulations; (e) high-resolution computations; or (f) experimental mathematics computations.
The following example, adapted from [5], demonstrates how numerical difficulties can arise in a very innocent-looking setting, and shows how these difficulties can be ameliorated, in many cases, by the judicious usage of high-precision arithmetic.
Most scientists and engineers will employ a familiar least-squares scheme to recover this polynomial [8, pg. 44].This can be done by constructing the (n + 1) × (n + 1) system , where (x k ) and (y k ) are the arguments and data values given above.This system is typically solved for An implementation of this approach using standard 64-bit IEEE-754 arithmetic and the LINPACK routines for LU decomposition, with final results rounded to the nearest integer, correctly finds the vector of coefficients (1, 0, 0, 32769, 0, 0, 1), which corresponds to the polynomial function f (x) = 1 + (2 15 + 1)x 3 + x 6 .
Readers trained in numerical analysis may recognize that the above scheme is not the best approach for this problem, because the matrix system is known to be ill-conditioned.A better approach is to employ either a scheme based on the Lagrange interpolating polynomial, or else a scheme due to Demmel and Koev [11] (currently the state-of-the-art for such problems).An implementation of either scheme with 64-bit IEEE-754 arithmetic finds the correct polynomial in the degree-8 case.However, both fail to recover the degree-12 polynomial 1 + (2 27 + 1)x 6 + x 12 when given the input (1,134217731,8589938753,97845255883,549772595201,2097396156251,6264239146561,15804422886323,35253091827713,71611233653971,135217729000001,240913322581691,409688091758593).
On the other hand, merely by modifying a simple LINPACK program to employ double-double arithmetic, using the QD software (see the next section), all three problems (degrees 6, 8 and 12) are correctly solved without incident.

Techniques and Software for High-Precision Arithmetic
By far the most common form of extra-precision arithmetic is roughly twice the level of standard 64-bit IEEE floating-point arithmetic.One option is the IEEE standard for 128-bit binary floating-point arithmetic, with 113 mantissa bits, but sadly it is not yet widely implemented in hardware.However, this datatype is now supported in software by the gfortran compiler, for both real and complex data.
A more common option, implemented in software, is known as "double-double" arithmetic (approximately 31-digit accuracy).This datatype consists of a pair of 64-bit IEEE floats (s, t), where s is the 64-bit floating-point value closest to the desired value, and t is the difference (positive or negative) between the true value and s.Arithmetic on such data can be performed using schemes originally proposed by Donald E. Knuth [12] (see also [13]).Similar techniques can be used for "quad-double" (approximately 62-digit) arithmetic.These techniques have been implemented in the QD package [14].
For higher-levels of precision, one typically represents a high-precision datum as a string of floats or integers, where the first few words contain bookkeeping and exponent information, and each subsequent word contains B bits of the mantissa.For moderate precision levels (up to a few hundred digits), arithmetic on such data is typically performed using adaptations of familiar schemes.
Above several hundred decimal digits, advanced algorithms, such as Karatsuba multiplication [15, pp. 222-223] or (for larger precision levels) FFT-based multiplication [15, pp. 223-224] should be used for maximum efficiency.FFT-based convolution is done as follows.Let the two high-precision data vectors (without exponents and other "bookkeeping" words) be x = (x 0 , x 1 , • • • , x n−1 ) and y = (y 0 , y 1 , • • • , y n−1 ), where each word contains B bits, i.e., an integer from 0 to 2 B − 1.First extend x and y to length 2n by appending n zeroes to each.Then compute the vector z = (z 0 , z 1 , • • • z 2n−1 ) as Finally, release carries, which is done by iterating z k := z k + int(z k+1 /2 B ), beginning with k = 2n − 2 and proceeding in reverse order to k = 0.The result is the 2n-long mantissa of the high-precision product of x and y.After inserting appropriate sign and exponent words and rounding the result if needed to n mantissa words, the operation is complete.
The convolution operation (1) may be performed using fast Fourier transforms (FFTs): It should be noted in the above that it is often necessary to limit B, so that the final results of the floating-point FFT operations can be reliably rounded to the nearest integer.Another option is to use a number-theoretic FFT, which is not subject to round-off error.Either way, efficient implementations of this scheme can dramatically accelerate multiplication for very high-precision data, since in effect it reduces an O(n 2 ) operation to an O(n log n log log n) operation [16,Sec. 2.3].Square roots and n-th roots can be computed efficiently by Newton iteration-based schemes.The basic transcendental functions can be computed by means of Taylor series-based algorithms or, for higher levels of precision, quadratically convergent algorithms that approximately double the number of correct digits with each iteration, [15, pp. 215-245] and [16,Sec. 1.5].
Software for performing high-precision arithmetic has been available for quite some time, for example in the commercial packages Mathematica and Maple.However, until 10 or 15 years ago those with applications written in more conventional languages, such as C++ or Fortran-90, often found it necessary to rewrite their codes, replacing each arithmetic operation with a subroutine call, which was a very tedious and error-prone process.
Nowadays there are several freely available high-precision software packages, together with accompanying high-level language interfaces, utilizing operator overloading, that make code conversions relatively painless.In most cases, one merely changes the type statements of those variables that are to be treated as high precision and makes a handful of other modifications.Thereafter when one of these variables appears in an expression, the correct underlying routines are automatically called.
Here are a few currently available packages for high-precision floating-point arithmetic: • ARPREC: Supports arbitrary precision real, integer and complex, with many algebraic and transcendental functions.Includes high-level interfaces for C++ and Fortran-90 [17].
• CLN: A C++ library supporting arbitrary precision integer, real and complex, with numerous algebraic and transcendental functions [18].
• GMP: Supports high-precision integer, rational and floating-point calculations.Distributed under the GNU license by the Free Software Foundation [19].
• Julia: A high-level programming environment that incorporates GMP and MPFR [20].
• MPFUN2015: A new thread-safe system supporting multiprecision real and complex datatypes, with transcendental functions and FFT-based arithmetic.Includes high-level Fortran interface [24].See Section 9.1.
• Pari/GP: A computer algebra system that includes facilities for high-precision arithmetic, with many transcendental functions [27].
• Sage: An open-source mathematical software system that includes high-precision arithmetic facilities [28].
Obviously there is an extra cost for performing high-precision arithmetic.Quad precision or double-double computations typically run five to ten times slower than 64-bit; oct precision or quad-double computations typically run 25 to 50 times slower; and arbitrary precision arithmetic may be hundreds or thousands of times slower.Fortunately, however, high-precision arithmetic is often only needed for a small portion of code, so that the total run time may increase only modestly.
What's more, the advent of highly parallel computer systems means that high-precision computations that once were impractical can now be completed in reasonable run time.Indeed, numerous demanding high-precision computations have been done in this way, most often by invoking parallelism at the level of loops in the application rather than within individual high-precision arithmetic operations.For example, up to 512 processors were employed to compute some two-and three-dimensional integrals in [29].Parallel high-precision computation will be further discussed in Section 9.

Applications of High-Precision Arithmetic
Here we briefly summarize some of the numerous applications that have recently arisen for high-precision arithmetic in conventional scientific computing.Sections 3.4 through 4.1 are condensed from [5]; see [5] for additional details and examples.

Optimization Problems
In 2014, well-known optimization theorist Michael Saunders of Stanford University and his colleague Ding Ma explored the usage of quadruple precision, using the real*16 facility of the gfortran compiler, to address problems in optimization.They found that they were able to find numerically stable solutions to numerous problems, including several in mathematical biology, that had eluded other researchers for years.Saunders concluded, "Just as double-precision floating-point hardware revolutionized scientific computing in the 1960s, the advent of quad-precision data types (even in software) brings us to a new era of greatly improved reliability in optimization solvers."[30,31].

Computer-Assisted Solution of Smale's 14th Problem
In 2002, Warwick Tucker of Cornell University employed a combination of interval arithmetic and 80-bit IEEE extended arithmetic (approx.17-digit precision) to prove that the Lorenz equations support a "strange" attractor.Berkeley mathematician Steven Smale had previously included this as one of the most challenging problems for the 21st century [32].

Anharmonic Oscillators
Malcolm H. Macfarlane of Indiana University has employed up to 80-digit arithmetic to obtain numerically reliable eigenvalues for a class of anharmonic oscillators, using a shifted-Lanczos method.Anharmonicity plays a key role in studies of molecular vibrations, quantum oscillations and semiconductor engineering.High precision not only handled the highly singular linear systems involved, but also prevented convergence to the wrong eigenvalue [33].

Planetary Orbit Dynamics
Planetary scientists have long debated whether the solar system is stable over many millions or billions of years, and whether our solar system is typical in this regard.Researchers running simulations of the solar system typically find that 64-bit computations are sufficiently accurate for long periods of time, but then fail at certain critical points.As a result, some researchers have employed higher precision arithmetic (typically IEEE extended or double-double) to reduce numerical error, in combination with other techniques [34,35].

Coulomb n-Body Atomic Systems
Alexei Frolov of the Western University (formerly the University of of Western Ontario) in Canada has employed high-precision software in studies of n-body Coulomb atomic systems, which require solutions of large and nearly singular linear systems.His computations typically use 120-digit arithmetic, which is sufficient to solve certain bound state problems that until recently were consider intractable [36].

Nuclear Physics Computations
Researchers have applied high-precision computations to produce accurate solutions to the Schrödinger equation (from quantum theory) for the lithium atom.In particular, they have been able to compute the non-relativistic ground state energy for lithium to nearly 12-digit accuracy, which is 1500 times better than earlier results [37].These researchers are now targeting a computation of the fine structure constant of physics (approx.7.2973525698 × 10 −3 ), hopefully to within a few parts per billion, directly from the QED theory [38].

Scattering Amplitudes
A team of physicists working on the Large Hadron Collider at CERN (distinct from the team mentioned in Section 1) is computing scattering amplitudes of collisions involving various fundamental particles (quarks, gluons, bosons).This program performs computations using 64-bit IEEE floating-point arithmetic in most cases, but then recomputes results where necessary using double-double arithmetic.These researchers are now designing a scheme that dynamically varies the precision level according to the inherent stability of the underlying computation [39][40][41][42].

Zeroes of the Riemann Zeta Function
In a recent study by Gleb Beliakov of Deakin University in Australia and Yuri Matiyasevich of the Steklov Institute of Mathematics in Russia, very high precision (up to 10,000-digit arithmetic) was employed to compute determinants and minors of very large and ill-conditioned matrices that arise in the analysis of the zeroes of the Riemann zeta function.Their codes employed graphics processing units (GPUs) and a parallel cluster (up to 144 CPUs) [43].

Dynamical Systems
High-precision arithmetic has long been employed in the study of dynamical systems, especially in the analysis of bifurcations and periodic orbit stability.Many of these computations have employed Runge-Kutta methods, but in the past few years researchers have found that the Taylor method, implemented with high-precision arithmetic, is even more effective [44].
The Taylor method can be defined as follows [45][46][47].Consider the initial value problem ẏ = f (t, y), where f is assumed to be a smooth function.Then the solution at t i is approximated by y i from the n-th degree Taylor series approximation of y(t) at t = t i .So, denoting h i = t i − t i−1 , one can write This method thus reduces the solution of the dynamical system to the determination of certain Taylor coefficients, which may be done efficiently using automatic differentiation techniques, provided one uses sufficiently accurate arithmetic [45,46,48,49].Figure 2 (left) compares computations of the unstable periodic orbit LR in the Lorenz model [53] with three methods: (a) the Taylor method (using the TIDES code [54]), (b) an eighth order Runge-Kutta method (using the dop853 code [55]), and (c) an extrapolation method (using the ODEX code) [50,51].In symbolic dynamics notation, "LR" means one loop around the left equilibrium point, followed by one loop around the right equilibrium point.
When these methods are performed using double precision arithmetic, the Runge-Kutta code is most accurate, but when done in double-double or higher-precision arithmetic, the Taylor method is the fastest.In many problems, high-precision arithmetic is required to obtain accurate results, and so for such problems the Taylor scheme is the only reliable method among the standard methods.The right graph in Figure 2 shows the error for an unstable orbit using 500-digit arithmetic using the TIDES code [52].TIDES code with 300 digits, compared with just one time period using IEEE 64-bit floating-point arithmetic.Note that since more than 16 digits are lost on each period, it is not possible to accurately compute even a single period of this orbit using only IEEE 64-bit floating-point arithmetic.

Periodic orbits
The Lindstedt-Poincaré method [56] for computing periodic orbits in dynamical systems is based on the Lindstedt-Poincaré technique of perturbation theory, together with Newton's method for solving nonlinear systems and Fourier interpolation.D. Viswanath [57] has used this method, implemented in part with high-precision arithmetic software (more than 100 digits of precision) to compute highly unstable periodic orbits.These calculations typically may lose more than 50 digits of precision in each orbit.The fractal properties of the Lorenz attractor, which require very high precision to compute, are shown in Figure 4 [57,58].
Another option currently being pursued by researchers computing these periodic orbits is to employ the Taylor series method in conjunction with Newton iterations, together with more than 1000-digit arithmetic [59].
To conclude this section, we mention a fascinating article [60], which describes how simulations must employ quantum molecular dynamics and special relativity to obtain realistic results on the element Mercury's properties that correspond to measured properties (notably its melting temperature).In arenas such as this, one must first accurately capture the physics, but one should also remember the option to parsimoniously increase precision when needed.

The PSLQ Algorithm and Experimental Mathematics
Very high-precision floating-point arithmetic is now considered an indispensable tool in experimental mathematics, which in turn is increasingly being employed in mathematical physics [15,61].
Many of these computations involve variants of Ferguson's PSLQ integer relation detection algorithm [62,63].Suppose one is given an n-long vector (x i ) of real or complex numbers (presented as a vector of high-precision values).The PSLQ algorithm finds the integer coefficients (a i ), not all zero, such that (to available precision), or else determines that there is no such relation within a certain bound on the size of the coefficients.Integer relation detection almost always requires very high precision-at least (n × d)-digit precision, where d is the size in digits of the largest a i and n is the vector length, or else the true relation, if one exists, will be lost in a sea of spurious numerical artifacts.
PSLQ constructs a sequence of integer-valued matrices B n that reduce the size of the vector y = x•B n , until either the relation is found (as one of the columns of matrix B n ), or else numeric precision is exhausted.A relation is detected when the size of smallest entry of the y vector suddenly drops to zero or to less than the "epsilon" of the high-precision arithmetic system being employed (i.e.b −p , where b is the number base, typically 2 or 10, and p is the number of digits of precision).The size of the drop in min(|y i |) at the iteration when the relation is detected can be viewed as a confidence level that the relation so discovered is not a numerical artifact.A drop of 20 or more decimal orders of magnitude almost always indicates a real relation (although a numerical calculation such as this cannot be interpreted as rigorous proof that the relation is mathematically correct).See Figure 5.
Two-and three-level variants of PSLQ are known, which perform almost all iterations with only double or intermediate precision, updating full-precision arrays only as needed.These variants are hundreds of times faster than the original PSLQ.A multipair variant of PSLQ is known that dramatically reduces the number of iterations required and is thus well-suited to parallel systems.As a bonus, it runs faster even on one CPU.Finally, two-and three-level versions of multipair PSLQ are now available, which combine the best of both worlds [63].Most of our computations have been done with a two-level multipair PSLQ program.
Figure 5, which illustrates a typical multipair PSLQ run, shows the abrupt drop in min |y i | (by nearly 200 orders of magnitude in this case) at iteration 199, when the relation was detected.The problem of finding an integer relation in a vector of real numbers is closely related to the problem of finding a small vector in a lattice.Indeed, another widely used integer relation algorithm, the "HJLS" algorithm, is based on the Lenstra-Lenstra-Lovasz (LLL) algorithm for lattice basis reduction.Both PSLQ and HJLS can be viewed as schemes to compute the intersection between a lattice and a vector subspace.For details, see [64].

The BBP Formula for π and Normality
One of the earliest applications of PSLQ was to numerically discover what is now known as the "BBP" formula for π: This remarkable formula, after a simple manipulation, can be used to calculate binary or base-16 digits of π beginning at the n-th digit, without needing to calculate any of the first n − 1 digits.The underlying scheme requires very little memory and no multiple-precision arithmetic software [15, pp. 135-143].Since 1996, when the BBP formula for π was discovered, numerous other formulas of this type have been found using PSLQ and then subsequently proven [65].For example, a binary formula has been found for Catalan's constant G = n 0 (−1) n /(2n + 1) 2 , and both binary and ternary (base-3) formulas have been found for π 2 [66].
These formulas have been used to compute digits that until just a decade or two ago were widely regarded as forever beyond the reach of humankind.For example, on March 14 (Pi Day), 2013, Ed Karrels of Santa Clara University computed 26 base-16 digits of π beginning at position one quadrillion [67].His result: 8353CB3F7F0C9ACCFA9AA215F2.In 2011, researchers at IBM Australia employed the formulas mentioned above to calculate base-64 digits of π 2 , base-729 digits of π 2 and base-4096 digits of G, beginning at the ten trillionth position in each case.These results were then validated by independent computations [66].
At first, many regarded these PSLQ-discovered results to be amusing but of no deep mathematical significance.But then it was found that these BBP-type formulas have connections to the ancient (and still unanswered) question of why the digits of π and certain other related constants appear "random." To be precise, a constant α is said to be 2-normal, or normal base-2, if it has the property that every string of m base-2 digits appears, in the limit, with frequency 2 −m , and similarly in other bases.Establishing normality (in any base) for various well-known mathematical constants, such as π, e or log 2, is a surprisingly difficult problem.Rigorous results are known only for a few special constants, such as Champernowne's number (0.12345678910111213141516 . ..) (produced by concatenating the decimal integers), certain Liouville numbers [68] and Stoneham numbers.
In 2002, Richard Crandall (deceased December 20, 2012) and one of the present authors found that the question of whether constants such as π and log 2 are 2-normal reduces to a conjecture about the behavior of certain pseudorandom number generators derived from the associated BBP-type formulas [15, pp. 163-178].This same line of research subsequently led to a proof that an uncountably infinite class of real numbers are 2-normal [15, pp.148-154], including, for instance, (This particular constant was proven 2-normal by Stoneham in 1973 [69]; the more recent results span a much larger class.)Although this constant is provably 2-normal, interestingly it is provably not 6-normal [70].This is the first concrete example of a number normal in one base but not normal in another.

High-Precision Arithmetic in Mathematical Physics
Very high-precision computations, combined with variants of the PSLQ algorithm, have been remarkably effective in resolving certain classes of definite integrals that arise in mathematical physics settings.We summarize here a few examples of this methodology in action, following [5] and some related references as shown below.

Ising Integrals
In one study, the tanh-sinh quadrature scheme [61, pp. 312-315], [71], implemented using the ARPREC high-precision software, was employed to study the following classes of integrals [29].The C n are connected to quantum field theory, the D n integrals arise in the Ising theory of mathematical physics, while the E n integrands are derived from D n : In the last line u k = k i=1 t i .In general, it is very difficult to compute high-precision numerical values of n-dimensional integrals such as these.But as it turn out, the C n integrals can be converted to one-dimensional integrals, which are amenable to evaluation with the tanh-sinh scheme: Here K 0 is the modified Bessel function [72,73] and it is in this form that C N appears in quantum field theory.Now 1000-digit values of these sufficed to identify the first few instances of C n in terms of well-known constants.For example, C 4 = 7ζ(3)/12, where ζ denotes the Riemann zeta function.For larger n, it quickly became clear that the C n approach the limit lim n→∞ C n = 0.630473503374386796122040192710 . . . .This numerical value was quickly identified, using the Inverse Symbolic Calculator 2.0 [74], as where γ is Euler's constant.This identity was then proven [29].Some results were also obtained for the D n and E n integrals, although the numerical calculations involved there were much more expensive, requiring highly parallel computation [29].In 2013 Erik Panzer was able to formally evaluate all the E n in terms of the alternating multizeta values [75].In particular, our purely experimental result for E 5 was confirmed, and our high precision computation of E 6 was used to confirm Panzer's evaluation.

Ramble Integrals
A separate study considered n-dimensional ramble integrals [76] for complex s.These integrals occur in the theory of uniform random walk integrals in the plane, where at each iteration, a step of length one is taken in a random direction.Integrals such as (4) are the s-th moment of the distance to the origin after n steps.In [77], it is shown that when s = 0, the first derivatives of these integrals, which arise independently in the study of Mahler measures, are Here J n (x) is the Bessel function of the first kind [72,73].These integrand functions, which are highly oscillatory, are very challenging to evaluate to high precision.We employed tanh-sinh quadrature and Gaussian quadrature, together with the Sidi mW extrapolation algorithm, as described in a 1994 paper by Lucas and Stone [78], which, in turn, is based on two earlier papers by Sidi [79,80].While the computations were relatively expensive, we were able to compute 1000-digit values of these integrals for odd n up to 17 [76].Unfortunately, this scheme does not work well for even n, but by employing a combination of symbolic and numeric computation we were able to obtain 50 to 100 good digits.
In the wake of this research, Sidi produced a refined method [81] that can be used for even n.

Moments of Elliptic Integral Functions
The analysis of ramble integrals led to a study of moments of integrals involving products of elliptic integral functions [76]: Here the elliptic functions K, E and their complementary versions are given by: These and related expressions arise in a variety of physical contexts [82].We computed over 4000 individual integrals of this form to between 1500 and 3000-digit precision using the ARPREC software.In most cases we were not able to identify these integrals in closed form.But we did find many interesting relations among these integrals, using a high-precision multipair PSLQ program.Two of the conjectured identities found by the multipair PSLQ program are [76]: −243 James Wan [83] has been able to prove many of these identities, including ( 6), but by no means all.A followup work [84] contains remarkable identities, such as for integrands which are three-fold products of elliptic integrals.

Lattice Sums Arising from a Poisson Equation
In this section we describe some results, just completed [85], that arose out of attempts to solve the Poisson equation, which arises in various contexts such as engineering applications, the analysis of crystal structures, and even the sharpening of photographic images.
The following lattice sums arise as values of basis functions in the Poisson solutions [85]: By noting some striking connections with Jacobi ϑ-function values, Crandall, Zucker and the present authors were able to develop new closed forms for certain values of the arguments r k [85].
Computing high-precision values of such sums was facilitated by deriving formulas such as where O + means positive odd integers, which is valid for x, y ∈ [−1, 1].
After extensive high-precision numerical experimentation using (9), we discovered (then proved) the remarkable fact that when x and y are rational, where A is an algebraic number, namely the root of an algebraic equation with integer coefficients.
In this case we computed α = A 8 = exp(8πφ 2 (x, y)) (as it turned out, the '8' substantially reduces the degree of polynomials and so computational cost), and then generated the vector (1, α, α 2 , • • • , α d ), which, for various conjectured values of d, was input to the multipair PSLQ program.When successful, Table 1.Minimal polynomials found by PSLQ computations k Minimal polynomial for exp(8πφ 2 (1/k, 1/k)) the program returned the vector of integer coefficients (a 0 , a 1 , a 2 , • • • , a d ) of a polynomial satisfied by α as output.With some experimentation on the degree d, and after symbolic verification using Mathematica, we were able to ensure that the resulting polynomial is in fact the minimal polynomial satisfied by α.Table 1 shows some examples of the minimal polynomials discovered by this process [85].
Using this data, we were able to conjecture a formula that gives the degree d as a function of k [85].These computations required prodigiously high precision: up to 20,000-digit floating-point arithmetic in some cases, such as in finding the degree-128 polynomial satisfied by α = exp(8πφ 2 (1/32, 1/32)).Related computations in a follow-up study required up to 50,000-digit arithmetic.

Integrals Arising in the Study of Wigner Electron Sums
In this section, we report some new results on "Wigner electron sums," which are discussed in [86].Throughout this section, Q(x) = Q(x 1 , . . ., x d ) is a positive definite quadratic form in d variables with real coefficients and determinant ∆ > 0.
On the other hand, each integral β N (s) is only defined for Re s < d/2.A priori it is unclear, for any s, whether the limit σ(s) := lim N →∞ σ N (s) should exist.In what follows, we will write σ(s) for the limit.The infinite sums and integrals (which never both converge) were introduced by Wigner eighty years ago to offer an alternative way of viewing crystal sums within quantum theory, but rigorous mathematical analysis is much more recent see [86].
It is therefore natural to ask in what senses the phenomenon observed for the cubic lattice when d = 3 extends both to higher dimensions and to more general quadratic forms.Theorem 8.1 below extends this result to arbitrary positive definite Q in all dimensions.

Jump Discontinuities in Wigner Limits
As above, let Q(x) = Q A (x) = 1 i,j d a ij x i x j , with A = (a ij ) 1 i,j d symmetric and positive definite.Set also B(s) = tr(A)A − 2(s + 1)A 2 .Finally, define with λ d−1 the induced (d − 1)-dimensional measure on the faces.
Then the limit σ(s) := lim N →∞ σ N (s) exists in the strip d/2 − 1 < Re s < d/2 and for s = d/2 − 1.In the strip, σ(s) coincides with the analytic continuation of α(s).On the other hand, with V Q as introduced in equation (13).

The Behavior of
We now examine the nature of V Q (d/2 − 1) in somewhat more detail.From the definition (13) we obtain that The equality is a consequence of [86,Lemma 2.5].Theorem 8.1 reduces to the result given in [86] in the cubic lattice case.In that case, A = I and tr(A) = d, so the integral in (15), involving the logarithm, vanishes.Hence, in agreement with the value given in [86].
In [86] the question was not settled of whether V Q (d/2 − 1) can possibly vanish for some positive definite quadratic form Q. However, a simple criterion for V Q (d/2 − 1) < 0 was given.Proposition 8.2 ( [86]).Let Q be a positive definite quadratic form with for all x with x ∞ = 1.Then V Q (d/2 − 1) < 0. The same conclusion holds if ' ' is replaced with ' ' in (16).
Proposition 8.2 can fail comprehensively if its conditions are not fully met [86].Nonetheless, on the basis of our analysis, motivated by cases where Proposition 8.2 does not apply, we were led to the physically attractive conjecture below.Truth of the conjecture would provide confidence to try to resolve it by proving that V Q (s) is sufficiently strongly decreasing for all s > 0.

Numerical Exploration of
In an attempt to better understand Conjecture 8.3, we employed double-double computations to test the inequality V (d/2 − 1) < 0 (what follows are new results for this paper).We first observed that the integral in (15) decomposes as 2d integrals of the form The task at hand is to explore whether the inequality can ever fail.
As the dimension grows, the cost of the required numerical integration increases substantially, as noted above in Section 6.1.Thus, we opted only to computationally examine whether the ratio L/R = LHS/RHS in (17) is always less than one for d = 3, which is the physically most meaningful case of Conjecture 8.3.In our tests of Conjecture 8.3, we programmed both sides of (17) using double-double arithmetic with the QD software package [14].This permitted highly accurate analysis, which is essential since, as we discovered, the most interesting regions of the space also correspond to nearly singular matrices and correspondingly difficult integrands.
In particular, we generated a set of 1000 symmetric positive definite 3 × 3 test matrices, each of which was constructed as A = ODO * , where O is the orthonormalization of a pseudorandom 3 × 3 matrix with integer entries chosen in [−1000, 1000], and D is a diagonal matrix with integer entries in [1,1000].Note that the diagonal entries of D are also the eigenvalues of the test matrix A = ODO * .The resulting 2-D integrals implicit in (17) were computed using the tanh-sinh quadrature scheme ([61, pp.312-315], [71]) with sufficiently large numbers of quadrature points to ensure more than 10-digit accuracy in the results.
We found that the ratio L/R is indeed less than one in all cases, but that it is close to one for those matrices whose eigenvalues have two very small entries and one large entry.For example, a test matrix A with eigenvalues (1, 5, 987) produced the ratio 0.987901 . .., and a test matrix A with eigenvalues (1, 1, 999) produced the ratio 0.993626 . ... Finally, we explored the case where the O matrix is generated pseudorandomly as above, but the eigenvalues in D (and thus the eigenvalues of A) are set to (1, 1, 10 n ), for n = 1, 2, • • • , 6.The resulting L/R ratios are shown in Table 2, truncated to 8 digits, with quadrature levels and run times in seconds.
Here the column labeled "quad.level" indicates the quadrature level Q.The number of quadrature points, approximately 8 × 2 Q in each of the two dimensions, is a index of the computational effort required.
All of these tests confirm that indeed the ratio L/R < 1, although evidently it can be arbitrarily close to one with nearly singular matrices.Thus, we see no reason to reject Conjecture 8.3.
In October 2014, mathematical physicists Mathieu Lewin and Elliott H. Lieb published an analysis consistent with our result, together with an argument concluding that the derivative never vanishes [89].

Requirements for Future High-Precision Arithmetic Software
As we have seen, high-precision arithmetic facilities are indispensable for a growing body of numerically demanding applications spanning numerous disciplines, especially applications in mathematical physics.Software for performing such computations, and tools for converting scientific codes to use this software, are in general more efficient, robust and useable than they were in the past.Yet it is clear, from these examples and other experiences, that the presently available tools still have a long way to go.
Even commercial packages, which in general are significantly more complete than open software packages, could use some improvements.For example, a recent study by the present authors and Richard Crandall of Mordell-Tornheim-Witten sums, which arise in mathematical physics, required numerical values of derivatives of polylogarithms with respect to the order.Our version of Maple did not offer this functionality, and while our version of Mathematica attempted to evaluate these derivatives, the execution was rather slow and did not return the expected number of correct digits [90].
Here are some specific areas of needed improvement.

High-Precision and Emerging Architectures
The scientific computing world is moving into multicore and multinode parallel computing, because the frequency and performance of individual processors is no longer rapidly increasing [91].It is possible to perform high-precision computations in parallel by utilizing message passing interface (MPI) software at the application level (rather than attempting to parallelize individual high-precision operations).MPI employs a "shared none" environment that avoids many difficulties.Indeed, numerous high-precision applications have been performed on highly parallel systems using MPI, including the Ising study mentioned in Section 6.1 [29].
But on modern systems that feature multicore processors, parallel computing is more efficiently performed using a shared memory, threaded environment such as OpenMP [91] within a single node, even if MPI is employed for parallelism between nodes.Computations that use an environment such as OpenMP must be entirely "thread-safe," which means, among other things, that no shared variables are actively written to, or otherwise there may be difficulties with processors stepping on each other during parallel execution.Employing "locks" and the like may remedy such difficulties, but only by reducing parallel efficiency.
From a brief survey performed by the present authors, only a few of the currently available high-precision software packages are certified thread-safe (in most cases, the available documentation makes no statement one way or the other).One high-level package that is thread-safe is the MPFR / MPFR C++ library [21,23], which has a "thread-safe" build option.It is is based on the GNU multiprecision library, which in turn is thread-safe.Also, one of the present authors has recently completed the MPFUN2015 package, a new Fortran-90-based system that is thread-safe [24].Both of these two packages employ data structures for multiprecision data that incorporate the working precision level and other key information, enabling the precision level to be dynamically changed during execution without loss of thread safety.
Another important development here is the recent emergence of graphics processing units (GPUs) and on-chip accelerators (such as Intel's MIC design), which are now being employed for large-scale high-performance computing applications [91].At the time of this writing, five of the top ten computer Recent research gives hope in this arena-see, for instance [15, pp. 215-245], [72, [95][96][97]-but these published schemes need to be implemented in publicly available high-precision computing environments.
Along this line, our experience with high precision implementation of special functions in Mathematica has been rather disappointing.This is not only our opinion.For example, Johansson [97] writes, "The series expansion of the gamma function in Mathematica, included for scale, seems to be poorly implemented."

Reproducibility
As we noted above in Section 1, reproducibility is increasingly important in scientific computing, at least in the general sense of producing numerically reliable results consistent across different platforms and equivalent implementations [1,2].For example, computations involved in highly regulated industries, such as the financial, pharmaceutical and aircraft industries, are increasingly required to certify their numerical reliability.Required or not, engineers often deploy reproducibility checks to forestall legal challenges.
As is well known, architectural differences, subtle changes in the order in which compilers generate instructions and even changes to the processor count can alter results.What's more, round-off errors inevitably accumulate in large calculations, which may render results invalid.Some of these difficulties stem from the fact that floating-point arithmetic operations (standard or high-precision) are not associative: (a+b)+c is not guaranteed to be the same as a+(b+c).They are further exacerbated in a highly parallel environment, where operations are performed disjointly on many different processors, and where deterministic, sequential order of execution typically cannot be guaranteed.
One benefit of high-precision arithmetic is to enhance reproducibility of results, since it can dramatically reduce floating-point roundoff error.Extra precision is not a cure-all, though.One frequently-cited example is due to Siegfried Rump [98, pg. 12 where a = 77617 and b = 33096.On a present-day IEEE-754 system, the results for 32-bit, 64-bit, double-double and quad-double are −6.338253× 10 29 , −1.1805916207 × 10 21 , 1.1726039400 and −0.8273960599, respectively.Only the last value, computed with quad-double (62-digit) arithmetic, returned a correct answer to 10-digit accuracy.On at least one other system, 32-bit and 64-bit results were nearly the same (but wrong), potentially leading a user to think that 64-bit was sufficient.In most cases, including Rump's polynomial, by appropriately adjusting the level of precision, a high level of reproducibility can be achieved.But there is no foolproof approach; some analysis and experimentation is always required.Some researchers are investigating facilities to guarantee bit-for-bit reproducibility in standard IEEE floating-point arithmetic, not just at the single operation level, but also at the application level, across different platforms, although for the time being they feature only partial reproducibility [99].Similar work is also underway to provide this functionality in a high-precision environment.For example, in a significant step forward, the GMP, MPFR and MPFR C++ packages now provide correctly rounded arbitrary precision arithmetic, in analogy to the correctly rounded modes of IEEE-754 arithmetic.
At the application level, however, high-precision bit-for-bit reproducibility is still a challenge, since high-precision operations, like their IEEE-754 equivalents, are not associative, and high-level compilers may alter the order of operations that are sent to the lower-level multiprecision routines.In Maple, for example, according to a Maplesoft researcher whom we consulted, "all bets are off" once more than one operation is involved.What's more, even if bit-for-bit reproducibility is achieved for high precision on a single-processor computer system, parallel systems present additional challenges, for the reasons mentioned above.
Thus for the foreseeable future, scientific researchers and engineers who use high-precision libraries must design their application-level codes to be robust in the presence of some level of numerical error and non-reproducibility.
Additionally, bit-for-bit reproducibility, although it may be helpful for some high-precision applications, has the potential downside of masking serious numerical difficulties.Even with IEEE 32-or 64-bit floating-point codes, numerical problems often come to light only when a minor code change or a run on a different number of processors produces a surprisingly large change in the results (recall the experience of the ATLAS code, mentioned in the Introduction).Thus ensuring bit-for-bit reproducibility in high-precision computation might hide such problems from the user.At the least, bit-for-bit reproducibility should be an optional feature, and rounding modes should be controllable by the user, just as they are with the IEEE-754 standard.

Figure 2 .
Figure 2. Left: Error vs. CPU time for the numerical solution of the unstable periodic orbit LR for the Lorenz model (in double-double arithmetic) using a Runge-Kutta code (dop853), an extrapolation code (odex)[50,51] and a Taylor series method (TIDES).Right: Error vs. CPU time for the numerical solution of an unstable periodic orbit for the Lorenz model (in 500-digit arithmetic) using the TIDES code.Taken from[52].

Figure 3 Figure 3 .
Figure2(left) compares computations of the unstable periodic orbit LR in the Lorenz model[53] with three methods: (a) the Taylor method (using the TIDES code[54]), (b) an eighth order Runge-Kutta method (using the dop853 code[55]), and (c) an extrapolation method (using the ODEX code)[50,51].In symbolic dynamics notation, "LR" means one loop around the left equilibrium point, followed by one loop around the right equilibrium point.When these methods are performed using double precision arithmetic, the Runge-Kutta code is most accurate, but when done in double-double or higher-precision arithmetic, the Taylor method is the fastest.In many problems, high-precision arithmetic is required to obtain accurate results, and so for such problems the Taylor scheme is the only reliable method among the standard methods.The right graph in Figure2shows the error for an unstable orbit using 500-digit arithmetic using the TIDES code[52].Figure3demonstrates the need for high accuracy in these solutions.It shows the results for completing 16 time periods of the L 25 R 25 unstable periodic orbit of the Lorenz model using the the

Figure 4 .
Figure 4. Fractal property of the Lorenz attractor.The first plot shows a rectangle in the plane.All later plots zoom in on a tiny region (too small to be seen by the unaided eye) at the center of the red rectangle of the preceding plot to show that what appears to be a line is in fact not a line.(Reproduced with permission from [58]).

Itera4on&number&Figure 5 .
Figure 5. Plot of log 10 (min |y i |) versus iteration number in a typical multipair PSLQ run.Note the sudden drop, by nearly 200 orders of magnitude, at iteration 199.

Table 2 .
Computational results and run times for tests of Conjecture 8.3.