1. Introduction
High-speed Monte Carlo simulations are used for across a broad spectrum of applications, from mathematics to economics. As input for such simulations, the probability distributions are usually generated by pseudo-random number sampling, a method derived from the work of John von Neumann in 1951 [
1]. In the era of “big data”, such methods have to be fast and reliable, and a sign of this necessity was the release of Quside’s inaugural processing unit in 2023 [
2]. However, these samplings need to be cross-validated by exact methods, and for this, the knowledge of analytical functions that describe the stochastic processes, and among those, the error function, are of tremendous importance.
By definition, a function is called analytic if it is locally given by a converging Taylor series expansion. Even if a function itself is not found to be analytic, its inverse could be analytic. The error function could be given analytically, and one of these analytic expressions was the integral representation found by Craig in 1991 [
3]. Craig mentioned this representation only briefly and did not provide a derivation of it. Since then, there have been a couple of derivations of this formula [
4,
5,
6]. In
Section 2, we describe an additional one that is based on the same geometric considerations as employed in [
7]. In
Section 3, we provide the series expansion for Craig’s integral representation and show the rapid convergence of this series.
For the inverse error function, the guidance for special functions (e.g., [
8]) do not unveil such an analytic property. Instead, this function has to be approximated. Known approximations date back to the late 1960s and early 1970s [
9,
10]) and include semi-analytical approximations by asymptotic expansion (e.g., [
11,
12,
13,
14,
15,
16]. Using the same geometric considerations, as shown in
Section 4, we developed a couple of useful approximations that can easily be implemented in different computer languages, resulting in the deviations from an exact treatment. In
Section 5, we discuss our results and evaluate the CPU time.
Section 6 contains our conclusions.
2. Derivation of Craig’s Integral Representation
The authors of [
7] provided an approximation for the integral over the Gaussian standard normal distribution that is obtained by geometric considerations and is related to the cumulative distribution function via
, where
is the Laplace function. The same considerations apply to the error function
that is related to
via
Translating the results of [
7] into the error function, we obtained the approximation of order
p by the following:
where the
values
(
) are found in the intervals between
and
. A method for selecting those values was extensively described in [
7], where the authors showed the following:
for
. With
times larger precision, the following was expressed:
for
and
. For the parameters
of the upper limits of those intervals, we calculated the deviation by the following:
Given the values
with
, with the limit
, the sum over
n in Equation (
2) could be replaced by an integral with measure
to obtain the following:
3. Power Series Expansion
The integral in Equation (
6) could be expanded into a power series in
,
with
where
. The coefficients
could be expressed by the hyper-geometric function,
, also known as Barnes’ extended hyper-geometric function. However, we could derive a constraint for the explicit finite series expression for
that rendered the series in Equation (
7) convergent for all values of
t. In order to be self-contained, the intermediate steps to derive this constraint and to show the convergence were shown by the following, in which the sum over the rows of Pascal’s triangle was required:
Returning to Equation (
8), we had
. Therefore,
The result in Equation (
8) led to the following:
where the existence of a real number
is between
and 1, such that
. We found the following:
Because of
, there was again a real number
in the corresponding open interval so that the following was true:
As the latter was the power series expansion of
, which was convergent for all values of
t, the original series was then also convergent and, thus,
with the limiting value shown in Equation (
7). A more compact form of the power series expansion was expressed by the following:
4. Approximations for the Inverse Error Function
Based on the geometric approach described in [
7], we were able to describe simple, useful formulas that, when guided by consistently higher orders of the approximation (
2) for the error function, led to consistently more advanced approximations of the inverse error function. The starting point was the degree
, that is, the approximation in Equation (
3). Inverting
led to
, and using the parameter
from Equation (
3) yielded the following:
For , the relative deviation from the exact value t was less than , and for , the deviation was less than . Therefore, for , a more precise formula has to be used. As such, higher values for E appeared only in of the cases, so this would not significantly influence the CPU demand.
Continuing with
, we inserted
into Equation (
2) to obtain the following:
where
and
are the same as for Equation (
4). Using the derivative of Equation (
1) and approximating this by the difference quotient, we obtained the following:
resulting in
. In this case, for the larger interval
, the relative deviation
was less than
. Using
instead of
and inserting
instead of
, we obtained
with a relative deviation of maximally
for the same interval. The results are shown in
Figure 1.
The method could be optimized by a method similar to the shooting method in boundary problems, which would add dynamics to the calculation. Suppose that following one of the previous methods, for a particular argument
E, we found an approximation
for the value of the inverse error function of this argument. Using
, we could adjust the improved result
by inserting
and calculating
A for
. In general, this procedure provided a vanishing deviation close to
. In this case as well as for
, in the interval
, the maximal deviation was slightly larger than
, while up to
the deviation was restricted to
. A more general ansatz
could be adjusted by inserting
for
and
, and yielded the system of equations:
with
. Therefore,
could be solved for
A and
B to obtain the following:
For
, we obtained a relative deviation of
. For
, the maximal deviation was
. Finally, an adjustment of
led to the following:
where
. For
, the relative deviation was restricted to
, while up to
, the maximal relative deviation was
. The results for the deviations of
(
) for linear, quadratic, and cubic dynamical approximation are shown in
Figure 2.
5. Discussion
In order to test the feasibility and speed, we coded our algorithm in the computer language C under
Slackware 15.0 (
Linux 5.15.19) on an ordinary HP laptop with an Intel® Core™2 Duo CPU P8600 @ 2.4GHz with 3MB memory used. The dependence of the CPU time for the calculation was estimated by calculating the value
times in sequence. The speed of the calculation did not depend on the value for
E, as the precision was not optimized. This would be required for practical application. Using an arbitrary starting value
, we performed this test, and the results are shown in
Table 1. An analysis of this table showed that a further step in the degree
p doubled the runtime while the dynamics for increasing
n added a constant value of approximately
seconds to the result. Though the increase in the dynamics required the solution of a linear system of equations and the coding of the results, this endeavor was justified, as by using the dynamics, we could increase the precision of the results without sacrificing the computational speed.
The results for the deviations in
Figure 1 and
Figure 2 were multiplied by increasing the decimal powers in order to ensure the results were comparable. This indicated that the convergence was improved in each of the steps for
p and
n, at least by the corresponding inverse power, while the static approximations
in
Figure 1 showed both deviations were close to
, and for higher values of
E, the dynamical approximations in
Figure 2 showed no deviation at
and moderate deviations for higher values. However, the costs for an improvement step in either
p or
n was, at most, a 2-fold increase in CPU time. This indicated that the calculations and coding of expressions such as Equation (
9) were justified by the increased precision. Given the goals for the precision, the user could decide to which degrees of
p and
n the algorithm should be developed. In order to prove the precision, in
Table 2, we showed the convergence of our procedure for
with fixed and increasing values of
n. The last column shows the CPU times for
runs of the algorithm proposed in [
12] with
N given in the last column of the table in [
12], as coded in C.
6. Conclusions
In this paper, we developed and described an approximation algorithm for the determination of the error function, which was based on geometric considerations. As demonstrated in this paper, the algorithm can be easily implemented and extended. We showed that each improvement step improved the precision by a factor of ten or more, with an increase in CPU time of, at most, a factor of two or more. In addition, we provided a geometric derivation of Craig’s integral representation of the error function and a converging power series expansion for this formula.