Analytic Solution to the Piecewise Linear Interface Construction Problem and its Application in Curvature Calculation for Volume-of-Fluid Simulation Codes

The plane-cube intersection problem has been around in literature since 1984 and iterative solutions to it have been used as part of piecewise linear interface construction (PLIC) in computational fluid dynamics simulation codes ever since. In many cases, PLIC is the bottleneck of these simulations regarding compute time, so a faster, analytic solution to the plane-cube intersection would greatly reduce compute time for such simulations. We derive an analytic solution for all intersection cases and compare it to the one previous solution from Scardovelli and Zaleski (Ruben Scardovelli and Stephane Zaleski."Analytical relations connecting linear interfaces and volume fractions in rectangular grids". In: Journal of Computational Physics 164.1 (2000), pp. 228-237.), which we further improve to include edge cases and micro-optimize to reduce arithmetic operations and branching. We then extend our comparison regarding compute time and accuracy to include two different iterative solutions as well. We find that the best choice depends on the employed hardware platform: on the CPU, Newton-Raphson is fastest with vectorization while analytic solutions perform better without. The reason for this is that vectorization instruction sets do not include trigonometric functions as used in the analytic solutions. On the GPU, the fastest method is our optimized version of the analytic SZ solution. We finally provide details on one of the applications of PLIC: curvature calculation for the Volume-of-Fluid model used for free surface fluid simulations in combination with the lattice Boltzmann method.


Introduction
Piecewise linear interface construction (PLIC) -first occurring in literature for 2D in 1982 [1] and for 3D in 1984 [2] -refers to the problem of calculating the offset along the given normal vector of a plane intersecting a unit cube for a given truncated volume. There are five possible intersection cases (cf. figure 1), of which the numbers (1), (2) and (5) have been already solved in the original 1984 work by Youngs [2], but the cubic polynomial cases (3) and (4) -resigned as impossible to algebraically invert [3] -in the majority of literature are approximated by a Newton-Raphson iterative solution. Nevertheless, there does exist an analytic solution by Scardovelli and Zaleski (SZ) [4] and a single documented implementation thereof in Fortran [5] which also includes an approximative version termed APPLIC.
Here, we formulate the PLIC problem from the ground up -first in the inverse direction -and derive an alternative analytic solution for all intersection cases by inverting the inverse formulation. We then compare our novel solution with (i) the original SZ solution, (ii) an improved and micro-optimized version of the SZ solution developed in the present work, (iii) an iterative solution using Newton-Raphson and (iv) an iterative solution using nested intervals. Depending on the available microarchitecture (GPU/CPU), vectorization may be available, which strongly favors multiplications and additions while not speeding up trigonometric functions, impacting which of the algorithms is fastest.
Among the applications for PLIC are Volume-of-Fluid simulation codes such as FluidX3D [6,7] and others [8][9][10][11][12][13][14][15][16][17][18], often in conjunction with GPU implementations [6, 1 7, 19-25] of the lattice Boltzmann method [26][27][28], used for simulating free surface fluid flows. In particular, these simulations work on a cubic lattice with every lattice point having a fill level assigned to it and PLIC is used in the process of surface reconstruction during curvature calculation for calculating physical surface tension effects [6,8]. In the final section of this work, we provide a detailed overview on the state-of-the-art curvature calculation procedure using PLIC.

Plane-Cube Intersection
Inputs to the PLIC algorithm are the truncated volume V 0 ∈ [0, 1] and the (normalized) normal vector of the plane n = (n x , n y , n z ) T , | n| = n 2 x + n 2 y + n 2 z = 1. The desired output is the plane offset from the origin (center of the unit cube) along the normal vector d 0 V 0 , (n x , n y , n z ) T → d 0 (1) ]. The interval is determined by the normal vector orientation: depending on the normal vector, the maximum possible distance from the cube center to be still at least touching the cube in one point varies between 1 2 (normal vector parallel to one of the coordinate system axes) and √ 3 2 (normal vector along the space diagonal).

Applying symmetry conditions to reduce problem complexity
To reduce the amount of possible cases and to avoid having to consider all possible intersections of the plane and cube edges -following the scheme in [2] and [14] 1 -the normal vector is component-wise mirrored into positive. The mirrored normal vector components are sorted ascending for their magnitude such that 0 ≤ n 1 ≤ n 2 ≤ n 3 ≤ 1. Because n is normalized, the absolute value of its largest component n 3 is always greater than zero.
This symmetry condition for the case V 0 > 1 2 is now applied to d 0 , and the coordinate origin is shifted from (0, 0, 0) (center of the unit cube) to (− 1 2 , − 1 2 , − 1 2 ) (bottom left corner in the back in fig. 1), resulting in the distance d ∈ [0, n1+n2+n3 2 ] in reduced symmetry space: With this reduction in symmetry, there are only five different intersection cases remaining (see figure 1).
2 ( Figure 1: All possible intersection cases of a plane and a unit cube. The truncated volume of cases (1) to (4) is a tetrahedral pyramid with zero (1), one (2), two (3) or all three (4) corners extending outside of the unit cube being cut-off tetrahedral pyramids themselves.

Formulating the inverse PLIC problem
In order to derive the analytic PLIC solution, first the inverse problem is formulated in equations -again following the scheme in [2]. In the inverse problem, the intersection volume is calculated from the plane offset and normal vector as inputs. At first, the intersection points s 1 , s 2 and s 3 of the plane with the coordinate system axes (see figure 1) are determined: Now one calculates the actual volume in reduced symmetry space. The approach is to calculate the volume of the tetrahedral pyramid formed by the plane and the coordinate system axes and, if necessary, subtract the volumes of zero, one, two or all three corners that extend beyond 1. For case (3), an additional condition is required to mutually exclude case (5), in which the bottom four corners of the cube are located beneath the plane.
if s 2 ≥ 1 and s 3 ≤ 1 and s 1 (s 2 − 1) ≤ s 2 To shorten equation (10), s 1 , s 2 and s 3 are substituted and the expression is simplified, yielding which is already quite a bit more friendly and completes the inverse PLIC formulation in conjunction with equations (2) to (4), (7) and (6). The condition for case (5) is the remaining free sector of the possible range of d mutually excluded by the other four cases. Listing 1 shows the fully optimized OpenCL C implementation of the inverse PLIC solution.

Inverting the inverse PLIC Formulation analytically
Equation (11) is now inverted for each case individually. Cases (1), (2) and (5) are easy, but cases (3) and (4) are non-trivial third order polynomials. Here we make use of the tool Mathematica, which outputs three complex solutions for cases (3) and (4) each (section 9), of which the third solutions respectively are the right ones as their imaginary parts in the desired range are zero after simplification. Luckily, both are of the same overall form (eq. (13)). However, a complex solution is not useful here since the expected result is a real number -a problem known as the casus irreducibilis -and most programming languages cannot deal with complex numbers natively. It would also lead to unwanted computational overhead to carry along the imaginary part during computation, which in the end will be zero anyway. To overcome this, we again make use of Mathematica to simplify the general form of the complex solution (eq. (13)) in order to obtain the real, trigonometric solution, which is then even further simplified using the trigonometric identity √ 3 sin(α)−cos(α) = −2 sin( π 6 −α) such that the number of trigonometric functions (which are computationally expensive compared to simpler operations such as additions or multiplications) is minimized (eq. (12)).
For better readability, a few expressions are pre-defined. Hereby the normalization condition n 2 1 + n 2 2 + n 2 3 = 1 is applied.
In equation (25) it is noteworthy that the conditions for the five different cases are determined a posteriori by the result itself. This means that each case has to be evaluated successively and for the resulting value d the respective condition has to be tested. If the condition is true, calculation is stopped and d is returned. If the condition is false, the next case has to be evaluated and so on, until the last case is reached.
The order in which the cases are computed and checked can be optimized to calculate the most difficult and infrequent cases last, when the probability is high that one of the easier and more frequent cases has already been chosen. 'Frequent' here refers to some cases appearing more often than others with randomized V 0 and n as expected in typical PLIC applications. Here also special considerations for edge cases (more on that below) need to be taken into account to avoid possible divisions by zero. With this in mind, the order (5)→(2)→(1)→(3)→(4) is preferred.
Additional speedup can be gained by noting that the implicit condition involving d can be replaced by an explicit condition involving V for cases (1), (2) and (5): Since V is known, these conditions are checked a priori in order to avoid root function calls if the condition is false.
For even more speedup, all redundant mathematical operations are reduced to a minimum by pre-calculating them to variables (micro-optimization) and condition checks mutually excluded by previous checks are skipped, especially all conditions for the very last case. In the implementation order (5) → (2) → (1) → (3) → (4) the conditions for case (3) simplify to d 3 ≤ n 3 , which will mutually exclusive decide between cases (3) and (4). Since both d 3 and d 4 are very complicated expressions, here no simplified a priori condition is formulated.
In equations (19), (21) and (25), the argument of the square root may be negative before the case condition is tested. In this case -since in the actual code, floatingpoint exception handling is turned off for performance reasons -the resulting NaN of a square root of a negative number would not be captured in the case condition, leading to a wrong result. An additional fdim function call in the square root solves this issue. In the implementation, we artificially exclude the edge case x 4 = 0 in order to instead of atan2(y, x) use the faster atan(y/x), giving the algorithm a 15% speedup. In case branching would be undesirable, bit masking is also an option, but bit masking turned out to be slower even on GPUs.
Two edge cases still need to be taken into careful consideration: n 1 = 0 (2D) and n 3 = 1 (1D). n 1 = 0 restricts n to be in a 2D plane of two coordinate system axes and n 3 = 1 restricts n to be parallel to one of the coordinate system axes. For the (1D) case, n 3 = 1 and the solution is always (5), so n 1 = n 2 = 0 and min(n 1 + n 2 , n 3 ) = 0, simplifying equation (25) without loss of generality to A clean derivation of the 1D case yields without any conditions, so it is a necessary requirement that both additional conditions in eq. (27) must be fulfilled automatically. d is in the range 0 ≤ d ≤ n1+n2+n3 2 , so here in the special case we have 0 ≤ d ≤ In the (2D) case, n 1 = 0 (n 2 = 0 is excluded here since it is already covered in the (1D) case, so here n 2 > 0). Here only intersection cases (2) or (5) are possible, 0 < n 2 ≤ n 3 ≤ 1 and min(n 1 + n 2 , n 3 ) = n 2 , simplifying eq. (25) without loss of generality to A clean derivation of the 2D case yields which has simpler conditions, so again it is a necessary requirement that both additional conditions in eq.

(29) must be fulfilled automatically. d is in the range
, so here with the special conditions we have 0 ≤ d ≤ 0+n2+n3 2 ≤ n3+n3 2 = n 3 , meaning that both 0 ≤ d and d ≤ n 3 are indeed fulfilled automatically.
Listing 2 shows the fully optimized OpenCL C implementation of the analytic PLIC solution with equation (26).

The analytic SZ solution
The analytic PLIC solution by Scardovelli and Zaleski from 2000 [4] has been implemented in Fortran in 2016 by Kawano [5] where it is used as comparison to the approximative APPLIC method. Here we focus on the exact SZ solution. The SZ solution is particularly interesting in that it builds upon the L 1 -normalized plane normal vector instead of the more common L 2 normalization as used in our own solution in section 2.3. We first translate the Fortran implementation to OpenCL C, make it compatible with an L 2 -normalized plane normal vector as input and rescale the result from the original [0, 1] to [− ]. The implementation is provided in listing 3.
When closely studying the implementation in listing 3, we notice that the 1D edge case is poorly handled despite the addition of a tiny constant to the denominator for the calculation of v1. In the edge cases of the normal vector n = (1, 0, 0) T and the volume V 0 ∈ {0, 1}, the algorithm returns -NaN both in our OpenCL and in the original Fortran implementation. By checking cases (5) and (2) first and by avoiding divisions by vm1 and vm2 before the checks for cases (5) and (2) respectively, we improve handling of the 1D edge case. We further apply some micro-optimization to significantly reduce the number of arithmetic operations, resulting in an optimized implementation of the SZ solution shown in listing 4 below.

Performance and Accuracy Comparison
Apart from floating-point errors, the SZ solution in listing 4 and our own solution in listing 2 produce identical results. For accuracy comparison, we define the error as with plic cube inverse(d 0 , n i ) referring to the implementation in listing 1 and plic cube(V 0 , n i ) referring to the various PLIC implementations. We define the average error as follows: The execution time for a 'blank run' containing the time for memory loads and stores as well as compute time for the inverse PLIC algorithm is measured and later subtracted from the execution times of the different PLIC variants in order to isolate their execution time. The time is also divided by the number of PLIC function evaluations (N L) in order to obtain the time for a single execution.

CPU Testing
In this test, n i is set to N = 4096 different normal vectors, of which one is (1, 0, 0) T , one is ( 1 When AVX2 vectorization is available, Newton-Raphson is considerably faster than all the other solutions. However we note that our benchmark is a particularly synthetic scenario where vectorization is easily achievable for the compiler: The benchmark just calculates PLIC repeatedly in a loop. With FP32 and AVX2, every eight consecutive iterations are combined into one vectorized iteration, completing in far fewer clock cycles than the eight iterations separately would. In realworld applications, PLIC is not computed repeatedly in multiples of eight, greatly limiting vectorization potential such that the strict criteria for auto-vectorization [29] might not be met. Surprisingly, our analytic solution and the optimized SZ solution are within margin of error regarding compute time. In the edge cases of the normal vector n = (1, 0, 0) T and the volume V 0 ∈ {0, 1}, the SZ solution by Kawano returns -NaN, which propagates through the entire error averaging procedure. For the IEEE-754 FP32 floatingpoint format, the machine epsilon is at = 5.96 · 10 −8 , meaning that for all other PLIC variants the average error E avg is within machine precision. Without compiler optimization (/Od, /Qpar-), the execution time results are very different as shown in table 2. With AVX2 vectorization not available, Newton-Raphson considerably falls behind the analytic solutions.

GPU Testing
Since in many applications the target platform for the PLIC algorithm is the GPU, we also benchmark the different variants in OpenCL on an Nvidia Titan Xp GPU (3840 CUDA cores at 1582 MHz, driver version 442.59, OpenCL 1.2). The test here differs from the CPU C++ test in that for sufficient saturation of the parallel compute capabilities, the number of random normal vectors is increased to 67108864, of which again one is (1, 0, 0) T , one is ( 1 √ 2 , 1 √ 2 , 0) T , 8388606 are random 2D directions in the x-y-plane and the remaining 58720256 are random 3D directions. For these normal vectors, PLIC is run in parallel and for each of them, the volume V 0 is varied in the interval [0, 1] -edge cases included -in L = 256 equally spaced steps in series. For more accurate results through averaging, this test is run 64 times and the mean execution time is averaged and divided by the number of parallel and serial PLIC computations, resulting in an average compute time of more than three magnitudes shorter than for single-core CPU execution as listed in table 3 below. Even though the tests are designed differently for the CPU and GPU, this comparison is appropriate because in both cases the same algorithms are computed and the hardware is fully saturated. The table also includes the number of arithmetic operations (floating-point, integer and bit operations combined) N a and the number of branching operations N b in PTX assembly [30]. GPUs are especially bad at branching, so a small N b is desired. These operation counts -in analogy to the compute time -refer to the isolated PLIC variants with the background 'blank run' for memory loads and stores as well as the inverse PLIC algorithm subtracted. N a indicates that Newton-Raphson is unrolled by the compiler whereas nested intervals is not. The number of iterations to reach machine precision for nested intervals depends on the initial interval width, which is a function of the normal vector components. For Newton-Raphson, the number of branching operations N b is smallest, resulting in rather good performance. Nevertheless, our optimized implementation of the SZ solution pulls ahead rather significantly and thus is preferred by us.

Plane-Sphere Intersection
The PLIC problem can be extended to spherical cells with unit volume (see figure 2): The offset along the plane normal vector is the desired result while the truncated volume is given. Since we have a sphere, the normal vector direction of the plane is irrelevant.
Calculating the inverse PLIC (getting the volume V 0 from a given offset d 0 ∈ [−r, r]) is straight-forward and covered by the 'spherical cap' equation: Calculating the inverse function of the above equation again is quite difficult due to it being a third order polynomial with three complex solutions, but by using the same trick as in the plane-cube intersection case (equations (12) and (13)), this real expression is obtained:  Volume-of-Fluid (VoF) is a model to simulate a sharp, freely moving interface between a fluid and gas phase in a Cartesian lattice [8][9][10][11]. While it can be coupled to any flow solver, here we focus on its usage in conjunction with the Lattice-Boltzmann-method (LBM). The interface is ensured to be exactly one lattice cell thick at any time (illustrated in figure 3). As an indicator for each lattice point type, the fill level ϕ is introduced, whereby for fluid lattice points ϕ = 1, for interface 1 > ϕ > 0 and for gas ϕ = 0: Here ρ is the density provided by the lattice Boltzmann method (LBM) and m is the fluid mass. m is a conserved quantity and cannot be gained or lost, only moved within the simulation box. The key difficulty of modeling a free surface on a discretized lattice is to obtain the surface curvature, which is a necessary ingredient for calculating the surface tension via the Young-Laplace pressure with κ = 1 R denoting the local mean curvature and σ denoting the surface tension parameter of the simulated fluid. The equation is easy in principle, but calculating κ from the discretized interface geometry is not. Specifically, discretized interface here means that only a local 3 3 neighborhood of fill levels ϕ ∈ [0, 1] is known in addition to the point in the center of this neighborhood being an interface lattice point.
The most common algorithm in literature [8,11] is the curvature calculation via a least-squares paraboloid fit from a neighborhood of points located on the interface. It assumes the local interface to be a paraboloid, the specifics of which will be given in the following sections.
Finding an appropriate set of neighboring points on the interface requires the PLIC solution.

Obtaining neighboring interface points: PLIC point neighborhood
Piecewise linear interface construction works on a 3 3 neighborhood of an interface lattice point (illustrated in 2D in figure 4a). Within this neighborhood, only interface points other than the center interface pointwhich is always interface -are considered as candidates for the later fitting procedure (figure 4b). The difficult part is to accurately obtain the heights z i of at least five neighboring points located on the true interface (figure 4c). For this, first the normal vector n of the center interface point is calculated via the Parker-Youngs approximation as described in the appendix 10.1 in eq. (54). A new coordinate system is introduced with its first base vector b z defined as this normal vector. Then, the cross product with an arbitrary vector such as r := (0.56270900, 0.32704452, 0.75921047) T which is always non-colinear with b z just by random chance is calculated to provide second and third orthonormal vectors forming the new coordinate system in which the z-axis is colinear with the surface normal and the center interface point is in the origin. Now the relative positions e i (equal to the D3Q27 LBM streaming directions, eq. (53)) of all neighboring interface lattice points are gathered and transformed into the rotated coordinate system. During this step, the approximate interface position of neighboring interface points (streaming directions, figure  4b) is corrected to the much more accurate interface Here i is only the subset of {0, .., 26} for which 0 < ϕ i < 1 is true (interface points). d 0 (V 0 = ϕ i , n) denotes the PLIC function (equation (8)). Note that d 0 (V 0 = ϕ 0 , n) only needs to be calculated once while d 0 (V 0 = ϕ i , n) has to be calculated for each neighboring interface point and that the normal vectors of neighboring interface lattice points are approximated to be equal to the normal vector of the center lattice point. In theory, going with the separately calculated neighbor normal vectors -which would require either an additional data buffer for normal vectors in memory or alternatively a 5 3 neighborhood which would break the multi-GPU capabilities of the code -should be more accurate, but our practical tests indicated no significant difference (see figure 5).
The set of points p i is then used to fit a local paraboloid. This paraboloid (figure 4e) here lacks a vertical offset parameter as that is handled already by the center point being defined as the origin, reducing computational cost to a LU-decomposition of dimensionality N = 5. The paraboloid has the form z(x, y) = Ax 2 + By 2 + Cxy + Hx + Iy =: x · Q (47)

Validating Curvature Calculation
To test the accuracy of the presented curvature calculation method, we validate it on spherical drops of different radius R = 1 κ theo . The fill levels ϕ are initialized with the inverse PLIC algorithm. To let the drop relax and the error converge, we simulate up to 50000 LBM time steps with D3Q19 SRT (τ = 1, σ = 0.001). The curvature error is calculated using the L 1 error norm with summation over all interface points: We plot the error in figure 5. For low drop radius R < 32, the error of our proposed method (red curve) is around 0.5%. However as R is increased, the error also generally increases as a result of more interface points in the summation and increased likelihood that the deviation in curvature for some points is much larger than average. If we do not make the assumption that the PLIC normal direction of neighbor interface points has to equal the center normal vector (orange curve), then accuracy is slightly improved. However the additional computational complexity may not justify this small improvement in accuracy.
For comparison, we also plot the curvature approximation method proposed by Donath [16] without and with the proposed π 4 correction factor. As expected, the error is much larger, but this method is also computationally less expensive than ours and certainly has its use-cases as well. . For comparison, we provide the curvature approximation method proposed by Donath [16] in both its variants (green and blue).

Application Example: Simulating a terminal Velocity Raindrop Impact
The analytic plane-cube intersection solution presented in this work has originally been developed for the VoF-LBM GPU simulation code FluidX3D, where we could observe a significant speedup compared to when an iterative nested-intervals solution is used. To illustrate this particular application of PLIC, we show a simulation of a 5 mm diameter raindrop impact at terminal velocity in figure 6. The parameters for this simulation are Re = 35195, We = 5702, Fr = 41.54, Ca = 0.1620, Bo = 3.3042. The simulation code FluidX3D is documented in great detail in [6].

Conclusions
We derived an analytic solution to the PLIC problem and compared it to the existing solution by Scardovelli and Zaleski [4] in two variants: an implementation by Kawano [5] and an improved and micro-optimized version thereof. We furthermore compared these three analytic solutions to two iterative solutions using Newton-Raphson and nested intervals. We provide OpenCL C implementations of all variants as well as the inverse PLIC formulation. We observed that in a synthetic benchmark scenario where AVX2 vectorization is available on the CPU, the Newton-Raphson solution (listing 6) is considerably faster than all other solutions because the analytic solutions require trigonometric functions which cannot be vectorized. Our benchmark calculated PLIC repeatedly in a loop. With FP32 and AVX2, every eight consecutive iterations are combined into one vectorized iteration, completing in far fewer clock cycles than the eight iterations separately would. In real-world applications however, PLIC is not computed repeatedly in multiples of eight, greatly limiting vectorization potential. Without vectorization, on both the CPU and GPU the analytic solutions are faster, with our micro-optimized version of the SZ solution as presented in listing 4 being fastest. For a generic PLIC problem, this is the solution we recommend. In the most common application of PLICcurvature calculation for Volume-of-Fluid LBM, which we presented and also validated on spherical drops -profiling revealed PLIC to be the main bottleneck regarding compute time. Here, our proposed fast PLIC solution led to significant speed up of VoF calculations. We hope that our findings will also make other simulation codes more computationally efficient.

12
(a) t = 0 ms (b) t = 1 ms (c) t = 2 ms (d) t = 3 ms (e) t = 4 ms (f) t = 5 ms Figure 6: A 5 mm diameter raindrop impacting a lake at 9.2 m s mean sea level pressure terminal velocity [31] and 10 • C water temperature simulated with the VoF-LBM GPU simulation code FluidX3D at a lattice resolution of 400 × 400 × 340 with the D3Q19 discretization and the SRT collision operator. The simulation box in SI-units has the dimensions 5.00 cm × 5.00 cm × 4.25 cm and the pool height is 2.00 cm. Compute time for this simulation is less than one minute on a single Nvidia Titan Xp GPU. Visualization is done with a custom GPU implementation of the marching cubes algorithm [32][33][34]. 13

Acknowledgements
We gratefully acknowledge computing time provided by the SuperMUC system of the Leibniz Rechenzentrum, Garching. We further acknowledge support through the computational resources provided by the Bavarian Polymer Institute. We acknowledge the NVIDIA Corporation for donating a Titan Xp GPU for our research.

Conflicts of interest
The authors declare no potential conflicts of interests.

Availability of data and material
All data is archived and available upon request.

Code availability
All code beyond what is provided in the listings is archived and available upon request.

FullSimplify[ComplexExpand[Re[f]]]
mean curvature. The strategy for finding the required fitting parameters is to apply a least-squares fit on a neighborhood of points on the interface.

Curvature from Least-Squares Paraboloid Fit
The least-squares method [41] is a procedure for fitting an analytic curve -here a Monge patch -to a set of discretized points located nearby the analytic curve. The general idea is to define the total error as a general expression of all fitting parameters and the entire set of discretized points and then find its global minimum by zeroing its gradient. The analytic curve first needs to be written in a dot product form z(x, y) = x · Q (64) with x being defined as the vector of parameters that define the curve and Q = Q(x, y) being an expression of the continuous coordinates x and y. This equation is then discretized to a set of individual data points ( with Q i = Q i (x i , y i ) being a vector containing expressions only dependent on a discretized set of points (x i , y i ) whose corresponding z-component z i is located close to the curve. In this notation, the error E between the z-positions of the analytic curve x · Q and a set of z-positions of at least N neighboring interface points z i is defined by summing up the squared differences whereby N denotes the dimensionality which is equal to the number of desired fitting parameters. The gradient of the error E is calculated and set to zero, where the error must have a global minimum: With some algebra, this equation is then transformed into a linear equation which is solved by LU-decomposition and provides the desired solution x that uniquely defines the curve. Note that the matrix M is always symmetrical, meaning that only the upper half and diagonal have to be calculated explicitly and the lower half is copied over. This reduces computational cost significantly due to every matrix element being a sum over an expression depending on all fitted points. In case there are less than N data points available (lattice points next to solid boundaries may have less interface neighbors), the regular fitting will not work. Instead, then the amount of fitting parameters is decreased to match the number of available data points by reducing dimensionality in the LU-decomposition. The ignored fitting parameters will remain zero.
Finally, from the solution vector x the constants defining the fitted curve are extracted and the curvature is calculated from them using equation (63).