High-Speed Scientific Computing Using Adaptive Spline Interpolation
Abstract
1. Introduction
2. Foundations and Related Work
2.1. Function Approximation Using Polynomial Interpolation
2.2. Function Approximation Using Spline Interpolation
2.3. Adaptive Spline Interpolation
3. Materials and Methods
3.1. Adaptive Spline Interpolation Algorithm
- 1.
- Specify the maximum error tolerance εmax.Comments: The maximum error tolerance, εmax, is the maximum acceptable absolute error between the spline S(x) and the function f(x) for any value of x in the interval [a, b]. Since the current study considers the approximation of statistical cumulative distribution functions (which generate p-values), εmax was set to a very small value of 1 × 10−8 (or 0.00000001) when constructing the spline interpolation models that were used in the experiments described later in this section.
- 2.
- Identify the endpoints of the spline’s interval (a and b).Comments: For statistical cumulative distribution functions, a can be easily identified by using the distribution’s inverse CDF, F−1(x), such that
- Using this approach, any valid values of x that are less than a can be evaluated as S(a) without the result exceeding the maximum error tolerance εmax. If the CDF does not exhibit point symmetry about the mean (e.g., the chi-squared distribution), then b can also be easily identified by using the distribution’s inverse CDF, such that
- 3.
- Generate the training data.Comments: Fitting a spline model within a specified error tolerance naturally requires a set of points to use as the basis for evaluating the model’s accuracy. The x-coordinates for these data points should span the closed interval [a, b], with their corresponding y-coordinates being directly computed using the original function that the spline model is being trained to approximate. Since the maximum error tolerance εmax in the current study was very small, a large dataset containing one million points was used as the basis for constructing each of the spline models that are described in the experiments later in this section.
- 4.
- Define and fit the initial spline model.Comments: To be parsimonious, a spline model must approximate its original function within the maximum error tolerance εmax using the fewest possible knots. The simplest possible spline, of course, is one that consists of a single polynomial, and this simplest model is a rational point of embarkation for the iterative spline construction process. The initial spline model should thus consist of a single cubic polynomial. Fitting a cubic polynomial requires a minimum of four points, so in addition to the two interval endpoints a and b (i.e., the boundary knots), the initial spline models used in the current study’s experiments included two additional interior knots that were equidistantly spaced between a and b. These initial models were then fitted using their corresponding training datasets.
- 5.
- Iteratively add knots to the spline model until the maximum observed error falls below εmax.Comments: After defining the initial spline, the model is iteratively expanded by adding new interior knots until the maximum absolute error between the spline S(x) and the function f(x) falls below the maximum error threshold εmax for all values of x in the interval [a, b]. This is accomplished by iteratively performing the following sequence of steps:
- (a)
- Evaluate the error function for S(x) using the training data.
- (b)
- Identify the maximum absolute error and the point within the training dataset at which that maximum error value was observed.
- (c)
- If the maximum observed error is less than εmax, then no additional knots are necessary. Otherwise,
- i.
- Add a new knot to the model at the point at which the maximum error value was observed.
- ii.
- Fit the revised spline model using the training data.
- iii.
- Go to step 5.a.
This approach to constructing the spline model targets the region of poorest fit, thereby ensuring a maximally efficient reduction in error for each additional knot that is added to the model [36]. This approach has also been shown to yield a final model that is vastly more efficient than could otherwise be obtained by using a uniformly spaced vector of knots [37]. - 6.
- Prune unnecessary knots from the spline model.Comments: After a spline model that satisfies Equation (1) has been identified, unnecessary interior knots must be pruned in order to ensure that the spline model is as parsimonious as possible. This is accomplished by performing the following sequence of steps for each interior knot in the model:
- (a)
- Remove the current knot from the spline model.
- (b)
- Fit the revised spline model using the training data.
- (c)
- If the revised model no longer satisfies Equation (1), then restore the current knot.
- (d)
- Proceed to the next interior knot.
3.2. Additional Considerations
- Fit a spline model for df = 1 that is accurate within εmax. This model becomes the initial reference model Sref.
- Use a binary search to find the next essential spline model. This will be the model S whose df value is closest to that of Sref for which the maximum absolute error between Sref and S exceeds εmax. Once identified, this model becomes the new reference model Sref. For any degrees of freedom that fall between the previous reference model and the new reference model, the previous model can be used to calculate values of the corresponding CDF because the maximum absolute error between that model and the hypothetical model for the specified df will always be less than or equal to εmax.
- Repeat Step 2 until all essential spline models between df = 1 and dfmax have been identified, trained, and added to the collection.
3.3. Evaluative Experiments
4. Results and Discussion
4.1. Model Characteristics
4.2. Experiment Results—Computational Accuracy
4.3. Experiment Results—Computational Speed
5. Summary, Limitations, and Concluding Remarks
5.1. Summary and Contributions
5.2. Limitations
5.3. Concluding Remarks
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| CDF | Cumulative Distribution Function |
| df | Degrees of Freedom |
| GPU | Graphics Processing Unit |
| MAE | Mean Absolute Error |
References
- Heinecke, A. Accelerators in Scientific Computing Is It Worth the Effort? In Proceedings of the 2013 International Conference on High Performance Computing & Simulation (HPCS), Helsinki, Finland, 1–5 July 2013; p. 504. [Google Scholar]
- Rahman, A.F.B.; Yusof, Z.B. Optimizing Resource Allocation for Big Data Workloads in Cloud Computing Platforms. Algorithms Comput. Theory Optim. Tech. Appl. Res. Q. 2024, 14, 15–27. [Google Scholar]
- Cheng, S.; Liu, B.; Shi, Y.; Jin, Y.; Li, B. Evolutionary Computation and Big Data: Key Challenges and Future Directions. In Proceedings of the International Conference on Data Mining and Big Data, Bali, Indonesia, 25–30 June 2016; pp. 3–14. [Google Scholar]
- Prudius, A.; Karpunin, A.; Vlasov, A. Analysis of Machine Learning Methods to Improve Efficiency of Big Data Processing in Industry 4.0. J. Phys. Conf. Ser. 2019, 1333, 032065. [Google Scholar] [CrossRef]
- Geist, A.; Reed, D.A. A Survey of High-Performance Computing Scaling Challenges. Int. J. High Perform. Comput. Appl. 2017, 31, 104–113. [Google Scholar] [CrossRef]
- Pilz, K.F.; Heim, L.; Brown, N. Increased Compute Efficiency and the Diffusion of Ai Capabilities. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; pp. 27582–27590. [Google Scholar]
- Goldstein, I.; Spatt, C.S.; Ye, M. Big Data in Finance. Rev. Financ. Stud. 2021, 34, 3213–3225. [Google Scholar] [CrossRef]
- Knuteson, B.; Padley, P. Statistical Challenges with Massive Datasets in Particle Physics. J. Comput. Graph. Stat. 2003, 12, 808–828. [Google Scholar] [CrossRef]
- Schnase, J.L.; Lee, T.J.; Mattmann, C.A.; Lynnes, C.S.; Cinquini, L.; Ramirez, P.M.; Hart, A.F.; Williams, D.N.; Waliser, D.; Rinsland, P. Big Data Challenges in Climate Science: Improving the Next-Generation Cyberinfrastructure. IEEE Geosci. Remote Sens. 2016, 4, 10–22. [Google Scholar] [CrossRef]
- Zou, J.; Huss, M.; Abid, A.; Mohammadi, P.; Torkamani, A.; Telenti, A. A Primer on Deep Learning in Genomics. Nat. Genet. 2019, 51, 12–18. [Google Scholar] [CrossRef] [PubMed]
- Fan, J.; Han, F.; Liu, H. Challenges of Big Data Analysis. Natl. Sci. Rev. 2014, 1, 293–314. [Google Scholar] [CrossRef]
- Klemetti, A.; Raatikainen, M.; Myllyaho, L.; Mikkonen, T.; Nurminen, J.K. Systematic Literature Review on Cost-Efficient Deep Learning. IEEE Access 2023, 11, 90158–90180. [Google Scholar] [CrossRef]
- Soper, D.S. Greed Is Good: Rapid Hyperparameter Optimization and Model Selection Using Greedy K-Fold Cross Validation. Electronics 2021, 10, 1973. [Google Scholar] [CrossRef]
- Soper, D.S. Hyperparameter Optimization Using Successive Halving with Greedy Cross Validation. Algorithms 2022, 16, 17. [Google Scholar] [CrossRef]
- Laughlin, G.; Aguirre, A.; Grundfest, J. Information Transmission between Financial Markets in Chicago and New York. Financ. Rev. 2014, 49, 283–312. [Google Scholar] [CrossRef]
- Adewusi, A.O.; Okoli, U.I.; Adaga, E.; Olorunsogo, T.; Asuzu, O.F.; Daraojimba, D.O. BBusiness Intelligence in the Era of Big Data: A Review of Analytical Tools and Competitive Advantage. Comput. Sci. IT Res. J. 2024, 5, 415–431. [Google Scholar] [CrossRef]
- Shah, T.R. Can Big Data Analytics Help Organisations Achieve Sustainable Competitive Advantage? A Developmental Enquiry. Technol. Soc. 2022, 68, 101801. [Google Scholar] [CrossRef]
- Bu, Y.; Howe, B.; Balazinska, M.; Ernst, M.D. The Haloop Approach to Large-Scale Iterative Data Analysis. VLDB J. 2012, 21, 169–190. [Google Scholar] [CrossRef]
- Cody, W.J. Algorithm 715: Specfun–a Portable Fortran Package of Special Function Routines and Test Drivers. ACM Trans. Math. Softw. (TOMS) 1993, 19, 22–30. [Google Scholar] [CrossRef]
- Hill, G.W. Acm Algorithm 395: Student’s T-Distribution. Commun. ACM 1970, 13, 617–619. [Google Scholar] [CrossRef]
- De, R.; Bush, W.S.; Moore, J.H. Bioinformatics Challenges in Genome-Wide Association Studies (Gwas). Clin. Bioinform. 2014, 1168, 63–81. [Google Scholar] [CrossRef]
- DiDonato, A.R.; Morris, A.H., Jr. Computation of the Incomplete Gamma Function Ratios and Their Inverse. ACM Trans. Math. Softw. (TOMS) 1986, 12, 377–393. [Google Scholar] [CrossRef]
- Gasca, M.; Sauer, T. Polynomial Interpolation in Several Variables. Adv. Comput. Math. 2000, 12, 377–410. [Google Scholar] [CrossRef]
- Stirling, J. Methodus Differentialis: Sive Tractatus de Summatione et Interpolatione Serierum Infinitarum; Typis Gul. Bowyer; Impensis G. Strahan: London, UK, 1730. [Google Scholar]
- Runge, C. Über Empirische Funktionen Und Die Interpolation Zwischen Äquidistanten Ordinaten. Z. Math. Phys. 1901, 46, 20. [Google Scholar]
- Schoenberg, I.J. Contributions to the Problem of Approximation of Equidistant Data by Analytic Functions. Part A. On the Problem of Smoothing or Graduation. A First Class of Analytic Approximation Formulae. Q. Appl. Math. 1946, 4, 45–99. [Google Scholar] [CrossRef]
- Schoenberg, I.J. Contributions to the Problem of Approximation of Equidistant Data by Analytic Functions. Part B. On the Problem of Osculatory Interpolation. A Second Class of Analytic Approximation Formulae. Q. Appl. Math. 1946, 4, 112–141. [Google Scholar] [CrossRef]
- Cox, M.G. The Numerical Evaluation of B-Splines. IMA J. Appl. Math. 1972, 10, 134–149. [Google Scholar] [CrossRef]
- De Boor, C. On Calculating with B-Splines. J. Approx. Theory 1972, 6, 50–62. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R. Generalized Additive Models. Stat. Sci. 1986, 1, 297–310. [Google Scholar] [CrossRef]
- De Boor, C. A Practical Guide to Splines; Springer: New York, NY, USA, 1978; Volume 27. [Google Scholar]
- Magalhaes, P.A.A.; Magalhaes, P.A.A., Jr.; Magalhaes, C.A.; Magalhaes, A.L.M.A. New Formulas of Numerical Quadrature Using Spline Interpolation. Arch. Comput. Methods Eng. 2021, 28, 553–576. [Google Scholar] [CrossRef]
- Budzinskiy, S.; Razgulin, A. Defocus Optical Transfer Function: Fast Evaluation and Lightweight Storage Based on Cubic Spline Interpolation. J. Opt. Soc. Am. A 2019, 36, 436–442. [Google Scholar] [CrossRef] [PubMed]
- Romano, D.; Kovacevic-Badstuebner, I.; Antonini, G.; Grossner, U. Accelerated Evaluation of Quasi-Static Interaction Integrals Via Cubic Spline Interpolation in the Framework of the Peec Method. IEEE Trans. Electromagn. Compat. 2024, 66, 829–836. [Google Scholar] [CrossRef]
- Wegman, E.J.; Wright, I.W. Splines in Statistics. J. Am. Stat. Assoc. 1983, 78, 351–365. [Google Scholar] [CrossRef]
- Jupp, D.L. Approximation to Data by Splines with Free Knots. SIAM J. Numer. Anal. 1978, 15, 328–343. [Google Scholar] [CrossRef]
- Dierckx, P. Curve and Surface Fitting with Splines; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
- Ross, S.M. A First Course in Probability; Pearson Harlow: London, UK, 2020. [Google Scholar]
- Casella, G.; Berger, R. Statistical Inference; Chapman and Hall/CRC: New York, NY, USA, 2024. [Google Scholar]
- Cormen, T.H.; Leiserson, C.E.; Rivest, R.L.; Stein, C. Introduction to Algorithms; MIT Press: Cambridge, MA, USA, 2022. [Google Scholar]
- Wilson, E.B.; Hilferty, M.M. The Distribution of Chi-Square. Proc. Natl. Acad. Sci. USA 1931, 17, 684–688. [Google Scholar] [CrossRef] [PubMed]
- Van Rossum, G.; Drake, F.L., Jr. The Python Language Reference; Python Software Foundation: Wilmington, DE, USA, 2014. [Google Scholar]
- Harris, C.R.; Millman, K.J.; Van Der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J. Array Programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
- Marsaglia, G. Evaluating the Normal Distribution. J. Stat. Softw. 2004, 11, 1–11. [Google Scholar] [CrossRef]
- Welch, B.L. The Generalization of ‘Student’s’problem When Several Different Population Varlances Are Involved. Biometrika 1947, 34, 28–35. [Google Scholar]




| Function | Input Range | Degrees of Freedom Range |
|---|---|---|
| Standard Normal CDF | −100 to 100 | N/A |
| Student’s t CDF | −10,000 to 10,000 | 1 to 100,000 |
| Chi-Squared CDF | 0 to 1,000,000 | 1 to 1,000,000 |
| Function | Number of Spline Models | Knots per Model |
|---|---|---|
| Standard Normal CDF | 1 | 90 |
| Student’s t CDF | 9,857 | 124 to 268 |
| Chi-Squared CDF | 46,418 | 12 to 193 |
| Observed Absolute Error | ||||
|---|---|---|---|---|
| Function | Trials | Minimum | Mean | Maximum |
| Standard Normal CDF | 30 | 0.0 | 9.73 × 10−11 | 1.00 × 10−8 |
| Student’s t CDF | 30 | 0.0 | 1.21 × 10−9 | 9.93 × 10−9 |
| Chi-Squared CDF | 30 | 0.0 | 3.02 × 10−11 | 9.99 × 10−9 |
| Mean Wall-Clock Time (Seconds) | |||
|---|---|---|---|
| Function | Trials | SciPy Algorithms | Spline Models |
| Standard Normal CDF | 30 | 243.117 (sd = 3.551) | 2.863 (sd = 0.035) *** |
| Student’s t CDF | 30 | 273.869 (sd = 2.091) | 31.364 (sd = 0.231) *** |
| Chi-Squared CDF | 30 | 270.129 (sd = 2.158) | 36.041 (sd = 0.689) *** |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Soper, D.S. High-Speed Scientific Computing Using Adaptive Spline Interpolation. Big Data Cogn. Comput. 2025, 9, 308. https://doi.org/10.3390/bdcc9120308
Soper DS. High-Speed Scientific Computing Using Adaptive Spline Interpolation. Big Data and Cognitive Computing. 2025; 9(12):308. https://doi.org/10.3390/bdcc9120308
Chicago/Turabian StyleSoper, Daniel S. 2025. "High-Speed Scientific Computing Using Adaptive Spline Interpolation" Big Data and Cognitive Computing 9, no. 12: 308. https://doi.org/10.3390/bdcc9120308
APA StyleSoper, D. S. (2025). High-Speed Scientific Computing Using Adaptive Spline Interpolation. Big Data and Cognitive Computing, 9(12), 308. https://doi.org/10.3390/bdcc9120308

