Stable and Efficient Gaussian-Based Kolmogorov–Arnold Networks
Abstract
1. Introduction
2. Related Work
3. Mathematical Background
3.1. Kolmogorov–Arnold Representation Theorem
3.2. Approximation Theory in Sobolev Spaces
3.3. Conditioning and Numerical Stability
- 1.
- Regularization: Penalty terms such as improve effective conditioning by adding to the Hessian of the loss, bounding the inverse curvature.
- 2.
- Iterative refinement: Gradient descent naturally explores well-conditioned directions in parameter space, implicitly avoiding pathological subspaces associated with small singular values.
- 3.
- Adaptive parameterization: Learning width parameters individually for each basis function allows the network to automatically adjust widths to maintain numerical stability while preserving approximation power where needed.
3.4. Native Space Theory and Reproducing Kernel Hilbert Spaces
3.5. Gaussian vs. B-Spline Basis Functions: Comparative Analysis
3.6. Neural Tangent Kernel Perspective
4. Shared-Basis Architecture and Gradient Computation
4.1. Architectural Design Principles
4.2. Convergence Rate Preservation Under Parameter Sharing
- (i)
- Shared-basis parameterization with K global centers quasi-uniformly distributed on achieves fill distance
- (ii)
- Per-connection parameterization with independent centers quasi-uniformly distributed on achieves
- (iii)
- Both interpolants satisfy identical convergence rates:where the constant C depends on s and the quasi-uniformity constant from Equation (8) but is independent of the parameter storage scheme.
4.3. Numerical Stability via Width Parameter Regularization
- (i)
- ;
- (ii)
- The exponential argument satisfies the bound
4.4. Forward Propagation
| Algorithm 1 Shared-basis RBF-KAN forward propagation |
|
4.5. Backward Propagation via Automatic Differentiation
- Coefficient gradients (line 4): operations.
- Basis parameter gradients (lines 6–13): operations.
- Input gradients (lines 15–20): operations.
| Algorithm 2 Shared-basis RBF-KAN backpropagation | |
| Require: Forward activations , output gradient | |
| Ensure: Parameter gradients | |
| 1: for do | |
| 2: Compute | ▹ Coefficients |
| 3: Compute | ▹ Biases |
| 4: Initialize and | |
| 5: for do | |
| 6: Evaluate | |
| 7: Compute | ▹ Weighted gradient |
| 8: Evaluate | |
| 9: Accumulate | ▹ Theorem 8 |
| 10: if then | |
| 11: Accumulate | ▹ Theorem 9 |
| 12: end if | |
| 13: end for | |
| 14: if then | ▹ Input gradients when not at input layer |
| 15: Initialize | |
| 16: for do | |
| 17: Compute | |
| 18: Evaluate | |
| 19: Accumulate | ▹ Negative sign, Theorem 10 |
| 20: end for | |
| 21: end if | |
| 22:end for |
4.6. Initialization, Optimization, and Stability
4.6.1. Parameter Initialization Strategy
4.6.2. Optimization Protocol
- Coefficients: (baseline learning rate).
- Centers: (compensates for the scaling established in Theorem 8).
- Widths: (compensates for the scaling established in Theorem 9).
4.6.3. Regularized Objective Function
- : coefficient sparsity penalty via Frobenius norm.
- : width penalty discouraging excessively narrow basis functions.
- : center diversity enforcement via Gaussian repulsion with characteristic scale
- (i)
- Coefficient Boundedness: The Frobenius norm of each coefficient tensor satisfiesuniformly over all layers and iterations .
- (ii)
- Width Lower Bound: Each width parameter satisfiesuniformly over all basis indices , layers , and iterations .
- (iii)
- Center Diversity: The separation between distinct centers satisfieswhere the probability is taken over the stochastic optimization trajectory.
5. Experimental Results
5.1. Image Classification on MNIST
Experimental Setup
- 1.
- Standard KAN: Original KAN implementation with B-spline basis functions and SiLU activation
- 2.
- RBF-KAN: Our proposed shared-basis architecture with learnable Gaussian RBFs
5.2. MNIST Classification with Controlled Architecture
5.2.1. Learning Dynamics and Generalization Analysis
5.2.2. Architectural Implications
5.2.3. Comparison with Alternative Efficient KAN Variants
5.3. Global Support and Generalization: Theoretical Analysis
5.3.1. Support Structure and Effective Capacity
5.3.2. Regularization Mitigation and Empirical Ablation
5.4. Comparison with Standard MLP Baseline
5.5. Physics-Informed Neural Networks for PDE Solving
5.5.1. General PINN Formulation
5.5.2. Case 1: Elliptic PDE
5.5.3. Case 2: Parabolic PDE
5.5.4. Case 3: Hyperbolic PDE
5.6. Convergence Dynamics and Initialization Sensitivity
5.6.1. Parameter Sharing Strategies
5.6.2. Ablation Studies
5.7. Computational Efficiency Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| KAN | Kolmogorov–Arnold Network |
| RBF | Radial Basis Function |
| MLP | Multilayer Perceptron |
| PINN | Physics-Informed Neural Network |
| PDE | Partial Differential Equation |
| ReLU | Rectified Linear Unit |
| SiLU | Sigmoid Linear Unit |
| GELU | Gaussian Error Linear Unit |
| MNIST | Modified National Institute of Standards and Technology |
| NTK | Neural Tangent Kernel |
| GPU | Graphics Processing Unit |
Appendix A
References
- Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control. Signals Syst. 1989, 2, 303–314. [Google Scholar] [CrossRef]
- Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
- Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 1991, 4, 251–257. [Google Scholar] [CrossRef]
- Pinkus, A. Approximation theory of the MLP model in neural networks. Acta Numer. 1999, 8, 143–195. [Google Scholar] [CrossRef]
- Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 2017, 107, 3–11. [Google Scholar] [CrossRef] [PubMed]
- Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for activation functions. arXiv 2017, arXiv:1710.05941. [Google Scholar] [CrossRef]
- Cho, K.; Van Merriënboer, B.; Bahdanau, D.; Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv 2014, arXiv:1409.1259. [Google Scholar] [CrossRef]
- Sak, H.; Senior, A.W.; Beaufays, F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Interspeech; ISCA: Kolkata, India, 2014; Volume 2014, pp. 338–342. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
- Kramer, M.A. Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 1991, 37, 233–243. [Google Scholar] [CrossRef]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
- Wan, Z.; Zhang, Y.; He, H. Variational autoencoder based synthetic data generation for imbalanced learning. In Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI), Honolulu, HI, USA, 27 November–1 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–7. [Google Scholar]
- Little, C.; Elliot, M.; Allmendinger, R.; Samani, S.S. Generative adversarial networks for synthetic data generation: A comparative study. arXiv 2021, arXiv:2112.01925. [Google Scholar] [CrossRef]
- Yang, L.; Zhang, Z.; Song, Y.; Hong, S.; Xu, R.; Zhao, Y.; Zhang, W.; Cui, B.; Yang, M.H. Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv. 2023, 56, 1–39. [Google Scholar] [CrossRef]
- Kolmogorov, A.N. On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk. Sssr 1957, 114, 953–956. [Google Scholar]
- Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. KAN: Kolmogorov-Arnold Networks. arXiv 2024, arXiv:2404.19756. [Google Scholar]
- Park, J.; Sandberg, I.W. Universal approximation using radial-basis-function networks. Neural Comput. 1991, 3, 246–257. [Google Scholar] [CrossRef]
- Schaback, R. Error estimates and condition numbers for radial basis function interpolation. Adv. Comput. Math. 1995, 3, 251–264. [Google Scholar] [CrossRef]
- Wendland, H. Scattered Data Approximation; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
- Fornberg, B.; Piret, C. A stable algorithm for flat radial basis functions on a sphere. SIAM J. Sci. Comput. 2011, 30, 60–80. [Google Scholar] [CrossRef]
- Arnold, V.I. On the representation of continuous functions of three variables by superpositions of continuous functions of two variables. Mat. Sb. 1957, 48, 3–74. [Google Scholar]
- Braun, J.; Griebel, M. On a constructive proof of Kolmogorov’s superposition theorem. Constr. Approx. 2009, 30, 653–675. [Google Scholar] [CrossRef]
- Kůrková, V. Kolmogorov’s theorem and multilayer neural networks. Neural Netw. 1992, 5, 501–506. [Google Scholar] [CrossRef]
- Buhmann, M.D. Radial Basis Functions: Theory and Implementations; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Li, Z. Kolmogorov-Arnold Networks are Radial Basis Function Networks. arXiv 2024, arXiv:2405.06721. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
- Hendrycks, D.; Gimpel, K. Gaussian error linear units (GELUs). arXiv 2016, arXiv:1606.08415. [Google Scholar]
- Di Vicino, A.; De Luca, P.; Marcellino, L. First Experiences on Exploiting Physics-Informed Neural Networks for Approximating Solutions of a Biological Model. In Computational Science–ICCS 2025 Workshops; Paszynski, M., Barnard, A.S., Zhang, Y.J., Eds.; ICCS 2025. Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2025; Volume 15910. [Google Scholar] [CrossRef]
- Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
- Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-informed machine learning. Nat. Rev. Phys. 2021, 3, 422–440. [Google Scholar] [CrossRef]
- Raissi, M.; Yazdani, A.; Karniadakis, G.E. Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science 2020, 367, 1026–1030. [Google Scholar] [CrossRef]
- Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics informed deep learning (part II): Data-driven discovery of nonlinear partial differential equations. arXiv 2017, arXiv:1711.10566. [Google Scholar] [CrossRef]
- Chen, Y.; Lu, L.; Karniadakis, G.E.; Negro, L.D.; Negro, L. Physics-informed neural networks for inverse problems in nano-optics and metamaterials. Opt. Express 2021, 28, 11618–11633. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Teng, Y.; Perdikaris, P. Understanding and mitigating gradient flow pathologies in physics-informed neural networks. Siam J. Sci. Comput. 2021, 43, A3055–A3081. [Google Scholar] [CrossRef]
- Chen, F.; Sondak, D.; Protopapas, P.; Mattheakis, M.; Liu, S.; Agarwal, D.; Di Giovanni, M. NeuroDiffEq: A Python package for solving differential equations with neural networks. J. Open Source Softw. 2021, 5, 1931. [Google Scholar] [CrossRef]
- Lu, L.; Jin, P.; Pang, G.; Zhang, Z.; Karniadakis, G.E. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nat. Mach. Intell. 2021, 3, 218–229. [Google Scholar] [CrossRef]
- Mishra, S.; Molinaro, R. Estimates on the generalization error of physics-informed neural networks for approximating PDEs. IMA J. Numer. Anal. 2022, 43, 1–43. [Google Scholar] [CrossRef]
- De Ryck, T.; Mishra, S. Error estimates for physics-informed neural networks approximating the Navier-Stokes equations. IMA J. Numer. Anal. 2024, 44, 83–119. [Google Scholar] [CrossRef]
- Wang, S.; Wang, H.; Perdikaris, P. On the eigenvector bias of Fourier feature networks: From regression to solving multi-scale PDEs with physics-informed neural networks. Comput. Methods Appl. Mech. Eng. 2021, 384, 113938. [Google Scholar] [CrossRef]
- Jacot, A.; Gabriel, F.; Hongler, C. Neural tangent kernel: Convergence and generalization in neural networks. Adv. Neural Inf. Process. Syst. 2018, 31, 8571–8580. [Google Scholar]
- Allen-Zhu, Z.; Li, Y.; Song, Z. A convergence theory for deep learning via over-parameterization. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 242–252. [Google Scholar]
- Du, S.; Lee, J.; Li, H.; Wang, L.; Zhai, X. Gradient descent finds global minima of deep neural networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 1675–1685. [Google Scholar]
- Arora, S.; Du, S.S.; Hu, W.; Li, Z.; Salakhutdinov, R.R.; Wang, R. On exact computation with an infinitely wide neural net. Adv. Neural Inf. Process. Syst. 2019, 32, 8141–8150. [Google Scholar]
- Adams, R.A.; Fournier, J.J. Sobolev Spaces, 2nd ed.; Academic Press: Cambridge, MA, USA, 2003. [Google Scholar]
- Bozorgasl, Z.; Chen, H. Wav-KAN: Wavelet Kolmogorov-Arnold Networks. arXiv 2024, arXiv:2405.12832. [Google Scholar] [CrossRef]
- Afzal Aghaei, A. fKAN: Fractional Kolmogorov-Arnold Networks with trainable Jacobi basis functions. Neurocomputing 2025, 623, 129414. [Google Scholar] [CrossRef]
- Afzal Aghaei, A. rKAN: Rational Kolmogorov-Arnold Networks. arXiv 2024, arXiv:2406.14495. [Google Scholar]
- Daubechies, I. Ten Lectures on Wavelets; CBMS-NSF Regional Conference Series in Applied Mathematics; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1992; Volume 61. [Google Scholar]
- Mallat, S. A Wavelet Tour of Signal Processing; Academic Press: San Diego, CA, USA, 1999. [Google Scholar]







| Architecture | Basis | Complexity | MNIST | Speedup | Wave |
|---|---|---|---|---|---|
| Standard KAN | B-spline | 89.1% | 1.0× | ||
| RBF-KAN | Gaussian | 87.8% | 1.4× | ||
| WavKAN | Wavelets | ∼ | ∼1.3× | ∼ | |
| fKAN | Frac. Jacobi | ∼ | ∼1.2× | ∼ | |
| rKAN | Rational | ∼ | ∼1.1× | ∼ |
| Architecture | Configuration | Parameters |
|---|---|---|
| Standard KAN | Width | , |
| RBF-KAN | Width | , |
| Architecture | Parameters | Training Time (s) | Memory (GB) | Speedup |
|---|---|---|---|---|
| Standard KAN | 682 | 4.8 | ||
| RBF-KAN | 487 | 3.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
De Luca, P.; Di Nardo, E.; Marcellino, L.; Ciaramella, A. Stable and Efficient Gaussian-Based Kolmogorov–Arnold Networks. Mathematics 2026, 14, 513. https://doi.org/10.3390/math14030513
De Luca P, Di Nardo E, Marcellino L, Ciaramella A. Stable and Efficient Gaussian-Based Kolmogorov–Arnold Networks. Mathematics. 2026; 14(3):513. https://doi.org/10.3390/math14030513
Chicago/Turabian StyleDe Luca, Pasquale, Emanuel Di Nardo, Livia Marcellino, and Angelo Ciaramella. 2026. "Stable and Efficient Gaussian-Based Kolmogorov–Arnold Networks" Mathematics 14, no. 3: 513. https://doi.org/10.3390/math14030513
APA StyleDe Luca, P., Di Nardo, E., Marcellino, L., & Ciaramella, A. (2026). Stable and Efficient Gaussian-Based Kolmogorov–Arnold Networks. Mathematics, 14(3), 513. https://doi.org/10.3390/math14030513

