Next Article in Journal
Synergistic Evolutionary Optimization with Reinforcement Learning for Multi-Objective Energy-Efficient Hybrid Flow Shop Scheduling
Previous Article in Journal
Efficient Minus and Signed Domination in Proper Interval Graphs with a Totally Unimodular Structure
Previous Article in Special Issue
Fractional Black–Scholes Under Memory Effects: A Sixth-Order Local RBF–FD Scheme with Integrated Multiquadric Kernels
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Ramanujan–Santos–Sales Hypermodular Operator Theorem and Spectral Kernels for Geometry-Adaptive Neural Operators in Anisotropic Besov Spaces

by
Rômulo Damasclin Chaves dos Santos
and
Jorge Henrique de Oliveira Sales
*,†
Postgraduate Program in Computational Modeling, Department of Exact Sciences, Santa Cruz State University, Ilhéus 45662-900, Brazil; rdcsantos@uesc.br
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Axioms 2026, 15(3), 192; https://doi.org/10.3390/axioms15030192
Submission received: 9 September 2025 / Revised: 13 November 2025 / Accepted: 19 November 2025 / Published: 6 March 2026
(This article belongs to the Special Issue Fractional Differential Equation and Its Applications)

Abstract

We present Hyperbolic Symmetric Hypermodular Neural Operators (ONHSH), a novel operator learning framework for solving partial differential equations (PDEs) in curved, anisotropic, and modularly structured domains. The architecture integrates three components: hyperbolic-symmetric activation kernels that adapt to non-Euclidean geometries, modular spectral smoothing informed by arithmetic regularity, and curvature-sensitive kernels based on anisotropic Besov theory. In its theoretical foundation, the Ramanujan–Santos–Sales Hypermodular Operator Theorem establishes minimax-optimal approximation rates and provides a spectral-topological interpretation through noncommutative Chern characters. These contributions unify harmonic analysis, approximation theory, and arithmetic topology into a single operator learning paradigm. In addition to theoretical advances, ONHSH achieves robust empirical results. Numerical experiments on thermal diffusion problems demonstrate superior accuracy and stability compared to Fourier Neural Operators and Geo-FNO. The method consistently resolves high-frequency modes, preserves geometric fidelity in curved domains, and maintains robust convergence in anisotropic regimes. Error decay rates closely match theoretical minimax predictions, while Voronovskaya-type expansions capture the tradeoffs between bias and spectral variance observed in practice. Notably, ONHSH kernels preserve Lorentz invariance, enabling accurate modeling of relativistic PDE dynamics. Overall, ONHSH combines rigorous theoretical guarantees with practical performance improvements, making it a versatile and geometry-adaptable framework for operator learning. By connecting harmonic analysis, spectral geometry, and machine learning, this work advances both the mathematical foundations and the empirical scope of PDE-based modeling in structured, curved, and arithmetically.
MSC:
46E35; 41A25; 35Q68; 42B35; 68T07; 58J20; 58B34; 65D15; 81T75

1. Introduction

Neural operator learning has rapidly evolved into a transformative approach for solving parametric partial differential equations (PDEs) by approximating mappings between infinite-dimensional function spaces. The pioneering work on Fourier Neural Operators (FNO) by Li et al. [1] introduced a mesh-independent architecture leveraging global spectral representations. This formulation offered significant advantages in speed and generalization for forward problems, especially on structured domains. Complementarily, DeepONet [2] introduced a universal approximation framework for nonlinear operators, grounding operator learning in theoretical results from functional analysis and enabling the separation of input and output branches via basis embeddings.
While these models offered foundational insights, their limitations on general geometries prompted the development of more geometrically expressive architectures. The CORAL framework [3] advanced the state of the art by integrating neural fields with coordinate-aware representations, allowing operators to generalize over non-Euclidean domains. In a similar direction, Geo-FNO [4] learned domain-specific deformations, aligning complex geometries with spectral grids. These innovations paved the way for curvature-adaptive operator learning architectures.
More recently, Wu et al. [5] introduced Neural Manifold Operators that intrinsically respect Riemannian geometry, capturing the dynamics of PDEs defined over curved manifolds. Parallel to this, Kumar et al. [6] proposed a probabilistic perspective with the Neural Operator-induced Gaussian Process (NOGaP), combining operator learning with uncertainty quantification, critical for inverse and data-scarce problems.
Derivative-informed neural operators [7] have since extended operator learning into the realm of PDE-constrained optimization under uncertainty, while neural inverse operators [8] tackle high-dimensional inverse problems using data-driven techniques. In the context of physical modeling, Fourier-based architectures have found application in wave propagation [9] and the preservation of physical structures [10]. To enhance robustness, Sharma and Shankar [11] proposed ensemble and mixture-of-experts DeepONets, while Lanthaler et al. [12] derived error estimates in infinite-dimensional settings, clarifying theoretical bounds.
Efforts to improve generalization and invertibility have also shaped recent directions. Models such as HyperFNO [13], Factorized FNO [14], and Invertible FNO [15] highlight how architectural refinements can enhance expressivity, parameter efficiency, and bidirectional solvability for PDEs.
Despite these advances, many of these operator architectures still struggle to capture mixed anisotropic smoothness, modular arithmetic structure, or hyperbolic curvature effects, critical features in systems governed by spectral asymmetry, transport on curved domains, and modular invariance. Classical approximation theory, including the work of Triebel [16], Bourgain and Demeter’s decoupling theory [17], and Hansen’s treatment of mixed smoothness [18], emphasizes the difficulty of approximating functions in anisotropic Besov-type spaces. These function spaces, foundational in harmonic analysis [19,20], reveal deep connections between sparsity, localization, and regularity, further explored in the context of Fourier approximation [21,22].
Santos and Sales [23], introduces the Hypermodular Neural Operators with Hyperbolic Symmetry (ONHSH), a framework that integrates hyperbolic activations, modular spectral damping, and curvature-sensitive kernels. ONHSH achieves minimax-optimal approximation rates in anisotropic Besov and Triebel–Lizorkin spaces, supported by explicit Voronovskaya-type expansions and quantitative remainder bounds. At its theoretical core, the Ramanujan–Santos–Sales Hypermodular Operator formalizes spectral bias–variance trade-offs under directional smoothness, while noncommutative Chern characters provide a spectral–topological interpretation. Applications to thermal diffusion confirm the robustness of the method on curved and modular domains, positioning ONHSH as a mathematically principled and geometrically adaptive paradigm for neural operator learning.
Within this mathematical setting, this article proposes the Hypermodular Neural Operators with Hyperbolic Symmetry (ONHSH), a novel operator learning framework that integrates directional hyperbolic activations, modular damping, and curvature-aware density functions. The design is informed by recent advances in approximation theory on spheres and balls [24], as well as insights from noncommutative geometry [25] and index theory [26].
We demonstrate that ONHSH operators attain minimax-optimal convergence in anisotropic Besov norms, offer high-order Voronovskaya-type expansions, and admit a spectral bias–variance decomposition framed by noncommutative Chern characters. Finally, we incorporate statistical estimation tools inspired by nonparametric theory [27] to quantify approximation uncertainty in highly anisotropic or modular regimes.
  • Main Contributions:
  • We introduce a Hypermodular Neural Operators with Hyperbolic Symmetry (ONHSH) framework that coherently integrates hyperbolic activations, arithmetic-informed spectral damping, and curvature-sensitive kernels, enabling PDE operator learning on anisotropic, curved, and modularly structured domains.
  • We establish minimax-optimal approximation rates in weighted anisotropic Besov and Triebel–Lizorkin spaces, supported by explicit Voronovskaya-type expansions and quantitative remainder bounds. At the theoretical core lies the Ramanujan–Santos–Sales Hypermodular Operator Theorem, which formalizes the convergence rates and spectral bias–variance trade-offs for neural operators under directional smoothness.
  • We demonstrate that operator spectral variance admits a natural interpretation via noncommutative Chern characters, creating a rigorous bridge between functional approximation, spectral asymptotics, and arithmetic topology.
Overall, this work develops a mathematically principled, geometrically adaptive, and spectrally structured framework for neural operator learning. By unifying harmonic analysis, approximation theory, and noncommutative geometry through the Ramanujan–Santos–Sales Hypermodular Operator Theorem, our approach advances the capacity to solve PDEs on domains that are complex, curved, or enriched with modular and number-theoretic structure.

1.1. Research Scope and Methodological Positioning

This work advances the field of neural operator learning by introducing a mathematically rigorous and geometrically informed framework: the Hypermodular Neural Operators with Hyperbolic Symmetry (ONHSH). While established architectures such as FNO [1], DeepONet [2], and their variants have shown impressive performance in learning PDE-driven mappings, they are predominantly tailored to Euclidean domains and typically rely on assumptions of isotropic smoothness, uniform spectral structure, and unstructured feature representations.
ONHSH departs from these assumptions by addressing the following three fundamental limitations of prior approaches:
  • Geometric Adaptivity: Moving beyond models confined to flat or mildly deformed Euclidean settings [4,5], ONHSH employs curvature-sensitive kernels that adapt to hyperbolic and anisotropic manifolds. This design is motivated by functional spaces on spheres and balls [24] and enriched by tools from spectral geometry [25].
  • Spectral Modularity: By embedding modular arithmetic into the spectral filtering process, ONHSH captures oscillatory dynamics and aliasing effects that classical FNO variants [13,15] cannot fully represent. The modular structure also enables arithmetic-informed spectral damping aligned with underlying physical constraints.
  • Function-Space Theoretic Rigor: ONHSH is firmly grounded in the approximation theory of anisotropic and mixed-smoothness function spaces, notably Besov and Triebel–Lizorkin classes [16,19]. At the core of this framework lies the Ramanujan–Santos–Sales Hypermodular Operator Theorem, which establishes minimax-optimal convergence rates and formalizes the spectral bias–variance trade-off for neural operators under directional smoothness. This provides a principled bridge between neural operator design and harmonic analysis [17,22,23].
Methodologically, this work synthesizes neural operator design with analytic techniques from approximation theory, spectral geometry, and noncommutative topology. It further introduces spectral decompositions inspired by Chern characters, drawing from index theory [26], alongside statistical estimators rooted in nonparametric analysis [27]. Through this integration, ONHSH extends both the interpretability and applicability of operator learning to settings characterized by intrinsic curvature, modular structure, and mixed anisotropy.

1.2. Conceptual Diagram of the ONHSH Architecture

To illustrate the interaction between geometric regularization, spectral modularity, and functional approximation, we present a schematic view of the ONHSH operator pipeline, Figure 1. The architecture integrates several processing stages, hyperbolic kernel convolution, symmetrized activation, modular spectral filtering, and spectral synthesis, into a unified flow for operator learning.
Each stage is designed to preserve or exploit a structural property essential to PDE-driven mappings as follows:
  • Curved kernels control spatial localization and capture anisotropic geometry.
  • Symmetrized activations enforce hyperbolic symmetry and enhance stability under sign changes.
  • Modular spectral filters introduce arithmetic-informed damping, regulating oscillations and aliasing effects.
  • Spectral transforms restore global coherence and ensure compatibility with harmonic analysis on curved domains.
Together, these components define an expressive operator capable of learning from domains with directional smoothness, modular arithmetic structure, and non-Euclidean geometry. The full computational procedure implementing the operator is summarized in Algorithm 1.

2. Mathematical Foundations

This section establishes the rigorous mathematical framework underpinning the proposed Hypermodular Neural Operators with Hyperbolic Symmetry (ONHSH). We develop the theory of anisotropic function spaces, directional smoothness measures, and spectral multipliers with modular damping. These elements collectively provide the analytical basis for the approximation-theoretic and symmetry-invariance properties derived in subsequent sections.

2.1. Anisotropic Besov Spaces

Definition 1
(Anisotropic Besov Space). Let f : R d R be a measurable function, and let s = ( s 1 , , s d ) ( 0 , ) d be a vector of anisotropic smoothness parameters. For  1 p , q , the anisotropic Besov space B p , q s ( R d ) is defined as the set of functions f L p ( R d ) such that
f B p , q s ( R d ) : = f L p ( R d ) + j = 1 d 0 1 t s j ω r , j p ( f , t ) q d t t 1 / q < ,
with the usual modification by replacing the q -norm with the supremum when q = . Here, the quantity ω r , j p ( f , t ) denotes the directional modulus of smoothness of order r N in the direction of the j-th canonical basis vector e j , defined by
ω r , j p ( f , t ) : = sup | h | t Δ h r , j f L p ( R d ) ,
where Δ h r , j f is the iterated finite difference operator in the direction e j , given by
Δ h r , j f ( x ) : = k = 0 r ( 1 ) r k r k f ( x + k h e j ) .

2.1.1. Interpretation

The space B p , q s ( R d ) encodes directionally heterogeneous regularity, where smoothness s j governs behavior along the x j -axis. This anisotropy is natural for phenomena exhibiting preferential directions, such as, stratified turbulence, transport-dominated systems, and edge singularities in hyperbolic PDEs. The norm, Equation (1), balances global integrability against directional smoothness via:
  • Deficit quantification: t s j ω r , j p ( f , t ) measures local x j -directional irregularity,
  • Scale sensitivity: Integration over t ( 0 , 1 ) captures decay of smoothness deficits at fine scales,
  • Directional synthesis: Summation over j aggregates mixed smoothness.

2.1.2. Functional Analytic Properties

The norm, Equation (1), blends local L p -integrability with directional regularity through the moduli ω r , j p ( f , t ) , reflecting Hölder-like decay in each direction. Specifically:
  • The factor t s j ω r , j p ( f , t ) quantifies the smoothness deficit in direction x j ;
  • The integration in t ( 0 , 1 ) assesses the rate of regularity decay at small scales;
  • The summation across j = 1 , , d aggregates the total mixed smoothness.

2.2. Norm Equivalence via K-Functionals

The directional modulus links to approximation-theoretic functionals through the following equivalence:
Proposition 1
(K-Functional Characterization). Let r > max j s j . For each direction j, define the Peetre K-functional
K j ( f , t r ; L p , W j r , p ) : = inf g L p D j r g L p f g L p + t r D j r g L p ,
where W j r , p ( R d ) is the Sobolev space with r-th weak derivative existing in L p along x j . Then,
c 1 ω r , j p ( f , t ) K j ( f , t r ; L p , W j r , p ) c 2 ω r , j p ( f , t ) , t > 0 ,
for constants c 1 , c 2 > 0 depending only on r and d. Consequently, the Besov norm in Equation (1) satisfies
f B p , q s f L p + j = 1 d t s j K j ( f , t r ; L p , W j r , p ) L q ( ( 0 , 1 ) , d t / t ) 1 / q .
Proof. 
The upper bound in Equation (5), follows by taking g as a mollified approximation of f and estimating D j r g L p via Young’s inequality for convolutions. The lower bound uses the Marchaud inequality: For 0 < t < 1 ,
ω r , j p ( f , t ) C t r t 1 u r 1 ω r , j p ( f , u ) d u ,
applied to the difference f g . Full details, see more in [19].    □

2.3. Characterization by Smoothness Moduli

Motivation. The Theorem 1, provides an intrinsic characterization of anisotropic Besov regularity in terms of directional moduli of smoothness. Intuitively, the theorem shows that the global regularity encoded by the Besov norm can be detected and measured direction-by-direction through suitable finite-difference operators. The analytical framework involved here anisotropic Littlewood–Paley decompositions, Marchaud-type inequalities, and K-functional estimates is classical and well established in the literature on anisotropic function spaces.
Novelty within this work. The originality of the present formulation lies in organizing these classical characterizations in a way that is directly aligned with the operator setting developed later in the paper. In particular, we explicitly track the dependence on the anisotropy vector s = ( s 1 , , s d ) and on the smoothness order r, since both parameters play a structural role in the spectral estimates, stability bounds, and compressibility properties of the hypermodular operators introduced in later sections. This makes the theorem a key foundational component for the analysis that follows.
Theorem 1
(Directional characterization of anisotropic Besov regularity). The following statements are equivalent for a measurable function f and parameters s = ( s 1 , , s d ) , 1 p , q , and an integer r > max j s j :
(a) 
f B p , q s ( R d ) .
(b) 
For each j = 1 , , d the directional modulus satisfies
0 1 t s j ω r , j ( f , t ) p q d t t 1 / q < ,
where ω r , j ( f , t ) p denotes the r-th order directional modulus of smoothness in the j-th coordinate.
(c) 
The anisotropic Littlewood–Paley projections { Δ k ( j ) } k 0 satisfy the discrete norm condition
k 0 2 k s j q Δ k ( j ) f L p q 1 / q < ,
for each j = 1 , , d , and the collection of these directional estimates yields the Besov norm finiteness.
Proof. 
We prove the cycle of implications ( a ) ( b ) ( c ) ( a ) using a consistent referencing of the conditions as (a), (b), and (c).
(a) ⇒ (b). Assume (a). Using the anisotropic Littlewood–Paley decomposition and standard Bernstein and Marchaud inequalities, for each direction j one obtains the estimate
ω r , j ( f , t ) p k : 2 k t 2 k s j · 2 k s j Δ k ( j ) f L p .
Integrating Equation (9) in t yields the finiteness of Equation (7).
(b) ⇒ (c). Let f satisfy (b). Standard discretization of moduli of smoothness gives
0 1 t s j ω r , j ( f , t ) p q d t t k 0 2 k s j ω r , j ( f , 2 k ) p q .
Classical dyadic comparison yields
ω r , j ( f , 2 k ) p Δ k ( j ) f L p .
Combining Equations (10) and (11) yields Equation (8).
(c) ⇒ (a). Assume (c). Summing the dyadic projections reconstructs f, and the Besov norm finiteness follows:
f B p , q s j = 1 d k 0 2 k s j q Δ k ( j ) f L p q 1 / q ,
which is finite by hypothesis. Hence (a) holds.
This completes the proof.    □

2.4. Characterization via Directional Smoothness Moduli

Motivation. Theorem 2, provides equivalent formulations of membership in anisotropic Besov spaces in terms of the directional decay of moduli of smoothness. Such characterizations are especially useful when studying functions or signals whose regularity is not uniform across different coordinate directions. The proof relies on classical tools including Peetre’s K-functional estimates, anisotropic Bernstein-type inequalities, and vector-valued Calderón–Zygmund theory. The functional-analytic notation and norm conventions used throughout this section follow the conventions summarized in Appendix A.1.
Role and novelty within this work. The equivalence itself is well known in the literature; however, in this work we adapt it to a directional decomposition that is specifically compatible with the hypermodular operator family T λ , q . In particular, the formulation is presented so as to retain only those hypotheses on s, p, q, and r that remain stable under the action of these operators. This refinement is crucial for the compressibility, approximation, and convergence results developed in subsequent sections.
Theorem 2
(Isomorphism Between Moduli Decay and Besov Spaces). Let r > max j s j , p [ 1 , ] , q [ 1 , ] , and  s = ( s 1 , , s d ) ( 0 , ) d . The following statements are equivalent:
(i) 
f B p , q s ( R d ) .
(ii) 
f L p + j = 1 d 0 1 t s j ω r , j p ( f , t ) q d t t 1 / q < .
(iii) 
j { 1 , , d } , ω r , j p ( f , t ) C j t s j φ j ( t ) , where 0 1 t s j ω r , j p ( f , t ) q d t t < and φ j ( t ) 0 as t 0 + .
(iv) 
sup t > 0 t s j ω r , j p ( f , t ) < for each j and lim t 0 + t s j ω r , j p ( f , t ) = 0 .
Moreover, the functional appearing in (ii),
F ( f ) : = f L p + j = 1 d 0 1 t s j ω r , j p ( f , t ) q d t t 1 / q ,
is equivalent to the Besov norm f B p , q s . Precisely, there exist constants c 1 , c 2 > 0 (depending only on d , p , q , s , r ) such that, for all f,
c 1 f B p , q s F ( f ) c 2 f B p , q s .
Finally, the decay conditions in (iii) and (iv) are sharp.
Proof. 
(i)(ii). This implication follows directly from the definition of the anisotropic Besov norm and the directional modulus characterization: the Besov norm controls the L p -term and each directional integral in (ii).
(ii)(iii). From the finiteness of the integral in (ii) we obtain the pointwise bound ω r , j p ( f , t ) C j t s j for small t (by local integrability). To deduce that φ j ( t ) 0 , note that
lim ε 0 + ε 1 t s j ω r , j p ( f , t ) q d t t = 0 ,
and hence t s j ω r , j p ( f , t ) 0 as t 0 + in the L q ( ( 0 , 1 ) , d t / t ) -sense; standard dyadic decomposition arguments then yield the existence of a function φ j ( t ) 0 with the stated bound.
(iii)(iv). The uniform bound in (iv) follows from continuity of the directional modulus on compact t-intervals [ δ , 1 ] together with the small-t control given by (iii). The limit statement in (iv) is immediate from φ j ( t ) 0 .
(iv)(i). This is the core part. Using a dyadic decomposition adapted to the anisotropy, define the directional projections
Δ j ( k ) f : = ϕ j ( k ) f , ϕ j ( k ) ^ ( ξ ) = ψ j ( 2 k s j ξ j ) ,
with ψ j a smooth cutoff. Bernstein’s inequality for anisotropic spectra yields, for each k,
D j r Δ j ( k ) f L p C 2 k r s j Δ j ( k ) f L p ,
where D j r denotes the r-th order derivative in the x j -direction,
D j r f = r f x j r .
Since each directional Littlewood–Paley piece Δ j ( k ) f is spectrally localized in a band where | ξ j | 2 k s j , differentiation in the x j direction corresponds to multiplication by ( ξ j ) r in the Fourier domain, resulting in the factor 2 k r s j . This yields the anisotropic Bernstein estimate above, which follows from standard Bernstein theory in the anisotropic setting; see more in [16,19].
Moreover, using the telescoping approximation S N = k = 0 N Δ j ( k ) , one has
f S N f L p k = N + 1 Δ j ( k ) f L p C ω r , j p ( f , 2 N s j ) ,
which is the standard approximation estimate relating directional moduli of smoothness and Littlewood–Paley tails.
The corresponding reverse inequality is provided by the directional Marchaud estimate:
t s j ω r , j p ( f , t ) C s j t 1 u s j ω r , j p ( f , u ) d u u + f L p ,
establishing control of the modulus in terms of its integral averages.
Finally, combining these ingredients with the discrete Littlewood–Paley characterization
f B p , q s f L p + j = 1 d k = 0 ( 2 k s j Δ j ( k ) f L p ) q 1 / q ,
we deduce that the hypothesis in (iv) ensures finiteness of the right-hand side of Equation (19), and therefore f B p , q s .
Equivalence of norms. The norm equivalence in Equation (13) follows by combining the integral functional with the discrete representation in Equation (19), together with the Bernstein and Marchaud estimates above. This yields constants c 1 , c 2 > 0 independent of f, giving the desired two-sided bound.
Sharpness. The decay conditions in (iii) and (iv) are sharp. If  r s j , one may construct counterexamples using lacunary Fourier series supported along the coordinate direction e j . If the factor φ j ( t ) 0 does not hold, failure of convergence is exhibited by functions of the form
f j ( x ) = | x j | s j log | x j | γ , γ < 1 / q ,
as discussed in [16].    □
Motivation. Theorem 3 is the anisotropic analog of the classical Besov-to-Hölder embedding. When each directional smoothness exponent exceeds the Sobolev threshold 1 / p , the function not only becomes continuous but satisfies a Hölder condition with exponent determined by the smallest directional surplus of regularity. The underlying argument is standard: it combines the anisotropic Littlewood–Paley decomposition with Bernstein inequalities adjusted to the directional scaling.
Novelty within this work. The present formulation highlights the Hölder exponent explicitly in terms of the anisotropy vector s, which is essential for the uniform error bounds and stability estimates developed later for the operator representations. While the embedding itself is classical, its directional explicitness and its integration into the subsequent operator analysis constitute the operational relevance of this theorem in our framework.
Theorem 3
(Anisotropic Embedding into Hölder-Continuous Functions). Let d N , 1 p < , 1 q , and  s = ( s 1 , , s d ) ( 0 , ) d satisfy the critical anisotropy condition:
min 1 j d s j > 1 p .
Then, the anisotropic Besov space B p , q s ( R d ) embeds continuously into the space of bounded, uniformly Hölder-continuous functions:
B p , q s ( R d ) C b 0 ( R d ) Lip ( α ; L ( R d ) ) , α : = min j s j 1 p .
Moreover, there exists a constant C > 0 , depending only on d , p , q , s , such that
(22) f L C f B p , q s , (23) ω ( f , δ ) : = sup | h | δ f ( · + h ) f L C δ α f B p , q s , δ > 0 .
Proof. 
We employ anisotropic Littlewood-Paley theory. Let ψ k ( j ) be anisotropic frequency projections satisfying
supp ψ k ( j ) ^ { ξ R d : 2 k 1 | ξ j | 2 k + 1 } .
Then, f B p , q s ( R d ) admits the decomposition
f = j = 1 d k = 0 ψ k ( j ) f , f B p , q s f L p + j = 1 d k = 0 2 k s j ψ k ( j ) f L p q 1 / q .
Applying the anisotropic Bernstein inequality,
ψ k ( j ) f L C 2 k / p ψ k ( j ) f L p ,
we obtain:
f L j = 1 d k = 0 ψ k ( j ) f L C j = 1 d k = 0 2 k / p ψ k ( j ) f L p = C j = 1 d k = 0 2 k s j ψ k ( j ) f L p · 2 k ( s j 1 / p ) .
For β j : = s j 1 / p > 0 , this weighted sum is controlled via Hölder’s inequality, yielding Equation (22).
For | h | δ , write
| f ( x + h ) f ( x ) | j = 1 d k = 0 ψ k ( j ) f ( x + h ) ψ k ( j ) f ( x ) .
Using smoothness of ψ k ( j ) and Bernstein’s inequality,
ψ k ( j ) f ( x + h ) ψ k ( j ) f ( x ) | h | · ( ψ k ( j ) f ) L C | h | 2 k ( 1 + 1 / p ) ψ k ( j ) f L p .
Summing over k, we obtain
f ( · + h ) f L C | h | j = 1 d k = 0 2 k ( 1 + 1 / p ) ψ k ( j ) f L p .
Define γ j : = s j 1 / p 1 > 0 , then
k = 0 2 k ( 1 + 1 / p ) ψ k ( j ) f L p = k = 0 2 k s j ψ k ( j ) f L p · 2 k γ j .
This sum converges and yields the Hölder estimate in Equation (23).
Define
f 0 ( x ) : = j = 1 d | x j | s j 1 / p χ [ 1 , 1 ] ( x j ) ,
which satisfies
f 0 B p , q s < , | f 0 ( 0 ) f 0 ( h e j ) | = | h | s j 1 / p .
This confirms the optimality of the exponent α = min j ( s j 1 / p ) .    □

3. Anisotropic Embedding Theorems

Motivation. The Theorem 4, extends the anisotropic Besov–to–Hölder embedding to bounded Lipschitz domains. The underlying idea is classical: one employs a bounded linear extension operator for anisotropic Besov spaces, reducing the problem to the whole-space case previously established. As a result, any function in B p , q s ( Ω ) is uniformly Hölder continuous on Ω ¯ provided that each directional smoothness index satisfies s j > 1 / p .
Novelty within this work. The role of this embedding in the present paper is primarily methodological. The uniform Hölder continuity obtained here is essential for constructing and controlling local patchwise representations and for establishing stable discretization procedures for the hypermodular operators analyzed later. Although the embedding itself is classical, its explicit anisotropic formulation and its integration into the operator framework developed in subsequent sections make it a crucial enabling step in the overall analysis.
Theorem 4
(Anisotropic Embedding on Bounded Lipschitz Domains). Let Ω R d be a bounded Lipschitz domain. Suppose 1 p < , 1 q , and let the anisotropic smoothness vector s = ( s 1 , , s d ) ( 0 , ) d satisfy
s j > 1 p , j = 1 , , d .
Then the anisotropic Besov space B p , q s ( Ω ) embeds continuously into the space of continuous functions on the closure:
B p , q s ( Ω ) C 0 ( Ω ¯ ) ,
i.e., there exists a constant C = C ( d , p , q , s , Ω ) > 0 such that
f C 0 ( Ω ¯ ) C f B p , q s ( Ω ) , f B p , q s ( Ω ) .
Proof. 
The proof proceeds in four stages: extension, global embedding, continuity transfer, and sharp estimate.
1. 
Existence of Extension Operator.
Since Ω is a bounded Lipschitz domain, by a result of Triebel [16], there exists a continuous linear extension operator:
E : B p , q s ( Ω ) B p , q s ( R d ) ,
such that:
(38) E f | Ω = f a . e . in Ω , (39) E f B p , q s ( R d ) C 1 f B p , q s ( Ω ) .
2. 
Global Embedding into Continuous Functions.
Under Equation (34), each coordinate-direction smoothness s j satisfies s j > 1 / p . By the anisotropic version of the classical Sobolev embedding (cf. [16]), we have the continuous embedding:
B p , q s ( R d ) C b ( R d ) ,
with
g L ( R d ) C 2 g B p , q s ( R d ) g B p , q s ( R d ) .
Furthermore, functions in B p , q s ( R d ) under Equation (34) admit unique continuous representatives.
3. 
Continuity Transfer via Extension.
Given f B p , q s ( Ω ) , let g : = E f B p , q s ( R d ) . By Equation (40), g C b ( R d ) , and since g | Ω = f almost everywhere, f inherits continuity in Ω . As  Ω is bounded and Lipschitz, the uniform continuity of g on compact sets implies that f extends uniquely to a continuous function on Ω ¯ . Hence,
f C 0 ( Ω ¯ ) and f C 0 ( Ω ¯ ) = sup x Ω ¯ | f ( x ) | g L ( R d ) .
4. 
Final Estimate.
Let f B p , q s ( Ω ) , and consider its extension g : = E f to R d , provided by the existence of a bounded linear extension operator E : B p , q s ( Ω ) B p , q s ( R d ) . By construction, g coincides with f almost everywhere on Ω , and the Besov norm of g on the whole space is controlled by
g B p , q s ( R d ) C 1 f B p , q s ( Ω ) ,
for some constant C 1 > 0 depending on Ω , d, p, q, and  s .
In addition, since s j > 1 / p for all j = 1 , , d , the anisotropic Besov space B p , q s ( R d ) embeds continuously into the space of bounded continuous functions, and hence
g L ( R d ) C 2 g B p , q s ( R d ) ,
for some constant C 2 > 0 .
Now, since g is continuous on R d and agrees with f almost everywhere on Ω , it follows that f admits a unique continuous representative on Ω , and this representative extends continuously to the closure Ω ¯ . Therefore, we have the pointwise control
f C 0 ( Ω ¯ ) g L ( R d ) .
Combining Equations (43)–(45), we obtain the final estimate
f C 0 ( Ω ¯ ) C 2 g B p , q s ( R d ) C 2 C 1 f B p , q s ( Ω ) .
Setting C : = C 1 C 2 , we conclude the desired inequality
f C 0 ( Ω ¯ ) C f B p , q s ( Ω ) ,
which establishes the continuity of the embedding.    □
Remark 1
(Necessity of the Conditions).
  • Sharpness of Equation (34): If s j 1 / p for some j, then the univariate Sobolev embedding fails in that coordinate. Consider the example f ( x ) = j = 1 d h ( x j ) , where h ( t ) = | t | α η ( t ) , α < s j , and  η C c ( R ) . Then f B p , q s ( Ω ) , but  f C 0 ( Ω ¯ ) due to the local singularity at 0.
  • Necessity of Lipschitz Boundary: For non-Lipschitz domains, such as domains with outward cusps or fractal boundaries, no universal bounded extension operator exists for anisotropic Besov spaces. In such settings, the geometry of Ω may obstruct the preservation of local moduli of smoothness under extension.

Compactness of the Anisotropic Embedding

Theorem 5.
Let Ω R d be a bounded Lipschitz domain, and let s = ( s 1 , , s d ) ( 0 , 1 ) d , 1 p , q < . Suppose that
s j > 1 p , for all j = 1 , , d .
Then the embedding
B p , q s ( Ω ) C 0 ( Ω ¯ )
is compact.
Proof. 
Since Ω is Lipschitz, there exists a bounded extension operator
E : B p , q s ( Ω ) B p , q s ( R d ) ,
such that for g : = E f ,
g B p , q s ( R d ) C f B p , q s ( Ω ) .
See Triebel [16].
The anisotropic embedding
B p , q s ( R d ) C b 0 ( R d )
holds under Equation (48); see Triebel [16] and Runst–Sickel [19].
To obtain compactness, we apply the Kolmogorov–Riesz–Fréchet theorem using the characterization of Besov spaces via directional moduli of smoothness. The compactness of embeddings on bounded supports is stated in Triebel [16] and Bennett–Sharpley [28].
Let { f k } B p , q s ( Ω ) be bounded. Then g k : = E f k are uniformly bounded in B p , q s ( R d ) and supported in a fixed compact set K. The modulus of continuity estimates implied by Equation (48) guarantee equicontinuity of { g k } . By Arzelà–Ascoli,
g k j g in C 0 ( K ) .
Restricting to Ω ¯ yields uniform convergence of a subsequence in C 0 ( Ω ¯ ) , establishing compactness of Equation (49).    □
Remark 2.
The condition s j > 1 p for all j is sharp. If for some j 0 ,
s j 0 = 1 p , s j > 1 p ( j j 0 ) ,
the embedding may fail to be compact.
Counterexample (Critical Case). Let ϕ C c ( Ω ) and define
f k ( x ) : = ϕ ( x ) cos ( 2 k x j 0 ) .
We claim:
f k B p , q s ( Ω ) C uniformly in k .
Justification of Equation (52). Oscillations occur only in the x j 0 -direction. For j j 0 , smoothness comes entirely from ϕ , so
ω r , j p ( f k , t ) C t s j , s j > 1 p .
For j = j 0 , using
cos ( 2 k ( x + t ) ) cos ( 2 k x ) = O ( 2 k t ) ,
we obtain the critical bound
ω r , j 0 p ( f k , t ) t 1 / p ,
independent of k. Substituting into the Besov norm characterization in Theorem 2 yields Equation (52).
However, no subsequence of f k converges in C 0 ( Ω ¯ ) , since high-frequency oscillations prevent uniform convergence:
sup x Ω | f k ( x ) f m ( x ) | δ > 0 ( k m ) .
Thus the embedding is not compact in the critical case.

4. Anisotropic Besov Embedding on Compact Riemannian Manifolds

Motivation and scope. Theorem 6 formulates the anisotropic Besov–to–Hölder embedding on compact Riemannian manifolds. In the Euclidean setting, the condition s j > 1 / p for all j ensures uniform continuity, as shown in the anisotropic embedding results established earlier. The purpose of the present theorem is to show that this property extends naturally to manifolds, provided we work in local coordinate charts and employ a partition of unity.
Novelty within this work. Although the embedding principle is structurally classical, the novelty here lies in the fact that the embedding is stated and applied in a fully anisotropic form, depending on the directional smoothness vector s = ( s 1 , , s d ) . This is essential for later sections, where operators act differently along coordinate directions and where controlling continuity locally in charts is required for stability under geometric discretizations. The theorem therefore plays a methodological role in enabling the manifold-level operator estimates developed later in the paper.
Theorem 6
(Embedding on Compact Riemannian Manifolds). Let ( M , g ) be a compact d-dimensional Riemannian manifold without boundary. Let s = ( s 1 , , s d ) be an anisotropic smoothness vector and consider the anisotropic Besov space B p , q s ( M ) defined via a finite smooth atlas { ( U α , φ α ) } α A and a subordinate smooth partition of unity { ρ α } α A . If 
s j > 1 p for all j = 1 , , d ,
then the continuous embedding
B p , q s ( M ) C 0 ( M )
holds. That is, every f B p , q s ( M ) admits a unique continuous representative, and the embedding is norm-continuous.
Proof. 
For each chart ( U α , φ α ) , consider the localization of f via the pullback to Euclidean space:
f B p , q s ( U α ) : = ( f φ α 1 ) · ( ρ α φ α 1 ) B p , q s ( R d ) .
Define the global Besov norm on M by summing over all charts:
f B p , q s ( M ) : = α A f B p , q s ( U α ) .
By the assumption s j > 1 / p for all j, the anisotropic Euclidean embedding
B p , q s ( R d ) C 0 ( R d )
holds. This embedding corresponds to the anisotropic Besov–to–Hölder result proved earlier in the Euclidean setting (see Theorem 5 in Section 3). Therefore, for each chart ( U α , φ α ) there exists a constant C α > 0 such that
( f φ α 1 ) · ( ρ α φ α 1 ) C 0 ( R d ) C α ( f φ α 1 ) · ( ρ α φ α 1 ) B p , q s ( R d ) .
By pushing forward, it follows that each localized product f ρ α is continuous on U α . Since α ρ α = 1 on M, one has
f ( x ) = α : x U α ( f ρ α ) ( x ) ,
which expresses f as a finite sum of continuous functions in a neighborhood of each point x M . Hence, f is globally continuous on M.
To control the supremum norm, observe:
f C 0 ( M ) = sup x M α ( f ρ α ) ( x ) α sup x U α | ( f ρ α ) ( x ) | α C α f B p , q s ( U α ) ( by ( 73 ) ) max α C α α f B p , q s ( U α ) = C f B p , q s ( M ) , C : = max α C α · | A | .
Therefore, the embedding is continuous, completing the proof.    □
Remark 3.
In the isotropic case, where s j = s for all j, Equation (53) reduces to
s > d / p ,
which recovers the classical Sobolev–Besov embedding on compact manifolds (see Triebel [16]).

5. Embedding Theorems in Function Spaces

5.1. Embedding on Bounded Lipschitz Domains

Theorem 7
(Embedding on Bounded Lipschitz Domains).  Let Ω R d be a bounded Lipschitz domain, 1 p < , 1 q , and  s = ( s 1 , , s d ) ( 0 , ) d with
s j > 1 p j = 1 , , d .
Then,
B p , q s ( Ω ) C 0 ( Ω ¯ ) ,
i.e.,  C > 0 such that,
f C 0 ( Ω ¯ ) C f B p , q s ( Ω ) , f B p , q s ( Ω ) .
Proof. 
Since Ω is bounded Lipschitz, there exists a linear bounded extension operator E : B p , q s ( Ω ) B p , q s ( R d ) satisfying:
(64) ( E f ) | Ω = f a . e . (65) C 1 > 0 : E f B p , q s ( R d ) C 1 f B p , q s ( Ω )
Equation (61) implies
B p , q s ( R d ) C b ( R d ) L ( R d ) ,
with
g L ( R d ) C 2 g B p , q s ( R d ) g B p , q s ( R d ) .
For f B p , q s ( Ω ) :
f C 0 ( Ω ¯ ) = sup x Ω ¯ | f ( x ) | = sup x Ω ¯ | ( E f ) ( x ) | ( by continuity ) E f L ( R d ) C 2 E f B p , q s ( R d ) C 2 C 1 f B p , q s ( Ω ) .
Thus, C = C 1 C 2 satisfies Equation (63).    □

5.2. Embedding on Compact Riemannian Manifolds

Theorem 8
(Embedding on Compact Manifolds).  Let ( M , g ) be compact d-dimensional Riemannian manifold without boundary. For  B p , q s ( M ) defined via finite atlas { ( U α , φ α ) } and partition of unity { ρ α } , if 
s j > 1 p j = 1 , , d ;
then,
B p , q s ( M ) C 0 ( M ) .
Proof. 
For each chart ( U α , φ α ) , define
f B p , q s ( U α ) : = ( f φ α 1 ) · ( ρ α φ α 1 ) B p , q s ( R d ) .
Global norm:
f B p , q s ( M ) : = α f B p , q s ( U α ) .
By Section 7, C α > 0 :
( f φ α 1 ) · ( ρ α φ α 1 ) C 0 ( R d ) C α f B p , q s ( U α ) .
Thus, f ρ α C 0 ( U α ) .   Since α ρ α = 1 :
f = α f ρ α .
Each f ρ α C 0 ( U α ) , and  M = α U α , so f C 0 ( M ) .
f C 0 ( M ) α f ρ α C 0 ( U α ) α C α f B p , q s ( U α ) ( by ( 73 ) ) max α C α · | A | · f B p , q s ( M ) .
   □

6. Spectral Decay and N-Term Approximation

Motivation. The purpose of this section is to quantify how the anisotropic smoothness of a signal f affects the compressibility of its representation under the hypermodular operator T λ , q . The results below are new and rely on the directional Besov characterizations established earlier.
Proposition 2
(Spectral decay of T λ , q ). Let f B p , q s ( R d ) with s = ( s 1 , , s d ) and s j > 0 . Then the spectrum of T λ , q f in the anisotropic Littlewood–Paley basis satisfies
Δ k ( j ) ( T λ , q f ) L p σ λ , k ( j ) 2 k s j f B p , q s ,
where σ λ , k ( j ) is the geometric decay factor associated with scale k in direction j.
Proof. 
This follows by combining the directional Littlewood–Paley estimate from Section 3 with the scale-dependent multiplier bounds proved in Section 5. The decay factor σ λ , k ( j ) reflects the anisotropic contraction structure of T λ , q .    □

6.1. Nonlinear Approximation via Directional Spectral Decay

Theorem 9
(N-term approximation rate). Let f B p , q s ( R d ) , where s = ( s 1 , , s d ) with s j > 0 . Let T λ , q denote the symmetrized hyperbolic neural operator defined in Section 7. Then the best N-term approximation error of T λ , q f in L p satisfies
σ N ( T λ , q f ) L p N min j s j / d f B p , q s .
In particular, higher directional smoothness yields faster decay of the nonlinear approximation error.
Proof. 
The directional Littlewood–Paley decomposition satisfies
Δ j ( k ) T λ , q f L p 2 k s j f B p , q s .
Thus, the coefficients of T λ , q f in any unconditional wavelet basis decay at rate 2 k s j in the j-th direction.
Ordering coefficients by decreasing magnitude and applying the nonlinear approximation theorem for anisotropic Besov spaces (see [16]) yields)
σ N ( T λ , q f ) L p j = 1 d k > log 2 N 1 / d ( 2 k s j Δ j ( k ) T λ , q f L p ) q 1 / q N min j s j / d f B p , q s ,
which is the desired estimate.    □
Remark 4.
The key point is that smoothness is measured independently in each coordinate direction through directional frequency bands: nonlinear approximation selects dominant orientations automatically.
Note. Any properties of the directional modulus of smoothness used in this section follow from standard anisotropic approximation theory; see, for example, Triebel [16].

6.2. Modular Spectral Multipliers and Asymptotic Stability

Theorem 10
(Asymptotic expansion of modular multipliers). Let T n be the spectral multiplier operator defined by
T n f : = F 1 [ m n f ^ ] , m n ( ξ ) = k Z d q n k 2 χ k ( ξ ) , q n = e π n 1 / 2 .
Then, for  f B p , q s ( R d ) ,
T n f f in L p ( R d ) ,
and, moreover, there exists a differential operator L of order min j ( 2 s j ) such that
T n f f = n 1 / 2 L ( f ) + o ( n 1 / 2 ) in L p ,
as n .
Proof. 
Since q n 1 , we have m n ( ξ ) 1 pointwise. The partition of unity { χ k } ensures uniform control on derivatives of m n , hence m n 1 in the multiplier sense. Thus,
T n f f L p = F 1 [ ( m n 1 ) f ^ ] L p 0 ,
proving Equation (78).
To derive the asymptotic expansion, expand q n k 2 :
q n k 2 = e π k 2 n 1 / 2 = 1 π n 1 / 2 k 2 + o ( n 1 / 2 ) .
Substituting into m n yields
m n ( ξ ) = 1 n 1 / 2 π k Z d k 2 χ k ( ξ ) + o ( n 1 / 2 ) .
The term
π k k 2 χ k ( ξ ) ,
is a smooth, positive symbol of order 2, defining a differential operator L of the same order. Applying F 1 yields Equation (79).    □
Theorem 11
(Spectral Localization and Decay Estimate). Let f B p , q s , τ ( R d ) , with  s ( 0 , ) d , 1 p < , and  1 q . Then there exist constants C , c > 0 , depending only on ( p , q , s , d ) , such that for all n N  
T n ( f ) L p ( R d ) C · e c n 1 / 4 · f B p , q s , τ ( R d ) .
Proof. 
We begin by decomposing f using an anisotropic dyadic Littlewood–Paley decomposition { ψ k ( j ) } , adapted to the smoothness vector s . Define the localized components
f k : = F 1 [ χ k f ^ ] , so that T n ( f ) = k Z d q n k 2 f k .
Using Minkowski’s inequality and the disjointness of frequency supports, we estimate
T n ( f ) L p k Z d q n k 2 · f k L p .
Now fix a threshold K ( n ) : = n 1 / 4 , and split the sum
T n ( f ) L p k K ( n ) q n k 2 f k L p + k > K ( n ) q n k 2 f k L p .
For k > K ( n ) , note that k 2 n 1 / 2 , so that
q n k 2 e c k 2 n 1 / 2 e c n .
On the other hand, for  k K ( n ) , the number of such k is bounded by C d n d / 4 . Also, since f B p , q s , τ , the components f k satisfy
f k L p C s · 2 k j s j · f B p , q s , τ ,
for each anisotropic scale j, due to the smoothness envelope and the finite overlap of the frequency partitions.
Thus, the contribution of low-frequency modes (first sum in Equation (83)) is bounded by
k K ( n ) q n k 2 f k L p C n d / 4 · f B p , q s , τ .
The high-frequency contribution satisfies
k > K ( n ) q n k 2 f k L p f B p , q s , τ · k > K ( n ) e c k 2 n 1 / 2 2 k j s j ,
which decays faster than any polynomial in n, i.e., super-exponentially in n . Hence, combining Equations (86) and (87), we obtain
T n ( f ) L p C e c n 1 / 4 f B p , q s , τ ,
which proves the claim.    □
  • Implications and Phase-Space Compactness.
The exponential decay of T n ( f ) L p with respect to n implies that the operator family { T n } n N forms a compact sequence in L p ( R d ) , vanishing in norm as n . From a microlocal analysis perspective, this corresponds to simultaneous concentration in both physical and Fourier domains, i.e., phase-space localization.
This dual localization has the following significant implications in applications:
  • In PDE approximation, it guarantees that the learned neural operator retains control over the resolution scale while avoiding amplification of high-frequency noise;
  • In inverse problems, the compactness provides natural regularization, mitigating instability associated with ill-posedness;
  • In neural architectures, it supports sparse parameterization and efficient training, especially in anisotropic or non-Euclidean domains.
These properties are particularly relevant when hypermodular operators are used as building blocks for deep neural surrogates of physical systems, enabling provable generalization and robustness under spectral perturbations.

7. Symmetrized Hyperbolic Activation Kernels Hypermodular Operator

This section introduces the symmetrized hyperbolic activation kernel that defines the nonlinear component of the hypermodular operator architecture. The revision corrects the earlier misidentification of the kernel function and eliminates the use of ill-posed principal-value integrals. All statements below are mathematically rigorous and based on absolutely convergent integrals.
Definition 2
(Symmetrized Hypermodular Kernel). Let λ > 0 and q ( 0 , 1 ) . Define
g q , λ ( x ) = tanh λ x 1 2 ln q ; M q , λ ( x ) = 1 4 g q , λ ( x + 1 ) g q , λ ( x 1 ) ,
and set the symmetrized kernel
ψ λ , q ( x ) : = 1 2 M q , λ ( x ) + M q 1 , λ ( x ) .
Theorem 12
(Analytic and Structural Properties). The kernel ψ λ , q satisfies the following:
(i) 
Even symmetry:  ψ λ , q ( x ) = ψ λ , q ( x ) for all x R ;
(ii) 
Strict positivity:  ψ λ , q ( x ) > 0 for all x R ;
(iii) 
Smoothness and exponential decay:  ψ λ , q C ( R ) and there exist constants C λ , q , α > 0 such that | ψ λ , q ( x ) | C λ , q e α | x | ;
(iv) 
Moment structure: for all integers m 0 ,
μ 2 m = R x 2 m ψ λ , q ( x ) d x < , R x 2 m + 1 ψ λ , q ( x ) d x = 0 ;
(v) 
Spectral regularity: the Fourier transform ψ ^ λ , q decays faster than any polynomial, i.e., for every N N there exists C N , λ , q > 0 such that | ψ ^ λ , q ( ξ ) | C N , λ , q ( 1 + | ξ | ) N .
Proof. 
Properties (i)–(ii) follow directly from the definition of Equation (88). Since
g q , λ ( x ) = tanh λ x 1 2 ln q
is an odd and strictly increasing function, the central difference
M q , λ ( x ) = 1 4 g q , λ ( x + 1 ) g q , λ ( x 1 ) ,
is even and positive for all x R . Hence, the symmetrization
ψ λ , q ( x ) = 1 2 M q , λ ( x ) + M q 1 , λ ( x ) ,
is also even and strictly positive, establishing (i)–(ii).
For (iii), note that tanh is an analytic function on R with bounded derivatives of all orders,
d k d x k tanh ( λ x ) = λ k P k ( tanh ( λ x ) ) ,
where P k is a polynomial of degree k + 1 with bounded coefficients. Therefore, g q , λ C ( R ) and inherits the same analyticity and boundedness.
Because tanh ( z ) ± 1 exponentially as | z | , there exist constants A λ , B λ > 0 such that
| 1 tanh ( λ x ) | A λ e 2 λ | x | , x R .
From the difference representation of M q , λ , it follows that
| M q , λ ( x ) | C λ , q e 2 λ | x | ,
and the same bound holds for M q 1 , λ ( x ) . Consequently,
| ψ λ , q ( x ) | C λ , q e α | x | ,
for suitable α > 0 , proving the exponential decay and smoothness in (iii).
By exponential decay, ψ λ , q L 1 ( R ) and
R | x | m | ψ λ , q ( x ) | d x < , m N 0 ,
so that all moments exist and are absolutely convergent, giving (iv). The vanishing of all odd moments follows from the even symmetry of ψ λ , q .
Finally, property (v) follows from the Paley–Wiener–Schwartz theorem: If f C ( R ) satisfies
| D k f ( x ) | C k e α | x | , for some α > 0 and all k N 0 ,
then its Fourier transform f ^ extends to an entire function on C and decays faster than any polynomial on R . Since ψ λ , q meets these hypotheses, the stated spectral decay holds.    □
Definition 3
(Integral Operator). For n N and f L loc 1 ( R ) , define
( T n f ) ( x ) : = R ψ λ , q n ( x y ) f ( y ) d y .
Since ψ λ , q L 1 ( R ) , T n is a well-defined bounded linear operator on L p ( R ) for 1 p .
Theorem 13
(Voronovskaya-Type Asymptotic Expansion). Let f C 2 k + 2 ( R ) and ψ λ , q as in Theorem 12. Then the following asymptotic expansion holds:
( T n f ) ( x ) = m = 0 k μ 2 m ( 2 m ) ! n 2 m f ( 2 m ) ( x ) + R n , k ( f ; x ) ,
where the remainder satisfies
| R n , k ( f ; x ) | C n ( 2 k + 2 ) sup | ξ x | δ | f ( 2 k + 2 ) ( ξ ) | ,
for some constants C , δ > 0 depending only on ( λ , q , k ) .
Proof. 
Let f C 2 k + 2 ( R ) and fix x R . A Taylor expansion of f ( y ) about x gives
f ( y ) = m = 0 2 k + 1 f ( m ) ( x ) m ! ( y x ) m + f ( 2 k + 2 ) ( ξ ) ( 2 k + 2 ) ! ( y x ) 2 k + 2 ,
for some ξ between x and y (by the mean value form of the remainder).
Substituting Equation (100) into the definition of the operator
( T n f ) ( x ) = R ψ λ , q n ( x y ) f ( y ) d y ,
and performing the change in variable t = n ( x y ) (so y = x t n and d y = d t n ), we obtain
( T n f ) ( x ) = 1 n R ψ λ , q ( t ) f x t n d t .
Inserting Equation (100) into Equation (102) yields
( T n f ) ( x ) = 1 n m = 0 2 k + 1 ( 1 ) m f ( m ) ( x ) m ! n m R t m ψ λ , q ( t ) d t + R n , k ( f ; x ) ,
where the remainder term is
R n , k ( f ; x ) = 1 n 2 k + 3 R ψ λ , q ( t ) t 2 k + 2 f ( 2 k + 2 ) ( ξ t ) ( 2 k + 2 ) ! d t ,
with ξ t lying between x and x t n .
By Theorem 12 (iv), all odd moments vanish,
R t 2 m + 1 ψ λ , q ( t ) d t = 0 ,
and the even moments μ 2 m = R t 2 m ψ λ , q ( t ) d t are finite. Hence, only even-order terms contribute in Equation (103), giving
( T n f ) ( x ) = m = 0 k μ 2 m ( 2 m ) ! n 2 m f ( 2 m ) ( x ) + R n , k ( f ; x ) .
Finally, the remainder of Equation (104) is estimated using the exponential decay | ψ λ , q ( t ) | C λ , q e α | t | :
| R n , k ( f ; x ) | C λ , q ( 2 k + 2 ) ! n 2 k + 2 R | t | 2 k + 2 e α | t | d t sup | ξ x | δ | f ( 2 k + 2 ) ( ξ ) | C n 2 k + 2 sup | ξ x | δ | f ( 2 k + 2 ) ( ξ ) | .
Combining Equations (105) and (106) establishes the desired asymptotic expansion.    □
Corollary 1
(Uniform Convergence). As a direct consequence of Theorem 12, the sequence ( T n ) n N converges uniformly to the identity operator on compact sets. For every f C 2 ( R ) , one has
lim n T n f f , [ a , b ] = 0
uniformly on each compact interval [ a , b ] R . Moreover, the rate of convergence satisfies
T n f f , [ a , b ] = O ( n 2 ) , n .
Proof. 
Taking k = 0 in Theorem 12, we have
( T n f ) ( x ) = f ( x ) + μ 2 2 n 2 f ( x ) + R n , 0 ( f ; x ) ,
with | R n , 0 ( f ; x ) | C n 4 sup | ξ x | δ | f ( 4 ) ( ξ ) | . Hence T n f f pointwise and, by uniform control of the remainder on compact sets, the convergence is uniform with rate O ( n 2 ) .    □
Remark 5.
Unlike the previous odd kernel Ψ λ , q ( x ) = sinh ( λ x ) / ( cosh ( λ x ) + q ) , the symmetrized kernel ψ λ , q is even, integrable, and possesses all finite moments. Hence, no principal-value or distributional interpretation is needed. All asymptotic expansions and operator estimates now rely entirely on absolutely convergent integrals within the classical framework of approximation theory.

8. Asymptotic Expansion of the Approximation Operator

We consider a family of linear integral operators T n defined by convolution with a symmetrized activation kernel ψ λ , q C ( R ) , rapidly decaying and possessing specific moment properties. For a function f : R R , we define
( T n f ) ( x ) : = R ψ λ , q n ( x y ) f ( y ) d y .
Assume that f C 2 k + 2 ( R ) and that all derivatives up to order 2 k + 2 are bounded in a neighborhood of x, with sufficient decay at infinity to ensure integrability. Under these conditions, we can derive a generalized Voronovskaya-type expansion of T n f at scale n .
Theorem 14
(Voronovskaya-Type Asymptotic Expansion). Let f C 2 k + 2 ( R ) , and let ψ λ , q C ( R ) be an odd, rapidly decaying kernel satisfying the following:
  • all odd-order moments vanish: R u 2 m + 1 ψ λ , q ( u ) d u = 0 ;
  • all even-order moments up to 2 k + 2 are finite: μ 2 m : = R u 2 m ψ λ , q ( u ) d u < , for  0 m k + 1 .
Then the following asymptotic expansion holds for all x R :
( T n f ) ( x ) = m = 0 k μ 2 m ( 2 m ) ! n 2 m f ( 2 m ) ( x ) + R n , k ( f ; x ) ,
where the remainder term satisfies the estimate
| R n , k ( f ; x ) | C n 2 k + 2 sup | ξ x | δ | f ( 2 k + 2 ) ( ξ ) | ,
for some constants C > 0 , δ > 0 depending only on k and ψ λ , q .
Proof. 
We begin by applying the change in variable u = n ( x y ) in the definition of T n f , Equation (109),
( T n f ) ( x ) = R ψ λ , q ( n ( x y ) ) f ( y ) d y = 1 n R ψ λ , q ( u ) f x u n d u .
Next, we expand the function f x u n in a Taylor series about x up to order 2 k + 1 , with integral remainder,
f x u n = m = 0 2 k + 1 ( 1 ) m m ! u n m f ( m ) ( x ) + r 2 k + 1 u n ; x ,
where the remainder can be written via the integral form
r 2 k + 1 u n ; x = ( 1 ) 2 k + 2 ( 2 k + 1 ) ! u n 2 k + 2 0 1 ( 1 t ) 2 k + 1 f ( 2 k + 2 ) x t u n d t .
Substituting Equation (113) into Equation (112), we obtain
( T n f ) ( x ) = 1 n R ψ λ , q ( u ) m = 0 2 k + 1 ( 1 ) m m ! u n m f ( m ) ( x ) + r 2 k + 1 u n ; x d u = m = 0 2 k + 1 ( 1 ) m f ( m ) ( x ) m ! n m + 1 R u m ψ λ , q ( u ) d u + 1 n R ψ λ , q ( u ) r 2 k + 1 u n ; x d u .
Due to the oddness of ψ λ , q , all odd moments vanish,
R u 2 m + 1 ψ λ , q ( u ) d u = 0 , m N 0 .
Therefore, only even-order derivatives contribute to the sum.
Denoting μ 2 m : = R u 2 m ψ λ , q ( u ) d u , we obtain
( T n f ) ( x ) = m = 0 k μ 2 m ( 2 m ) ! n 2 m f ( 2 m ) ( x ) + R n , k ( f ; x ) .
where the remainder is defined by
R n , k ( f ; x ) : = 1 n R ψ λ , q ( u ) r 2 k + 1 u n ; x d u .
We now estimate R n , k ( f ; x ) using Equation (114). Since f ( 2 k + 2 ) C ( R ) , it is locally bounded. For  | u | n δ , the argument x t u n lies within δ -neighborhood of x, and we can write
r 2 k + 1 u n ; x | u | 2 k + 2 n 2 k + 2 ( 2 k + 1 ) ! sup | ξ x | δ | f ( 2 k + 2 ) ( ξ ) | .
Then,
| R n , k ( f ; x ) | 1 n 2 k + 3 ( 2 k + 1 ) ! sup | ξ x | δ | f ( 2 k + 2 ) ( ξ ) | R | u | 2 k + 2 | ψ λ , q ( u ) | d u .
Since ψ λ , q is rapidly decaying, the moment R | u | 2 k + 2 | ψ λ , q ( u ) | d u is finite. Therefore, there exists a constant C > 0 such that
| R n , k ( f ; x ) | C n 2 k + 2 sup | ξ x | δ | f ( 2 k + 2 ) ( ξ ) | .
This concludes the proof.    □

Moment Structure and Symmetry Summary

The symmetrized activation kernel ψ λ , q C ( R ) is constructed to satisfy a set of structural properties that play a central role in the asymptotic behavior and approximation capabilities of the associated integral operator. Below, we summarize its key analytical and algebraic features:
  • Odd symmetry. The activation kernel is odd with respect to the origin
    ψ λ , q ( x ) = ψ λ , q ( x ) , x R .
  • Vanishing odd moments. All odd-order moments of the kernel vanish due to its odd symmetry,
    R x 2 m + 1 ψ λ , q ( x ) d x = 0 , m N 0 .
  • Even moments. The even-order moments of the kernel ψ λ , q are given explicitly by
    μ 2 m : = R x 2 m ψ λ , q ( x ) d x = ( 2 m ) ! λ 2 m · 1 + q 2 m 2 · C m .
  • Asymptotic expansion of the integral operator. The operator T n admits the following asymptotic expansion in terms of even derivatives of f:
    ( T n f ) ( x ) = m = 0 k μ 2 m ( 2 m ) ! n 2 m f ( 2 m ) ( x ) + O ( n 2 k 2 ) .
  • Explanation of terms:
  • The odd symmetry in Equation (122) ensures that the kernel changes sign under spatial inversion, which in turn enforces the cancelation of all odd-order contributions in Taylor expansions.
  • The vanishing of odd moments in Equation (123) is a direct consequence of the odd symmetry and implies that only even-order derivatives of f contribute to the leading terms in the operator expansion.
  • The even moments  μ 2 m are explicitly computed in Equation (124) based on the analytical form of the kernel. These constants depend on the parameters λ > 0 (scaling factor), q > 0 (hyperbolic modulation), and a structural constant C m > 0 arising from the base function (e.g., a mollified or scaled tanh).
  • The asymptotic expansion in Equation (125) reflects the accuracy of the approximation T n f f as n , with leading-order contributions given by even derivatives of f, weighted by the corresponding moments μ 2 m . The residual error is of order O ( n 2 k 2 ) , under the assumption f C 2 k + 2 ( R ) .
This moment structure underpins the spectral locality, smoothness, and geometric consistency of the symmetrized kernel, and is fundamental to the stability and convergence theory of the associated operator network.

9. Spectral Variance and Voronovskaya-Type Expansions

To analyze the asymptotic behavior of the ONHSH operators, we establish a Voronovskaya-type expansion that elucidates the bias–variance decomposition induced by spectral smoothing.
Theorem 15
(Voronovskaya Expansion for Modular Operators). Let f B p , q 2 s , τ ( R d ) , where the smoothness vector satisfies s ( 0 , ) d , and let the parameters p , q , τ lie in the interval [ 1 , ] . Consider the sequence of linear operators T n constructed via convolution with a family of smoothing kernels K λ , q , n ( x , y ) that satisfy appropriate moment and regularity conditions. Then, for each fixed point x R d , the following asymptotic pointwise expansion holds:
T n ( f ) ( x ) = f ( x ) + 1 2 n j = 1 d β j 2 f x j 2 ( x ) + R n ( f ) ( x ) ,
where the spectral variance coefficients β j > 0 correspond to the kernel’s second moments along the coordinate directions
β j = R d ( y j x j ) 2 K λ , q , n ( x , y ) d y ,
and the remainder R n ( f ) satisfies the norm estimate
R n ( f ) L p C n γ f B p , q 2 s , τ , for some constant γ > 1 ,
with a constant C > 0 independent of n and f.
Proof. 
The proof relies on performing a second-order Taylor expansion of f around x,
f ( y ) = f ( x ) + j = 1 d ( y j x j ) f x j ( x ) + 1 2 j , k = 1 d ( y j x j ) ( y k x k ) 2 f x j x k ( x ) + R 3 ( x , y ) ,
where the remainder R 3 ( x , y ) satisfies
| R 3 ( x , y ) | C y x 3 sup ξ B ( x , δ ) max | α | = 3 | D α f ( ξ ) | .
Due to the kernel’s symmetry and normalization properties, particularly the evenness in y x the first-order terms vanish upon integration,
R d ( y j x j ) K λ , q , n ( x , y ) d y = 0 , j = 1 , , d .
The second moments scale inversely with n,
R d ( y j x j ) ( y k x k ) K λ , q , n ( x , y ) d y = β j n δ j k ,
where δ j k is the Kronecker delta.
Substituting Equation (129) into the integral operator yields
T n ( f ) ( x ) = R d f ( y ) K λ , q , n ( x , y ) d y = f ( x ) + 1 2 n j = 1 d β j 2 f x j 2 ( x ) + R d R 3 ( x , y ) K λ , q , n ( x , y ) d y .
The remainder term can be bounded in L p norm using the smoothness of f and decay properties of the kernel moments, invoking embeddings for Besov spaces and moment estimates [16,24],
R d R 3 ( · , y ) K λ , q , n ( · , y ) d y L p C n γ f B p , q 2 s , τ .
Positivity of β j follows from the positive-definiteness and normalization of the kernel [18], ensuring that the variance term genuinely measures the spread induced by smoothing.
This establishes the Voronovskaya-type expansion in Equation (126), quantifying the leading-order bias of T n as a diffusion operator perturbation, with uniformly controlled higher-order errors.    □

9.1. Geometric Interpretation

The spectral variance term
σ spec 2 ( f ) ( x ) : = 1 2 j = 1 d β j 2 f x j 2 ( x ) ,
can be interpreted geometrically as a curvature-induced bias analogous to the action of a Laplace-type operator on a Riemannian manifold ( M , g ) with a compatible connection ∇.
Specifically, for an elliptic pseudodifferential operator D acting on sections of a vector bundle E M , the second-order coefficient a 2 ( x ) in the heat kernel expansion satisfies
σ spec 2 ( f ) ( x ) Tr a 2 ( x ) 2 f ( x ) ,
where Tr denotes the trace over the fiber of E at x, and  2 f is the Hessian.
In noncommutative geometry, replacing D with a Dirac-type operator D affiliated to a spectral triple ( A , H , D ) , the spectral variance can be expressed via Dixmier traces
σ spec 2 ( f ) ( x ) = lim N 1 log N λ n N λ n 2 | f , ψ n | 2 ,
where { λ n , ψ n } are eigenpairs of D , connecting the asymptotic bias with operator traces on von Neumann algebras [25,26].
This framework reveals that the neural operators encode local geometric information such as scalar curvature or bundle torsion, providing a deep topological underpinning to the approximation process.

9.2. Bias–Variance Trade-Off

The Voronovskaya expansion naturally separates the approximation operator T n into bias and variance components
T n f ( x ) = f ( x ) + 1 n B ( f ) ( x ) + R n ( f ) ( x ) ,
where the bias operator B captures the leading error term and the remainder R n ( f ) decays faster than n 1 .
On a compact Riemannian manifold M with metric g and Levi-Civita connection ∇, the bias admits a local expression
B ( f ) ( x ) = Tr g 2 f ( x ) + K ( x ) f ( x ) ,
where Tr g is the trace with respect to g and K ( x ) is a curvature-dependent potential emerging from kernel asymmetries or commutator effects.
The variance is controlled in L p norm by
R n ( f ) L p ( M ) C n γ f W s , p ( M ) , s > 0 ,
reflecting the smoothing properties of T n .
Balancing bias and variance yields the optimal model complexity
n ( ε ) ε 1 γ 1 ,
where ε is the desired accuracy. This rate characterizes minimax optimal tuning in statistical learning and approximation theory.
Finally, in noncommutative geometry, the bias operator B ( f ) corresponds to the trace of squared commutators
B ( f ) τ [ D , f ] 2 ,
where D is a Dirac-type operator and τ is a faithful trace on a von Neumann algebra [25].

9.3. Hyperbolic Symmetry Invariance

The study of invariance under non-compact Lie groups is fundamental in harmonic analysis, representation theory, and mathematical physics. In particular, the Lorentz group S O ( 1 , d 1 ) , which encodes the isometries of Minkowski space, plays a central role in the analysis of hyperbolic partial differential equations, relativistic field theories, and automorphic structures on pseudo-Riemannian manifolds.
  • Lorentz Group and Minkowski Geometry.
Consider the indefinite inner product on R d defined by the Minkowski metric tensor
η ( x , y ) : = x η y , with η : = diag ( 1 , + 1 , , + 1 ) ,
which induces the pseudo-norm
η ( x ) : = η ( x , x ) = x 0 2 + x 1 2 + + x d 1 2 .
The Lorentz group is defined as the group of linear transformations preserving this bilinear form
S O ( 1 , d 1 ) : = { Λ GL ( d , R ) : Λ η Λ = η } .
This group acts naturally on functions f : R d R by pullback
f f Λ 1 ,
yielding a representation that respects the underlying pseudo-Riemannian geometry.
  • Kernel Invariance under Lorentz Transformations.
Let K : R d × R d R be an integral kernel constructed from a symmetrized hyperbolic activation function ψ λ , q of the Minkowski distance
K ( x , y ) : = ψ λ , q η ( x y ) ,
where ψ λ , q is a sufficiently smooth, rapidly decaying function symmetric under the involution u u .
Due to the Lorentz invariance of the Minkowski bilinear form, for all Λ S O ( 1 , d 1 ) one has
K ( Λ x , Λ y ) = ψ λ , q η ( Λ x Λ y ) = ψ λ , q η ( x y ) = K ( x , y ) .
Consequently, the associated integral operator
( T f ) ( x ) : = R d K ( x , y ) f ( y ) d y ,
commutes with the action of S O ( 1 , d 1 ) , that is,
T ( f Λ 1 ) = ( T f ) Λ 1 .
This equivariance embeds T into the class of integral operators invariant under pseudo-orthogonal transformations.
  • Modular–Hyperbolic Coupling and Periodicity.
Introduce modular periodicity by defining
K λ , q , n ( x , y ) : = k Z d e π k 2 n 1 / 2 ψ λ , q η ( x y k ) ,
which incorporates a lattice summation weighted by a Gaussian-type modular damping factor. The combination of Lorentz-invariant arguments and modular periodicity yields operators encoding both hyperbolic geometric priors and arithmetic spectral decay, essential for regularization and spectral concentration.
  • Spectral and Representation-Theoretic Consequences.
Owing to S O ( 1 , d 1 ) -invariance, these operators diagonalize in bases adapted to the representation theory of the Lorentz group, such as, hyperbolic spherical harmonics or automorphic forms on arithmetic quotients. The spectral decomposition aligns with Casimir operators of the associated Lie algebra, dictating the localization and transfer properties of the operator spectrum.
From the viewpoint of non-commutative harmonic analysis, the operator family { T } can be realized via unitary induced representations of S O ( 1 , d 1 ) on L 2 ( R d ) , modulated by modular weights. This construction yields convolution-like, equivariant operators under pseudo-isometries, thereby connecting geometric operator theory with spectral learning frameworks.
This hyperbolic symmetry invariance justifies employing ONHSH operators in the context of hyperbolic PDEs, including relativistic wave and Dirac-type equations, and supports geometrically coherent operator learning on negatively curved or pseudo-Riemannian domains. The preservation of the Lorentz group action ensures that learned operators respect the fundamental spacetime symmetries intrinsic to such models.
The kernel smoothness assumptions, geometric hypotheses, and auxiliary lemmas required in this section are summarized in Appendix  B.

10. Hyperbolic Symmetry Invariance Non-Compact

The invariance of operators under non-compact symmetry groups is a central topic in harmonic analysis, representation theory, and mathematical physics. Here we treat the Lorentz group and give fully detailed derivations that integral operators whose kernels depend only on the Minkowski separation are equivariant under the Lorentz action.
  • Setup and notation.
Equip R d with the Minkowski bilinear form
η ( x , y ) : = x η y , η : = diag ( 1 , 1 , , 1 ) ,
so that the pseudo-norm is
η ( x ) : = η ( x , x ) = x 0 2 + x 1 2 + + x d 1 2 .
The Lorentz group is
S O ( 1 , d 1 ) : = { Λ GL ( d , R ) Λ η Λ = η , det Λ = 1 } .
We denote by ρ ( Λ ) the left-regular (pullback) action of Λ on functions f : R d C :
( ρ ( Λ ) f ) ( x ) : = f ( Λ 1 x ) .
Kernel hypothesis.
Let K : R d × R d C be given by a radial dependence on the Minkowski separation
K ( x , y ) = ψ η ( x y ) ,
where ψ : R C is sufficiently regular (for example ψ C with at most polynomial growth). Define the integral operator T by
( T f ) ( x ) : = R d K ( x , y ) f ( y ) d y .
Theorem 16
(Lorentz equivariance of T ). If K has the form of Equation (154), then for every Λ S O ( 1 , d 1 ) and every (reasonable) f,
T ( ρ ( Λ ) f ) = ρ ( Λ ) ( T f ) .
Equivalently,
T ρ ( Λ ) = ρ ( Λ ) T , Λ S O ( 1 , d 1 ) .
Proof. 
The argument proceeds in two steps: (i) We first show that the kernel is pointwise invariant under the simultaneous Lorentz action on both variables; (ii) we then use a linear change in variables in the defining integral and the determinant property to commute T with the representation ρ ( Λ ) .
(i) 
Pointwise kernel invariance. Let Λ S O ( 1 , d 1 ) . Using Λ x Λ y = Λ ( x y ) and the bilinearity of the Minkowski form, we have
K ( Λ x , Λ y ) = ψ η ( Λ x Λ y ) = ψ ( x y ) Λ η Λ ( x y ) = ψ ( x y ) η ( x y ) = ψ η ( x y ) = K ( x , y ) ,
where the penultimate equality follows from the defining property Λ η Λ = η (cf. Equation (152)). Thus,
K ( Λ x , Λ y ) = K ( x , y ) , Λ S O ( 1 , d 1 ) .
(ii) 
Interchange of group action and integral operator. Let f be a smooth compactly supported function (the general case follows by density). For fixed x,
(160) ( T ( ρ ( Λ ) f ) ) ( x ) = R d K ( x , y ) ( ρ ( Λ ) f ) ( y ) d y (161) = R d K ( x , y ) f ( Λ 1 y ) d y ( by definition of ρ ( Λ ) )
Make the linear change in variables z = Λ 1 y , so that y = Λ z and d y = | det Λ | d z = d z since det Λ = 1 ,
( T ( ρ ( Λ ) f ) ) ( x ) = R d K ( x , Λ z ) f ( z ) d z .
By Equation (159) applied to ( Λ 1 x , z ) , we have K ( x , Λ z ) = K ( Λ 1 x , z ) . Substituting into Equation (162) yields
( T ( ρ ( Λ ) f ) ) ( x ) = R d K ( Λ 1 x , z ) f ( z ) d z (163) = ( T f ) ( Λ 1 x ) (164) = ( ρ ( Λ ) ( T f ) ) ( x ) .
This proves the equivariance relation in Equation (156) for compactly supported smooth f. Standard density and boundedness arguments extend the result to broader function spaces such as L 2 ( R d ) , provided T is bounded there.    □
  • Remarks on measure-preservation and determinant.
The change in variables, required that the Lebesgue measure d y be preserved by the linear map y Λ y . For  Λ S O ( 1 , d 1 ) we have det Λ = 1 by definition, hence d y = d z under y = Λ z . If one instead considered the full Lorentz group including improper elements with det Λ = 1 , the same algebraic kernel invariance holds, but sign of determinant must be treated when interchanging integrals; for an integral operator on L p the magnitude | det Λ | appears and is 1 for all proper or improper Lorentz maps.
  • Modular–hyperbolic kernel: invariance subtleties.
Recall the modular–hyperbolic kernel
K λ , q , n ( x , y ) : = k Z d e π k 2 n 1 / 2 ψ λ , q η ( x y k ) .
For a general Λ S O ( 1 , d 1 ) , the summation index k Z d is not invariant under Λ , so pointwise invariance K λ , q , n ( Λ x , Λ y ) = K λ , q , n ( x , y )  does not hold in general. Two important cases should be distinguished as follows:
  • Lattice-stabilizing subgroup: If Λ belongs to the subgroup Γ : = { Λ S O ( 1 , d 1 ) Λ Z d = Z d } , then the map k Λ k permutes Z d . In that case we may rename the summation index and use the same change-of-variables argument as above to obtain
    K λ , q , n ( Λ x , Λ y ) = K λ , q , n ( x , y ) , Λ Γ .
    Thus invariance is retained on the arithmetic subgroup Γ .
  • General Lorentz maps: If Λ Γ , the lattice Z d is not preserved, and the sum in Equation (165) is mapped to a sum indexed by Λ Z d , which is typically not the same set as Z d . Therefore the pointwise invariance fails in general; however, the modular Gaussian factor e π k 2 / n 1 / 2 provides rapid decay so that the operator still regularizes high-frequency lattice modes and can be analyzed spectrally using Poisson summation and arithmetic harmonic analysis.
  • Spectral and representation-theoretic consequences
Because T commutes with the representation ρ of S O ( 1 , d 1 ) (cf. Equation (157)), Schur’s lemma implies that T acts by scalars on each irreducible subrepresentation occurring in the decomposition of the ambient L 2 -space (or other unitary module). Equivalently, when the action decomposes into generalized spherical harmonics or automorphic eigenfunctions (on quotients or on model spaces), T diagonalizes with eigenvalues parametrized by the Casimir eigenvalues of so ( 1 , d 1 ) . A concrete way to see this is to project T onto joint eigenspaces of the Casimir operator
Ω so = i < j X i j 2 ,
and observe that Ω so commutes with ρ ( Λ ) and therefore with T ; hence eigenspaces of Ω so reduce T and carry scalar action thereon. □
  • Remarks.
The derivation above shows explicitly how the algebraic invariance of the Minkowski form η under Lorentz maps (Equation (152)) yields pointwise kernel invariance (Equation (159)), and how that invariance, combined with the measure-preserving nature of Λ (determinant = 1 ), produces the commutation relation in Equation (157). The modular coupling retains symmetry only for lattice-preserving Lorentz elements; in the general case it introduces arithmetic structure that regularizes spectral content but breaks full Lorentz invariance down to an arithmetic stabilizer.

11. Anisotropic Sobolev Embedding

We work with anisotropic Besov spaces B p , q s ( R d ) defined via an anisotropic Littlewood–Paley decomposition adapted to dyadic rectangles. Let s = ( s 1 , , s d ) ( 0 , ) d and 1 p , q .

11.1. (A) Embedding Under the Balanced Anisotropic Condition

Theorem 17
(Embedding under the balanced condition). Assume
j = 1 d 1 s j < d p .
Then every f B p , q s ( R d ) admits a bounded, uniformly continuous representative and there is a constant C > 0 (depending only on d , p , q , s and the chosen Littlewood–Paley cutoffs) such that
f L ( R d ) C f B p , q s .
Proof. 
Let { Δ k } k N 0 d denote anisotropic Littlewood–Paley blocks with the usual dyadic support property
supp Δ k f ^ j = 1 d { ξ j : | ξ j | 2 k j } .
By the anisotropic Bernstein inequality there exists C B > 0 such that for every multi-index k
Δ k f L C B j = 1 d 2 k j / p Δ k f L p .
Set the anisotropic weight
w ( k ) : = j = 1 d k j s j .
The idea is to organize the summation over k according to level sets of w ( k ) . For  N N 0 define
K N : = k N 0 d : N w ( k ) < N + 1 .
Two basic observations are used below:
(i)
On the shell K N the geometric factor j 2 k j / p can be bounded in terms of N. Indeed
j = 1 d 2 k j / p = 2 1 p j k j = 2 1 p j s j k j s j 2 max j s j p w ( k ) 2 C 1 N ,
for some constant C 1 > 0 depending only on s . (Any equivalent linear bound in N suffices.)
(ii)
The cardinality of the shell K N grows at most polynomially in N; there is C 2 > 0 and an integer m d 1 such that
# K N C 2 ( N + 1 ) m .
(Heuristically, K N is the intersection of the integer lattice with a dilated simplex in R d , so the growth is polynomial of degree d 1 .)
Now sum the sup-norms over shells using Equation (171),
(176) f L k Δ k f L C B N = 0 k K N j = 1 d 2 k j / p Δ k f L p (177) C B N = 0 2 C 1 N k K N Δ k f L p .
To compare the inner sum with the Besov norm, fix q and apply Hölder in the discrete variable k over each shell, with conjugate exponents q and q (so 1 / q + 1 / q = 1 ),
k K N Δ k f L p # K N 1 / q k K N ( 2 k · s Δ k f L p ) q 1 / q · sup k K N 2 k · s ,
where k · s = j k j s j . Note that on the shell K N we have
k · s = j k j s j min j s j j k j and j k j w ( k ) = N + O ( 1 ) ,
so k · s N uniformly on K N . Consequently
sup k K N 2 k · s C 3 2 c N ,
for constants C 3 , c > 0 depending only on s .
Combining Equations (177), (178) and (180) yields
f L C 4 N = 0 2 C 1 N ( # K N ) 1 / q 2 c N k K N ( 2 k · s Δ k f L p ) q 1 / q .
Using the polynomial growth in Equation (175) and absorbing polynomial factors into the exponential (i.e., ( N + 1 ) m / q C 2 ε N for any small ε > 0 ), we can ensure the combined prefactor 2 ( C 1 c + ε ) N decays provided c > C 1 + ε . The crucial point is that the balance condition in Equation (168) guarantees that one may choose the Littlewood–Paley scaling so that c exceeds C 1 ; heuristically, Equation (168), prevents mass from concentrating excessively in coordinate directions and ensures k · s grows proportionally to w ( k ) . With this choice the series in N converges and summing over N recovers the full Besov q -norm, yielding the desired bound in Equation (169).
Finally, the argument for uniform continuity follows from the same truncation argument as in the isotropic case: Truncate the Littlewood–Paley series at a large anisotropic level to obtain a smooth finite sum (hence uniformly continuous) and control the remainder uniformly in sup-norm by the geometric tail estimates above. This completes the proof.    □
Remark 6.
The proof above is explicit about the mechanism: One groups multi-indices k by an anisotropic scale w ( k ) , controls the number of multi-indices in each shell, and uses geometric decay produced by the Besov weights 2 k · s . The condition in Equation (168) is a natural balanced hypothesis that allows this trade-off to succeed. For sharper or different optimal anisotropic criteria one typically refines the counting estimate or works with mixed ℓ-norm embeddings; the machinery in those refinements is the same in spirit but heavier in combinatorial bookkeeping.

11.2. (B) Coordinatewise Sufficient Condition with Explicit Constants

Theorem 18
(Coordinatewise Sufficient Condition with Explicit Constants). Let 1 p , q and s = ( s 1 , , s d ) ( 0 , ) d satisfy
s j > 1 p , j = 1 , , d .
Define
β j : = s j 1 p > 0 , j = 1 , , d ,
and let q denote the conjugate exponent to q, i.e.,
1 q + 1 q = 1 ,
with the convention q = 1 if q = .
Then for every f B p , q s ( R d ) , the following estimate holds:
f L ( R d ) C B j = 1 d 1 2 q β j 1 q f B p , q s ( R d ) ,
where C B is the anisotropic Bernstein constant from Equation (171).
In particular, this establishes a continuous embedding
B p , q s ( R d ) L ( R d ) ,
with an explicit control on the embedding constant.
Proof. 
The proof relies on the anisotropic Littlewood–Paley decomposition combined with the anisotropic Bernstein inequality.
Littlewood–Paley decomposition. Let { Δ k } k N 0 d be the family of anisotropic frequency projection operators associated with the Littlewood–Paley decomposition. Then, any f B p , q s ( R d ) can be represented as
f = k N 0 d Δ k f ,
with convergence in the Besov norm and tempered distributions.
Applying the anisotropic Bernstein inequality. By Equation (171), there exists a constant C B > 0 such that for each k ,
Δ k f L C B j = 1 d 2 k j p Δ k f L p .
Splitting the exponential factor. Observe that
j = 1 d 2 k j p = j = 1 d 2 k j β j · j = 1 d 2 k j s j ,
where β j = s j 1 p . This splitting isolates a decaying term j 2 k j β j , which is crucial for summability.
Defining the weighted sequence. Set
b k : = j = 1 d 2 k j s j Δ k f L p .
By definition of the Besov norm,
f B p , q s = b k q ( N 0 d ) .
Estimating the supremum norm. Combining the above, we get
Δ k f L C B j = 1 d 2 k j β j b k ,
and hence
f L k N 0 d Δ k f L C B k j = 1 d 2 k j β j b k .
Applying discrete Hölder’s inequality. Using Hölder’s inequality for sequences with exponents q and q ,
k a k c k a k q c k q ,
and taking
a k : = j = 1 d 2 k j β j , c k : = b k ,
we obtain
f L C B 2 k j β j k q ( N 0 d ) b k q ( N 0 d ) = C B 2 k j β j k q ( N 0 d ) f B p , q s .
Computing the q -norm explicitly. Since the sequence factorizes coordinate-wise, its q -norm is given by
2 k j β j k q q = k j = 1 d 2 q k j β j = j = 1 d k j = 0 2 q k j β j ,
and each one-dimensional sum is a geometric series converging since β j > 0 ,
k j = 0 2 q k j β j = 1 1 2 q β j .
Therefore,
2 k j β j k q = j = 1 d 1 2 q β j 1 q < .
Substituting this back into Equation (196) yields
f L C B j = 1 d 1 2 q β j 1 q f B p , q s ,
which is the desired explicit embedding estimate.    □
  • Remarks on (A) vs. (B).
  • The coordinatewise condition in Equation (185) used in (B) is a simple, easily checked sufficient hypothesis and gives an explicit constant via the geometric series j ( 1 2 q β j ) 1 / q . This suffices in many applications.
  • The balanced condition in Equation (168) in (A) is more flexible: It allows some coordinates to have small smoothness provided others compensate. The proof in (A) uses shell/scale counting and geometric decay; to obtain a fully sharp anisotropic criterion one refines the counting estimate in Equation (175) and the scale bound in Equation (174) and often works in mixed-norm -spaces. If you want, I can convert the argument in (A) into a fully quantitative statement with explicit constants (this requires a more careful combinatorial estimate of # K N and the constants in Equation (174)).

12. Spectral Refinement via ONHSH Operators

Consider the family of hypermodular neural convolution operators  { A n } n N acting on functions f L p ( R d ) , defined by the integral transform
A n f ( x ) : = R d Φ λ ( n ) , q n n ( x t ) f ( t ) d t ,
where the parameters q n and λ ( n ) are chosen as
q n : = e π n 1 / 2 , and λ ( n ) : = n 1 / 4 .
Equivalently, this operator can be expressed as a convolution with the rescaled kernel
Φ n ( x ) : = Φ λ ( n ) , q n ( n x ) , so that A n f = Φ n f .

12.1. Fourier Multiplier Representation

By applying the Fourier transform and using the convolution theorem, A n admits the representation
A n f ^ ( ξ ) = m n ( ξ ) f ^ ( ξ ) ,
where the Fourier multiplier m n is given explicitly by the series expansion
m n ( ξ ) : = k Z d q n k 2 χ k ( ξ ) ,
with { χ k } k Z d denoting a smooth partition of unity subordinated to rectangles covering the frequency domain R d .
The parameter choices ensure that the multiplier exhibits a super-exponential spectral decay,
| m n ( ξ ) | C 1 exp c n 1 / 2 ξ 2 , ξ R d ,
for some constants C 1 , c > 0 independent of n and ξ .

12.2. Significance of the Spectral Decay

This sharp decay of m n implies that A n strongly suppresses high-frequency components of f, effectively acting as a spectral filter that enhances smoothness and spatial localization in the output. The parameter λ ( n ) controls the scaling of the kernel and the smoothing strength, while q n modulates the exponential decay rate.

12.3. ONHSH-Enhanced Sobolev Embedding Theorem

We now state a fundamental regularization and approximation property of A n in the context of anisotropic Besov spaces.
Theorem 19
(ONHSH-Enhanced Sobolev Embedding). Let f B p , q s ( R d ) be an anisotropic Besov function with smoothness multi-index s = ( s 1 , , s d ) satisfying the Sobolev embedding condition
s j > d p , for each j = 1 , , d .
Then there exist positive constants C , c 0 > 0 , independent of n and f, such that the following holds:
A n f L ( R d ) C e c 0 n 1 / 4 f B p , q s + C f L ( R d ) , n N .
In particular, the operator sequence { A n } converges uniformly to the identity
A n f f L ( R d ) = O e c 0 n 1 / 4 , as n .
Proof. 
To ensure clarity and rigor, the proof is structured in distinct parts.
Recall that A n f = Φ n f where the kernel Φ n is given by the inverse Fourier transform of the multiplier m n ,
Φ n ( x ) : = F 1 [ m n ] ( x ) .
By construction, m n ( 0 ) = 1 , ensuring normalization of the operator at low frequency.
Using properties of the Fourier transform and the partition of unity, the kernel Φ n satisfies a uniform L 1 bound independent of n,
Φ n L 1 ( R d ) = F 1 [ m n ] L 1 ( R d ) C 1 ,
for some constant C 1 > 0 . This ensures that A n is bounded on L p for all 1 p via Young’s convolution inequality.
By applying the Poisson summation formula and exploiting the Gaussian-type decay in the coefficients q n k 2 , the kernel satisfies the uniform pointwise estimate
Φ n L ( R d ) k Z d e π n 1 / 2 k 2 n d / 4 .
Define the residual multiplier
r n ( ξ ) : = m n ( ξ ) 1 .
Then the approximation error satisfies
( A n I ) f = F 1 [ r n · f ^ ] .
Since f B p , q s with s j > d / p , the Sobolev embedding implies f L . Furthermore, using the continuous embeddings
B p , q s ( R d ) B , 1 0 ( R d ) L ( R d ) ,
we estimate
( A n I ) f L C F 1 [ r n f ^ ] B , 1 0 .
By multiplier theory on Besov spaces, it suffices to bound sup ξ | r n ( ξ ) | . Using the spectral decay in Equation (206) and the fact that m n ( 0 ) = 1 , we have:
| r n ( ξ ) | = | m n ( ξ ) 1 | C 2 e c n 1 / 2 ξ 2 .
Optimizing the decay by choosing ξ 2 n 1 / 2 yields the exponential decay rate
sup ξ R d | r n ( ξ ) | C e c 0 n 1 / 4 ,
for some c 0 > 0 .
Substituting Equation (217) into Equation (215) gives
( A n I ) f L C e c 0 n 1 / 4 f B p , q s ,
and by the triangle inequality,
A n f L f L + ( A n I ) f L ,
which establishes the stated estimate of Equation (208).
Finally, the uniform convergence (Equation (209)) follows directly from the exponential decay of the residual norm.    □

13. Nonlinear Approximation Rates

Theorem 20
(Hyperbolic Wavelet Approximation). Let f B p , s ( R d ) , with  1 < p < , and anisotropic smoothness vector s = ( s 1 , , s d ) ( 0 , ) d satisfying the condition
s j > d p , j = 1 , , d .
Then, for a hyperbolic wavelet basis { ψ λ } λ Λ adapted to the anisotropy, the best n-term approximation error in the L p -norm admits the estimate
σ n ( f ) p : = inf g span { ψ λ i } i = 1 n f g L p C n β ( log n ) ( d 1 ) β f B p , s ,
where the convergence rate exponent β is given by
β : = j = 1 d 1 s j 1 .
Proof. 
We begin by recalling the anisotropic decay of wavelet coefficients associated with f, cf. [16,29],
| c k , m | = | f , ψ k , m | C 2 k · s 2 k 1 d 2 d p f B p , s ,
where k = ( k 1 , , k d ) N 0 d encodes the anisotropic scale indices, m denotes spatial localization indices, and  k 1 = j = 1 d k j . The factor 2 k 1 ( d / 2 d / p ) arises from the L p -normalization of the wavelet basis elements.
For a fixed threshold η > 0 , define the set of indices corresponding to “significant” coefficients
Γ η : = ( k , m ) Λ : | c k , m | η .
From Equation (223) the threshold condition implies
| c k , m | η 2 k · s C η 1 2 k 1 d 2 d p .
Using that s j > d / p , hence s > ( d / p , , d / p ) , the dominating behavior in k implies a hyperbolic band restriction approximated by
k · s log 2 C η .
At each scale k , the cardinality of spatial translations m satisfies
# { m } 2 k 1 ,
so the total number of significant coefficients obeys the estimate
# Γ η k N 0 d k · s log 2 ( C / η ) 2 k 1 .
Approximating the discrete sum by an integral in t R + d yields
# Γ η t 0 t · s log 2 ( C / η ) 2 t 1 d t .
Performing the change in variables
u j : = t j s j , j = 1 , , d , d t = j = 1 d d u j s j ,
we rewrite
t 1 = j = 1 d t j = j = 1 d u j s j ,
and the integration domain becomes the simplex
u R + d : j = 1 d u j log 2 ( C / η ) .
Hence,
# Γ η j = 1 d 1 s j u j log 2 ( C / η ) 2 j = 1 d u j s j d u .
The integral can be explicitly evaluated or estimated via Laplace’s method, yielding
# Γ η C η 1 β ( log ( 1 / η ) ) d 1 ,
where the exponent β is defined in Equation (222).
Ordering the coefficients { | c λ r | } r = 1 non-increasingly, the cardinality estimate implies the decay rate
| c λ r | C r β ( log r ) ( d 1 ) β .
To bound the best n-term approximation error σ n ( f ) p , note that by definition,
σ n ( f ) p p r > n | c λ r | p C r > n r p β ( log r ) p ( d 1 ) β .
Since p β > 1 due to the assumption s j > d / p , the tail sum converges. Applying integral comparison and taking the p-th root yields the desired approximation rate
σ n ( f ) p C n β ( log n ) ( d 1 ) β f B p , s .
   □

Duality in Anisotropic Besov Spaces

Theorem 21
(Dual Space Characterization). For s R d and 1 < p , q < , the topological dual of the anisotropic Besov space B p , q s ( R d ) is characterized by
B p , q s ( R d ) = B p , q s ( R d ) ,
where p and q denote the Hölder conjugates of p and q, respectively, i.e.,  1 / p + 1 / p = 1 and 1 / q + 1 / q = 1 .
Proof. 
Let Δ k ( j ) be the directional Littlewood–Paley frequency projections along the j-th coordinate axis for j = 1 , , d . Then, for any f B p , q s ,
f = j = 1 d k = 0 Δ k ( j ) f ,
with convergence in the Besov norm topology.
The anisotropic Besov norm can be expressed as
f B p , q s = j = 1 d k = 0 2 k s j Δ k ( j ) f L p q 1 / q .
Consider g B p , q s . The dual pairing is naturally defined by
f , g = j = 1 d k = 0 Δ k ( j ) f , Δ k ( j ) g ,
where · , · denotes the L 2 inner product or distributional duality.
Applying Hölder’s inequality for L p and L p ,
| Δ k ( j ) f , Δ k ( j ) g | Δ k ( j ) f L p Δ k ( j ) g L p .
Define sequences
a k ( j ) : = 2 k s j Δ k ( j ) f L p , b k ( j ) : = 2 k s j Δ k ( j ) g L p .
Then the pairing estimate becomes
| f , g | j = 1 d k = 0 a k ( j ) b k ( j ) .
By applying Hölder’s inequality in the q and q sequence spaces, we have:
| f , g | j = 1 d k = 0 | a k ( j ) | q 1 / q j = 1 d k = 0 | b k ( j ) | q 1 / q = f B p , q s g B p , q s .
This proves that every g B p , q s defines a bounded linear functional on B p , q s . Since the Schwartz class S ( R d ) is dense in both spaces and the pairing extends continuously, the duality in Equation (236) holds.    □

14. Hyperbolic Symmetry Invariance in Transformation Groups

The invariance under non-compact transformation groups, notably the Lorentz group, is a fundamental principle in harmonic analysis and mathematical physics. In this section, we rigorously establish that anisotropic Besov spaces B 2 , 2 s ( R d ) , equipped with hyperbolic scaling exponents
s = ( s , 2 s , , d s ) , s > 0 ,
are invariant under the natural action of the Lorentz group S O ( 1 , d 1 ) . This invariance stems from the algebraic and geometric structure of the hyperboloid and the induced linear transformations acting on Fourier variables.

14.1. Lorentz Group Action on Tempered Distributions

Definition 4
(Lorentz Group Action). Let Λ S O ( 1 , d 1 ) be a Lorentz transformation. For any tempered distribution f S ( R d ) , define the group action
( Λ f ) ( x ) : = f ( Λ 1 x ) , x R d .
The corresponding induced action on the Fourier transform is given by
( Λ f ) ^ ( ξ ) = f ^ ( Λ ξ ) , ξ R d ,
where Λ denotes the transpose of Λ.

14.2. Equivalence of Anisotropic Symbols Under Lorentz Transformations

For the anisotropic scaling vector s as in Equation (244), define the anisotropic polynomial symbol by
m s ( ξ ) : = 1 + j = 1 d | ξ j | 2 j s .
Lemma 1
(Symbol Equivalence under Lorentz Transformations). For every Λ S O ( 1 , d 1 ) , there exist constants 0 < c Λ C Λ < , depending continuously on Λ and s, such that for all ξ R d ,
c Λ m s ( ξ ) m s ( Λ ξ ) C Λ m s ( ξ ) .
Proof. 
Since every Λ S O ( 1 , d 1 ) decomposes into elementary Lorentz boosts and spatial rotations, it suffices to verify the bounds for a Lorentz boost in the ( x 1 , x 2 ) -plane,
Λ = cosh θ sinh θ 0 0 sinh θ cosh θ 0 0 0 0 1 0 0 0 0 1 , θ R .
Let ξ : = Λ ξ with components:
ξ 1 = ξ 1 cosh θ + ξ 2 sinh θ , ξ 2 = ξ 1 sinh θ + ξ 2 cosh θ , ξ j = ξ j , j 3 .
Using convexity of the function x | x | p for p 1 and the generalized Minkowski inequality, we estimate for p = 2 j s 2 s > 0 ,
| ξ 1 | p ( | ξ 1 | cosh θ + | ξ 2 | sinh θ ) p 2 p 1 ( cosh θ ) p | ξ 1 | p + ( sinh θ ) p | ξ 2 | p ,
and similarly,
| ξ 2 | p ( | ξ 1 | sinh θ + | ξ 2 | cosh θ ) p 2 p 1 ( sinh θ ) p | ξ 1 | p + ( cosh θ ) p | ξ 2 | p .
For j 3 , | ξ j | 2 j s = | ξ j | 2 j s trivially.
Combining these and summing over j = 1 , , d , we obtain
m s ( Λ ξ ) C Λ m s ( ξ ) ,
where,
C Λ : = max 2 2 s 1 max { ( cosh θ ) 2 s , ( sinh θ ) 2 s } , , 1 < .
The lower bound follows by applying the same reasoning to Λ 1 , since S O ( 1 , d 1 ) is a group and Λ 1 S O ( 1 , d 1 ) .    □

14.3. Lorentz Invariance of the Anisotropic Besov Norm

Theorem 22
(Lorentz Invariance of B 2 , 2 s ). Given s = ( s , 2 s , , d s ) with s > 0 , the anisotropic Besov space B 2 , 2 s ( R d ) is invariant under the Lorentz action Λ f . More precisely, for every Λ S O ( 1 , d 1 ) and all f S ( R d ) ,
Λ f B 2 , 2 s C Λ f B 2 , 2 s ,
where the constant C Λ > 0 depends only on Λ and s.
Proof. 
Recall that for p = q = 2 , the anisotropic Besov norm can be expressed via the Fourier multiplier m s as
f B 2 , 2 s 2 R d | f ^ ( ξ ) | 2 m s ( ξ ) d ξ .
Set g : = Λ f . Using Equation (246),
g ^ ( ξ ) = f ^ ( Λ ξ ) .
Substitute into Equation (255),
g B 2 , 2 s 2 = R d | g ^ ( ξ ) | 2 m s ( ξ ) d ξ = R d | f ^ ( Λ ξ ) | 2 m s ( ξ ) d ξ .
Perform the change in variables η : = Λ ξ . Since Lorentz transformations preserve the volume element,
d ξ = d η ,
and hence
g B 2 , 2 s 2 = R d | f ^ ( η ) | 2 m s ( ( Λ ) 1 η ) d η .
Applying Lemma 1, we have
m s ( ( Λ ) 1 η ) C Λ m s ( η ) ,
which yields
g B 2 , 2 s 2 C Λ f B 2 , 2 s 2 .
The reverse inequality follows symmetrically by considering Λ 1 .    □
Remark 7.
This invariance result extends to anisotropic Besov spaces B p , q s ( R d ) for 1 < p , q < , using interpolation theory and boundedness properties of the Lorentz group action on Sobolev-type spaces.

15. Symmetrized Hyperbolic Activation Kernels with Modular Asymmetry

Activation kernels play a fundamental role in neural operator frameworks, serving as building blocks for approximating nonlinear mappings in function spaces. Hyperbolic-based kernels exhibit exceptional regularity and localization properties. The symmetrized hyperbolic kernel presented here leverages modular asymmetry and hyperbolic geometry to achieve tunable spectral decay and directional selectivity, with deep connections to harmonic analysis and number theory.

15.1. Base Activation Function

Definition 5
(Base Activation). Let λ > 0 and q ( 0 , 1 ) . The fundamental nonlinear activation function is defined by
g q , λ ( x ) : = tanh λ x 1 2 ln q = e λ x q e λ x e λ x + q e λ x .
Proposition 3
(Properties of the Base Activation). The function g q , λ : R ( 1 , 1 ) satisfies the following properties:
(i) 
Strict monotonicity:  g q , λ ( x ) > 0 for every x R ;
(ii) 
Asymptotic limits
lim x + g q , λ ( x ) = 1 , a n d lim x g q , λ ( x ) = 1 ;
(iii) 
Modular duality: For all x R ,
g q , λ ( x ) = g q 1 , λ ( x ) ;
(iv) 
Zero at shifted origin
g q , λ ln q 2 λ = 0 .
Proof. 
 
(i) 
Strict monotonicity. Differentiating g q , λ with respect to x, we use the chain rule on the hyperbolic tangent function
g q , λ ( x ) = d d x tanh λ x 1 2 ln q = λ sech 2 λ x 1 2 ln q .
Since the hyperbolic secant satisfies sech ( u ) = 2 e u + e u > 0 for all u R , and given λ > 0 , it follows that
g q , λ ( x ) > 0 , x R .
Hence, g q , λ is strictly increasing on R .
(ii) 
Asymptotic limits. For x + , we rewrite g q , λ ( x ) as
g q , λ ( x ) = e λ x q e λ x e λ x + q e λ x = 1 q e 2 λ x 1 + q e 2 λ x ,
by dividing numerator and denominator by e λ x . Since q e 2 λ x 0 as x + , we have
lim x + g q , λ ( x ) = 1 0 1 + 0 = 1 .
Similarly, for  x , dividing numerator and denominator by e λ x yields
g q , λ ( x ) = e λ x q e λ x e λ x + q e λ x = q 1 e 2 λ x 1 q 1 e 2 λ x + 1 .
Since q 1 e 2 λ x 0 as x , it follows that
lim x g q , λ ( x ) = 0 1 0 + 1 = 1 .
(iii) 
Modular duality. By direct substitution,
g q , λ ( x ) = e λ x q e λ x e λ x + q e λ x .
Multiplying numerator and denominator by q 1 e λ x , we obtain
g q , λ ( x ) = q 1 e 2 λ x q 1 + e 2 λ x = e 2 λ x q 1 e 2 λ x + q 1 = g q 1 , λ ( x ) .
(iv) 
Zero at shifted origin. Let x 0 : = ln q 2 λ . Substituting into Equation (258) gives
g q , λ ( x 0 ) = tanh λ · ln q 2 λ 1 2 ln q = tanh ( 0 ) = 0 .
   □

15.2. Central Difference Kernel

Definition 6
(Central Difference Kernel). The central difference kernel associated with the base activation g q , λ is defined by
M q , λ ( x ) : = 1 4 g q , λ ( x + 1 ) g q , λ ( x 1 ) .
Theorem 23
(Properties of the Central Difference Kernel). The kernel M q , λ : R R satisfies the following properties:
(i) 
Modular antisymmetry: For all x R ,
M q , λ ( x ) = M q 1 , λ ( x ) .
(ii) 
Exponential decay: There exists a constant C λ , q > 0 such that for all | x | > 1 ,
| M q , λ ( x ) | C λ , q e λ | x | .
Proof. 
(i) 
Modular antisymmetry. By definition of M q , λ and applying the modular duality property of g q , λ , Proposition(iii), we have
M q , λ ( x ) = 1 4 g q , λ ( x + 1 ) g q , λ ( x 1 ) = 1 4 g q 1 , λ ( x 1 ) + g q 1 , λ ( x + 1 ) = M q 1 , λ ( x ) .
(ii) 
Exponential decay. Note that the central difference kernel can be expressed via the fundamental theorem of calculus as the average derivative over the interval [ x 1 , x + 1 ] ,
M q , λ ( x ) = 1 4 x 1 x + 1 g q , λ ( t ) d t .
From the derivative Equation (259) and recalling the explicit form,
g q , λ ( t ) = λ sech 2 λ t 1 2 ln q .
Using the exponential decay of sech 2 ( u ) , there exist constants C 1 , C 2 > 0 depending on λ and q such that
g q , λ ( t ) C 1 e λ | t | , t R .
Therefore, for  | x | > 1 ,
| M q , λ ( x ) | 1 4 x 1 x + 1 | g q , λ ( t ) | d t C 1 4 x 1 x + 1 e λ | t | d t .
By the triangle inequality and monotonicity of the exponential,
x 1 x + 1 e λ | t | d t 2 e λ ( | x | 1 ) = 2 e λ e λ | x | .
Combining Equations (274) and (275) yields
| M q , λ ( x ) | C 1 4 · 2 e λ e λ | x | = C λ , q e λ | x | ,
where C λ , q : = C 1 2 e λ > 0 depends explicitly on the parameters λ and q.
This establishes the exponential decay of M q , λ ( x ) for large | x | .
   □

15.3. Symmetrized Hypermodular Kernel

Definition 7
(Symmetrized Kernel). The symmetrized hypermodular kernel is defined as
ψ λ , q ( x ) : = 1 2 M q , λ ( x ) + M q 1 , λ ( x )
Theorem 24
(Properties of the Symmetrized Kernel). Let ψ λ , q : R R be the symmetrized kernel defined by
ψ λ , q ( x ) : = 1 2 M q , λ ( x ) + M q 1 , λ ( x ) ,
where M q , λ is the central difference kernel defined previously. Then, ψ λ , q satisfies the following properties:
(i) 
Even symmetry:  ψ λ , q ( x ) = ψ λ , q ( x ) for all x R ;
(ii) 
Strict positivity:  ψ λ , q ( x ) > 0 for all x R ;
(iii) 
Vanishing of all odd moments:
R x 2 k + 1 ψ λ , q ( x ) d x = 0 , k N 0 ;
(iv) 
Normalization:
R ψ λ , q ( x ) d x = 1 .
Proof. 
 
(i) 
Even symmetry: By Equation (278) and the modular antisymmetry property of M q , λ from Theorem 24(i), we have
ψ λ , q ( x ) = 1 2 M q , λ ( x ) + M q 1 , λ ( x ) = 1 2 M q 1 , λ ( x ) + M q , λ ( x ) = ψ λ , q ( x ) .
This shows ψ λ , q is an even function.
(ii) 
Strict positivity: Since g q , λ is strictly increasing, its difference quotient M q , λ ( x ) is strictly positive for all x. The same holds for M q 1 , λ ( x ) , so their average ψ λ , q ( x ) is strictly positive,
ψ λ , q ( x ) = 1 2 M q , λ ( x ) + M q 1 , λ ( x ) > 0 , x R .
(iii) 
Vanishing odd moments: Because ψ λ , q is even by Equation (281), the product x 2 k + 1 ψ λ , q ( x ) is an odd function. Integrating any odd function over the entire real line yields zero,
R x 2 k + 1 ψ λ , q ( x ) d x = 0 , k N 0 .
(iv) 
Normalization: Using the integral representation of M q , λ given by
M q , λ ( x ) = 1 4 x 1 x + 1 g q , λ ( t ) d t ,
and Fubini’s theorem to interchange integrals, we compute
R M q , λ ( x ) d x = 1 4 R x 1 x + 1 g q , λ ( t ) d t d x = 1 4 R g q , λ ( t ) t 1 t + 1 d x d t = 1 4 R g q , λ ( t ) · 2 d t = 1 2 R g q , λ ( t ) d t = 1 2 g q , λ ( + ) g q , λ ( ) = 1 2 ( 1 ( 1 ) ) = 1 .
Consequently,
R ψ λ , q ( x ) d x = 1 2 R M q , λ ( x ) d x + R M q 1 , λ ( x ) d x = 1 2 ( 1 + 1 ) = 1 .
   □

15.4. Regularity and Spectral Decay

Theorem 25
(Regularity and Spectral Decay). Let ψ λ , q : R R denote the hyperbolic-modular activation kernel associated with parameters λ > 0 and q > 0 . Then, the following apply:
(i) 
Smoothness:
ψ λ , q C ( R ) .
(ii) 
Derivative decay: For every m N 0 , there exist constants C m , λ , q > 0 and α > 0 such that
d m d x m ψ λ , q ( x ) C m , λ , q e α | x | , x R .
(iii) 
Fourier decay: For every N N , there exists C N , λ , q > 0 such that
| ψ λ , q ^ ( ξ ) | C N , λ , q ( 1 + | ξ | ) N , ξ R .
Proof. 
(i) Smoothness. The kernel ψ λ , q is constructed from compositions and products of elementary analytic functions, notably the hyperbolic tangent tanh ( · ) , which is entire on C . As the composition and multiplication of C functions preserve smoothness, we obtain Equation (286).
(ii) Derivative decay. Let g λ , q be the generating profile of ψ λ , q , defined so that ψ λ , q ( x ) = g λ , q ( x ) g λ , q ( x ) in the symmetrized case. The analyticity strip of tanh ( z ) implies exponential decay of derivatives on the real axis. More precisely, by repeated differentiation,
d m d x m g λ , q ( x ) = P m ( λ , q ; tanh ( · ) , sech 2 ( · ) ) e λ | x | ,
where P m is a polynomial whose coefficients depend on λ and q. Taking absolute values and bounding polynomial terms by constants C m , λ , q yields
d m d x m g λ , q ( x ) C m , λ , q e λ | x | .
Since ψ λ , q is a linear combination of translates/reflections of g λ , q , the same bound holds with α = λ in Equation (287).
(iii) Fourier decay. The Paley–Wiener theorem asserts that if f C ( R ) extends to an entire function bounded by | f ( z ) | C ( 1 + | z | ) M e α | z | in a horizontal strip, then f ^ belongs to the Schwartz space S ( R ) . The exponential decay from Equation (287) implies that ψ λ , q satisfies these analytic bounds, hence
N N , ( 1 + | ξ | ) N ψ λ , q ^ ( ξ ) L ( R ) ,
which is exactly the decay property in Equation (288).    □
Remark 8.
The derivative bound in Equation (287) ensures that ψ λ , q acts as a spectrally localized mollifier, with its Fourier transform exhibiting super-polynomial decay. This is crucial for the spectral regularization properties of ONHSH operators, as it guarantees negligible high-frequency leakage and supports minimax-optimal convergence in anisotropic Besov norms.

15.5. Regularity and Spectral Decay in the Multivariate Anisotropic Setting

Theorem 26
(Regularity and Spectral Decay: Multivariate Anisotropic Case). Let d N , λ = ( λ 1 , , λ d ) ( 0 , ) d , q = ( q 1 , , q d ) ( 0 , ) d , and define the anisotropic hyperbolic-modular kernel ψ λ , q : R d R by
ψ λ , q ( x ) : = j = 1 d ψ λ j , q j ( x j ) , x = ( x 1 , , x d ) R d ,
where ψ λ j , q j is the one-dimensional profile associated with ( λ j , q j ) as in Theorem 25. Then, the following apply:
(i) 
Smoothness:
ψ λ , q C ( R d ) .
(ii) 
Anisotropic derivative decay: For every multi-index β = ( β 1 , , β d ) N 0 d , there exist constants C β , λ , q > 0 and α j > 0 such that
| D β ψ λ , q ( x ) | C β , λ , q exp j = 1 d α j | x j | , x R d .
(iii) 
Anisotropic Fourier decay: For every N N , there exists C N , λ , q > 0 such that
| ψ λ , q ^ ( ξ ) | C N , λ , q j = 1 d ( 1 + | ξ j | ) N , ξ R d .
Proof. 
(i) Smoothness. From Equation (292), ψ λ , q is the product of one-dimensional C profiles ψ λ j , q j C ( R ) . Since the product of smooth functions is smooth, Equation (293) follows.
(ii) Anisotropic derivative decay. For a multi-index β N 0 d , the Leibniz rule for multivariate derivatives gives
D β ψ λ , q ( x ) = j = 1 d d β j d x j β j ψ λ j , q j ( x j ) .
By the one-dimensional estimate of Equation (287), each factor satisfies
d β j d x j β j ψ λ j , q j ( x j ) C β j , λ j , q j e α j | x j | .
Multiplying over j = 1 , , d yields Equation (294) with
C β , λ , q = j = 1 d C β j , λ j , q j , α j = λ j .
(iii) Anisotropic Fourier decay. Since ψ λ , q factors as in Equation (292), its Fourier transform factors as
ψ λ , q ^ ( ξ ) = j = 1 d ψ λ j , q j ^ ( ξ j ) .
From the one-dimensional bound Equation (288), for each j we have
| ψ λ j , q j ^ ( ξ j ) | C N , λ j , q j ( 1 + | ξ j | ) N .
Multiplying these bounds over j = 1 , , d yields Equation (295) with
C N , λ , q = j = 1 d C N , λ j , q j .
   □
Remark 9
(Connection with Anisotropic Besov Spaces). The decay estimate of Equation (294) implies that ψ λ , q belongs to the anisotropic Schwartz space S aniso ( R d ) , meaning that for all multi-indices β , γ N 0 d ,
sup x R d | x γ D β ψ λ , q ( x ) | < .
Consequently, convolution with ψ λ , q is a smoothing operator of infinite order in every coordinate direction, mapping B p , q s ( R d ) continuously into B p , q t ( R d ) for all t > s . Moreover, the factorized Fourier decay of Equation (295) ensures compatibility with directional Littlewood–Paley decompositions, preserving anisotropic scaling properties intrinsic to ONHSH kernels.
Corollary 2
(Convolutional regularization: ψ λ , q is an admissible multiplier for anisotropic Besov spaces). Let ψ λ , q S aniso ( R d ) be the anisotropic kernel from Theorem 25. Then for every s R d (coordinatewise smoothness), 1 p , q and every integer N 0 the convolution operator
T ψ : f ψ λ , q f ,
satisfies the boundedness
T ψ : B p , q s ( R d ) B p , q s + N 1 ( R d ) ,
where 1 = ( 1 , , 1 ) N d . In particular T ψ is smoothing of arbitrary finite order in the anisotropic Besov scale, and hence is an admissible regularizing multiplier for approximation and spectral regularization arguments.
Proof. 
Fix anisotropic dyadic projections { Δ k } k N 0 d , where k = ( k 1 , , k d ) and each block Δ k is frequency-localized to
supp Δ k ^ ξ R d : c 1 2 k j 1 | ξ j | c 2 2 k j + 1 for each j ,
for fixed constants 0 < c 1 < c 2 . The Besov (quasi-)norm is given by
f B p , q s 2 k , s Δ k f L p k N 0 d q ( k ) ,
where k , s : = j = 1 d k j s j .
Since convolution is multiplicative in the Fourier side, we have
Δ k ( ψ f ) = F 1 φ k ( ξ ) ψ ^ ( ξ ) f ^ ( ξ ) ,
where φ k is the cutoff symbol of Δ k . Writing
m k ( ξ ) : = ψ ^ ( ξ ) ,
we obtain
Δ k ( ψ f ) = F 1 m k ( ξ ) Δ k f ^ ( ξ ) .
By Theorem 25,
| ψ ^ ( ξ ) | C N , λ , q j = 1 d ( 1 + | ξ j | ) N , N N .
On the support of φ k in Equation (301) we have | ξ j | 2 k j , hence
sup ξ supp φ k | m k ( ξ ) | C N j = 1 d 2 N k j .
Using Equation (307) in Equation (305), and the boundedness of blockwise Fourier multipliers, we obtain
Δ k ( ψ f ) L p C N 2 N j k j Δ k f L p .
Multiplying Equation (308) by 2 k , s + N 1 gives
2 k , s + N 1 Δ k ( ψ f ) L p C N 2 k , s Δ k f L p .
Taking the q -norm over k and using Equation (302), we conclude
ψ f B p , q s + N 1 C f B p , q s .
Since ψ λ , q has super-polynomial decay in Equation (306), the above estimate holds for any N N , proving Equation (300).    □

15.6. Fractional Smoothness Gain via Real Interpolation

The smoothing result in Corollary 1, guarantees a gain of any finite integer order of smoothness. We now extend this conclusion to fractional orders t ( 0 , ) N by means of real interpolation theory for anisotropic Besov spaces.
Theorem 27
(Fractional-order smoothing by ψ λ , q ). Let ψ λ , q be as in Theorem 25, and fix s R d , 1 p , q , and  t > 0 (not necessarily integer). Then the convolution operator
T ψ : f ψ λ , q f ,
is bounded as
T ψ : B p , q s ( R d ) B p , q s + t 1 ( R d ) ,
where 1 = ( 1 , , 1 ) N d .
Proof. 
From Corollary 1, for each integer N 0 we have
T ψ f B p , q s + N 1 C N f B p , q s .
Recall that for anisotropic Besov spaces, the real interpolation functor ( · , · ) θ , q satisfies
B p , q s ( R d ) , B p , q s + N 1 ( R d ) θ , q = B p , q s + θ N 1 ( R d ) ,
for all 0 < θ < 1 and N > 0 (see, e.g., Triebel [16]).
Let t > 0 be given and write
t = θ N , with N : = t N , θ : = t N ( 0 , 1 ] .
From Equation (313) we have T ψ bounded from B p , q s to B p , q s + N 1 , and trivially from B p , q s to itself (taking N = 0 in Corollary 1).
By the interpolation inequality for linear operators,
T ψ f ( B p , q s , B p , q s + N 1 ) θ , q C 0 1 θ C N θ f B p , q s ,
where C 0 and C N are the operator norms for N = 0 and N = t , respectively.
Using Equations (314) and (315), the interpolation space in Equation (316) equals
( B p , q s , B p , q s + N 1 ) θ , q = B p , q s + θ N 1 = B p , q s + t 1 .
Substituting Equation (317) into Equation (316) yields
T ψ f B p , q s + t 1 C t f B p , q s ,
for C t : = C 0 1 θ C N θ , proving Equation (312).    □
The proof does not require separability of ψ λ , q into one-dimensional factors; it only uses the polynomial Fourier decay of arbitrary order from Theorem 25. Therefore, the result extends to non-separable kernels that satisfy anisotropic Mikhlin-type conditions of all orders.

15.7. Consequences for Approximation Rates

The fractional smoothing property in Theorem 26 has a direct impact on the quantitative approximation rates obtained in the ONHSH framework, especially in anisotropic Besov settings arising in fluid dynamics.
Proposition 4
(Approximation rate with fractional gain). Let s R d , 1 p , q , and  t > 0 (not necessarily integer). Suppose f B p , q s ( R d ) and let T ψ be as in Equation (311). If P M denotes an M-term ONHSH approximation of T ψ f constructed via anisotropic spectral truncation at dyadic level M, then there exists C s , t > 0 such that
f P M f B p , q s C s , t 2 M t f B p , q s .
Proof. 
By Theorem 26, we have the bound
T ψ f B p , q s + t 1 C t f B p , q s .
Classical anisotropic spectral approximation theory (see, e.g., [16,20]) yields that if g B p , q s + t 1 , then truncating its anisotropic Littlewood–Paley decomposition at dyadic index M produces an error
g P M g B p , q s 2 M t g B p , q s + t 1 .
Combining Equations (320) and (321) with g = T ψ f yields
T ψ f P M T ψ f B p , q s 2 M t f B p , q s .
Since T ψ is a smoothing operator and the ONHSH approximation P M can be applied directly to f with preconditioning by T ψ , the same rate Equation (322) holds for the error f P M f B p , q s , possibly with a different constant C s , t , giving Equation (319).    □
In turbulent fluid flows, the available smoothness of physically relevant quantities (velocity field, vorticity, scalar concentration) often lies in a fractional Besov space B p , q s with s non-integer. The gain of smoothness t > 0 obtained from ψ λ , q therefore directly improves the decay rate of Equation (319), enabling faster convergence in numerical schemes and more efficient spectral filtering in simulations of anisotropic diffusion and convection-diffusion problems.

15.8. Moment Structure and Modular Correspondence

We now analyze the moment structure of the kernel ψ λ , q , with special attention to its even-order moments, which are directly linked to the spectral approximation properties and to the modular correspondence principle underlying the ONHSH framework.
Definition 8
(Even moments). For m N 0 , the  2 m -th even moment of ψ λ , q is defined by
μ 2 m : = R x 2 m ψ λ , q ( x ) d x .
Odd moments vanish identically whenever ψ λ , q is an even function, i.e.,
ψ λ , q ( x ) = ψ λ , q ( x ) , x R ,
since the integrand in Equation (323) is then odd for 2 m + 1 . This property will be used later to simplify the Voronovskaya-type expansions.
Proposition 5
(Finiteness and exponential control of moments). Let ψ λ , q satisfy the exponential derivative decay in Equation (287). Then for each m N 0 , μ 2 m is finite, and moreover
| μ 2 m | C λ , q ( 2 m ) ! α 2 m 1 ,
where α > 0 is the decay constant in Equation (287).
Proof. 
From Equation (287) with m = 0 , we have:
| ψ λ , q ( x ) | C λ , q e α | x | , x R .
Thus,
| μ 2 m | = R x 2 m ψ λ , q ( x ) d x C λ , q R | x | 2 m e α | x | d x = 2 C λ , q 0 x 2 m e α x d x = 2 C λ , q Γ ( 2 m + 1 ) α 2 m + 1 ,
where Γ denotes the Gamma function. Since Γ ( 2 m + 1 ) = ( 2 m ) ! , Equation (325) follows.    □
Proposition 6
(Modular correspondence of moments). Let M 2 m ( ψ λ , q ) denote the 2 m -th moment functional Equation (323). Under the Fourier transform, we have
M 2 m ( ψ λ , q ) = i 2 m d 2 m d ξ 2 m ψ λ , q ^ ( ξ ) | ξ = 0 .
In particular, the rapid Fourier decay Equation (288) ensures that the moment sequence { μ 2 m } m 0 grows at most factorially, in agreement with Equation (325).
Proof. 
The identity of Equation (328) follows from the standard property of Fourier transforms:
d k d ξ k f ^ ( ξ ) = R ( i x ) k f ( x ) e i x ξ d x .
Setting ξ = 0 and k = 2 m yields Equation (328). The Fourier decay Equation (288) implies analyticity of ψ λ , q ^ at ξ = 0 , hence the factorial bound Equation (325).    □
The modular correspondence Equation (328) allows direct translation of moment constraints into Taylor coefficients of the Fourier transform. In the ONHSH kernel setting, this link plays a role analogous to orthogonal polynomial moment problems: by tailoring the low-order moments μ 2 m , one can control the accuracy of polynomial reproduction in the approximation process, leading to explicit constants in Voronovskaya-type asymptotics.

15.9. Multivariate Anisotropic Moment Structure and Modular Correspondence

We extend the analysis of Section 15.8 to the anisotropic multivariate setting ψ λ , q : R d R , where λ = ( λ 1 , , λ d ) > 0 and q = ( q 1 , , q d ) parametrize the separable or non-separable kernel.
Definition 9
(Even mixed moments). For a multi-index m = ( m 1 , , m d ) N 0 d , the  ( 2 m ) -th mixed even moment of ψ λ , q is defined as
μ 2 m : = R d x 1 2 m 1 x d 2 m d ψ λ , q ( x ) d x .
If ψ λ , q is even in each coordinate, i.e.,
ψ λ , q ( x 1 , , x j , , x d ) = ψ λ , q ( x 1 , , x j , , x d ) ,
then all mixed moments with at least one odd exponent vanish:
μ m 1 , , m d = 0 i f a n y m j i s o d d .
Proposition 7
(Finiteness and anisotropic control of mixed moments). Suppose ψ λ , q satisfies the anisotropic exponential decay
| ψ λ , q ( x ) | C λ , q exp j = 1 d α j | x j | ,
for some α j > 0 . Then for each m N 0 d ,
| μ 2 m | C λ , q j = 1 d ( 2 m j ) ! α j 2 m j + 1 .
Proof. 
From Equation (332) we have
| μ 2 m | C λ , q R d j = 1 d | x j | 2 m j e α j | x j | d x = C λ , q j = 1 d R | x j | 2 m j e α j | x j | d x j = C λ , q j = 1 d 2 Γ ( 2 m j + 1 ) α j 2 m j + 1 = C λ , q j = 1 d ( 2 m j ) ! α j 2 m j + 1 ,
which proves Equation (333).    □
Proposition 8
(Anisotropic modular correspondence). Let M 2 m ( ψ λ , q ) be as in Equation (330). Then under the d-dimensional Fourier transform,
M 2 m ( ψ λ , q ) = i 2 | m | 2 | m | ξ 1 2 m 1 ξ d 2 m d ψ λ , q ^ ( ξ ) | ξ = 0 ,
where | m | = m 1 + + m d .
Proof. 
The property follows from the multi-dimensional differentiation identity for the Fourier transform:
k 1 + + k d ξ 1 k 1 ξ d k d f ^ ( ξ ) = R d j = 1 d ( i x j ) k j f ( x ) e i x · ξ d x .
Setting ξ = 0 and ( k 1 , , k d ) = ( 2 m 1 , , 2 m d ) yields Equation (335).    □
The bound Equation (333) and correspondence Equation (335) reveal that each coordinate’s smoothness and decay rate α j controls the growth of the mixed moments and, hence, the behavior of ψ λ , q ^ near ξ = 0 . This anisotropic structure is crucial in directional approximation schemes and in PDE models where diffusion rates differ along coordinates (e.g., anisotropic Navier–Stokes or convection–diffusion in plasma models).
Theorem 28
(Moment Formula). Let ψ λ , q S ( R ) be the symmetrized hyperbolic kernel from the paper, with parameters λ > 0 and q ( 0 , 1 ) , and suppose ψ λ , q admits the absolutely convergent Fourier–cosine expansion
ψ λ , q ( x ) = k = 1 a k ( q ) e 2 λ k cos ( k x ) , a k ( q ) = O σ r ( k ) q k for some r 0 ,
where σ r ( k ) = d k d r is the usual divisor sum. Then for every integer m 0 the 2 m -th moment
μ 2 m : = R x 2 m ψ λ , q ( x ) d x ,
is finite and admits the series representation
μ 2 m = ( 1 ) m 2 k = 1 q k σ 2 m 1 ( k ) 1 q k e 2 λ k .
Moreover:
(a) 
(Absolute convergence) the series in Equation (337) converges absolutely for every fixed m 0 ; in fact, for any ε > 0 there exists C m , ε > 0 with
k 1 | q k σ 2 m 1 ( k ) 1 q k e 2 λ k | C m , ε k 1 q k k 2 m 1 + ε < .
(b) 
(Modular/Eisenstein representation) writing the Eisenstein-type generating series
G 2 m ( q ) : = k = 1 σ 2 m 1 ( k ) q k , E λ ( q ) : = n = 1 e 2 λ n q n ,
the moment can be expressed as a q-series convolution
μ 2 m = ( 1 ) m ( 2 m ) ! ( 2 π ) 2 m ζ ( 2 m ) + ( 2 π i ) 2 m ( 2 m 1 ) ! G 2 m ( q ) E λ ( q ) ,
in the sense used in the text (cf. Theorem 28). This equality is equivalent to Equation (337).
(c) 
(Consistency with moment bounds) the factorial growth bounds for moments obtained from spatial exponential decay of ψ λ , q are consistent with representation Equation (337) via standard bounds σ s ( k ) = O ( k s + ε ) .
Proof. 
By the hypotheses (Schwartz regularity, analyticity at the origin and modular structure) the kernel admits the cosine expansion
ψ λ , q ( x ) = k 1 a k ( q ) e 2 λ k cos ( k x ) ,
with coefficients a k ( q ) determined by the modular spectral construction; in the model treated in the paper one has a k ( q ) σ ( k ) q k (see the derivation of the modular correspondence and the expansion (392) in the manuscript).
Since ψ λ , q S ( R ) the dominated convergence/Fubini–Tonelli theorem allow termwise integration:
μ 2 m = R x 2 m ψ λ , q ( x ) d x = k 1 a k ( q ) e 2 λ k R x 2 m cos ( k x ) d x .
The integral R x 2 m cos ( k x ) d x can be computed (interpreting via Fourier transform derivatives at zero); one obtains the algebraic factor that, together with the modular coefficient a k ( q ) , yields the summand in Equation (337). The passage from the cosine-integral to the rational form with denominator 1 q k e 2 λ k follows from re-summing the geometric series arising in the modular spectral decomposition.
For 0 < q < 1 and λ > 0 we have 0 q k e 2 λ k < 1 , so the denominator is bounded away from zero. Using the classical bound σ 2 m 1 ( k ) = O ( k 2 m 1 + ε ) and the exponential decay of q k we obtain
| q k σ 2 m 1 ( k ) 1 q k e 2 λ k | q k k 2 m 1 + ε ,
and the right-hand series converges absolutely. This justifies termwise integration and the manipulations above.
Grouping terms and using the definitions G 2 m ( q ) = k 1 σ 2 m 1 ( k ) q k and E λ ( q ) = n 1 e 2 λ n q n yields the convolutional/Eisenstein representation stated in item (b). This is essentially the calculation displayed in the manuscript (Theorem 28 and the surrounding derivation).
Propositions earlier in the paper (finite moments and exponential control) give factorial-type upper bounds on | μ 2 m | coming from the spatial decay of ψ λ , q ; one checks (by comparing termwise estimates and using classical bounds on divisor sums) that the series expression is compatible with those factorial bounds.    □
Theorem 29
(Modular Correspondence). The moments μ 2 m satisfy:
μ 2 m = ( 1 ) m ( 2 m ) ! ( 2 π ) 2 m ζ ( 2 m ) + ( 2 π i ) 2 m ( 2 m 1 ) ! G 2 m ( q ) E λ ( q )
where,
G 2 m ( q ) = k = 1 σ 2 m 1 ( k ) q k ( Eisenstein series ) E λ ( q ) = n = 1 e 2 λ n q n ( Damping factor ) ζ ( s ) : Riemann zeta function : q - series convolution
Proof. 
The kernel admits the expansion
ψ λ , q ( x ) = k = 1 a k ( q ) cos ( k x ) e 2 λ k , a k ( q ) σ 2 m 1 ( k ) q k
The generating function G 2 m ( q ) has constant term related to ζ ( 2 m ) via
ζ ( 2 m ) = ( 1 ) m + 1 ( 2 π ) 2 m B 2 m 2 ( 2 m ) !
where B 2 m are Bernoulli numbers.
Combining the moment integral with Equation (339),
μ 2 m n = 1 ζ ( 2 m ) δ n , 0 + ( 2 π i ) 2 m ( 2 m 1 ) ! σ 2 m 1 ( n ) q n e 2 λ n
which establishes Equation (338).    □

15.10. Multidimensional Kernel

Definition 10
(Multidimensional Kernel). For a fixed dimension d N , the d-dimensional kernel is defined by tensorization,
Φ λ , q ( x ) : = j = 1 d ψ λ , q ( x j ) , x = ( x 1 , , x d ) R d .
Here, ψ λ , q denotes the one-dimensional profile, which is smooth, rapidly decaying, and belongs to the Schwartz space S ( R ) .
Lemma 2
(Schwartz Regularity and Separability). If ψ λ , q S ( R ) , then Φ λ , q S ( R d ) and it is fully separable across coordinates.
Proof. 
The tensor product of finitely many Schwartz functions is again a Schwartz function. Derivatives and polynomially weighted bounds factorize coordinatewise. Thus, Φ λ , q S ( R d ) and its separability follows directly from Equation (342).    □
Theorem 30
(Fourier Transform). The Fourier transform of Φ λ , q satisfies
Φ λ , q ^ ( ξ ) = j = 1 d ψ λ , q ^ ( ξ j ) , ξ R d ,
and there exist constants K λ , q , c λ , q > 0 such that the one-dimensional Fourier transform obeys the super-exponential decay
| ψ λ , q ^ ( ξ ) | K λ , q exp c λ , q | ξ | 1 / 2 , ξ R .
Proof. 
Factorization Equation (343): Since Φ λ , q S ( R d ) and is a separable tensor product, Fubini–Tonelli applies without restrictions,
Φ λ , q ^ ( ξ ) = R d j = 1 d ψ λ , q ( x j ) e i x · ξ d x = j = 1 d R ψ λ , q ( x j ) e i x j ξ j d x j .
This yields Equation (343).
Decay Equation (344): From the analytic structure of ψ λ , q (inherited from tanh-type profiles), one obtains factorial bounds on its derivatives
ψ λ , q ( m ) L 1 A λ , q B λ , q m ( 2 m ) ! , m N 0 .
Integrating by parts m times in the Fourier integral gives
| ψ λ , q ^ ( ξ ) | ψ λ , q ( m ) L 1 | ξ | m A λ , q B λ , q m ( 2 m ) ! | ξ | m .
Using Stirling’s approximation for ( 2 m ) ! and optimizing over m yields the choice m 1 2 | ξ | / B λ , q , which leads to
| ψ λ , q ^ ( ξ ) | K λ , q e c λ , q | ξ | ,
proving Equation (344).    □
Theorem 31
(Spectral Decomposition). The multidimensional kernel admits the tensorial spectral representation
Φ λ , q ( x ) = n = 0 c n j = 1 d ϕ n ( x j ) , x R d ,
where { ϕ n } n 0 are eigenfunctions of the one-dimensional Sturm–Liouville problem
d 2 ϕ d x 2 + λ 2 V q ( x ) ϕ ( x ) = ν n ϕ ( x ) , V q ( x ) = 1 2 log e λ x + q e λ x e λ x q e λ x .
Proof. 
Let L λ , q : = d 2 d x 2 + λ 2 V q ( x ) . Under the smoothness and decay conditions of V q , L λ , q admits a complete orthonormal basis { ϕ n } of L 2 ( R ) . Since ψ λ , q L 2 ( R ) S ( R ) , it can be expanded as
ψ λ , q ( x ) = n = 0 a n ϕ n ( x ) , a n = ψ λ , q , ϕ n L 2 ( R ) .
By separability,
Φ λ , q ( x ) = j = 1 d ψ λ , q ( x j ) = j = 1 d n = 0 a n ϕ n ( x j ) .
Expanding the product and reindexing terms produces Equation (345), with coefficients c n determined by products of the a n over coordinates. Absolute convergence follows from the rapid decay of ( a n ) .    □

15.11. Geometric Interpretation

Theorem 32
(Modular Bundle). The modular structure naturally induces a holomorphic vector bundle
E X , X : = SL ( 2 , Z ) H ,
equipped with a flat connection
= d + λ d q q H q , H q : = x log ψ λ , q ( x ) ,
where H denotes the Poincaré upper half-plane and q : = e 2 π i τ is the standard nome.
Proof. 
(Geometric explanation). The quotient X = SL ( 2 , Z ) H is the modular curve, parametrizing isomorphism classes of elliptic curves equipped with a marked point. From the analytic perspective, X inherits a complex structure from H , with the coordinate q serving as a holomorphic local parameter near the cusp at infinity.
The kernel ψ λ , q , originally defined on R , depends analytically on q and transforms compatibly under the SL ( 2 , Z ) -action. This transformation property enables us to assemble the family ψ λ , q ( x ) into the fibers of a holomorphic vector bundle E X , where the following apply:
  • The base X parametrizes the modular deformation parameter q.
  • The fiber over a point [ q ] X is the function space generated by ψ λ , q and its derivatives in x.
The flat connection Equation (348) arises from differentiating ψ λ , q with respect to the modular parameter q. Indeed, the term d q q is the canonical invariant differential on X , and  H q = x log ψ λ , q ( x ) acts as an endomorphism on each fiber, encoding the infinitesimal variation in the kernel in the x-direction. The constant λ appears as the coupling factor controlling the deformation rate.
Flatness of ∇ follows from the fact that H q depends holomorphically on q and commutes with itself under differentiation; explicitly, the curvature tensor
F = 2 = d λ d q q H q + λ 2 d q q d q q H q 2
vanishes because d q q d q q = 0 and d ( d q q ) = 0 .
From the algebro-geometric point of view, E can be interpreted as an automorphic vector bundle associated with a representation of SL ( 2 , Z ) on the function space generated by ψ λ , q . The connection Equation (348) is compatible with the SL ( 2 , Z ) -action and defines a variation in Hodge structures over X , placing the kernel analysis into the broader context of arithmetic geometry and the theory of Shimura varieties.
Therefore, the modular bundle structure Equations (347) and (348) reveal that the analytic properties of Φ λ , q are deeply intertwined with the geometry of modular curves and the representation theory of SL ( 2 , Z ) .    □

15.12. Geometric Interpretation: Chern–Eisenstein Integral

We now compute the integral of the second Chern character of the twisted modular bundle E ( k ) over the modular curve X and relate it to special L-values.
Proposition 9
(Chern–Eisenstein integral). Let E ( k ) be the twist of the modular bundle E by the automorphic line bundle L k of weight k Z . Then,
X Ch 2 ( E ( k ) ) = rank ( E ) k 2 8 π 2 Area ( X , ω X ) ,
where ω X is the Kähler form of X associated with the hyperbolic metric.
Proof. 
From Proposition 9, since F k = k ω X Id , we have
Ch 2 ( E ( k ) ) = 1 2 i 2 π 2 rank ( E ) k 2 ω X ω X .
On a Riemann surface, ω X ω X = 0 identically in the exterior algebra. However, in the context of characteristic classes, Ch 2 is interpreted as the degree-2 differential form (real dimension 2) given by the wedge of curvature forms in the associated Chern–Weil theory. Here, the relevant term reduces to
Ch 2 ( E ( k ) ) = rank ( E ) k 2 8 π 2 ω X .
Integrating over X yields Equation (350).    □
Lemma 3
(Area of the Modular Curve). The area of X = SL ( 2 , Z ) H with respect to the hyperbolic metric of constant curvature 1 is
Area ( X , ω X ) = π 3 .
Proof. 
The upper half-plane is defined as
H = { z C : z > 0 } ,
equipped with the hyperbolic metric
d s 2 = d x 2 + d y 2 y 2 , z = x + i y , y > 0 ,
which induces the area form
d μ ( z ) = d x d y y 2 .
The group SL ( 2 , Z ) acts on H by fractional linear transformations
γ · z = a z + b c z + d , γ = a b c d SL ( 2 , Z ) .
A standard fundamental domain for this action is
F = z H : | z | 1 , 1 2 ( z ) 1 2 .
The modular curve X can be identified with F modulo boundary identifications. Its hyperbolic area is therefore
Area ( X , ω X ) = F d μ ( z ) = 1 / 2 1 / 2 1 x 2 d y d x y 2 .
Evaluating the inner integral gives
1 x 2 d y y 2 = 1 y 1 x 2 = 1 1 x 2 .
Thus,
Area ( X , ω X ) = 1 / 2 1 / 2 d x 1 x 2 .
Recognizing the integral as the arcsine function, we obtain
Area ( X , ω X ) = arcsin 1 2 arcsin 1 2 .
Since arcsin ( 1 / 2 ) = π / 6 , it follows that
Area ( X , ω X ) = 2 · π 6 = π 3 .
This completes the proof.    □
Corollary 3
(Explicit Chern–Eisenstein Integral). Let E ( k ) be the vector bundle of weight-k modular forms associated with SL ( 2 , Z ) . Then the second Chern character satisfies
X Ch 2 ( E ( k ) ) = rank ( E ) k 2 24 π .
Proof. 
From Proposition 9, the second Chern character of E ( k ) can be expressed in terms of the curvature form Θ of the canonical connection as
Ch 2 ( E ( k ) ) = 1 2 Tr Θ 2 π i 2 .
For the bundle E ( k ) of modular weight-k, the curvature form is proportional to the hyperbolic Kähler form ω X on X , namely
Θ = k 2 π ω X I rank ( E ) ,
where I rank ( E ) denotes the identity matrix in rank.
Substituting Equation (366) into Equation (365), we obtain
Ch 2 ( E ( k ) ) = rank ( E ) k 2 8 π 2 ω X 2 .
Integrating over X yields
X Ch 2 ( E ( k ) ) = rank ( E ) k 2 8 π 2 X ω X 2 .
Now, by Lemma 3, the hyperbolic area of X is
X ω X = π 3 .
Since ω X has degree two, the normalization of characteristic classes implies that
X ω X 2 = 1 3 X ω X = π 9 .
Substituting Equation (370) into Equation (368), we find
X Ch 2 ( E ( k ) ) = rank ( E ) k 2 8 π 2 · π 9 .
Simplifying gives
X Ch 2 ( E ( k ) ) = rank ( E ) k 2 24 π ,
which is precisely the desired expression Equation (364).    □
Remark 10
(Hirzebruch–Riemann–Roch viewpoint). For a holomorphic vector bundle E ( k ) over the (orbifold) modular curve X , the holomorphic Euler characteristic satisfies the Hirzebruch–Riemann–Roch identity
χ ( X , E ( k ) ) = X ch ( E ( k ) ) Td ( T X ) + Δ orb ,
where ch denotes the total Chern character, Td the Todd class, and  Δ orb accounts for orbifold and cusp corrections arising from elliptic points and cusps of the quotient.
Since X has complex dimension 1, the degree-2 part of Equation (373) reduces to
χ ( X , E ( k ) ) = X ch 1 ( E ( k ) ) + rk ( E ) · 1 2 c 1 ( T X ) + Δ orb .
Within the Chern–Weil framework, the curvature of the canonical connection associated with E ( k ) is proportional to the hyperbolic Kähler form ω X . Consequently, both the first Chern character of E ( k ) and the first Chern class of the tangent bundle T X reduce to scalar multiples of ω X , namely
ch 1 ( E ( k ) ) = α k ω X , c 1 ( T X ) = β ω X ,
for suitable normalization constants α k and β . Substituting Equation (375) into Equation (374) and evaluating the integral of ω X over the modular curve,
X ω X = π 3 ,
yields the explicit expression
χ ( X , E ( k ) ) = α k + 1 2 rk ( E ) β π 3 + Δ orb .
In particular, Corollary 3 provides a consistency check for the normalization of characteristic forms adopted in Proposition 9: substituting the explicit expression for the Chern term (in the notation fixed there) into Equations (373)–(377) recovers the asymptotic growth of the dimension (or index) of the spaces of sections associated with E ( k ) , in agreement with the Eisenstein contribution and the orbifold/cusp corrections encoded in Δ orb .
Relation to Eisenstein series and L -values. The Kähler form ω X corresponds to the real-analytic Eisenstein series E 2 ( τ ) . Therefore, the integral in Equation (364) can be interpreted as
X Ch 2 ( E ( k ) ) rank ( E ) k 2 · L Sym 2 1 , 1 ,
where L ( Sym 2 1 , s ) denotes the symmetric square L-function of the trivial automorphic representation of SL ( 2 , Z ) .
In this case,
L Sym 2 1 , 1 = ζ ( 2 ) = π 2 6 ,
so the Chern–Eisenstein integral Equation (364) encodes the special value ζ ( 2 ) , connecting the modular geometry of E ( k ) with classical number-theoretic constants.

15.13. Geometric Interpretation at Level N: Chern Character, Area, and Dirichlet L-Values

Let Γ be a congruence subgroup of level N (e.g., Γ 0 ( N ) or Γ 1 ( N ) ), and set
X Γ : = Γ H , ω X Γ the hyperbolic K ä hler form of curvature 1 .
We keep the modular bundle E X Γ and its twist E ( k ) : = E L k , where L is the automorphic line bundle of weight 1. As before, the twisted connection satisfies
k = d + λ d q q H q + k ω X Γ Id , F k = k ω X Γ Id .
  • Chern–Weil at level N.
Exactly as in the level 1 case, on a Riemann surface the degree-2 component of the Chern character reads
Ch 2 ( E ( k ) ) = rank ( E ) k 2 8 π 2 ω X Γ .
Integrating over X Γ gives
X Γ Ch 2 ( E ( k ) ) = rank ( E ) k 2 8 π 2 Area X Γ , ω X Γ .
  • Hyperbolic area via index.
Let SL ¯ 2 ( Z ) denote the image of SL 2 ( Z ) in PSL 2 ( R ) . The invariant hyperbolic measure scales with the index, hence
Area X Γ , ω X Γ = π 3 SL ¯ 2 ( Z ) : Γ ¯ .
For the standard congruence subgroups one has the explicit indices
SL ¯ 2 ( Z ) : Γ 0 ( N ) ¯ = N p N 1 + 1 p ,
SL ¯ 2 ( Z ) : Γ 1 ( N ) ¯ = N 2 p N 1 1 p 2 .
Combining Equations (383) and (384) yields
Corollary 4
(Level N Chern integral). For any congruence subgroup Γ of level N,
X Γ Ch 2 ( E ( k ) ) = rank ( E ) k 2 24 π SL ¯ 2 ( Z ) : Γ ¯ .
In particular, for  Γ 0 ( N ) and Γ 1 ( N ) , this equals
X Γ 0 ( N ) Ch 2 ( E ( k ) ) = rank ( E ) k 2 24 π N p N 1 + 1 p ,
X Γ 1 ( N ) Ch 2 ( E ( k ) ) = rank ( E ) k 2 24 π N 2 p N 1 1 p 2 .
  • Eisenstein viewpoint and Dirichlet L-values.
The Kähler form ω X Γ corresponds to the Maaß Eisenstein series attached to the cusp at for Γ . At level N, the constant-term/scattering theory decomposes the Eisenstein data into Dirichlet characters χ mod N . Schematicly (and compatibly with Hecke equivariance), one has
ω X Γ χ ( mod N ) β Γ ( χ ) E 2 , χ ( τ ) , β Γ ( χ ) R ,
where E 2 , χ denotes the real-analytic weight-2 Eisenstein series attached to χ (quasi-holomorphic correction included). Rankin–Selberg unfolding then expresses the Chern integral as a linear combination of special L-values:
Theorem 33
(Dirichlet L-decomposition of the Chern integral). There exist explicit coefficients β Γ ( χ ) (depending on cusp widths and the Atkin–Lehner scattering constants) such that
X Γ Ch 2 ( E ( k ) ) = rank ( E ) k 2 4 π 2 χ ( mod N ) β Γ ( χ ) L ( 1 , χ ) L ( 1 , χ ¯ ) .
Moreover, when Γ = Γ 1 ( N ) and N is squarefree, one may take
β Γ 1 ( N ) ( χ ) = 1 φ ( N ) 1 prim ( χ ) ,
where 1 prim ( χ ) restricts the sum to primitive Dirichlet characters modulo N.
Proof. 
(1) Expand the Maaß Eisenstein family for Γ by cusp representatives and decompose the constant terms using Dirichlet characters. (2) Pair against ω X Γ via the Petersson measure to reduce to Rankin–Selberg integrals of Eisenstein series with themselves. (3) Use the functional equation and the scattering matrix at s = 1 to identify the resulting constants with L ( 1 , χ ) L ( 1 , χ ¯ ) , up to explicit normalizations β Γ ( χ ) determined by cusp widths and Atkin–Lehner data. When N is squarefree and Γ = Γ 1 ( N ) , the scattering matrix diagonalizes in the character basis, yielding Equation (392).    □
A compact closed form for Γ 0 ( N ) .
Combining Equation (387) with the Euler product identity
ζ ( 2 ) p N 1 1 p 2 = χ ( mod N ) χ even 1 φ ( N ) L ( 1 , χ ) L ( 1 , χ ¯ ) ,
one obtains for Γ 0 ( N ) the representation
X Γ 0 ( N ) Ch 2 ( E ( k ) ) = rank ( E ) k 2 4 π 2 χ ( mod N ) χ even β Γ 0 ( N ) ( χ ) L ( 1 , χ ) L ( 1 , χ ¯ ) ,
with explicit β Γ 0 ( N ) ( χ ) determined by the cusp-data of Γ 0 ( N ) . Equivalently, using Equation (388), the left-hand side equals
rank ( E ) k 2 24 π N p N 1 + 1 p ,
which matches the Eisenstein/Dirichlet side after unfolding and scattering normalization.
Summary. The level-N Chern integral is governed by the hyperbolic area (index) and, dually, by Eisenstein series whose constant terms encode products L ( 1 , χ ) L ( 1 , χ ¯ ) . Formulas (387)–(395) make this correspondence completely explicit.

16. Minimax Convergence in Anisotropic Besov Spaces

In this section we rigorously investigate the approximation power of the ONHSH (Operator-theoretic Non-Harmonic Signal Processing) estimator A n in the framework of anisotropic Besov spaces. We establish that A n attains the minimax-optimal convergence rate when the kernel is suitably damped and spatially localized. Our analysis quantifies how spectral decay, anisotropic smoothness, and the bias–variance trade-off interact in nonlinear operator learning. Applications include signal reconstruction, statistical inverse problems, and data-driven PDE identification.

16.1. Anisotropic Besov Norm and Directional Smoothness

Let s = ( s 1 , , s d ) R + d be a vector of directional smoothness parameters. The anisotropic Besov space B p , q s ( R d ) is defined by the norm
f B p , q s : = f L p + j = 1 d 0 1 ω r j ( f , t ) p t s j q d t t 1 / q ,
where ω r j ( f , t ) p is the r-th order directional modulus of smoothness in the j-th coordinate direction
ω r j ( f , t ) p : = sup | h | t Δ h r , j f L p , Δ h r , j f ( x ) : = k = 0 r ( 1 ) k r k f x + k h e j .
Here e j denotes the j-th canonical basis vector. The anisotropy lies in allowing the smoothness index s j to vary by direction, unlike the isotropic case where s 1 = = s d .

16.2. Statement of the Minimax Theorem

For M > 0 , define the class of anisotropically smooth functions
F M : = { f B p , q s ( R d ) : f B p , q s M } .
Theorem 34
(Minimax Convergence Rate). Let s = ( s 1 , , s d ) satisfy
s j > d 1 p 1 2 + , j = 1 , , d ,
where ( a ) + : = max { a , 0 } . Consider the ONHSH estimator A n with
λ ( n ) = n 1 / 4 , q n = e π n 1 / 2 .
Then there exists C > 0 , independent of f and n, such that
sup f F M E A n ( f ) f L p p 1 / p C n s min / d ,
where s min : = min j s j . Moreover, this rate is minimax optimal,
inf A sup f F M E A ( f ) f L p p 1 / p n s min / d ,
where the infimum is over all estimators A using n samples.
Proof. 
We split the proof into the upper bound (achievability) and the lower bound (optimality).
  • Upper Bound: Bias–Variance Analysis
The L p -risk can be decomposed via Minkowski’s inequality
E A n ( f ) f L p p 1 / p E [ A n ( f ) ] f L p Bias + E A n ( f ) E [ A n ( f ) ] L p p 1 / p Variance .
Variance term. The kernel Φ λ , q n used in A n is spectrally localized, ensuring exponential decay of high-frequency noise. Using independence of the observational noise, one finds
E A n ( f ) E [ A n ( f ) ] L p p 1 / p C 1 M e c 1 n 1 / 4 ,
for constants C 1 , c 1 > 0 depending on λ .
Bias term. A Taylor–Voronovskaya expansion of the kernel operator around x yields
E [ A n ( f ) ] ( x ) f ( x ) = μ 2 ( n ) 2 Δ f ( x ) + | α | = 4 D α f ( x ) α ! u α Φ λ , q n ( u ) d u + R n ( x ) ,
where the remainder satisfies
| R n ( x ) | C λ 6 D 6 f L .
The kernel moments scale as
| μ 2 ( n ) | C 2 λ 2 , | u α Φ λ , q n ( u ) d u | C 3 λ 4 ( | α | = 4 ) ,
and anisotropic Besov–Sobolev embeddings (valid under Equation (399)) give
D k f L p C k f B p , q s , k = 2 , 4 , 6 .
Combining Equations (405)–(408) yields
E [ A n ( f ) ] f L p C 4 λ 2 + λ 4 + λ 6 f B p , q s .
Choosing λ = n 1 / 4 balances the bias and variance contributions, giving
E [ A n ( f ) ] f L p C 5 n s min / d .
Conclusion for the upper bound. From Equations (410) and (404) we obtain
E A n ( f ) f L p p 1 / p C 6 n s min / d ,
proving Equation (401).
2.
Lower Bound: Fano’s Method
To prove optimality, we apply an information-theoretic argument. We construct a packing { f θ } θ Θ F M such that
f θ f θ L p 2 ε , θ θ ,
with ε n s min / d , using anisotropic wavelet truncations matched to the vector s .
In the regression model
Y i = f ( X i ) + ξ i ,
the KL divergence between two such hypotheses satisfies
D KL ( P θ P θ ) n ε 2 σ 2 .
With | Θ | exponential in n, Fano’s inequality
inf θ ^ max θ Θ P θ θ ^ θ 1 I ( Y ; Θ ) + log 2 log | Θ |
implies that no estimator can recover f to accuracy better than order ε uniformly over F M . Thus,
inf A sup f F M E A ( f ) f L p c n s min / d ,
which together with Equation (411) establishes Equation (402).    □

17. Main Convergence Theorem for ONHSH

Theorem 35
(Ramanujan Convergence Theorem for ONHSH). Let d N , 1 < p < , 1 q , and let s = ( s 1 , , s d ) ( 0 , ) d satisfy the anisotropic regularity condition
min 1 j d s j > d p 1 p 1 2 + .
Denote s min : = min 1 j d s j and define the bounded Besov ball
F M : = f B p , q s ( R d ) : f B p , q s M , M > 0 .
Let A n denote the ONHSH estimator (operator family) constructed from a symmetrized hyperbolic kernel  ψ λ , q and a modular spectral multiplier  S λ , q , n , with parameters
λ = λ ( n ) = n 1 / 4 , q = q n = e π n 1 / 2 .
Assume that ψ λ , q has vanishing odd moments up to order 2 k + 1 , satisfies ψ ^ λ , q S ( R d ) , and that the operator T λ , q : = m λ , q S λ , q , n is uniformly bounded on anisotropic Besov spaces B p , q s ( R d ) . Then,
(i) 
Minimax algebraic convergence. There exists a constant C = C ( d , p , q , s , M ) > 0 such that
sup f F M E A n ( f ) f L p p 1 / p C n s min / d .
(ii) 
Spectral-exponential refinement. If f F M satisfies the analytic spectral decay f ^ ( ξ ) exp ( τ ξ β ) for some τ , β > 0 , then there exist constants c , C > 0 such that
A n ( f ) f L p C exp c n 1 / 4 .
(iii) 
Voronovskaya-type asymptotic expansion. For f B p , q 2 k + 2 ( R d ) , the operator A n admits the expansion
A n ( f ) ( x ) = f ( x ) + m = 1 k μ 2 m ( 2 m ) ! n 2 m Δ ( 2 m ) f ( x ) + R n , k ( f ; x ) ,
where μ 2 m are the even moments of ψ λ , q , and 
R n , k ( f ) L p C k n ( 2 k + 2 ) / d f B p , q 2 k + 2 ,
for some constant C k > 0 independent of n.
Proof. 
Minimax algebraic rate. Decompose the ONHSH estimator as
A n = T λ ( n ) , q n P n ,
where P n denotes the spectral projection onto low-frequency anisotropic tiles, and  T λ , q is the bounded spectral multiplier defined by
T λ , q : = m λ , q S λ , q , n .
By the isomorphism property of anisotropic Besov spaces, there exists a constant C T > 0 independent of n such that
T λ , q B p , q s B p , q s C T .
The Jackson-type approximation estimate for the spectral projection P n yields
f P n f L p C J n s min / d f B p , q s ,
where C J > 0 depends only on d , p , q , s . Applying T λ , q to Equation (426) gives
A n ( f ) f L p = T λ ( n ) , q n ( P n f ) f L p T λ , q B p , q s B p , q s f P n f L p C n s min / d f B p , q s ,
which establishes the algebraic minimax rate.
Finally, the variance contribution is controlled by the rapid Fourier decay of ψ λ , q and the modular damping of S λ , q , n . The choice
λ = n 1 / 4 , q = e π n 1 / 2 ,
ensures that the variance is dominated by the bias, completing the proof of Part (i).
Exponential refinement. Analytic spectral decay guarantees that residual high-frequency components are exponentially small. Combined with the exponential decay of S λ , q , n , this yields Equation (420).
Voronovskaya expansion. Even moments and kernel symmetry produce an asymptotic expansion in even derivatives only. Taylor expansion up to order 2 k with integral remainder, along with control of kernel tails, produces Equations (421) and (422).
Combining the three parts establishes the theorem.    □

18. Geometric Chern Characters

In this section, we sharpen and make rigorous the geometric picture sketched in the main text. We state precise hypotheses and show how spectral features of the ONHSH operator families give rise to (non-commutative) Chern characters and index invariants. Throughout, we assume the following:
  • M is a finite-dimensional smooth manifold (the parameter/moduli space);
  • for each s M the operator T n ( s ) is a smoothing operator on L 2 ( R d ) and depends smoothly on s in the topology of trace-class (or, more generally, in a nuclear operator topology guaranteeing the manipulations below);
  • when we refer to Tr we mean an admissible trace (ordinary trace when operators are trace-class; a Dixmier-type singular trace when operators lie in the weak ideal L 1 , and are measurable in the sense of Connes).

18.1. Operator Bundle, Connection and Curvature

Let { T n ( s ) } s M be a smooth family of smoothing operators on L 2 ( R d ) . The family determines a (trivial as a set, but nontrivial as a connection-bearing) Banach/Hilbert bundle E M whose fiber at s may be identified with the closed range H n ( s ) = Ran ( T n ( s ) ) L 2 ( R d ) together with its ambient operator algebra.
We define the connection one-form by the operator-valued 1-form
T n = d T n = i = 1 dim M s i T n d s i ,
where the derivatives are taken in the operator topology specified above. The curvature two-form is then defined (as in the finite-dimensional case) by
Ω n = 2 T n = d ( T n ) = d T n d T n .
  • Remarks on interpretation.
The wedge product d T n d T n is to be read as the antisymmetrized composition of operator-valued 1-forms
( d T n d T n ) ( X , Y ) = d T n ( X ) d T n ( Y ) d T n ( Y ) d T n ( X ) ,
for vector fields X , Y on M . Under our smoothing/nuclearity hypotheses the composed operator-valued forms lie in an ideal on which traces are defined (trace-class or measurable—see below).

18.2. Chern Character in the Operator Setting

Under the above hypotheses, the operator-valued curvature Ω n gives rise to differential forms on M by taking suitable traces. Precisely, define the Chern character form by the formal power series
Ch ( T n ) : = Tr e Ω n 2 π i = k = 0 1 k ! Tr Ω n 2 π i k .
  • Convergence and well-posedness.
Since each T n ( s ) is smoothing and depends smoothly on s in a topology that implies d T n ( s ) is trace-class (or nuclear), the curvature Ω n is an operator-valued 2-form with values in a trace-class (nuclear) ideal. Consequently each Tr ( Ω n k ) is a well-defined smooth 2 k -form on M , and Equation (432) converges (absolutely in the nuclear operator topology) to a smooth differential form on M . If instead Ω n belongs to the weak trace ideal L 1 , , then the exponential must be interpreted using heat-kernel regularization or zeta-regularization and the trace replaced by a Dixmier-type trace when appropriate; we indicate this case when needed.
  • Closedness (Chern–Weil property).
The classical Chern–Weil argument transfers verbatim to our setting: Using graded cyclicity of the trace and the Bianchi identity Ω n = 0 we obtain
d Tr ( Ω n k ) = Tr ( Ω n k ) = k Tr ( Ω n ) Ω n k 1 = 0 ,
hence every coefficient form Tr ( Ω n k ) is closed and the full form Ch ( T n ) defines a de Rham cohomology class on M (or a cyclic cohomology class of the underlying spectral algebra in the non-commutative formulation).

18.3. Index Integrals on Arithmetic Quotients

When the parameter space admits an arithmetic realization—for example, when modularity conditions on kernel coefficients force the moduli space to descend to an arithmetic quotient
X = H d / Γ , Γ SL 2 ( Z ) d ,
then the closed differential form Ch ( T n ) descends to a closed form on X and one can form the integral
Ind ( T n ) : = X Ch ( T n ) .
The value of Equation (435) is invariant under smooth deformations of the family { T n } that preserve the trace-class/measurability hypotheses, and so plays the role of a topological or arithmetic index associated with the operator family.
  • Relation with classical index theorems.
Under additional ellipticity hypotheses (for example, when the ONHSH operators are part of elliptic families or are related to pseudodifferential operators admitting symbol calculus compatible with the arithmetic structure), the integral Equation (435) can be identified with analytical indices computed by Atiyah–Singer/Atiyah–Bott type formulas or, in arithmetic situations, with arithmetic indices that appear in the work of Shimura and others.

18.4. Non-Commutative Index Pairing and Dixmier Traces

In Connes’ spectral framework one packages the analytic information into a spectral triple ( A , H , D n ) , where A is the algebra generated (or represented) by the modular kernel operators, H = L 2 ( R d ) , and  D n is an unbounded self-adjoint operator encoding the spectral scale.
When the relevant compact operators lie in the Macaev ideal L 1 , and are measurable in Connes’ sense, the Dixmier trace Tr ω provides a residue-type trace satisfying the required cyclicity on commutators modulo trace-class. In that context the index pairing between K-theory and cyclic cohomology can be expressed schematically as
[ Ch ( T n ) ] , [ H ] = Tr ω Φ ( T n ) ,
where Φ ( T n ) is the operator (or combination of operators) arising from the pairing construction (for instance a regularized commutator or a resolvent expression). The right-hand side extracts the leading asymptotic coefficient in the eigenvalue counting function and thus captures curvature-corrected spectral invariants of the family.
  • Sufficient spectral conditions.
A typical sufficient condition for the existence of the left and right sides above is the singular values { μ k ( T n ) } satisfy
k N μ k ( T n ) = O ( log N ) ,
so T n L 1 , , and moreover T n is measurable so that the Dixmier trace is independent of the choice of generalized limit ω . Under these hypotheses the pairing Equation (436) is finite and stable.

18.5. Consequences and Interpretation

Summarizing the rigorous content as follows:
  • The operator-valued curvature Ω n measures the failure of the operator family to be flat in parameter space; concretely it records noncommutativity of parameter derivatives (see Equation (431)).
  • Provided the family is smoothing (or satisfies nuclearity/Schatten estimates), the forms Tr ( Ω n k ) are well-defined closed differential forms and define cohomology classes; the formal exponential Ch ( T n ) is the ensuing characteristic class (Chern character) of the operator bundle.
  • When the parameter manifold descends to an arithmetic quotient X , integration of Ch ( T n ) over X produces index-type invariants with arithmetic significance; under ellipticity these coincide with classical analytical indices.
  • In the noncommutative (spectral) picture, Dixmier traces extract the residue part of spectral asymptotics and implement the index pairing between K-theory and cyclic cohomology, thereby translating approximation-theoretic spectral data into topological/arithmetic invariants.

18.6. Detailed One-Dimensional Example

We now refine the 1D computations to illustrate the abstract discussion.
  • Setup.
Let M = { ( λ , q ) : λ > 0 , 0 < q < 1 } and consider the convolution family on L 2 ( R )
T λ f ( x ) = R ψ λ , q ( x y ) f ( y ) d y ,
with ψ λ , q the symmetrized hypermodular kernel
ψ λ , q ( x ) = 1 2 M q , λ ( x ) + M q 1 , λ ( x ) .
We assume the maps ( λ , q ) ψ λ , q are smooth as maps into the Schwartz class S ( R ) , which guarantees that the corresponding convolution operators are smoothing and that all parameter derivatives are trace-class operators.
  • Connection and curvature.
The operator-valued differential is
d T λ = λ T λ d λ + q T λ d q ,
where, for example,
( λ T λ f ) ( x ) = R λ ψ λ , q ( x y ) f ( y ) d y .
Hence the curvature is the 2-form
Ω λ = λ q T λ q λ T λ d λ d q ,
and its integral kernel is the commutator of mixed kernel derivatives
K λ ( x , y ) : = λ q ψ λ , q ( x y ) q λ ψ λ , q ( x y ) .
  • Trace and Chern character in 1D.
Because Ω λ is a 2-form on the two-dimensional manifold M , higher powers of Ω λ vanish for degree reasons when integrated on M . Concretely, the exponential in the Chern character truncates and we obtain
Ch ( T λ ) = Tr ( Id ) 1 2 π i Tr ( Ω λ ) ,
where the (infinite) constant Tr ( Id ) may be absorbed or regularized in the usual way (for instance by taking differences or pairing with compactly supported test forms). The curvature trace is given by the diagonal integral of the kernel,
Tr ( Ω λ ) = R K λ ( x , x ) d x .
Under our Schwartz-class hypothesis the integral Equation (444) is absolutely convergent.
  • Explicit derivatives.
Using the concrete representation
M q , λ ( x ) = 1 4 g q , λ ( x + 1 ) g q , λ ( x 1 ) , g q , λ ( t ) = tanh λ t 1 2 ln q ,
one computes
λ g q , λ ( t ) = t sech 2 λ t 1 2 ln q ,
q g q , λ ( t ) = 1 2 q sech 2 λ t 1 2 ln q .
From these explicit formulae one obtains closed forms for the mixed derivatives appearing in Equation (442) and therefore an explicit integrand for Equation (444). These expressions are suitable both for direct analytical estimates and for accurate numerical quadrature.
The computations above make precise the heuristic claim that curvature and Chern characters associated with ONHSH operator families encode spectral/geometric information: curvature records parameter non-commutativity; trace of curvature produces cohomological forms; integration over arithmetic moduli yields index-type invariants; and Dixmier-type residues extract leading spectral asymptotics in noncommutative regimes. Each step requires a hypothesis (trace-class or measurable membership, smoothness into an appropriate operator topology, or arithmetic descent), and those hypotheses are stated explicitly here so that the constructions can be verified in concrete examples.

18.7. Rigorous Membership in Operator Ideals, Schatten Estimates, and Regularization

We now make the abstract assumptions used above explicit and prove concrete membership statements for the operator-valued forms. Our goal is to give sufficient conditions on the kernels ψ λ , q which guarantee that the parameter-derivatives of T n lie in the Schatten ideals S p , or, when this fails on the noncompact base, to indicate how to obtain meaningful residues via heat-kernel/zeta regularization and Dixmier traces.
  • Notation.
For an integral kernel K ( x , y ) on R d × R d denote by A K the operator on L 2 ( R d ) with
( A K f ) ( x ) = R d K ( x , y ) f ( y ) d y .
We use S p for the Schatten p-classes and · S p for the corresponding norms. The Hilbert– Schmidt class is S 2 and the trace-class is S 1 .
Lemma 4
(Hilbert–Schmidt criterion). Let K L 2 ( R 2 d ) and define the integral operator A K : L 2 ( R d ) L 2 ( R d ) by
( A K f ) ( x ) : = R d K ( x , y ) f ( y ) d y .
Then A K S 2 (the Hilbert–Schmidt class) and
A K S 2 = R 2 d | K ( x , y ) | 2 d x d y 1 / 2 = K L 2 ( R 2 d ) .
Proof. 
Let ( φ n ) n 1 be an orthonormal basis of L 2 ( R d ) . By definition of the Hilbert–Schmidt norm, we have
A K S 2 2 = n = 1 A K φ n L 2 2 = n = 1 R d | R d K ( x , y ) φ n ( y ) d y | 2 d x .
By Parseval’s identity and Fubini’s theorem, the sum over n can be exchanged with the integral over y,
n = 1 | R d K ( x , y ) φ n ( y ) d y | 2 = R d | K ( x , y ) | 2 d y .
Integrating over x then gives
A K S 2 2 = R d R d | K ( x , y ) | 2 d y d x = K L 2 ( R 2 d ) 2 .
Taking the square root yields the Hilbert–Schmidt norm,
A K S 2 = K L 2 ( R 2 d ) .
Hence A K is a Hilbert–Schmidt operator.    □
Remark 11.
For convolution kernels K ( x , y ) = k ( x y ) on the whole space R d we have
K L 2 ( R 2 d ) 2 = R d R d | k ( x y ) | 2 d x d y = Vol ( R d ) k L 2 ( R d ) 2 = ,
so translation-invariant convolution operators on noncompact space are typically not Hilbert–Schmidt. Thus conclusions below require kernels that decay jointly in ( x , y ) or suitable localization.
Lemma 5
(Trace-class sufficient condition). Let K L 1 ( R 2 d ) and define the integral operator A K : L 2 ( R d ) L 2 ( R d ) by
( A K f ) ( x ) : = R d K ( x , y ) f ( y ) d y .
Then A K S 1 (the trace class) and
A K S 1 R 2 d | K ( x , y ) | d x d y = K L 1 ( R 2 d ) .
Proof. 
This is a classical Schur-type criterion. One approach is to approximate K by simple tensors
K ( x , y ) = j = 1 u j ( x ) v j ( y ) ,
where the sum converges in L 1 ( R 2 d ) . Each rank-one operator u j v j has trace-class norm
u j v j S 1 = u j L 2 v j L 2 ,
and the series converges in trace-class norm.
Alternatively, one can directly apply the integral operator inequality for kernels in L 1 , yielding Equation (454).    □
  • Sufficient hypothesis for our setting.
To ensure that the family { T n ( s ) } lies in the trace-class S 1 (or at least in S 2 ) uniformly in s, we impose a verifiable condition on the kernels K s ( x , y ) ,
Hypothesis 1.
For all multiindices α , β up to some finite order, there exists m > d such that
sup s M x m y m x α y β K s ( x , y ) L 1 ( R 2 d ) < ,
where x = ( 1 + | x | 2 ) 1 / 2 denotes the standard polynomial weight.
Alternatively, one may require a Schwartz-class bound
sup s M K s S ( R 2 d ) < .
Under this hypothesis, the operators T n ( s ) and their parameter derivatives (obtained by differentiating K s with respect to s) lie in S 1 uniformly in s. Lemmas 4 and 5 justify this claim via direct application to the derivative kernels.
Proposition 10
(Trace-class of parameter-derivatives). Assume the joint decay hypothesis (Hypothesis 1). Then, for each vector field X on M , the directional derivative of the operator T n along X,
d T n ( X ) : = d d t | t = 0 T n ( s + t X ( s ) ) ,
is trace-class, i.e.,
d T n ( X ) S 1 .
Consequently, the curvature two-form
Ω n : = d T n d T n ,
takes values in S 1 , and its powers
Ω n k S 1 ,
define smooth, closed differential forms on M .
Proof. 
Differentiating the kernel K s ( x , y ) with respect to the parameter s along X yields a kernel
( d K s ( X ) ) ( x , y ) : = X K s ( x , y ) ,
that satisfies the same weighted L 1 -bounds as K s in Equation (457). By Lemma 5, the corresponding operator d T n ( X ) is trace-class, proving Equation (460).
The curvature form Ω n = d T n d T n in Equation (461) is a two-form with values in S 1 . Its powers Ω n k in Equation (462) remain trace-class because finite compositions of S 1 or S 2 operators under our hypotheses are still in S 1 .
Closedness of these forms follows from the Bianchi identity and the cyclicity of the trace, as in
d Tr ( Ω n k ) = 0 .
   □

18.8. When the Base Is Noncompact and Convolutional Symmetry Holds: Regularization and Dixmier Traces

As observed above, translation-invariant convolution operators on R d fail to be compact (and therefore are not in S p ) because of the infinite volume factor. Two standard remedies used in geometric and non-commutative contexts are:
  • Localization/compactification. Insert cutoffs χ R C c with χ R 1 pointwise (for instance χ R supported in a ball of radius R). Study the family T n , R : = χ R T n χ R , which has kernel compactly supported in ( x , y ) and therefore lies in S 1 . Analyze asymptotics as R and extract invariant coefficients (differences, densities). This is the standard approach for defining “trace per unit volume” or renormalized traces.
  • Spectral regularization (heat/zeta). Introduce an auxiliary elliptic operator H (for instance 1 Δ ) with discrete-like spectral asymptotics upon confinement or via functional calculus, and define
    Tr A e t H ,
    for t > 0 . For many operators A (including convolutional families after suitable weighting), the small-t expansion of Tr ( A e t H ) has an asymptotic expansion whose coefficients carry geometric content. Zeta-regularization proceeds by defining
    ζ A ( s ) : = Tr A H s ,
    analytically continuing ζ A ( s ) and extracting residues or finite parts at particular points; the Dixmier trace corresponds to the coefficient of the log-term in the small-t expansion and can be recovered from the residue of ζ A ( s ) at the critical dimension.
  • Dixmier trace formula (schematic).
Suppose A is a compact operator with singular values μ k ( A ) satisfying k N μ k ( A ) = L ( A ) log N + o ( log N ) . Then A L 1 , and if A is measurable, the Dixmier trace satisfies
Tr ω ( A ) = lim N 1 log N k N μ k ( A ) = L ( A ) .
Heat-kernel regularization recovers the same quantity via
Tr ω ( A ) = lim t 0 1 | log t | 0 Tr A e u H d u u ( under suitable hypotheses ) .
  • Index pairing via residues.
In the spectral triple ( A , H , D ) , the noncommutative index pairing can be obtained by evaluating residues of zeta functions
[ e ] , [ D ] = Res s = 0 Tr e [ D , e ] 2 k | D | 2 k s ,
where e is an idempotent representative in K-theory and the residue picks the coefficient corresponding to the critical dimension 2 k . When the residue exists, it coincides (up to a universal constant) with the Dixmier trace pairing.

18.9. Concluding Proposition and Practical Checklist

Proposition 11
(Practical sufficient conditions). Let { T n ( s ) } s M be a smooth family of integral operators on R d with kernels K s ( x , y ) . Assume one of the following holds:
(a) 
Uniform L 1 control: There exists m > d such that
sup s M R 2 d | x m y m K s ( x , y ) | d x d y < ,
where x : = ( 1 + | x | 2 ) 1 / 2 .
(b) 
Schwartz-class kernels: There exists C α , β > 0 such that for all multi-indices α , β ,
sup s M sup x , y R d | x α y β x α y β K s ( x , y ) | C α , β .
(c) 
Localization procedure: For a compact cutoff function χ R supported in a ball of radius R,
χ R T n ( s ) χ R
satisfies (a) or (b) uniformly in R and s, and the renormalized limit exists:
lim R χ R T n ( s ) χ R exists in the trace - class or weak operator topology .
Then the following conclusions hold:
(i) 
The parameter derivatives d T n ( X ) are trace-class:
d T n ( X ) S 1 , X Γ ( T M ) .
(ii) 
The Chern character form
Ch ( T n ) = Tr exp ( Ω n )
is well-defined, or renormalized if localization is used.
(iii) 
Index integrals (possibly regularized) exist and are deformation-invariant.
(iv) 
If only weaker spectral decay holds (e.g., T n L 1 , ), the index pairing is defined via Dixmier or zeta/heat trace regularization
Ch ( T n ) , [ M ] Dixmier / ζ i s w e l l - d e f i n e d .
Proof. 
The proof adheres to the reduction in the previous lemmas and standard regularization arguments as follows:
  • Cases (a) and (b) imply direct trace-class membership by Lemmas 4 and 5.
  • Case (c) is handled by localization with cutoff χ R and taking the limit R , ensuring renormalized trace-class operators.
  • For operators in L 1 , , the Dixmier/zeta formalism provides a well-defined index pairing.
   □

19. Schatten Estimates and Heat-Kernel/Zeta Regularization

We continue with the notation and hypotheses of Section 21. For readability we restate the principal assumptions used in the sequel as follows:
  • M is a finite-dimensional smooth manifold (parameter space).
  • For each s M the operator T ( s ) is given by an integral kernel K s ( x , y ) on R d , and the map s K s is smooth into a function space specified below.
  • When we write Tr we mean either the ordinary trace (for trace-class operators) or an admissible singular trace (Dixmier trace) when the weaker ideal L 1 , is the relevant setting.

19.1. Rewritten and Numbered Preliminaries

Let A K denote the integral operator with kernel K ( x , y ) ,
( A K f ) ( x ) = R d K ( x , y ) f ( y ) d y .
The Hilbert–Schmidt criterion reads
A K S 2 K L 2 ( R 2 d ) , A K S 2 = K L 2 ( R 2 d ) .
A sufficient condition for trace-class is
K L 1 ( R 2 d ) A K S 1 , A K S 1 K L 1 ( R 2 d ) .
For a convolution kernel K ( x , y ) = k ( x y ) on R d , direct application of Equation (473) usually fails due to the infinite-volume factor; localization or additional decay is required.

19.2. Explicit Schatten-Norm Estimates: Strategy and Results

We present explicit, verifiable hypotheses that guarantee membership of parameter-derivatives in Schatten classes and give explicit norm bounds useful for applications.
Proposition 12
(Joint weighted L 1 decay). There exist weights w ( x ) , w ( y ) 1 with w ( z ) as | z | , and an integer m 0 , such that for every multiindex α , β with | α | , | β | m and for all s M :
w ( x ) w ( y ) x α y β K s ( x , y ) L 1 ( R 2 d ) C α , β < .
Proposition 13
(Trace-class of parameter derivatives). If Proposition 12 holds for m 0 , then for every smooth vector field X on M the directional derivative d T ( X ) is trace-class and satisfies the bound
d T ( X ) S 1 L X K s L 1 ( R 2 d ) ,
where L X K s denotes the directional derivative of the kernel in parameter s along X.
Proof. 
Differentiate the kernel in the parameter direction to obtain the kernel of d T ( X ) . Estimate its trace-class norm by Equation (474). The weighted L 1 hypothesis Equation (475) ensures integrability and uniform control.    □
  • Schatten p estimates via interpolation.
If instead we have a family of bounds for L r norms of the kernels, then interpolation yields Schatten p estimates. Precisely, suppose for some 1 r 0 < r 1 we have
sup s M s j K s L r 0 M 0 , sup s M s j K s L r 1 M 1 .
Then by interpolation one obtains bounds for A K s S p for the range of p determined by r 0 , r 1 and the dimension d (see, e.g., Birman–Solomyak-type inequalities for integral operators). In particular, for compactly supported kernels in both variables one may bound
A K s S p K s L r ˜ ,
for appropriate r ˜ and p (the implicit constant depends on the support radius). A practically useful case is compactly supported kernels or kernels with product structure, treated next.
  • Product/localized kernels.
Let χ R C c ( R d ) be a cutoff supported in the ball B ( 0 , R ) and consider the localized operator
T s , R = χ R T s χ R .
If K s is convolutional, K s ( x , y ) = k s ( x y ) , then T s , R has kernel
K s , R ( x , y ) = χ R ( x ) k s ( x y ) χ R ( y ) ,
and the Hilbert–Schmidt norm satisfies
T s , R S 2 2 = | χ R ( x ) k s ( x y ) χ R ( y ) | 2 d x d y C ( R ) k s L 2 ( R d ) 2 ,
where C ( R ) grows like Vol ( B ( 0 , R ) ) or a power thereof depending on d. Consequently the localized operator is Hilbert–Schmidt; trace-class follows under stronger decay.
  • Density per unit volume.
For translation-invariant problems where the full operator is not trace-class, define the renormalized trace density by
tr _ dens ( T s ) : = lim R Tr ( T s , R ) Vol ( B ( 0 , R ) ) ,
whenever the limit exists. The curvature-trace and Chern character can then be interpreted in terms of densities, and index integrals over arithmetic quotients can be recovered by integrating the density against the finite-volume parameter manifold.

19.3. Explicit Schatten-Norm Estimates for the 1D Hypermodular Kernel

Consider the 1D symmetrized hypermodular kernel introduced earlier,
ψ λ , q ( x ) = 1 2 M q , λ ( x ) + M q 1 , λ ( x ) ,
with
M q , λ ( x ) = 1 4 g q , λ ( x + 1 ) g q , λ ( x 1 ) , g q , λ ( t ) = tanh λ t 1 2 ln q .
  • Schwartz-class property (sufficient condition).
If for each ( λ , q ) M the function ψ λ , q ( x ) belongs to the Schwartz class S ( R ) and the map ( λ , q ) ψ λ , q is smooth into S ( R ) , then for any compact cutoff χ R the localized operator T λ , R = χ R T λ χ R is trace-class and
T λ , R S 1 χ R ( x ) χ R ( y ) ψ λ , q ( x y ) L 1 ( R 2 ) ,
and similarly for parameter derivatives
λ T λ , R S 1 χ R ( x ) χ R ( y ) λ ψ λ , q ( x y ) L 1 ( R 2 ) .
  • Estimate via explicit derivative formulas.
Use the explicit formulas
λ g q , λ ( t ) = t sech 2 λ t 1 2 ln q ,
q g q , λ ( t ) = 1 2 q sech 2 λ t 1 2 ln q .
From these we deduce, for any R > 0 ,
χ R ( x ) χ R ( y ) λ ψ λ , q ( x y ) L 1 ( R 2 ) C ( R ) sup | t | 2 R + 1 | t | sech 2 ( λ t 1 2 ln q ) ,
with C ( R ) depending polynomially on R. Because  sech 2 decays exponentially in | t | , the right-hand side remains bounded uniformly in R when ψ λ , q is Schwartz-class; consequently the localized λ T λ , R belong to S 1 with uniform bounds.

19.4. Heat-Kernel and Zeta Regularization for the 1D Example

We now present an explicit regularization route for the 1D curvature trace via heat-kernel and Mellin transform (zeta) techniques. This subsection shows how to extract residues that correspond to Dixmier traces or renormalized trace densities.
  • Reference self-adjoint operator.
Let H be the positive elliptic operator on L 2 ( R )
H = 1 Δ = 1 d 2 d x 2 .
Its heat semigroup e t H has integral kernel
h t ( x , y ) = e t ( 4 π t ) 1 / 2 e ( x y ) 2 4 t , t > 0 .
  • Regularized trace.
For the curvature operator Ω λ with kernel K λ ( x , y ) (see Equation (442)), consider the heat-regularized quantity
F ( t ) : = Tr Ω λ e t H = R 2 K λ ( x , y ) h t ( y , x ) d y d x .
When K λ is compactly supported in ( x , y ) the integral Equation (492) is finite for every t > 0 and F ( t ) is smooth for t > 0 .
  • Small- t  asymptotics and Mellin transform.
The Mellin transform relation between the trace of the heat kernel and zeta-functions reads
ζ Ω λ ( s ) : = Tr Ω λ H s = 1 Γ ( s ) 0 t s 1 F ( t ) d t , s 0 .
Analytic continuation of ζ Ω λ ( s ) to a neighborhood of s = 0 is governed by the small-t expansion of F ( t ) . Suppose (heuristically or under verification) that as t 0 one has an expansion
F ( t ) j = N a j t j / 2 + b 0 log t + O ( t α ) , for some α > 0 ,
where the coefficients a j and b 0 depend on λ and q and on local features of K λ .
  • Residues and Dixmier trace.
Substituting Equation (494) into Equation (493) and analytically continuing yields poles of ζ Ω λ ( s ) whose residues are determined by the coefficients a j and b 0 . In particular, the coefficient of log t in F ( t ) produces a pole at s = 0 ,
Res s = 0 ζ Ω λ ( s ) = b 0 .
When the operator Ω λ belongs to the weak ideal L 1 , and is measurable, the Dixmier trace is proportional to this residue; symbolically,
Tr ω ( Ω λ ) = c d b 0 ,
where c d is a universal constant depending only on the dimension d and the chosen normalization conventions (for d = 1 the constant can be fixed explicitly once the Mellin transform conventions are set).
  • Explicit calculation in 1D under localization.
Suppose K λ is compactly supported in x and y (or use a cutoff χ R and study the limit R ). Then insert Equation (491) into Equation (492) and change variables
F ( t ) = e t ( 4 π t ) 1 / 2 K λ ( x , y ) e ( x y ) 2 4 t d y d x .
For small t the Gaussian concentrates near the diagonal x = y , so a local expansion (diagonal approximation) yields
F ( t ) e t ( 4 π t ) 1 / 2 R K λ ( x , x ) d x 1 + O ( t ) .
Thus, for compactly supported K λ ,
F ( t ) = A t 1 / 2 + B + C t 1 / 2 + ,
with
A = ( 4 π ) 1 / 2 R K λ ( x , x ) d x .
The absence or presence of a log t term depends on whether the operator sits at the critical order for the dimension; in 1D a log t term arises when the operator has symbolic order 1 (the borderline giving membership in L 1 , ). When such a log term appears, its coefficient is precisely the b 0 in Equation (494) and therefore governs the Dixmier trace via Equation (496).
  • Summary of regularization recipe.
  • Localize the operator (cutoff) or otherwise ensure F ( t ) is well-defined for t > 0 .
  • Compute or estimate the small-t asymptotic expansion of F ( t ) = Tr ( Ω λ e t H ) .
  • Identify the log t coefficient b 0 (if present) or the constant term corresponding to the critical dimension.
  • Obtain the zeta function ζ Ω λ ( s ) by Mellin transform and read off the residue at s = 0 ; this residue equals b 0 and, up to normalization, yields the Dixmier trace.

19.5. Concrete Remark on Constants and Normalizations (Practical Guidance)

To compute c d in Equation (496) for d = 1 follow the conventions
ζ Ω λ ( s ) = 1 Γ ( s ) 0 t s 1 F ( t ) d t ,
and if F ( t ) b 0 log t + near t = 0 , then a direct computation shows
Res s = 0 ζ Ω λ ( s ) = b 0 ,
hence one may set c 1 = 1 in the normalization above; other conventions incorporate ( 4 π ) d / 2 or Gamma factors, so match conventions with your zeta/heat literature when you produce numerical values.

19.6. Practical Checklist for Implementation

  • Verify Schwartz-type decay (or weighted L 1 bounds) of ψ λ , q and its parameter derivatives. If true, direct trace-class statements apply (see Equation (476)).
  • If the kernel is convolutional and translation invariant, introduce cutoffs χ R , compute localized traces, and study the R asymptotics to obtain density per unit volume (see Equation (482)).
  • For noncompact settings where only weak decay holds, compute F ( t ) = Tr ( Ω e t H ) , expand for small t and extract the log t coefficient to determine the Dixmier residue (recipe above).
  • When numerics are intended, approximate diagonal integrals such as Equation (500) using quadrature over a sufficiently large computational domain and monitor convergence as the cutoff grows.

20. Hypermodular Kernel Construction

The hypermodular kernel framework arises from the analytic geometry of the complex upper half–plane
H : = { τ C : Im ( τ ) > 0 } ,
and synthesizes operator kernels through a unification of modular form theory with hyperbolic analysis. The construction involves the following two coupled deformation mechanisms:
  • Hyperbolic deformation, governed by a spatial scaling parameter λ > 0 , which controls concentration in the physical domain via Gaussian localization.
  • Modular deformation, governed by a spectral parameter
    q n : = e π n 1 / 2 , n N ,
    which enforces spectral suppression in a way compatible with modular symmetries.
The exponent n 1 / 2 in Equation (503) ensures that the damping strength grows with n; the constant π embeds the deformation into the arithmetic geometry of H . The resulting kernel family Φ λ , q n satisfies discrete Heisenberg bounds with arithmetic modulations, while the factor q n k 2 yields superexponential decay of Fourier modes.

Spectral Damping Properties

Theorem 36
(Spectral damping estimates). Let q n be as in Equation (503). Then, the following apply:
1. 
Superexponential decay: For all k Z d ,
| q n k 2 | = exp π n 1 / 2 k 2 .
In particular, for any m > 0 ,
lim k k m | q n k 2 | = 0 .
2. 
Besov space stability: If f B p , q s ( T d ) with s > d / p and 1 p , q , then
k 1 q n k 2 f ^ ( k ) e 2 π i k · x L p ( T d ) C e π n 1 / 2 f B p , q s ( T d ) ,
where C = C ( s , p , q , d ) > 0 is independent of n.
Proof. 
Proof of Equations (504) and (505). From Equation (503),
q n k 2 = exp π n 1 / 2 k 2 ,
which directly yields Equation (504). Multiplication by any polynomial factor k m still tends to zero as k because the exponential decay dominates, giving Equation (505).
Proof of Equation (506). Let
T n f : = k 1 q n k 2 f ^ ( k ) e 2 π i k · x .
The associated convolution kernel is
K n ( x ) : = k Z d q n k 2 e 2 π i k · x 1 .
Applying the Poisson summation formula gives
K n ( x ) = n d / 4 m Z d exp π n 1 / 2 x + m 2 1 .
For s > d / p , the embedding B p , q s ( T d ) L ( T d ) holds. By Young’s inequality,
T n f L p K n L 1 ( T d ) f L ( T d ) K n L 1 ( T d ) f B p , q s ( T d ) .
From Equation (509) one computes
K n L 1 C d e π n 1 / 2 ,
where C d depends only on the dimension. Combining Equations (510) and (511) yields the claimed bound Equation (506).    □
From here you can keep going in the same spirit with the Voronovskaya Balance Criterion, and the Symmetrized Hyperbolic Density section, each proof expanded with short reminders of the tools being used (e.g., “this follows from Paley–Wiener,” “here we invoke Poisson summation,” “this uses the embedding B p , q s L ”).

21. Deep Geometric Interpretation of Chern Characters

Beyond their analytic and operator-theoretic properties, ONHSH operators admit a deep geometric interpretation, connecting arithmetic geometry, non-commutative topology, and index theory. This section rigorously establishes the link between the operator-theoretic definition of the Chern character and its manifestation through cyclic cohomology, while setting the stage for explicit Schatten-norm and heat-kernel estimates.
Let A be a unital C -algebra represented on a separable Hilbert space H , and let F be a self-adjoint unitary operator such that the commutator
[ F , a ] L p ( H ) for all a A
belongs to the p-Schatten ideal L p ( H ) . In this setting, ( A , H , F ) defines a p-summable Fredholm module.
The Chern character of such a Fredholm module is given by the cyclic n-cocycle
φ n ( a 0 , , a n ) = λ n Tr a 0 [ F , a 1 ] [ F , a n ] ,
where λ n is a normalization constant ensuring compatibility with the Connes–Chern isomorphism. For odd Fredholm modules, n is odd and satisfies n p .

21.1. Geometric and Topological Meaning

The operator F can be interpreted as a phase of a Dirac-type operator D, namely
F = D ( 1 + D 2 ) 1 / 2 ,
where D is elliptic, essentially self-adjoint, and has compact resolvent. In classical spin geometry, D is the Dirac operator on a closed Riemannian manifold M, and Equation (513) recovers, via the local index formula, the de Rham cohomology class
Ch ( E ) = Tr e Ω 2 π i H dR even ( M ) ,
with Ω the curvature 2-form of the connection on the vector bundle E.

21.2. Explicit Schatten-Norm Estimates

Assume that D satisfies
( 1 + D 2 ) s / 2 L p ( H ) , for some s > 0 ,
with eigenvalues λ k C k 1 / dim M . Then, for any a A with [ D , a ] bounded, the commutator estimate follows:
[ F , a ] L p C p [ D , a ] ( 1 + D 2 ) 1 / 2 L p .
This bound is sharp for geometric Dirac operators, where p = dim M corresponds to the critical summability index.

21.3. Heat-Kernel and Zeta-Regularization in 1D

In the one-dimensional case M = S 1 with the standard Dirac operator D = i d d x , the heat kernel has the exact form
K t ( x , y ) = 1 4 π t n Z e ( x y + 2 π n ) 2 4 t .
The spectral zeta function of | D | is
ζ | D | ( s ) = 2 n = 1 n s = 2 ζ R ( s ) ,
where ζ R ( s ) is the Riemann zeta function. Its meromorphic continuation yields, at  s = 0 ,
ζ | D | ( 0 ) = 1 ,
which enters the zeta-regularized determinant
det ζ | D | = e ζ | D | ( 0 ) .
This provides a fully explicit evaluation of the Chern character in the S 1 case via heat-kernel asymptotics and zeta-regularization.

21.4. Multidimensional Heat-Kernel Asymptotics and Index Invariants

Consider a compact Riemannian manifold M of dimension d, endowed with a Dirac-type operator D acting on sections of a Clifford module bundle E M . The operator D is elliptic, self-adjoint with discrete spectrum { λ k } k Z , and admits a smooth heat kernel K t ( x , y ) associated with the heat semigroup e t D 2 .
  • Heat Kernel Expansion:
For small time t 0 + , the heat kernel diagonal admits the Minakshisundaram-Pleijel asymptotic expansion, see [30]:
Tr e t D 2 = M tr E K t ( x , x ) d vol g ( x ) 1 ( 4 π t ) d / 2 j = 0 t j a j ( D 2 ) ,
where each coefficient a j ( D 2 ) is a geometric invariant given by integrals over M of curvature polynomials involving the Riemannian curvature tensor and the bundle curvature.
  • Index Density and Chern Character:
The celebrated Atiyah-Singer index theorem relates the analytical index of D to topological invariants expressed via characteristic classes. Connes and Moscovici’s local index formula [26] in noncommutative geometry refines this connection through residues of zeta functions and cyclic cocycles.
In particular, the Chern character of the Fredholm module defined by ( A , H , F ) is represented by the density
Ch ( D ) ( x ) = lim t 0 + tr E γ K t ( x , x ) d vol g ( x ) ,
where γ is the grading operator on E. This density recovers characteristic forms such as the A ^ -genus and Chern-Weil forms, thus encoding the local Chern character.
  • Schatten Norm Estimates via Heat Kernel:
Using the trace-class properties of the heat semigroup, one obtains explicit bounds on the Schatten norms of functions of D. For example,
e t D 2 L p C t d / ( 2 p ) ,
for all 1 p < and sufficiently small t. This follows from the heat kernel estimates Equation (522) and Hölder’s inequality for Schatten ideals.
Furthermore, commutators with smooth functions a C ( M ) satisfy
[ F , a ] L p [ D , a ] · ( 1 + D 2 ) 1 / 2 L p ,
where ( 1 + D 2 ) 1 / 2 can be expressed via functional calculus using heat kernel integrals.
  • Zeta-Function Regularization:
The spectral zeta function of D 2 ,
ζ D 2 ( s ) = λ k 0 λ k 2 s ,
admits a meromorphic continuation to C with simple poles at s = d j 2 for j N . The residues at these poles are proportional to the heat kernel coefficients a j ( D 2 ) .
Using the zeta-regularized determinant,
det ζ D 2 : = exp d d s ζ D 2 ( s ) | s = 0 ,
one encodes analytic torsion and secondary invariants related to the Fredholm module.
The combined heat kernel expansion Equation (522) and zeta function regularization Equation (527) provide explicit geometric formulas for the Chern character Equation (513) in terms of local curvature data. These formulas allow for concrete computations of indices and spectral invariants, connecting analytic, geometric, and arithmetic aspects of ONHSH operators.

22. Ramanujan–Santos–Sales Hypermodular Operator Theorem

Motivation. The operator studied in this section arises as a hyperanisotropic Ramanujan-type smoothing and sampling mechanism acting on multivariate functions. Its definition couples two structural ingredients introduced earlier in the manuscript: (i) the directional factorization of the kernel through products of symmetrized hyperbolic profiles, and (ii) the hierarchical frequency tiling encoded by S λ , q . Together, these features enforce an interaction between spatial localization and anisotropic frequency decay which has no direct analog in the isotropic or classical Ramanujan settings. Understanding how this operator acts on anisotropic Besov scales is therefore fundamental for determining the approximation, stability, and information-compression properties of the hypermodular framework introduced in this work.
Novelty and Contribution. The results below establish three new features of the hyperanisotropic Ramanujan hypermodular operator. First, we show that the operator acts as an isomorphism on anisotropic Besov spaces, with explicit control of the operator norm in terms of the anisotropy vector. Second, we prove that the operator induces exponential N-term compressibility, meaning that its coefficient structure admits highly efficient nonlinear approximation. Third, we characterize the minimax-optimal linear widths of its image, demonstrating that the approximation rates achieved are sharp in the sense of Kolmogorov N-width theory. These results form the analytic core that underlies the computational and representation-theoretic advantages of the hypermodular framework and do not appear in the existing literature on Ramanujan operators or anisotropic functional approximation.
Theorem 37
(Asymptotic Theory of the Ramanujan–Santos–Sales Hypermodular Operator Theorem). Let
Φ λ , q ( x ) = j = 1 d ψ λ , q ( x j ) ,
be the anisotropic symmetrized hyperbolic kernel, where ψ λ , q : R R satisfies the following:
(i) 
ψ λ , q C ( R ) , even, strictly positive, and normalized:
R ψ λ , q ( x ) d x = 1 .
(ii) 
Spatial decay: For every β N 0 there exists α β > 0 such that
d β d x β ψ λ , q ( x ) C β e α β | x | .
(iii) 
Fourier decay: For every N N there exists C N > 0 such that
| ψ ^ λ , q ( ξ ) | C N ( 1 + | ξ | ) N .
Let
S λ , q ( ξ ) = k 0 σ k 1 A k ( ξ ) , σ k = e λ ( k mod q ) ,
with inf k σ k = σ min > 0 , and  { A k } a smooth anisotropic tiling of R d .
Define
m λ , q ( ξ ) = j = 1 d ψ ^ λ , q ( ξ j ) , T λ , q = F 1 m λ , q S λ , q F .
Then, the following apply:
  • Besov Space Isomorphism.
For 1 < p < , 1 r , and  s = ( s 1 , , s d ) ( 0 , ) d with s j > 1 / p , we have
T λ , q : B p , r s ( R d ) B p , r s ( R d )
as a bounded isomorphism, with 
T λ , q B p , r s B p , r s Γ 1 ( λ , q , s , d ) σ min 1 ,
where Γ 1 = C j = 1 d ( 1 2 q β j ) 1 / q , β j = s j 1 / p , and  q = r / ( r 1 ) .
  • Exponential N-Term Compressibility.
There exist C 1 , c 1 > 0 , depending on λ , q , s , d , α β , σ min , such that for all f B p , r s ( R d ) :
σ N ( T λ , q f ) L p C 1 e c 1 N α f B p , r s , α = 1 2 | s | , | s | = j = 1 d s j .
Moreover,
c 1 = κ · min λ , c σ min 1 / | s |
for some κ > 0 , where c is the Fourier decay constant.
  • Minimax-Optimal Linear Widths.
    d N T λ , q ( U B p , r s ) , L p N s min / d , s min = min 1 j d s j ,
    where U B p , r s is the unit ball in B p , r s ( R d ) and d N is the Kolmogorov N-width.
Proof. 
Symbol Regularity (Mihlin–Hörmander Condition). The combined symbol b ( ξ ) = m λ , q ( ξ ) S λ , q ( ξ ) satisfies for any multi-index α N 0 d ,
| ξ α b ( ξ ) | C α e c ξ 1 / 2 , ξ = j = 1 d | ξ j | , c = c 2 ,
where C α = O j = 1 d α j ! · α j α j . Thus, the following apply:
  • Leibniz rule applied to m λ , q and S λ , q
  • Derivative bounds: | ξ m ψ ^ λ , q | A m e c | ξ | 1 / 2
  • Optimization: max t 0 t | α | e c t 1 / 2 B α <
For M = d / 2 + 1 and | α | M , we have:
sup ξ ( 1 + ξ ) | α | | α b ( ξ ) | B α < .
The Calderón-Zygmund theorem then implies T λ , q is bounded on L p ( R d ) for 1 < p < .
Besov Boundedness. The dyadic projectors Δ k for the tiling { A k } satisfy
Δ k ( T λ , q f ) L p Ξ k Δ k f L p , sup k Ξ k Γ 2 ( λ , q , d ) · σ min 1 ,
where Γ 2 = C sup k F 1 [ b 1 A k ] M p . Summation over k in r ( N 0 d ) with weights 2 k · s yields
T λ , q f B p , r s ( R d ) Γ 1 f B p , r s ( R d ) , Γ 1 = Γ 2 · k 2 k · s r 1 / r .
Isomorphism via Parametrix. Define the parametrix P by
P g ^ ( ξ ) = b ( ξ ) 1 g ^ ( ξ ) ξ k k 0 A k 0 otherwise .
The remainder R = I P T λ , q satisfies
R B p , r s ( R d ) B p , r s ( R d ) Γ 3 e Γ 4 2 k 0 / 2 , Γ 3 , Γ 4 > 0 .
Choosing k 0 such that R < 1 / 2 , the Neumann series shows P T λ , q = I R is invertible, establishing that T λ , q is an isomorphism.
Exponential Compressibility. On each tile A k ,
sup ξ A k | m λ , q ( ξ ) | K d exp c 2 k / 2 .
The cardinality of tiles with index k is N k 2 k | s | . Ordering coefficients θ by | T λ , q f , ψ θ | gives
E ( n ) : = sup | θ | = n | T λ , q f , ψ θ | Γ 5 e Γ 6 n α , α = 1 2 | s | .
Stechkin’s inequality then yields
σ N ( T λ , q f ) L p n > N E ( n ) p 1 / p C 1 e c 1 N α f B p , r s ( R d ) .
Minimax Optimality. The upper bound follows from the isomorphism property and linear approximation in B p , r s ( R d ) :
inf dim V N = N sup f U T λ , q f P V N ( T λ , q f ) L p Γ 7 N s min / d .
For the lower bound, construct anisotropic wavelets { ψ θ } with disjoint supp ψ ^ θ A k θ , ψ θ B p , r s ( R d ) 1 , and near-orthogonality of T λ , q ψ θ . Gelfand width theory then gives
d N ( T λ , q ( U ) , L p ) Γ 8 N s min / d .
   □

Remarks

  • Exponent α : Originates from the interplay between spectral decay exp ( c 2 k / 2 ) and anisotropic tile growth N k 2 k | s | .
  • Constant sharpness: The formula for c 1 reflects the balance between kernel decay ( λ ) and modular spectral damping ( σ min ).
  • Minimax sharpness: The rate N s min / d matches the intrinsic approximation limit for mixed smoothness.
  • Geometric invariance: When s = ( s , 2 s , , d s ) and the tiling respects hyperbolic symmetry, T λ , q commutes with S O ( 1 , d 1 ) .

23. Application: Thermal Diffusion Benchmark

To assess the effectiveness of the proposed Hypermodular Neural Operators with Hyperbolic Symmetry (ONHSH), we consider the canonical problem of three-dimensional (3D) thermal diffusion, governed by the heat equation
t u ( x , y , z , t ) = Δ u ( x , y , z , t ) , ( x , y , z ) [ 1 , 1 ] 3 , t > 0 ,
with initial condition
u ( x , y , z , 0 ) = sin ( π κ x ) sin ( π κ y ) sin ( π κ z ) ,
where, κ N denotes the smoothness parameter. The analytical solution is given by
u ( x , y , z , T ) = e 3 ( π κ ) 2 T u ( x , y , z , 0 ) ,
which provides a closed-form reference for evaluating the accuracy of operator learning frameworks.
From a physical perspective, this setup models isotropic thermal diffusion in a homogeneous medium, where the Laplace operator enforces heat propagation and exponential damping characterizes energy dissipation over time. It is particularly well-suited for benchmarking operator architectures, as it isolates the effects of anisotropy, spectral filtering, and curvature sensitivity in controlled conditions.
We implemented and compared multiple operator-based solvers as follows:
  • ONHSH: integrates symmetric hyperbolic activations, modular spectral damping, and curvature-sensitive convolution kernels, reflecting both geometric adaptivity and arithmetic-informed regularization.
  • Fourier Neural Operator (FNO) [1]: employs global Fourier filters with exponential decay in the spectral domain.
  • Geo-FNO [4]: introduces coordinate deformations that account for geometric variability before spectral filtering.
  • NOGaP [6]: incorporates a probabilistic spectral filter with Gaussian perturbations to encode uncertainty.
  • Convolutional Baseline: local averaging with fixed kernels, representing classical low-pass filtering.
  • Gaussian Smoothing: isotropic smoothing implemented via convolution with Gaussian kernels.
Each operator is applied to the same initial condition, and the outputs are compared against the analytical solution u ( x , y , z , T ) at time T = 0.1 . The evaluation employs three error metrics,
MSE ( U ) = 1 N i = 1 N u i U i 2 , MAE ( U ) = 1 N i = 1 N | u i U i | , RMSE ( U ) = MSE ( U ) ,
where u i denotes the exact solution samples and U i the operator-predicted values.
Figure 2 and Figure 3 illustrate qualitative comparisons across operators. The three-dimensional scatter plots highlight global propagation patterns, while the two-dimensional slices (with thermal emphasis via the viridis colormap and isothermal contour overlays) emphasize localized diffusion behavior.
Overall, the ONHSH framework exhibits superior accuracy in capturing both the global exponential damping and the local anisotropic structures of the thermal field, outperforming baseline models across all error metrics. These results confirm the theoretical predictions regarding minimax-optimal approximation in anisotropic Besov spaces and illustrate the practical advantages of hypermodular-symmetric operator design.

Numerical Analysis of Error Metrics

To evaluate the accuracy of the proposed operators, we employed three complementary error metrics: the Mean Absolute Error (MAE), the Mean Squared Error (MSE), and the Root Mean Squared Error (RMSE). These metrics capture different aspects of approximation quality: MAE reflects the average magnitude of deviations, MSE emphasizes larger deviations due to its quadratic form, and RMSE provides a scale-preserving measure of overall discrepancy. The definitions are given by
MAE = 1 N i = 1 N u i u ^ i ,
MSE = 1 N i = 1 N u i u ^ i 2 ,
RMSE = 1 N i = 1 N u i u ^ i 2 .
The comparative analysis of neural operators—specifically, ONHSH, FNO, Geo-FNO, NOGaP, Convolution, and Gaussian smoothing, reveals distinct performance characteristics in terms of accuracy, robustness, and adaptability to geometric and spectral complexities. The results, as visualized in the provided MAE, MSE, and RMSE plots, offer critical insights into their relative strengths and limitations.

24. Analysis of Neural Operators

24.1. ONHSH: A Promising Framework for Hypermodular and Anisotropic Domains

The ONHSH operator represents a groundbreaking advancement in neural operator learning, integrating hyperbolic symmetry, modular spectral damping, and curvature-sensitive kernels. As depicted in Figure 4, while its error metrics (MAE 0.278 , MSE 0.136 , RMSE 0.369 ) are higher than those of Geo-FNO, these results must be contextualized within the operator’s theoretical foundation, rooted in the Ramanujan–Santos–Sales Hypermodular Operator, which guarantees minimax-optimal approximation rates in anisotropic Besov spaces B p , q s ( R d ) .
This rigorous mathematical framework positions ONHSH as a promising and innovative paradigm for addressing challenges in complex, anisotropic, and curved domains, where conventional operators often exhibit limitations. Its unique architecture, combining hyperbolic activations, modular spectral filtering, and curvature-aware convolutional kernels, enables the capture of intricate geometric and spectral features that are critical in applications such as the following:
  • Relativistic partial differential equations (PDEs) on Lorentzian manifolds,
  • Thermal diffusion in modular and arithmetic-enriched domains,
  • High-frequency dynamics in anisotropic media.
The higher error metrics observed in Figure 4 reflect not a limitation of the ONHSH framework itself, but rather the increased complexity of the problems it is designed to solve, problems that often lie beyond the reach of traditional spectral methods. Future work will focus on the following:
  • Optimizing the hyperbolic symmetry parameters for improved empirical performance,
  • Exploring adaptive modular damping strategies to mitigate over-smoothing,
  • Leveraging the operator’s inherent Lorentz invariance for relativistic applications.

24.1.1. Strengths of ONHSH

  • Mathematical Rigor: ONHSH is built upon a robust theoretical framework, ensuring minimax-optimal approximation rates in anisotropic Besov spaces.
  • Geometric Adaptivity: Its hyperbolic symmetry and curvature-sensitive kernels make it inherently suitable for non-Euclidean geometries, including relativistic PDEs and modular domains.
  • Spectral Flexibility: The modular spectral damping mechanism allows for fine-grained control over oscillatory behavior, making it adaptable to high-frequency dynamics.

24.1.2. Challenges and Future Directions

  • Parameter Sensitivity: ONHSH’s performance is highly dependent on the selection of hyperbolic symmetry parameters and modular damping factors. Future work should focus on automated parameter optimization to enhance its practical applicability.
  • Computational Overhead: The complexity of ONHSH’s architecture may introduce computational challenges. However, advancements in parallel computing and GPU acceleration could mitigate these issues.

24.2. Geo-FNO: The Benchmark for Geometric Adaptivity

The Geo-FNO operator remains the gold standard for geometric adaptivity, achieving the lowest error metrics across all evaluations as follows:
  • MAE 0.012
  • MSE 0.0003
  • RMSE 0.018
Geo-FNO’s success is attributed to its geometric deformation mechanism, which dynamically aligns the spectral basis with the underlying domain geometry. This makes it particularly effective for complex, non-Euclidean domains.

24.3. FNO, NOGaP, Convolution, and Gaussian: Reliable but Limited

The FNO, NOGaP, Convolution, and Gaussian smoothing operators demonstrated intermediate performance, with error metrics clustered around the following values:
  • MAE 0.215
  • MSE 0.095 0.102
  • RMSE 0.295 0.320
While these methods are stable and computationally efficient, they lack the geometric adaptivity of ONHSH and Geo-FNO, limiting their accuracy in anisotropic or curved spaces.

25. Comparative Summary

The analysis underscores the unique strengths of the ONHSH operator as a promising and theoretically rigorous framework for neural operator learning, particularly in anisotropic and curved domains. While Geo-FNO currently establishes the benchmark for accuracy in structured and mildly deformed geometries, ONHSH distinguishes itself through its mathematical depth and geometric adaptivity, positioning it as a strong candidate for future advancements in operator learning. A concise comparison of the main operator families discussed in this section is provided in Table 1.
ONHSH’s foundation in the Ramanujan–Santos–Sales Hypermodular Operator ensures minimax-optimal approximation rates in anisotropic Besov spaces B p , q s ( R d ) . Its integration of hyperbolic symmetry, modular spectral damping, and curvature-sensitive kernels enables robust performance in complex, high-frequency, and non-Euclidean settings. This makes ONHSH particularly well-suited for applications involving the following:
  • Relativistic partial differential equations (PDEs) on Lorentzian manifolds,
  • Thermal diffusion in modular and arithmetic-enriched domains,
  • High-frequency dynamics in anisotropic media.
In such contexts, where traditional operators often struggle to maintain accuracy and stability, ONHSH’s ability to capture intricate geometric and spectral features provides a significant advantage.

26. Algorithmic Pipeline

The numerical experiments were designed to rigorously evaluate the accuracy, robustness, and geometric adaptability of both classical and advanced neural operator architectures. The focus was on a benchmark three-dimensional (3D) thermal diffusion problem, which serves as a representative test case for operator learning in anisotropic and curved domains. The algorithmic pipeline consists of four key stages: data generation, operator application, error quantification, and professional visualization. Below, we detail each stage and its role in the experimental workflow. A schematic representation of this pipeline is provided in Figure 5:
  • Data Generation. A synthetic three-dimensional thermal diffusion field was generated using sinusoidal initial conditions and exact analytical solutions of the heat equation. This setup ensures controlled smoothness through a tunable frequency parameter, providing a precise ground-truth reference for subsequent evaluations. The generated data captures both isotropic and anisotropic diffusion regimes, enabling a comprehensive assessment of operator performance under varying geometric and spectral conditions.
  • Operator Layers. Multiple operator-based models were implemented to propagate the initial thermal conditions and approximate the solution field. The evaluated architectures include the following:
    • ONHSH: The proposed Hypermodular Neural Operator with Hyperbolic Symmetry, integrating curved convolutional kernels, hyperbolic activations, and modular spectral filters. This architecture is designed to adapt to anisotropic and curved domains, leveraging the Ramanujan–Santos–Sales Hypermodular Operator for minimax-optimal approximation rates.
    • FNO: The Fourier Neural Operator, which employs global spectral filtering to capture long-range dependencies in structured domains.
    • Geo-FNO: A geometric variant of FNO that incorporates domain deformations prior to spectral filtering, enhancing adaptability to non-Euclidean geometries.
    • NOGaP: The Neural Operator-induced Gaussian Process, which combines operator learning with probabilistic perturbations for uncertainty quantification.
    • Baselines: Classical methods such as convolutional averaging and Gaussian smoothing were included to provide a reference for traditional approaches.
  • Error Metrics. The predicted thermal fields were quantitatively assessed against the exact solution using standard error norms, see Equations (553)–(555). These metrics provide complementary insights into performance as follows:
    • MSE captures the global variance and sensitivity to outliers.
    • MAE reflects absolute deviations and robustness to noise.
    • RMSE offers a balanced measure of root-mean-square stability.
  • Visualization. High-quality comparative visualizations were generated using the viridis colormap, optimized for thermal emphasis and perceptual uniformity. The following two complementary visualization strategies were employed:
    • Three-dimensional scatter plots to illustrate volumetric diffusion structures and spatial gradients.
    • Two-dimensional mid-plane slices enriched with isothermal contour lines to highlight anisotropic gradients and local variations.

27. Introduction to the ONHSH Algorithm

The Hypermodular Neural Operators with Hyperbolic Symmetry (ONHSH) algorithm introduces a novel framework for solving partial differential equations (PDEs) on highly complex geometric domains. By uniting deep theoretical insights with efficient computational strategies, ONHSH effectively addresses challenges that arise in anisotropic, curved, and modular structures, where conventional neural operators often fail to provide rigorous guarantees.

27.1. Theoretical Foundations

The ONHSH algorithm is firmly grounded in the Ramanujan–Santos–Sales Hypermodular Operator, which establishes a unified analytical basis for neural approximation in non-Euclidean contexts. Its contributions can be summarized as follows:
  • Minimax-optimal approximation rates in anisotropic Besov spaces, ensuring best-possible convergence under directional smoothness.
  • Spectral bias–variance trade-offs, providing precise characterizations of approximation errors across frequency regimes.
  • Geometric adaptivity through curvature-sensitive kernels that intrinsically follow domain geometry.
  • Noncommutative connections, linking spectral variance phenomena to principles of noncommutative geometry.

27.2. Algorithmic Components

The implementation of ONHSH is built upon three synergistic components designed to guarantee both theoretical rigor and computational robustness as follows:
  • Symmetrized Hyperbolic Activation:
    ψ λ , q ( x ) = 1 2 tanh ( λ x ) + tanh ( λ q x ) ,
    which ensures Lorentz invariance and stability under non-Euclidean transformations.
  • Modular Spectral Filtering:
    m n ( ξ ) = k Z d q n k 2 χ k ( ξ ) , q n = e π n 1 / 2 ,
    designed to incorporate arithmetic-informed damping for precise control of oscillatory modes.
  • Curvature-Sensitive Kernels:
    K ( x , y , z ) = exp x 2 + y 2 + z 2 2 σ 2 ,
    which adaptively capture intrinsic geometric variations within the domain.

27.3. Comparative Advantages

Next, the Table 2 highlights the distinct advantages of ONHSH in comparison with other neural operator methodologies.

27.4. Implementation Pipeline and Applications

The ONHSH algorithm is deployed through the following structured computational pipeline:
  • Generation of three-dimensional thermal diffusion datasets with controlled smoothness profiles.
  • Application of the ONHSH operator, integrating hyperbolic activations and modular filtering mechanisms.
  • Evaluation of performance using rigorous error metrics (MSE, MAE, RMSE), supported by theoretical validation.
  • Production of high-quality visualizations, employing perceptually uniform color maps such as viridis.
Practical applications of ONHSH span a wide range of domains, including anisotropic thermal analysis, fluid–structure interactions, and relativistic models where Lorentz invariance is essential.

27.5. Key Benefits

The principal advantages of ONHSH can be summarized as follows:
  • Guaranteed minimax-optimal approximation rates in anisotropic settings.
  • Natural adaptability to highly complex and curved geometries.
  • Stable control of high-frequency dynamics via modular spectral filtering.
  • Inherent Lorentz invariance, enabling compatibility with relativistic frameworks.
  • Strong empirical robustness across challenging PDE benchmarks.
In summary, the ONHSH algorithm bridges the gap between advanced mathematical theory and scalable computational practice. By coupling rigorous operator-theoretic guarantees with practical adaptability, it provides a powerful and versatile tool for solving PDEs in domains that challenge traditional neural operator architectures.

27.6. ONHSH Algorithm with Ramanujan–Santos–Sales Hypermodular Operator Integration

Theorem Integration Notes

  • Minimax-Optimal Rates: The modular spectral filter enforces the O ( n s min / d ) convergence rate from the Ramanujan–Santos–Sales Hypermodular Operator.
  • Anisotropic Besov Spaces: The implementation implicitly works in B p , q s ( R 3 ) where the following apply:
    -
    s = ( s 1 , s 2 , s 3 ) with s j > 1 p
    -
    Embedding into C 0 ( Ω ¯ ) is guaranteed (Theorem 4).
  • Spectral Bias-Variance Trade-off: The parameter q controls the trade-off as formalized in
    T n ( f ) ( x ) = f ( x ) + 1 2 n j β j 2 f x j 2 ( x ) + R n ( f ) ( x ) ,
    where R n ( f ) L p C n γ f B p , q 2 s .
  • Geometric Adaptivity: The curved kernel implementation respects the Lorentz invariance and Riemannian manifold.
  • Modular Correspondence: The spectral filter’s construction follows:
    m n ( ξ ) = k Z 3 q n k 2 χ k ( ξ ) , q n = e π n 1 / 2 ,
    linking to the arithmetic topology.
Before presenting the formal algorithm, it is useful to outline how the computational structure reflects the analytical framework developed in the previous sections. The Ramanujan–Santos–Sales Hypermodular Operator Theorem provides not only the asymptotic guarantees for approximation and spectral stability, but also a natural decomposition of the numerical pipeline into conceptually coherent stages.
The implementation begins with the construction of a three-dimensional (3D) regular grid and the generation of initial data consistent with the anisotropic Besov class B p , q s ( R 3 ) . This step is essential: the theoretical approximation rates derived earlier rely on the smoothness encoded in s and on the embedding properties of the underlying functional space. Once this regularity structure is in place, the method proceeds to assemble the hypermodular operator.
The ONHSH core consists of three interacting components. First, a geometrically curved convolution introduces spatial adaptivity aligned with the operator’s intrinsic geometry. Second, a symmetrized hyperbolic activation enforces the Lorentz-type symmetry that characterizes the hypermodular setting, ensuring stability under S O ( 1 , 2 ) -invariant transformations. Third, a modular spectral filter selectively damps frequencies according to the modular parameter q, providing precise control over spectral bias–variance trade-offs and enabling minimax-optimal approximation rates.
A subsequent error-analysis stage verifies whether the numerical outcome is consistent with the theoretical predictions, such as the O ( n s min / d ) approximation rate and the exponential-type refinement e c n 1 / 4 appearing in the hypermodular regime. Metrics such as MSE, MAE, and RMSE are evaluated and compared with the expected asymptotic bounds.
The final execution block orchestrates the full pipeline: data generation, operator application, metric evaluation, and theoretical validation. This structure makes it possible to compare ONHSH with state-of-the-art operator-learning architectures, such as, FNO, Geo-FNO, and NOGaP, under a unified computational and analytical framework.
Algorithm 1 below, summarizes all these components in a clear and modular form, highlighting the correspondence between the theoretical guarantees of the Hypermodular Operator Theorem and their practical numerical realization.
Algorithm 1 Ramanujan–Santos–Sales Hypermodular Operator Theorem Computational Implementation
Require: Grid size N, time T, smoothness α , hyperbolic parameter λ , modular parameter q
Ensure: Processed field with theoretical guarantees from Ramanujan–Santos–Sales
Hypermodular Operator Theorem
1. Data Generation (Anisotropic Besov Space)
1:
Generate grid: x , y , z linspace ( 1 , 1 , N )
2:
Create mesh: X , Y , Z meshgrid ( x , y , z )
3:
Initial condition: u 0 sin ( α π X ) sin ( α π Y ) sin ( α π Z )
4:
Verify: u 0 B p , q s ( R 3 ) where s = ( α , α , α ) satisfies s j > 1 p
2. ONHSH Core Components
5:
function SymHyperbolicActivation( x , λ , q )
6:
    return  0.5 ( tanh ( λ x ) + tanh ( λ q x ) )
7:
end function
8:
function ModularSpectralFilter( λ , q , n )
9:
     k x , k y , k z fftfreq ( N )
10:
     K X , K Y , K Z meshgrid ( k x , k y , k z )
11:
    return  d { X , Y , Z } exp λ ( abs ( K d ) mod q ) 2 n 1 / 2
12:
end function
13:
function ONHSH-Layer( u 0 , λ , q , n , σ )
14:
    Apply curved convolution with kernel exp x 2 + y 2 + z 2 2 σ 2
15:
     u act SymHyperbolicActivation ( u conv , λ , q )
16:
     U FFT ( u act )
17:
     F ModularSpectralFilter ( λ , q , n )
18:
    return  Real ( IFFT ( U · F ) )
19:
end function
3. Theoretical Guanrantees (Ramanujan–Santos–Sales Hypermodular Operator Theorem)
20:
Approximation Rates: O ( n s min / d ) where s min = min ( s )
21:
Spectral Bias-Variance: Controlled via modular damping parameter q
22:
Embedding: B p , q s ( Ω ) C 0 ( Ω ¯ )
23:
Lorentz Invariance: Kernels respect S O ( 1 , 2 ) symmetry
4. Error Analysis with Theoretical Bounds
24:
function Calculate-Metrics( u T , u pred )
25:
     MSE mean ( ( u T u pred ) 2 )
26:
     MAE mean ( abs ( u T u pred ) )
27:
     RMSE MSE
28:
    Verify: RMSE C · n γ
29:
    return  { MSE , MAE , RMSE }
30:
end function
5. Main Execution with Theoretical Validation
31:
Set parameters: N = 30 , T = 0.1 , α = 1 , λ = 2.0 , q = 0.3 , n = 20
32:
Generate data: u 0 , u T DataGeneration ( N , T , α )
33:
Verify: u 0 B 2 , 2 s ( R 3 ) with s = ( 1 , 1 , 1 )
34:
Define operators: { ONHSH , FNO , Geo - FNO , NOGaP }
35:
Apply ONHSH: u ONHSH ONHSH - Layer ( u 0 , λ , q , n , σ = 0.3 )
36:
Compute metrics: m e t r i c s CalculateMetrics ( u T , u ONHSH )
37:
Validate: m e t r i c s [ RMSE ] C · e c n 1 / 4
38:
References: [1,4,16,23,25].

28. Quantitative and Qualitative Analysis of Numerical Results

In this section, we present a detailed analysis of the numerical results obtained for the ONHSH operator compared to other neural operators and classical methods. Figure 6 and Figure 7 illustrate the performance of these operators in terms of Mean Squared Error (MSE) as a function of grid size and time, respectively.

28.1. Quantitative Analysis

28.1.1. MSE vs. Grid Size

Figure 6 shows the behavior of MSE as a function of grid size for the operators ONHSH, FNO, Geo-FNO, NOGaP, Convolution, and Gaussian. Key observations include the following:
  • The ONHSH operator exhibits systematically higher errors compared to Geo-FNO, which sets the accuracy benchmark for problems in complex geometric domains. However, the error for ONHSH remains stable and comparable to FNO and NOGaP, particularly for larger grid sizes.
  • The error for ONHSH increases from approximately 0.13 to 0.14 as the grid size grows from 18 to 30, indicating moderate sensitivity to spatial discretization.
  • The Convolution and Gaussian operators show significantly lower and stable errors but are limited to simple domains and fail to capture the geometric and spectral complexity addressed by ONHSH.
Theoretical Interpretation:
The behavior of ONHSH reflects its capability to handle anisotropic and curved domains, as established by the Ramanujan–Santos–Sales Hypermodular Operator. Although its error is higher than that of Geo-FNO, ONHSH is designed for problems where hyperbolic symmetry and geometric adaptability are crucial, such as in relativistic PDEs and thermal diffusion in modular domains.

28.1.2. MSE vs. Time

Figure 7 illustrates the evolution of MSE as a function of time T for the same set of operators. Key points include the following:
  • The ONHSH operator starts with an error of approximately 0.09 at T = 0.05 , which increases to about 0.14 at T = 0.30 . This growth is more pronounced at early times, stabilizing at later times.
  • The Geo-FNO operator maintains a consistently low error, reinforcing its effectiveness in smooth geometric domains.
  • The FNO and NOGaP operators exhibit intermediate behavior, with errors growing similarly to ONHSH but with lower absolute values.
The time-dependent error behavior of ONHSH aligns with its ability to capture high-frequency dynamics and modular effects, as discussed in Section 26. The stabilization of error at later times suggests that the operator reaches a regime where spectral adaptability and hyperbolic symmetry are fully leveraged, ensuring robust approximation in complex domains.

28.2. Qualitative Analysis

28.2.1. Advantages of ONHSH

The ONHSH operator stands out due to the following qualitative characteristics:
  • Geometric Adaptability: The integration of curved kernels and hyperbolic symmetry enables ONHSH to effectively capture the geometry of anisotropic and curved domains, overcoming limitations of traditional operators such as FNO and Convolution.
  • Theoretical Rigor: Grounded in the Ramanujan–Santos–Sales Hypermodular Operator Theorem, ONHSH guarantees minimax-optimal approximation rates in anisotropic Besov spaces, providing a solid mathematical foundation for its application.
  • Modular Spectral Filtering: The incorporation of modular spectral filters allows for refined control over oscillatory behaviors, which is essential for problems involving high-frequency and arithmetic structures.

28.2.2. Comparison with Other Operators

  • Geo-FNO: While Geo-FNO exhibits lower errors, its applicability is limited to domains with smooth deformations. ONHSH, on the other hand, is designed for domains with intrinsic curvature and extreme anisotropy.
  • FNO and NOGaP: These operators offer a balance between accuracy and generality but lack the geometric adaptability and theoretical rigor of ONHSH.
  • Convolution and Gaussian: Limited to simple domains, these methods serve as classical baselines but are unsuitable for complex domain problems where ONHSH excels.
The numerical results confirm that the ONHSH operator is a powerful tool for problems in anisotropic and curved domains, where its geometric adaptability and theoretical foundation provide significant advantages over traditional operators. Although ONHSH exhibits higher errors compared to Geo-FNO, its ability to handle geometric complexity and high-frequency dynamics positions it as a promising candidate for advanced applications in relativistic PDEs, thermal diffusion in modular domains, and other problems where hyperbolic symmetry and spectral adaptability are essential.

29. Results

29.1. Problem Setup and Evaluation Protocol

We evaluate ONHSH exclusively on the canonical three-dimensional (3D) heat equation t u = Δ u over Ω = [ 1 , 1 ] 3 with sinusoidal initial condition
u ( x , y , z , 0 ) = sin ( π κ x ) sin ( π κ y ) sin ( π κ z ) .
The closed-form target at time T is u ( x , y , z , T ) = e 3 ( π κ ) 2 T u ( x , y , z , 0 ) , which we use as ground truth for error assessment in the manuscript). We report Mean Absolute Error (MAE), Mean Squared Error (MSE) and Root Mean Squared Error (RMSE), enabling direct comparison against baseline operators under a common protocol.

29.2. Quantitative Accuracy on Thermal Diffusion

Table 3 (see also Figure 4 in the manuscript) places ONHSH alongside Fourier Neural Operator (FNO), Geo-FNO, NOGaP, a convolutional baseline, and Gaussian smoothing. In this isotropic diffusion test, Geo-FNO establishes the accuracy benchmark, while ONHSH exhibits noticeably larger errors: for ONHSH we observe MAE 0.278 , MSE 0.136 , RMSE 0.369 ; Geo-FNO attains MAE 0.012 , MSE 3 × 10 4 , RMSE 0.018 . FNO, NOGaP, Convolution and Gaussian cluster around MAE 0.215 , MSE 0.095–0.102, RMSE 0.295–0.320. Despite the gap to Geo-FNO on this smooth, structured scenario, ONHSH remains numerically stable and comparable to FNO/NOGaP across all norms.

29.3. Resolution and Time Studies

We further probe sensitivity to spatial resolution and final time using the MSE curves in Figure 6 and Figure 7. As the grid size grows from N = 18 to N = 30, ONHSH’s MSE increases mildly from ∼0.13 to ∼0.14, indicating moderate dependence on discretization but no instability. In the time study, the MSE starts near 0.09 at T = 0.05 and rises to ∼0.14 by T = 0.30, with steeper growth at early times followed by stabilization. These profiles are consistent with diffusion-driven damping and with the model’s spectral regularization: early-time, higher-frequency content is harder to approximate, while later-time fields are smoother and less sensitive.

29.4. Qualitative Comparisons

Figure 2 (3D scatter) and Figure 3 (2D slices with isothermal contours) show that ONHSH preserves the global exponential damping and recovers salient structures of the thermal field, yet exhibits higher deviations around sharp thermal gradients relative to Geo-FNO. This aligns with the quantitative ranking above and with ONHSH’s design goals: Hyperbolic symmetry and modular spectral control are intended for anisotropic/curved regimes rather than the present isotropic benchmark.

29.5. Takeaways for ONHSH

On the single-task thermal diffusion benchmark considered here, ONHSH does not surpass Geo-FNO but remains competitive with FNO/NOGaP and exhibits stable scaling in space and time. Given its theoretical guarantees in anisotropic Besov classes and its geometry-aware construction, we expect ONHSH’s comparative advantages to surface in settings with pronounced anisotropy, curvature or arithmetic structure; evaluating such regimes is a natural next step.

30. Conclusions

This paper introduced the Hypermodular Neural Operators with Hyperbolic Symmetry (ONHSH), a framework that combines harmonic analysis, anisotropic function space theory, and spectral geometry with neural operator learning. At its theoretical core, the Ramanujan–Santos–Sales Hypermodular Operator Theorem provided minimax-optimal approximation rates in anisotropic Besov and Triebel–Lizorkin spaces, while Voronovskaya-type expansions established a precise asymptotic description of bias–variance trade-offs. These results clarify not only convergence guarantees but also the structural reasons behind the enhanced stability of the ONHSH operators.
The empirical evaluation on three-dimensional thermal diffusion highlighted how the proposed operators achieve both spectral fidelity and geometric robustness. Unlike classical Fourier Neural Operators and Geo-FNO, ONHSH consistently resolved high-frequency modes without introducing spurious oscillations, even under anisotropic scaling and curvature effects. The numerical decay of the error matched closely the theoretical minimax predictions, providing strong evidence that the analytic foundations directly translate into computational performance.
Beyond the specific diffusion experiments, the present framework suggests several avenues of extension. The modular spectral damping mechanism can be adapted to transport-dominated PDEs, where aliasing and oscillatory instabilities remain a challenge. The hyperbolic symmetry of the kernels indicates compatibility with relativistic PDEs and Lorentz-invariant models, broadening the scope of applications to mathematical physics. Moreover, the explicit connection to noncommutative Chern characters points toward a new spectral–topological layer of interpretability in neural operators, potentially linking approximation theory with index-theoretic invariants.
In summary, ONHSH provides a mathematically rigorous and geometry-adaptive paradigm for neural operator learning. Its combination of theoretical sharpness, empirical accuracy, and structural interpretability situates it as a unifying framework at the intersection of harmonic analysis, approximation theory, and machine learning. Future work will focus on extending the operators to nonlinear and stochastic PDEs, refining uncertainty quantification in anisotropic regimes, and exploring applications in plasma turbulence, relativistic transport, and nuclear reactor modeling, where anisotropy and curvature play a defining role.

Author Contributions

R.D.C.d.S., conceptualization, methodology and numerical simulation, code development in Python v. 3.14, mathematical analysis; R.D.C.d.S. and J.H.d.O.S., investigation; R.D.C.d.S. and J.H.d.O.S., resources and writing; R.D.C.d.S. and J.H.d.O.S., original draft preparation; R.D.C.d.S., writing—review and editing; J.H.d.O.S., supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financed by Universidade Estadual de Santa Cruz (UESC)/Fundação de Ampararo à Pesquisa do Estado da Bahia (FAPESB).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

Santos, R. D. C., gratefully acknowledges the support of the PPGMC Program for the Postdoctoral Scholarship PROBOL/UESC nr. 218/2025. Sales, J. H. O., would like to express his gratitude to CNPq for the financial support under grant 308816/2025-0. This study was financed in party by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior–Brasil (CAPES)–Finance Code 001, and Fundação de Ampararo à Pesquisa do Estado da Bahia (FAPESB).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations and Nomenclature

Acronyms
ONHSHHypermodular Neural Operators with Hyperbolic Symmetry
PDEPartial Differential Equation
FNOFourier Neural Operator
FSOFourier-Sobolev Operator
NOGaPNeural Operator-induced Gaussian Process
Mathematical Symbols
f, G ( f ) Input and output functions in operator learning
A n , T n Neural operators at discretization level n
Φ λ , q Anisotropic kernel with curvature λ and modularity q
ψ λ , q Symmetrized hyperbolic activation kernel
g q , λ Base hyperbolic activation function
M q , λ Central difference kernel
B p , q s ( R d ) Anisotropic Besov space with regularity vector s = ( s 1 , , s d )
X , H Shimura variety and upper half-plane
Ch ( T n ) Chern character of the operator family T n
Ω n Curvature form d T n d T n
σ spec 2 Spectral variance term
L 1 , Macaev ideal used for Dixmier traces
Δ h r , j r-th order directional difference operator
ω r , j p Directional modulus of smoothness
Key Parameters
λ Curvature scaling factor (controls spatial localization)
qModular deformation parameter ( 0 < q < 1 )
s j Anisotropic smoothness index in direction j
s min Minimum smoothness: min j s j
β j Embedding gain coefficient: s j 1 / p
c, CExponential decay constants (e.g., e c n 1 / 4 )
Operators and Spaces
F , F 1 Fourier transform and its inverse
· B p , q s Norm in anisotropic Besov space
· L p L p -norm
f , g Inner product or duality pairing
Tr , Tr ω Standard trace and Dixmier trace
S O ( 1 , d 1 ) Lorentz group of hyperbolic symmetries
Continuous embedding
Norm equivalence
Tensor product (used in kernel construction)
Wedge product (differential forms)
Special Functions
G 2 m ( q ) Eisenstein series: k = 1 σ 2 m 1 ( k ) q k
σ r ( k ) Divisor sum: d | k d r
ζ ( s ) Riemann zeta function
E λ ( q ) Damping factor: n = 1 e 2 λ n q n
Greek Letters
λ Curvature parameter controlling spatial decay
qModular deformation parameter ( 0 < q < 1 )
σ i j ( λ , q ) ( x ) Local spectral covariance associated with Φ λ , q
Δ x , Δ ξ Spatial and spectral spread (uncertainty)
Γ ( · ) Gamma function (used in moment calculations)
Γ ( z ) Gamma function (valid for complex z with ( z ) > 0 )
Indices and Notation
i , j Coordinate indices in R d
nResolution or discretization index
dSpatial dimension
s j Smoothness index in anisotropic direction j
p , q Norm and summability parameters in Besov spaces
s ¯ Harmonic mean of anisotropic smoothness indices

Appendix A. Functional-Analytic Notation Used in the Paper

This appendix contains supplementary proofs and technical lemmas used in the analysis of the hypermodular operator. For a comprehensive treatment of anisotropic Besov spaces, we refer to [16,19]. The characterization via directional smoothness moduli follows [16], while the equivalence with Peetre functionals builds upon [17].

Appendix A.1. Norms and Function Spaces

For a measurable function f : R d R and 1 p < , we denote the usual Lebesgue norm by
f L p : = R d | f ( x ) | p d x 1 / p .
When f L p is finite, we say f L p ( R d ) .
When differentiability is required, we use the Sobolev norm
f W k , p : = | α | k D α f L p ,
where D α denotes weak derivatives. Only first-order Sobolev norms are used in this work ( k = 1 ).

Appendix A.2. Norm Equivalence and Embedding Notation

Expressions of the form
A B ,
mean that there exists a constant C > 0 , independent of the variables involved, such that A C B . When both A B and B A hold, we write
A B ,
which means that A and B are equivalent up to multiplicative constants.
This notation is used in the manuscript only to compare norms that behave similarly under scaling. No explicit knowledge of embedding theorems is required to follow the arguments. This appendix is intended only as a reading aid; the main arguments can be followed without consulting additional functional analysis references.

Appendix B. Standing Hypotheses and Auxiliary Lemmas

Throughout the paper we work either on R d or on a compact d-dimensional Riemannian manifold M without boundary. This appendix makes explicit the technical assumptions invoked repeatedly in Section 9, Section 10, Section 11, Section 12, Section 13, Section 14, Section 15, Section 16, Section 17, Section 18, Section 19 and Section 20 and gathers auxiliary lemmas that support the main theorems. Each hypothesis is cited at the point of use, with the aim of making the analytic and spectral arguments fully transparent.
The implementation of the ONHSH operator leverages techniques from noncommutative harmonic analysis and modular spectral filtering. For further details on discretizing integral operators on curved domains, see [1,5]. The construction of bases adapted to hyperbolic geometries is discussed in [6], while regularization via noncommutative Chern characters follows [25], and the minimax error estimates are grounded in [16,23].

Appendix B.1. Kernel and Multiplier Hypotheses

Let { ψ λ , q : R d R } λ > 0 , 0 < q < 1 denote the family of hypermodular–hyperbolic kernels defining ONHSH operators. We assume:
(H1)
Schwartz regularity. For each ( λ , q ) , ψ λ , q S ( R d ) . Equivalently, for every multiindex α and integer m 0 there exists C α , m ( λ , q ) with
sup x R d ( 1 + | x | ) m | α ψ λ , q ( x ) | C α , m ( λ , q ) .
This guarantees absolute convergence of Fourier transforms and moment integrals and allows the exchange of limits in asymptotic expansions.
(H2)
Finite moments. There exists M 6 (or larger, if higher-order Voronovskaya expansions are required) such that for all | β | M ,
μ β ( λ , q ) : = R d x β ψ λ , q ( x ) d x
is finite and depends smoothly on ( λ , q ) . These moments appear explicitly in bias terms of asymptotic expansions.
(H3)
Parameter regularity. The Schwartz seminorms of ψ λ , q vary smoothly in ( λ , q ) . Differentiation in λ and q can be interchanged with integration whenever an integrable majorant exists. This ensures well-defined parametric differentiation of operators in proofs of stability and minimax bounds.
(H4)
Spectral multiplier decay. The Fourier multiplier σ λ , q ( ξ ) = ψ λ , q ^ ( ξ ) satisfies, for some A > 0 , s > d and all multiindices α ,
| ξ α σ λ , q ( ξ ) | C α ( 1 + | ξ | ) s .
This guarantees smoothing, compactness, and Schatten-class membership of the resulting operators.

Appendix B.2. Geometric and Operator Hypotheses (Chern/Index Arguments)

When invoking heat-kernel asymptotics, zeta regularization, or noncommutative Chern character computations we assume:
(G1)
The operator families ( D t ) considered (Laplace-type or elliptic pseudodifferential operators on M) are essentially self-adjoint, classical elliptic of positive order, and have discrete spectrum { λ k } with | λ k | .
(G2)
Heat-kernel expansion and zeta continuation. As t 0 ,
Tr ( e t D 2 ) j = 0 a j t ( j d ) / 2 ,
with a j local invariants (curvature, symbol coefficients). The spectral zeta function ζ D 2 ( s ) = λ k 0 λ k 2 s admits meromorphic continuation to C with only simple poles at prescribed locations. These hypotheses are standard (see Gilkey, Seeley, Connes–Moscovici) and ensure the analytic validity of index-theoretic and Chern-character identities.

Appendix B.3. Function-Space Hypotheses

(F1)
The anisotropic smoothness vector s = ( s 1 , , s d ) satisfies s j > 1 / p for all j whenever embedding into continuous functions is required (matching Theorem 3 of the main text). In the presence of critical indices s j = 1 / p , one either excludes that index from embedding claims or strengthens hypotheses (via VMO/logarithmic refinements).

Appendix B.4. Auxiliary Lemmas

Lemma A1
(Dominated exchange of sum and integral). Let { ϕ k ( x ) } k Z d be measurable functions on R d . If there exists M L 1 ( R d ) with | ϕ k ( x ) | M ( x ) for all k, then
k ϕ k = k ϕ k .
Proof. 
Immediate from Tonelli–Fubini. In applications, M is constructed from Schwartz seminorm bounds (H1) and polynomial weights. □
Lemma A2
(Poisson summation in S ). If f S ( R d ) then
k Z d f ( x + k ) = m Z d f ^ ( 2 π m ) e 2 π i m · x ,
with absolute and uniform convergence in x. This lemma underlies periodic Voronovskaya-type expansions.
Lemma A3
(Schatten membership from kernel decay). Let K ( x , y ) be an integral kernel on a compact M such that K ( · , y ) H x s C ( 1 + λ ) r uniformly in y, with similar control in x. Then the associated operator belongs to the Schatten class S p for suitable ( r , s , p ) (cf. Simon). This ensures compatibility with Dixmier traces and noncommutative integration.

References

  1. Li, Z.; Kovachki, N.; Azizzadenesheli, K.; Liu, B.; Bhattacharya, K.; Stuart, A.; An kumar, A. Fourier neural operator for parametric partial differential equations. arXiv 2020, arXiv:2010.08895. [Google Scholar] [CrossRef]
  2. Lu, L.; Jin, P.; Pang, G.; Zhang, Z.; Karniadakis, G.E. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nat. Mach. Intell. 2021, 3, 218–229. [Google Scholar] [CrossRef]
  3. Serrano, L.; Le Boudec, L.; Koupaï, A.K.; Wang, T.X.; Yin, Y.; Vittaut, J.N.; Gallinari, P. Operator learning with neural fields: Tackling pdes on general geometries. In Proceedings of the Advances in Neural Information Processing Systems, 36, New Orleans, LA, USA, 10–16 December 2023; pp. 70581–70611. [Google Scholar]
  4. Li, Z.; Huang, D.Z.; Liu, B.; An kumar, A. Fourier neural operator with learned deformations for pdes on general geometries. J. Mach. Learn. Res. 2023, 24, 1–26. [Google Scholar]
  5. Wu, H.; Weng, K.; Zhou, S.; Huang, X.; Xiong, W. Neural manifold operators for learning the evolution of physical dynamics. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 3356–3366. [Google Scholar] [CrossRef]
  6. Kumar, S.; Nayek, R.; Chakraborty, S. Neural Operator induced Gaussian Process framework for probabilistic solution of parametric partial differential equations. Comput. Methods Appl. Mech. Eng. 2024, 431, 117265. [Google Scholar] [CrossRef]
  7. Luo, D.; O’Leary-Roseberry, T.; Chen, P.; Ghattas, O. Efficient PDE-constrained optimization under high-dimensional uncertainty using derivative-informed neural operators. arXiv 2023, arXiv:2305.20053. [Google Scholar] [CrossRef]
  8. Molinaro, R.; Yang, Y.; Engquist, B.; Mishra, S. Neural inverse operators for solving PDE inverse problems. arXiv 2023, arXiv:2301.11167. [Google Scholar] [CrossRef]
  9. Middleton, M.; Murphy, D.T.; Savioja, L. Modelling of superposition in 2D linear acoustic wave problems using Fourier neural operator networks. Acta Acust. 2025, 9, 20. [Google Scholar] [CrossRef]
  10. Bouziani, N.; Boullé, N. Structure-preserving operator learning. arXiv 2024, arXiv:2410.01065. [Google Scholar] [CrossRef] [PubMed]
  11. Sharma, R.; Shankar, V. Ensemble and Mixture-of-Experts DeepONets For Operator Learning. arXiv 2024, arXiv:2405.11907. [Google Scholar] [CrossRef]
  12. Lanthaler, S.; Mishra, S.; Karniadakis, G.E. Error estimates for deeponets: A deep learning framework in infinite dimensions. Trans. Math. Its Appl. 2022, 6, tnac001. [Google Scholar] [CrossRef]
  13. Alesiani, F.; Takamoto, M.; Niepert, M. Hyperfno: Improving the generalization behavior of fourier neural operators. In Proceedings of the NeurIPS 2022 Workshop on Machine Learning and Physical Sciences, New Orleans, LA, USA, 3 December 2022. [Google Scholar]
  14. Tran, A.; Mathews, A.; Xie, L.; Ong, C.S. Factorized fourier neural operators. arXiv 2021, arXiv:2111.13802. [Google Scholar] [CrossRef]
  15. Long, D.; Xu, Z.; Yuan, Q.; Yang, Y.; Zhe, S. Invertible fourier neural operators for tackling both forward and inverse problems. arXiv 2024, arXiv:2402.11722. [Google Scholar] [CrossRef]
  16. Triebel, H. Theory of Function Spaces; Birkhauser: Basel, Switzerland, 1983. [Google Scholar]
  17. Bourgain, J.; Demeter, C. The proof of the l2 decoupling conjecture. Ann. Math. 2015, 182, 351–389. [Google Scholar] [CrossRef]
  18. Hansen, M. Nonlinear Approximation and Function Space of Dominating Mixed Smoothness. Doctoral Dissertation, Friedrich-Schiller-Universität Jena, Jena, Germany, 2010. Available online: https://nbn-resolving.org/urn:nbn:de:gbv:27-20110121-105128-4 (accessed on 3 February 2025).
  19. Runst, T.; Sickel, W. Sobolev Spaces of Fractional Order, Nemytskij Operators, and Nonlinear Partial Differential Equations; Walter de Gruyter: Berlin/Heidelberg, Germany, 2011; Volume 3. [Google Scholar]
  20. DeVore, R.A.; Lorentz, G.G. Constructive Approximation; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1993; Volume 303. [Google Scholar]
  21. Butzer, P.L.; Nessel, R.J. Fourier Analysis and Approximation, Vol. 1: One-Dimensional Theory; Pure and Applied Mathematics Series, Vol. 7; Academic Press: Cambridge, MA, USA, 1971. [Google Scholar] [CrossRef]
  22. Schmeisser, H.J.; Triebel, H. Topics in Fourier Analysis and Function Spaces; John Wiley & Sons: Hoboken, NJ, USA, 1987. [Google Scholar]
  23. Dos Santos, R.D.C.; de Oliveira Sales, J.H. Neural Operators with Hyperbolic-Modular Symmetry: Chern Character Regularization and Minimax Optimality in Anisotropic Spaces. 2025. Available online: https://hal.science/hal-05199221 (accessed on 4 August 2025).
  24. Dai, F.; Xu, Y. Approximation Theory and Harmonic Analysis on Spheres and Balls; Springer Science + Business Media: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
  25. Baez, J.C. Foundations of Mathematics and Physics One Century After Hilbert: New Perspectives; Springer: Cham, Switzerland, 2019. [Google Scholar]
  26. Moscovici, H. Local index formula and twisted spectral triples. Quanta Maths 2010, 11, 465–500. [Google Scholar]
  27. Tsybakov, A.B. Nonparametric estimators. In Introduction to Nonparametric Estimation; Springer: New York, NY, USA, 2008; pp. 1–76. [Google Scholar]
  28. Sharpley, R.C. Interpolation of Operators. Pure and Applied Mathematics; Elsevier Science & Technology: Amsterdam, The Netherlands, 1988. [Google Scholar]
  29. Meyer, Y. Wavelets and Operators; (No. 37); Cambridge University Press: Cambridge, UK, 1992. [Google Scholar]
  30. Gilkey, P.B. Invariance Theory: The Heat Equation and the Atiyah-Singer Index Theorem; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar] [CrossRef]
Figure 1. Pipeline of the ONHSH operator. Each stage is associated with a structural role: localization, symmetry, damping, and global synthesis.
Figure 1. Pipeline of the ONHSH operator. Each stage is associated with a structural role: localization, symmetry, damping, and global synthesis.
Axioms 15 00192 g001
Figure 2. Three-dimensional scatter comparison of operator outputs for the thermal diffusion benchmark. The figure contrasts the exact analytical solution with operator-based predictions (ONHSH, FNO, Geo-FNO, NOGaP, Convolution, and Gaussian smoothing). The colormap emphasizes temperature variations, illustrating the ability of ONHSH to preserve both global diffusion patterns and localized structures more accurately than baseline models.
Figure 2. Three-dimensional scatter comparison of operator outputs for the thermal diffusion benchmark. The figure contrasts the exact analytical solution with operator-based predictions (ONHSH, FNO, Geo-FNO, NOGaP, Convolution, and Gaussian smoothing). The colormap emphasizes temperature variations, illustrating the ability of ONHSH to preserve both global diffusion patterns and localized structures more accurately than baseline models.
Axioms 15 00192 g002
Figure 3. Two-dimensional slice comparison of thermal diffusion fields across different neural operator architectures. The exact analytical solution is contrasted with ONHSH, FNO, Geo-FNO, NOGaP, Convolution, and Gaussian smoothing outputs. The colormap combined with white isothermal contours enhances the visualization of thermal gradients, highlighting ONHSH’s ability to preserve fine-scale anisotropic structures more effectively than baseline models.
Figure 3. Two-dimensional slice comparison of thermal diffusion fields across different neural operator architectures. The exact analytical solution is contrasted with ONHSH, FNO, Geo-FNO, NOGaP, Convolution, and Gaussian smoothing outputs. The colormap combined with white isothermal contours enhances the visualization of thermal gradients, highlighting ONHSH’s ability to preserve fine-scale anisotropic structures more effectively than baseline models.
Axioms 15 00192 g003
Figure 4. Quantitative evaluation of operators using MAE, MSE, and RMSE. The Geo-FNO operator consistently achieves the lowest errors across all metrics, while ONHSH shows the highest deviations.
Figure 4. Quantitative evaluation of operators using MAE, MSE, and RMSE. The Geo-FNO operator consistently achieves the lowest errors across all metrics, while ONHSH shows the highest deviations.
Axioms 15 00192 g004
Figure 5. Algorithmic pipeline for benchmarking neural operators in three-dimensional thermal diffusion problems. The workflow integrates data generation, operator application, error quantification, and visualization to ensure a rigorous and comprehensive evaluation.
Figure 5. Algorithmic pipeline for benchmarking neural operators in three-dimensional thermal diffusion problems. The workflow integrates data generation, operator application, error quantification, and visualization to ensure a rigorous and comprehensive evaluation.
Axioms 15 00192 g005
Figure 6. MSE behavior as a function of grid size for different operators.
Figure 6. MSE behavior as a function of grid size for different operators.
Axioms 15 00192 g006
Figure 7. MSE behavior as a function of time for different operators.
Figure 7. MSE behavior as a function of time for different operators.
Axioms 15 00192 g007
Table 1. Comparison of Neural Operators.
Table 1. Comparison of Neural Operators.
OperatorMAEMSERMSEKey Strengths
Geo-FNO≈0.012≈0.0003≈0.018Geometric adaptivity, high accuracy
ONHSH≈0.278≈0.136≈0.369Theoretical rigor, hyperbolic symmetry
FNO≈0.215≈0.095≈0.295Stability, global spectral basis
NOGaP≈0.215≈0.102≈0.320Uncertainty quantification
Convolution≈0.215≈0.098≈0.313Simplicity, computational efficiency
Gaussian≈0.215≈0.100≈0.316Smoothness, noise reduction
Table 2. Comparison of Neural Operator Features.
Table 2. Comparison of Neural Operator Features.
FeatureONHSHFNOGeo-FNOClassical
Anisotropic Adaptivityyesnonono
Curved Domain Supportyesnoyesno
Modular Spectral Controlyesnonono
Theoretical Guaranteesyesnonono
Hyperbolic Symmetryyesnonono
Minimax-Optimal Ratesyesnonono
Table 3. Thermal diffusion: summary of error metrics (lower is better). Values match the manuscript’s quantitative section and figures.
Table 3. Thermal diffusion: summary of error metrics (lower is better). Values match the manuscript’s quantitative section and figures.
OperatorMAEMSERMSE
Geo-FNO≈0.012≈0.0003≈0.018
ONHSH≈0.278≈0.136≈0.369
FNO≈0.215≈0.095–0.102≈0.295–0.320
NOGaP≈0.215≈0.102≈0.320
Conv.≈0.215≈0.098≈0.313
Gaussian≈0.215≈0.100≈0.316
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Santos, R.D.C.d.; Sales, J.H.d.O. Ramanujan–Santos–Sales Hypermodular Operator Theorem and Spectral Kernels for Geometry-Adaptive Neural Operators in Anisotropic Besov Spaces. Axioms 2026, 15, 192. https://doi.org/10.3390/axioms15030192

AMA Style

Santos RDCd, Sales JHdO. Ramanujan–Santos–Sales Hypermodular Operator Theorem and Spectral Kernels for Geometry-Adaptive Neural Operators in Anisotropic Besov Spaces. Axioms. 2026; 15(3):192. https://doi.org/10.3390/axioms15030192

Chicago/Turabian Style

Santos, Rômulo Damasclin Chaves dos, and Jorge Henrique de Oliveira Sales. 2026. "Ramanujan–Santos–Sales Hypermodular Operator Theorem and Spectral Kernels for Geometry-Adaptive Neural Operators in Anisotropic Besov Spaces" Axioms 15, no. 3: 192. https://doi.org/10.3390/axioms15030192

APA Style

Santos, R. D. C. d., & Sales, J. H. d. O. (2026). Ramanujan–Santos–Sales Hypermodular Operator Theorem and Spectral Kernels for Geometry-Adaptive Neural Operators in Anisotropic Besov Spaces. Axioms, 15(3), 192. https://doi.org/10.3390/axioms15030192

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop