Abstract
We give a systematic investigation on the reproducing property of the zonal translation network and apply this property to kernel regularized regression. We propose the concept of the Marcinkiewicz–Zygmund setting (MZS) for the scattered nodes collected from the unit sphere. We show that under the MZ condition, the corresponding convolutional zonal translation network is a reproducing kernel Hilbert space. Based on these facts, we propose a kind of kernel regularized regression learning framework and provide the upper bound estimate for the learning rate. We also give proof for the density of the zonal translation network with spherical Fourier-Laplace series.
Keywords:
kernel regularized regression; learning theory; convolution translation network; reproducing kernel Hilbert space; Marcinkiewicz–Zygmund inequality; quadrature rule; learning rate MSC:
41A25
1. Introduction
It is known that convolutional neural networks provide various models and algorithms to process data models in many fields, such as computer vision (see [1]), natural language processing (see [2]), and sequence analysis in bioinformatics (see [3]). Regularized neural network learning has thus become an attractive research topic (see [4,5,6,7,8,9]). In this paper, we shall give theory analysis for the learning rate of regularized regression associated with the zonal translation network on the unit sphere.
1.1. Kernel Regularized Learning
Let be a compact subset in the d-dimensional Euclidean space with the usual norm for and Y be a nonempty closed subset contained in for a given . The aim of regression learning is to learn the target function, which describes the relationship between the input and the output from a hypothesis function space. In most cases, the target function is offered as a set of observations drawn independently and identically distributed (i.i.d.) according to a probability joint distribution (measure) on , where is the conditional probability of y for a given x and is the marginal probability about x, i.e., for every integrable functions , there hold
For a given normed space consisting of real functions on , we define the regularized learning framework with B as
where is the regularization parameter, is the empirical mean
To give theory analysis for the convergence of algorithm (1) quantitatively, we often use the integral framework (see [10,11])
where .
The optimal target function is the regression function
satisfying
where the inf is taken over all the -measurable functions f. Moreover, there holds the famous equality (see [12])
The choices for the hypothesis space B in (1) are rich. For example, C.P. An et al. choose the algebraic polynomial class as B (see [13,14,15]). In [16], C. De Mol et al chose the dictionary as B. Recently, some papers chose the Sobolev space as the hypothesis space B (see [17,18]). By the kernel method, we mean traditionally replacing B with a reproducing kernel Hilbert space (RKHS) , which is a Hilbert space consisting of real functions defined on , and there is a Mercer kernel on (i.e., is a continuous and symmetric function on and for any and any the Mercer matrices are positive semi-definite) such that
and there holds the embedded inequality
where c is a constant independent of f and x. There are two results for the optimal solution . The reproducing property (3) yields the representation
The embedded inequality (4) yields the inequality
1.2. Marcinkiewicz-Zygmund Setting (MZS)
It is particularly important to mention here that the translation network has recently been used as the hypothesis space of regularized learning (see [25,26]). From the view of approximation theory, a simple single-layer translation network with m neurons is a function class produced by translating a given function , and can be written as
where is a given node set, and a given is a translation operator corresponding to . For example, when and , we choose as the usual convolution translation operator for a defined on or an -periodic function (see [27,28]. When is the unit sphere in , one can choose as the zonal translation operator for a given defined on the interval (see [29]). In [30], we defined a kind of translation operator for . To ensure that the single-layer translation network can approximate the constant function, is modified as
In the case of and , R.D.Nowak et al. use (7) to design regularized learning frameworks (see [31]). An algorithm is provided by S.B. Lin et al. [26] for designing such a kind of network, and is applied to construct regularized learning algorithms. In [5], was used to construct deep neural network learning frameworks. The same type of investigations are given in [32,33,34].
It is easy to see that the approximation ability and the construction of a translation network depend upon the node set (see [35,36,37]). On the other hand, according to the view of [38], the quadrature rule and the Marcinkiewicz–Zygmund (MZ) inequality associated with also influence the construction of the translation network . For a bounded closed set , with measure satisfying . We denote by the linear space of polynomials on of a degree of, at most, n, equipped with the -inner product The n-point quadrature rule (QR) is
where and weights are all positive for We say the QR (8) has polynomial exactness n if there is a positive integer such that
The Marcinkiewicz–Zygmund (MZ) inequality based on the set is
where the weights in (10) may be not the same as the in (8) and (9). Another important inequality, which is called the MZ condition in the case of unit sphere in [39], associated with polynomial approximation, is
where C is a constant independent of p, and n and r are any positive integers.
(i) In many cases of the domain , relations (9)–(11) are coexistent. For example, when , (9)–(11) hold in the case that are the zeros of the n-th Jacobi polynomial orthogonal with respect to , and are the Cotes–Christoffel numbers associated with (see Theorem A and Theorem B in [40] and Theorem 3.4.1 in [41]). H.N. Mhaskar et al first showed in [42] that (9) and (10) are coexistent on the unit sphere , and the corresponding relation (11) was shown in [43].
Accord to the view of [38], the quadrature rule (QR) follows automatically from the Marcinkiewicz–Zygmund (MZ) inequality in many cases of . H.N. Mhaskar et al. gave a general method of transition from MZ inequality to the polynomial exact QR in [44]; see also Theorem 4.1 in [42]. In particular, in the case of , (10) may be obtained from (9) and (11) directly (see [45]).
These show that besides the polynomial exactness formula (QR) (9), the MZ inequality (10) is also an important feature for describing the node set . For this reason, the node set , which yields an MZ inequality, is given a special terminology, called the Marcinkiewicz–Zygmynd Family (MZF) (see [38,46,47,48]). However, from this literature, we know that the MZF does not totally coincide with the Lagrange interpolation nodes in the case of . The hyperinterpolations are then developed with the help of exact QR (see [49,50,51,52,53]) and are applied to approximation theory and regularized learning (see [13,14,15,54]). On the other hand, we find that the problem of the polynomial exact QR is investigated individually (see [55,56]). The concept of spherical t-design was first defined in [57], and has been given investigations by many papers subsequently; one can see the classical references [58,59]. We say is a spherical t-design if
where is the volume of and is a spherical polynomial with the degree t. Moreover, in many applications, the polynomial exact QR and the MZF have been used as assumptions. For example, C.P. An et al. gave an approximation order for the hyperinterpolation approximation under the assumptions that (9), (12), and the MZ inequality (10) hold (see [60,61]). Also, in [25], Lin et al. gave investigations on regularized regression associated with a zonal translation network by assuming that the node set is a type of spherical t-design.
The polynomial exact QR is also a good tool in approximation theory. For example, H.N. Mhaskar et al used a polynomial exact QR to construct the first periodic translation operators (see [27]) and the zonal translation network operators (see [29]). Along this line, the translation operators defined on the unit ball, on the Euclidean space , and on the interval are constructed (see [28,30,62]).
The above investigations encourage us to define a terminology that contains both the Marcinkiewicz–Zygmynd family (MZF) and the polynomial exact QR: we call it the Marcinkiewicz–Zygmund setting (MZS).
Definition 1
(Marcinkiewicz–Zygmund setting (MZS) on Ω). We say a given finite node set forms a Marcinkiewicz–Zygmund setting on Ω if (9)–(11) simultaneously hold.
In this paper, we design the translation network by taking , assuming that satisfies MZS and choosing the zonal translation with being a given integrable function on . Under these assumptions, we provide a learning framework with being the hypothesis space, and show the learning rate.
The contributions of this paper are twofold. First, after absorbing the ideas of [38,46,47,48] and the successful experience of [13,25,27,29,42,60,61,63], we propose the concept of the Marcinkiewicz–Zygmund setting (MZS) for scattered nodes on the sphere unit; based on this assumption, we show the convergence rate for the approximation error of kernel regularized learning associated with spherical Fourier analysis. Second, we give a new application of translation network and, at the same way, expand the application scopes of kernel regularized learning.
The paper is organized as follows. In Section 2, we first show the density for the zonal translation class and then show the reproducing property for the translation network . In Section 3, we provide the results in the present paper, for example, a new regression learning framework and a learning setting, the error decomposition for the error analysis, and an estimate for the convergence rate. In Section 4, we give several lemmas, which are used to prove the main results. The proofs for all the theorems and propositions are given in Section 5.
Throughout the paper, we write if there is a positive constant C independent of A and B such that . In particular, by we show that A is a bounded quantity. We write A∼B if both and .
2. The Properties of the Translation Networks on the Unit Sphere
Let . Then, H.N. Mhaskar et al constructed in [29] a sequence of approximation operators to show that the zonal translation class
is dense in if for all where
and is the n-th Legendre polynomial satisfies the orthogonal relation
with , , and it is known that (see from (B.2.1), (B.2.2) and (B.5.1) of [64]) It follows that
where .
Let denote the space of all homogeneous polynomials of degree n in d variables. We denote by the class of all measurable functions defined on with the finite norm
and for , we assume that is the space of continuous functions on with the uniform norm.
For a given integer , the restriction to of a homogeneous harmonic polynomial of degree n is called the spherical harmonics of degree n. If , then , so that Y is determined by its restriction on the unit sphere. Let denote the space of the spherical harmonics of degree n. Then,
Spherical harmonics of different degrees are orthogonal on the unit sphere. For further properties about harmonics, one can refer to [65].
For let be an orthonormal basis of . Then,
where denotes the surface area of and . Furthermore, by (1.2.8) in [64], we have,
where is the n-th generalized Legendre polynomial, the same as in (13). Combining (13) and (14), we have
Also, there holds the Funk–Hecke formula (see (1.2.11) in [64] or (1.2.6) in [66]):
In particular, there holds
For a we define . Then,
where It is known that (see (6.1.4) in [66])
2.1. Density
We first give a general discrimination method for density.
Proposition 1
(see Lemma 1 in Chapter 18 of [67]). For a subset V in a normed linear space E, the following two properties are equivalent:
(a) V is fundamental in E (that is, its linear span is dense in E).
(b) (that is, 0 is the only element of that annihilates V).
Based on this proposition, we can show the density of in a qualitative way.
Theorem 1.
Let satisfy for all Then, is dense in .
Proof.
See the proof in Section 5. □
We can quantitatively show the density of in .
Let denote the set of all continuous functions defined on and
Define a differential operator as
and
Theorem 2.
Let be sufficient smoothness (for example, for all ) and satisfy for all Then, for a given and , there is a such that
Proof.
See the proof in Section 5. □
2.2. MZS on the Unit Sphere
We first restate a proposition.
Proposition 2.
There is a finite subset and positive constants such that for any given , we have two nonnegative number sets and satisfying
such that
and
Moreover, for any and , there exists a constant such that
and
Proof.
Inequalities (21)–(22) were proved by H.N. Mhaskar et al in [42] or see [29], which have now been extended to other domains (see [68]). Inequality (24) may be found from [29]. Inequality (23) is followed by (22) and the following facts (see Theorem 2.1 in [43]):
Suppose that Ω is a finite subset of , is a set of positive numbers, and n is a positive integer. If for a holds the inequality
with independent of f, then for any and any with
where depends only on d and p.
Proposition 3.
For any given , there exists a finite subset , which forms an MZS on .
Proof.
The results follow from (21)–(23). □
2.3. The Reproducing Property
Let be a given even function. For a given finite set , i.e., , and the corresponding finite number set , we define a zonal translation network as
where . Then, it is easy to see that
where for , and we define and
For , we define a bivariate operation as
and
Because of (15), we see by Theorem 4 in Chapter 17 of [67] that the matrix is positive and definite for a given n.
It follows that the vector is unique. Then, for a given n is a finite-dimensional Hilbert space, whose dimensional , and is isometrically isomorphic with , where
Since is a finite-dimensional Hilbert space, we know by Theorem A in section 3 of Part I in [69] that must be a reproducing kernel Hilbert space; what we need to do is to find the reproducing kernel.
We have a proposition.
Proposition 4.
Let satisfy and
If satisfies the MZ condition (23) by , then is a finite-dimensional reproducing kernel Hilbert space associated with the kernel
i.e.,
and there is a constant such that
Proof.
See the proof in Section 5. □
Corollary 1.
Under the assumptions of Proposition 4, is a finite-dimensional reproducing kernel Hilbert space associated with the inner product defined by
where
and the corresponding reproducing kernel is
Furthermore, there is a constant such that
Proof.
The results can be obtained from Proposition 4, and the fact that the real set R is a reproducing kernel Hilbert space whose reproducing kernel is 1, and the inner product is the usual product of two real numbers. □
Corollary 2.
Proof.
See the proof in Section 5. □
3. Apply to Kernel Regularized Regression
We now apply the above reproducing kernel Hilbert spaces to kernel regularized regression.
3.1. Learning Framework
For a set of observations drawn i.i.d. according to a joint distribution on is a given real number, satisfying ; we define a regularized framework as
where are the regularization parameters, and
It can be seen that the n in (31) may be different from the sample number m. But it can be chosen according to our needs for the purpose of increasing the learning rate.
To show the convergence analysis for (31), we need to bound the error
which is an approximation problem whose convergence rate depends upon the approximation ability of . An error decomposition will be given in Section 3.2.
3.2. Error Decompositions
By (2) and the definition of , we have
where we have used the fact that for and , there holds
and
is a K-functional that denotes the approximation error, whose decay will be described later. So, the main estimate that we need to deal with is the sample error
3.3. Convergence Rate for the K-Functional
We first provide a convergence rate for the K-functional .
Proposition 5.
Let satisfy for all , and let be a positive integer. Then, there is a , which forms an MZS on such that
where
Proof.
See the proof in Section 5. □
Corollary 3.
Let satisfy for all , and for a given l, there holds . If are chosen such that and , then
3.4. The Learning Rate
Theorem 3.
Let satisfy (25) and
If , then we have a constant such that for any , with confidence , holds
If , then we have a constant such that for any , with confidence , holds
Proof.
See the proof in Section 5. □
Corollary 4.
Let satisfy (25) and
If , then we have a constant such that for any , with confidence , holds
If , then we have a constant such that for any , with confidence holds
Proof.
See the proof in Section 5. □
Corollary 5.
Let satisfy for all and be the MZS defined as in Proposition 5. If , then for any , with confidence , holds
3.5. Comments
We propose the concept of MZS for a scattered node set on the unit sphere, with which we show that the related convolutional zonal translation network is a reproducing kernel Hilbert space, and we show the learning rate for the kernel regularized least square regression model. We give further comments for the results.
(1) The zonal translation network that we have chosen is a finite-dimensional reproducing kernel Hilbert space; our discussions belong to the scope of the kernel method, which is a combination and application of (neural) translation networks with learning theory.
(2) Compared with the existing convergence rate estimate of neural network learning, our upper estimates are dimensional-independent (see Theorem in [25], Theorem 3.1 in [70], Theorem 7 in [71], Theorem 1 in [26]).
(3) The density derivation in Theorem 1 for the zonal translation network is qualitative; the density deduction in Theorem 2 is quantitative with the help of spherical Fourier analysis. We think that this method can be extended to other domains such as the unit ball, the Euclidean space and , et al.
(4) It is hopeful that with the help of the MZ-condition, one may show the reproducing property for a deep translation network, and thus give investigations for the performance of deep convolutional translation learning with the kernel method (see [6,7,33]).
(5) We provide a method of constructing a finite-dimensional reproducing kernel Hilbert space with a convolutional kernel defined on the domains having a near-best-approximation operator, for example, the interval , the unit sphere , and the unit ball , etc. (see [64]). The only assumption that we need is (25). The set may be any finite scattered sets satisfying the MZ condition (23), whose parameters can be obtained according to (5.3.5) in Theorem 5.3.6 of [64], as we know that this is the first time that the reproducing property of the zonal neural network is shown.
(6) In many research references, to obtain an explicit learning rate, one often assumes that the approximation error (i.e., the K-functional) has a decay of power convergence, i.e., It was proved in [72] that the K-functional is equivalent to a modulus of smoothness. In this paper, it is the first time that an upper estimate for the convergence rate of the K-functional has been provided (see (34)).
(7) One advantage of framework (31) is that it is a finite-dimensional quadratic nonlinear strict convex optimization problem; since the structure of , so the optimal solution for (31) is unique and can be obtained by the gradient descent algorithm.
(8) It is easy to see that the optimal solution depends upon both the distribution and the function . How to quantitatively describe the influence level, i.e., the robustness of with respect to and , is a significant research direction; for such kinds of research, one can refer to references [73,74,75].
(9) Combining the upper estimate (40) and convergence (35), we know that if are chosen such that and , then for any , with confidence , holds the convergence
Convergence (41) shows that under these assumptions, algorithm (31) is convergent.
(10) We now make a comparison of the learning framework in present paper with the general learning framework (1) associated with a reproducing kernel Hilbert space . The Theorem 1 in [76] shows the sample error
Inequality (42) is a fundamental inequality for obtaining the optimal learning rate with the integral operator approach (see Theorem 2 in [76]). Inequality (37) in Theorem 3 shows that the sample error estimate (42) also holds for (31). But is a finite-dimensional proper subset of , which is a reproducing kernel Hilbert space. These show that the learning framework (1) may obtain algorithm convergence as well as the optimal learning rate if and are chosen properly.
(11) In this paper, we have shown our idea of the kernel regularized translation network learning with the zonal translation network. The essence is an application of the MZ inequality and the exact QR, or the MZ-condition and the MZS. We conjecture that the results in the present may be extended to many other translation networks whose domains satisfy the MZ-condition and the MZS, for example, the periodic translation network (see [27]), the translation on the interval (see [30]), the unit sphere (see [29]), and the unit ball (see [62]).
(12) Recently, the exact spherical QR is used to investigate the convergence for the spherical scattered data-fitting problem (see [25,77,78]). The Tikhonov regularization model used is the following type:
where is a native space of the type , is a scattered set, and and are the positive numbers defined in the polynomial exact QR as in (21); is the target function to be fitted. It is hopeful that the method used in the present may be used to investigate the convergence of the algorithm
where and are defined as in Section 2.
4. Lemmas
To give a capacity-independent generalization error for algorithm (31), we need some concepts of convex analysis.
differentiable. Let be a Hilbert space and be a real function. We say that F is differentiable at if there is an such that for any , there holds
and we write or . It is known that for a differentiable convex function, on H if and only if . (see Proposition 17.4 in [79]).
To prove the main results, we need some lemmas.
Lemma 1.
Let be a Hilbert space, ξ be a random variable on with values in H, and be independent sample drawers of ρ. Assume that almost surely. Denote . Then, for any , with confidence , holds
Proof.
Find it from [76]. □
Lemma 2.
Let be a Hilbert space over X with respect to kernel K. If and be closed subspaces of H such that and ; then, , where L and M are the reproducing kernels of E and F, respectively. Moreover, for , we have
Proof.
See Corollary 1 in Chapter 31 of [67] or the Theorem in Section 6 in part I of [69]. □
Lemma 3.
There hold the following equalities:
and
Proof of (44). By the equality
we have
Since and the definition of is derivative, we have by the above equality that
We then have (44). By the same way, we can have (45).
Lemma 4.
Let be a Hilbert space consisting of real functions on X. Then,
and
Proof.
Equality (47) is the deformation of the parallelogram formula. Equality (48) can be shown with (47).
□
Lemma 5.
Framework (31) has a unique solution and (32) has a unique solution . Moreover, There holds the bound
where κ is defined as in (30).
There hold the equality
and the equality
Proof.
Lemma 6.
The solutions and satisfy the inequality
where
5. Proof for Theorems and Propositions
Proof of Theorem 1.
If is not dense in , i.e.,
then by (b) in Proposition 1, we know , and there is a nonzero functional such that
Proof of Theorem 2.
For a nonnegative function satisfying (a) and , or (b) , we define a near-best-approximation operator as
Then, by [80], we know
, and for any , there hold ,
and
where
By the same way, we define a near-best-approximation operator as
Then, it is known that (see Lemma 4.1.1 in [66] or see Theorem 2.6.3 in [64]) and for , and there is a constant such that for any
Since , we have
On the other hand, by (16), we have for and that
Since , we have by (21) that
Define an operator as
Then, and
It follows that
where
Because of (22) and (20), we have
where we have used the fact that . It follows that
where we have used equality (18) and . Take (62) into (61). Then,
Since depends upon n and , we can choose sufficient l and N such that
Also, since for , we have for sufficient large n that
Proof of Proposition 4.
By the definition of and the definition kernel in (26), we have for any that
Kernel (27) then holds. We now show (28). In fact, by Cauchy’s inequality, we have
On the other hand, by the Minkowski inequality and inequality (23), we have
where we have used (17), (57), and (25). Kernel (28) thus holds, where we have used (16). □
Proof of Corollary 2.
Since is defined as in Proposition 2, we know and , and condition (25) is satisfied with ; we then have the results of Corollary 2 by Proposition 4.
□
Proof of Proposition 5.
Since , we have
where
and
Author Contributions
Conceptualization, X.R.; methodology, B.S.; validation, X.R.; formal analysis, X.R.; resources, B.S. and S.W.; writing—original draft preparation, X.R. and B.S.; writing—review and editing, B.S. and S.W.; supervision, B.S. All authors have read and agreed to the published version of the manuscript.
Funding
The work is supported by the National Natural Science Foundation of China under Grants No. 61877039, the NSFC/RGC Joint Research Scheme (Project No. 12061160462 and N_CityU102/20) of China and Natural Science Foundation of Jiangxi Province of China (20232BAB201021).
Data Availability Statement
The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.-V.; Norouzi, M.; Macherey, W.; Cao, Y.; Gao, Q. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv 2016, arXiv:1609.08144. [Google Scholar]
- Alipanahi, B.; Delong, A.; Weirauch, M.T.; Frey, B.J. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 2015, 33, 831–838. [Google Scholar] [CrossRef] [PubMed]
- Chui, C.K.; Lin, S.-B.; Zhou, D.-X. Construction of neural networks for realization of localized deep learning. arXiv 2018, arXiv:1803.03503. [Google Scholar] [CrossRef]
- Chui, C.K.; Lin, S.-B.; Zhou, D.-X. Deep neural networks for rotation-invariance approximation and learning. Anal. Appl. 2019, 17, 737–772. [Google Scholar] [CrossRef]
- Fang, Z.-Y.; Feng, H.; Huang, S.; Zhou, D.-X. Theory of deep convolutional neural networks II: Spherical analysis. Neural Netw. 2020, 131, 154–162. [Google Scholar] [CrossRef]
- Feng, H.; Huang, S.; Zhou, D.-X. Generalization analysis of CNNs for classification on spheres. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 6200–6213. [Google Scholar] [CrossRef]
- Zhou, D.-X. Deep distributed convolutional neural networks: Universality. Anal. Appl. 2018, 16, 895–919. [Google Scholar] [CrossRef]
- Zhou, D.-X. Universality of deep convolutional neural networks. Appl. Comput. Harmon. Anal. 2020, 48, 787–794. [Google Scholar] [CrossRef]
- Cucker, F.; Zhou, D.-X. Learning Theory: An Approximation Theory Viewpoint; Cambridge University Press: New York, NY, USA, 2007. [Google Scholar]
- Steinwart, I.; Christmann, A. Support Vector Machines; Springer: New York, NY, USA, 2008. [Google Scholar]
- Cucker, F.; Smale, S. On the mathematical foundations of learning. Bull. Amer. Math. Soc. 2001, 39, 1–49. [Google Scholar] [CrossRef]
- An, C.-P.; Chen, X.-J.; Sloan, I.H.; Womersley, R.S. Regularized least squares approximations on the sphere using spherical designs. SIAM J. Numer. Anal. 2012, 50, 1513–1534. [Google Scholar] [CrossRef][Green Version]
- An, C.-P.; Wu, H.-N. Lasso hyperinterpolation over general regions. SIAM J. Sci. Comput. 2021, 43, A3967–A3991. [Google Scholar] [CrossRef]
- An, C.-P.; Ran, J.-S. Hard thresholding hyperinterpolation over general regions. arXiv 2023, arXiv:2209.14634. [Google Scholar]
- De Mol, C.; De Vito, E.; Rosasco, L. Elastic-net regularization in learning theory. J. Complex. 2009, 25, 201–230. [Google Scholar] [CrossRef]
- Fischer, S.; Steinwart, I. Sobolev norm learning rates for regularized least-squares algorithms. J. Mach. Learn. Res. 2020, 21, 8464–8501. [Google Scholar]
- Lai, J.-F.; Li, Z.-F.; Huang, D.-G.; Lin, Q. The optimality of kernel classifiers in Sobolev space. arXiv 2024, arXiv:2402.01148. [Google Scholar]
- Sun, H.-W.; Wu, Q. Least square regression with indefinite kernels and coefficient regularization. Appl. Comput. Harmon. Anal. 2011, 30, 96–109. [Google Scholar] [CrossRef]
- Wu, Q.; Zhou, D.-X. Learning with sample dependent hypothesis spaces. Comput. Math. Appl. 2008, 56, 2896–2907. [Google Scholar] [CrossRef]
- Chen, H.; Wu, J.-T.; Chen, D.-R. Semi-supervised learning for regression based on the diffusion matrix. Sci. Sin. Math. 2014, 44, 399–408. (In Chinese) [Google Scholar]
- Sun, X.-J.; Sheng, B.-H. The learning rate of kernel regularized regression associated with a correntropy-induced loss. Adv. Math. 2024, 53, 633–652. [Google Scholar]
- Wu, Q.; Zhou, D.-X. Analysis of support vector machine classification. J. Comput. Anal. Appl. 2006, 8, 99–119. [Google Scholar]
- Sheng, B.-H. Reproducing property of bounded linear operators and kernel regularized least square regressions. Int. J. Wavelets Multiresolut. Inf. Process. 2024, 22, 2450013. [Google Scholar] [CrossRef]
- Lin, S.-B.; Wang, D.; Zhou, D.-X. Sketching with spherical designs for noisy data fitting on spheres. SIAM J. Sci. Comput. 2024, 46, A313–A337. [Google Scholar] [CrossRef]
- Lin, S.-B.; Zeng, J.-S.; Zhang, X.-Q. Constructive neural network learning. IEEE Trans. Cybern. 2019, 49, 221–232. [Google Scholar] [CrossRef]
- Mhaskar, H.N.; Micchelli, C.A. Degree of approximation by neural and translation networks with single hidden layer. Adv. Appl. Math. 1995, 16, 151–183. [Google Scholar] [CrossRef]
- Sheng, B.-H.; Zhou, S.-P.; Li, H.-T. On approximation by tramslation networks in Lp(Rk) spaces. Adv. Math. 2007, 36, 29–38. [Google Scholar]
- Mhaskar, H.N.; Narcowich, F.J.; Ward, J.D. Approximation properties of zonal function networks using scattered data on the sphere. Adv. Comput. Math. 1999, 11, 121–137. [Google Scholar] [CrossRef]
- Sheng, B.-H. On approximation by reproducing kernel spaces in weighted Lp-spaces. J. Syst. Sci. Complex. 2007, 20, 623–638. [Google Scholar] [CrossRef]
- Parhi, R.; Nowak, R.D. Banach space representer theorems for neural networks and ridge splines. J. Mach. Learn. Res. 2021, 22, 1–40. [Google Scholar]
- Oono, K.; Suzuki, Y.J. Approximation and non-parameteric estimate of ResNet-type convolutional neural networks. arXiv 2023, arXiv:1903.10047. [Google Scholar]
- Shen, G.-H.; Jiao, Y.-L.; Lin, Y.-Y.; Huang, J. Non-asymptotic excess risk bounds for classification with deep convolutional neural networks. arXiv 2021, arXiv:2105.00292. [Google Scholar]
- Mallat, S. Understanding deep convolutional networks. Phil. Trans. R. Soc. A 2016, 374, 20150203. [Google Scholar] [CrossRef] [PubMed]
- Narcowich, F.J.; Ward, J.D.; Wendland, H. Sobolev error estimates and a Bernstein inequality for scattered data interpolation via radial basis functions. Constr. Approx. 2006, 24, 175–186. [Google Scholar] [CrossRef]
- Narcowich, F.J.; Ward, J.D. Scattered data interpolation on spheres: Error estimates and locally supported basis functions. SIAM J. Math. Anal. 2002, 33, 1393–1410. [Google Scholar] [CrossRef]
- Narcowich, F.J.; Sun, X.P.; Ward, J.D.; Wendland, H. Direct and inverse Sobolev error estimates for scattered data interpolation via spherical basis functions. Found. Comput. Math. 2007, 7, 369–390. [Google Scholar] [CrossRef]
- Gröchenig, K. Sampling, Marcinkiewicz-Zygmund inequalities, approximation and quadrature rules. J. Approx. Theory 2020, 257, 105455. [Google Scholar] [CrossRef]
- Gia, Q.T.L.; Mhaskar, H.N. Localized linear polynomial operators and quadrature formulas on the sphere. SIAM J. Numer. Anal. 2008, 47, 440–466. [Google Scholar] [CrossRef]
- Xu, Y. The Marcinkiewicz-Zygmund inequalities with derivatives. Approx. Theory Its Appl. 1991, 7, 100–107. [Google Scholar] [CrossRef]
- Szegö, G. Orthogonal Polynomials; American Mathematical Society: New York, NY, USA, 1967. [Google Scholar]
- Mhaskar, H.N.; Narcowich, F.J.; Ward, J.D. Spherical Marcinkiewicz-Zygmund inequalities and positive quadratue. Math. Comput. 2001, 70, 1113–1130, Corrigendum in Math. Comp. 2001, 71, 453–454. [Google Scholar] [CrossRef]
- Dai, F. On generalized hyperinterpolation on the sphere. Proc. Amer. Math. Soc. 2006, 134, 2931–2941. [Google Scholar] [CrossRef]
- Mhaskar, H.N.; Narcowich, F.J.; Sivakumar, N.; Ward, J.D. Approximation with interpolatory constraints. Proc. Amer. Math. Soc. 2001, 130, 1355–1364. [Google Scholar] [CrossRef]
- Xu, Y. Mean convergence of generalized Jacobi series and interpolating polynomials, II. J. Approx. Theory 1994, 76, 77–92. [Google Scholar] [CrossRef]
- Marzo, J. Marcinkiewicz-Zygmund inequalities and interpolation by spherical harmonics. J. Funct. Anal. 2007, 250, 559–587. [Google Scholar] [CrossRef]
- Marzo, J.; Pridhnani, B. Sufficiant conditions for sampling and interpolation on the sphere. Constr. Approx. 2014, 40, 241–257. [Google Scholar] [CrossRef][Green Version]
- Wang, H.P. Marcinkiewicz-Zygmund inequalities and interpolation by spherical polynomials with respect to doubling weights. J. Math. Anal. Appl. 2015, 423, 1630–1649. [Google Scholar] [CrossRef]
- Gia, T.L.; Slon, I.H. The nuiform norm of hyperinterpolation on the unit sphere in an arbitrary number of dimensions. Constr. Approx. 2001, 17, 249–265. [Google Scholar] [CrossRef]
- Sloan, I.H. Polynomial interpolation and hyperinterpolation over general regions. J.Approx.Theory 1995, 83, 238–254. [Google Scholar] [CrossRef]
- Sloan, I.H.; Womersley, R.S. Constructive polynomial approximation on the sphere. J. Approx. Theory 2000, 103, 91–118. [Google Scholar] [CrossRef]
- Wang, H.-P. Optimal lower estimates for the worst case cubature error and the approximation by hyperinterpolation operators in the Sobolev space sertting on the sphere. Int. J. Wavelets Multiresolut. Inf. Process. 2009, 7, 813–823. [Google Scholar] [CrossRef]
- Wang, H.-P.; Wang, K.; Wang, X.-L. On the norm of the hyperinterpolation operator on the d-dimensional cube. Comput. Appl. 2014, 68, 632–638. [Google Scholar]
- Sloan, I.H.; Womersley, R.S. Filtered hyperinterpolation: A constructive polynomial approximation on the sphere. Int. J. Geomath. 2012, 3, 95–117. [Google Scholar] [CrossRef]
- Bondarenko, A.; Radchenko, D.; Viazovska, M. Well-seperated spherical designs. Constr. Approx. 2015, 41, 93–112. [Google Scholar] [CrossRef]
- Hesse, K.; Womersley, R.S. Numerical integration with polynomial exactness over a spherical cap. Adv. Math. Math. 2012, 36, 451–483. [Google Scholar] [CrossRef]
- Delsarte, P.; Goethals, J.M.; Seidel, J.J. Spherical codes and designs. Geom. Dedicata 1977, 6, 363–388. [Google Scholar] [CrossRef]
- An, C.-P.; Chen, X.-J.; Sloan, I.H.; Womersley, R.S. Well conditioned spherical designs for integration and interpolation on the two-sphere. SIAM J. Numer. Anal. 2010, 48, 2135–2157. [Google Scholar] [CrossRef]
- Chen, X.; Frommer, A.; Lang, B. Computational existence proof for spherical t-designs. Numer. Math. 2010, 117, 289–305. [Google Scholar] [CrossRef]
- An, C.-P.; Wu, H.-N. Bypassing the quadrature exactness assumption of hyperinterpolation on the sphere. J. Complex. 2024, 80, 101789. [Google Scholar] [CrossRef]
- An, C.-P.; Wu, H.-N. On the quadrature exactness in hyperinterpolation. BIT Numer. Math. 2022, 62, 1899–1919. [Google Scholar] [CrossRef]
- Sun, X.-J.; Sheng, B.-H.; Liu, L.; Pan, X.-L. On the density of translation networks defined on the unit ball. Math. Found. Comput. 2024, 7, 386–404. [Google Scholar] [CrossRef]
- Wang, H.-P.; Wang, K. Optimal recovery of Besov classes of generalized smoothness and Sobolev class on the sphere. J. Complex. 2016, 32, 40–52. [Google Scholar] [CrossRef]
- Dai, F.; Xu, Y. Approximation Theory and Harmonic Analysis on Spheres and Balls; Springer: New York, NY, USA, 2013. [Google Scholar]
- Müller, C. Spherical Harmonic; Springer: Berlin/Heidelberg, Germany, 1966. [Google Scholar]
- Wang, K.-Y.; Li, L.-Q. Harmonic Analysis and Approximation on the Unit Sphere; Science Press: New York, NY, USA, 2000. [Google Scholar]
- Cheney, W.; Light, W. A Course in Approximation Theory; China Machine Press: Beijing, China, 2004. [Google Scholar]
- Dai, F.; Wang, H.-P. Positive cubature formulas and Marcinkiewicz-Zygmund inequalities on spherical caps. Constr. Approx. 2010, 31, 1–36. [Google Scholar] [CrossRef][Green Version]
- Aronszajn, N. Theory of reproducing kernels. Trans. Amer. Math. Soc. 1950, 68, 337–404. [Google Scholar] [CrossRef]
- Lin, S.-B.; Wang, Y.-G.; Zhou, D.-X. Distributed filtered hyperinterpolation for noisy data on the sphere. SIAM J. Numer. Anal. 2021, 59, 634–659. [Google Scholar] [CrossRef]
- Montúfar, G.; Wang, Y.-G. Distributed learning via filtered hyperinterpolation on manifolds. Found. Comput. Math. 2022, 22, 1219–1271. [Google Scholar] [CrossRef]
- Sheng, B.-H.; Wang, J.-L. Moduli of smoothness, K-functionals and Jackson-type inequalities associated with kernel function approximation in learning theory. Anal. Appl. 2024, 22, 981–1022. [Google Scholar] [CrossRef]
- Christmann, A.; Xiang, D.-H.; Zhou, D.-X. Total stability of kernel methods. Neurocomputing 2018, 289, 101–118. [Google Scholar] [CrossRef]
- Sheng, B.-H.; Liu, H.-X.; Wang, H.-M. The learning rate for the kernel regularized regression (KRR) with a differentiable strongly convex loss. Commun. Pure Appl. Anal. 2020, 19, 3973–4005. [Google Scholar] [CrossRef]
- Wang, S.-H.; Sheng, B.-H. Error analysis of kernel regularized pairwise learning with a strongly convex loss. Math. Found. Comput. 2023, 6, 625–650. [Google Scholar] [CrossRef]
- Smale, S.; Zhou, D.-X. Learning theory estimates via integral operators and their applications. Constr. Approx. 2007, 26, 153–172. [Google Scholar] [CrossRef]
- Lin, S.-B. Integral operator approaches for scattered data fitting on sphere. arXiv 2024, arXiv:2401.15294. [Google Scholar]
- Feng, H.; Lin, S.-B.; Zhou, D.-X. Radial basis function approximation with distributively stored data on spahere. Constr. Approx. 2024, 60, 1–31. [Google Scholar] [CrossRef]
- Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces; Springer: New York, NY, USA, 2010. [Google Scholar]
- Kyriazis, G.; Petrushev, P.; Xu, Y. Jacobi decomposition of weighted Triebel-Lizorkin and Besov spaces. Stud. Math. 2008, 186, 161–202. [Google Scholar] [CrossRef]
- Chen, W.; Ditzian, Z. Best approximation and K-functionals. Acta Math. Hung. 1997, 75, 165–208. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).