From Kernel Matrices to Kernel Functions: An Eigenfunction-Based Approach

Muñoz, Alberto; Torres, Aida; Muñoz García, Elvira

doi:10.3390/math14111971

Open AccessFeature PaperArticle

From Kernel Matrices to Kernel Functions: An Eigenfunction-Based Approach

by

Alberto Muñoz

^1,*

,

Aida Torres

¹ and

Elvira Muñoz García

²

¹

Department of Statistics, Universidad Carlos III de Madrid, Calle Madrid 126, Getafe, 28903 Madrid, Spain

²

Department of Business and Business Analytics, Faculty of Economics, Business and Communication, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(11), 1971; https://doi.org/10.3390/math14111971

Submission received: 3 May 2026 / Revised: 25 May 2026 / Accepted: 31 May 2026 / Published: 3 June 2026

Download

Browse Figures

Versions Notes

Abstract

Kernel-combination procedures used in classification often return only a combined kernel matrix on the training sample, rather than a kernel function that can be evaluated consistently at new points. This limitation is especially important for supervised or label-aware combinations, whose entries may depend on training labels and therefore have no immediate out-of-sample meaning. We study the problem of constructing an inductive, finite-rank kernel extension from such empirical matrices. The proposed framework makes the non-uniqueness of this extension explicit: it is determined by empirical coordinates, a positive-semidefinite coefficient matrix, and a continuation model for the coordinates. Experiments on vector, tabular, and relational classification problems give a deliberately diagnostic picture. Smooth direct combinations are stable: on Synthetic, the direct mean gives error

0.0793 \pm 0.0227

, essentially matching the best individual RBF kernel (

0.0809 \pm 0.0231

), and on Telco it remains close to the best individual polynomial kernel (

0.2061 \pm 0.0154

versus

0.2045 \pm 0.0154

). In the controlled Synthetic oracle diagnostic, reconstructing a smooth sum/mean gives relative Frobenius error

4.13 \times 10^{- 6} \pm 9.41 \times 10^{- 6}

and functional MSE at numerical scale. By contrast, abrupt label-aware matrix-only rules are less robust: the Synthetic percentile_inout_auto rule has error

0.1404 \pm 0.1198

, Telco matrix-only supervised rules are around

0.307

–

0.326

error, and the Chickenpieces pickout_auto rule fails under strict out-of-sample reconstruction (

0.3545 \pm 0.2666

error), whereas direct relational combinations match the best individual relational kernel within

10^{- 3}

. Overall, the empirical evidence supports the method as a bridge from finite matrix-level information fusion to deployable kernels, while also identifying abrupt label-aware geometries as the main limitation for stable generalization.

Keywords:

kernel methods; reproducing kernel Hilbert spaces; kernel combination; spectral reconstruction; out-of-sample extension; support vector machines

MSC:

46E22; 65D15; 65F15; 62H30; 68T05

1. Introduction

Kernel methods provide an effective and natural way to bring distances and similarities directly into learning problems. A kernel matrix can encode how close, comparable, or compatible two observations are, even when the objects are not conveniently described by ordinary Euclidean coordinates. This makes kernels particularly useful for classification, regression, and similarity-based learning: geometric or relational information is transformed into an inner-product representation in a reproducing kernel Hilbert space (RKHS), where linear operations become available, nonlinear decision rules can be handled through linear algorithms, and standard classifiers can work from similarities rather than only from raw coordinates [1,2,3]. Their practical success depends heavily on the choice of the kernel, and in many applications a single kernel is not rich enough to capture all relevant aspects of the problem. For that reason, combinations of kernels or similarities have been extensively studied, both as a way of fusing heterogeneous information and as a mechanism for improving discrimination in supervised tasks [4,5,6].

A central difficulty appears when the combination rule produces only a matrix on a finite sample. This happens, for instance, in several label-aware or similarity-driven fusion rules [6,7]: the practitioner obtains a combined empirical matrix

K_{S}^{*}

on the training set

S = {x_{i}}_{i = 1}^{n}

, but not a genuine kernel function

K^{*} (x, z)

that can be evaluated at new points. Once the learning algorithm leaves the training sample, the combined object is no longer directly available. In consequence, the resulting model cannot be deployed consistently unless one reconstructs an out-of-sample extension of the combined kernel.

Recent work has confirmed that this passage from finite kernel or proximity information to an out-of-sample evaluator remains an active problem. Fanuel et al. [8] constructed data-dependent positive-semidefinite kernels with explicit out-of-sample formulas for positive-semidefinite embeddings, while Münch et al. [9] studied spectral fusion of heterogeneous, possibly indefinite, proximity data. These approaches are complementary to the present paper: we start from a combined matrix already produced by a supervised or multi-source rule and seek a functional representation of that final matrix, up to the selected spectral rank, without using labels for new points.

To make the background and scope more explicit, we position the problem at the intersection of three related literatures. Multiple-kernel learning and kernel-alignment methods study how to combine kernels for supervised learning [4,5,10,11]. Similarity and proximity learning study how finite pairwise matrices, including non-Euclidean or indefinite objects, can be used or rectified for learning [7,12]. Kernel approximation and data-dependent RKHS bases study how finite spectral or interpolation information can be evaluated stably away from the sample [8,9,13,14]. The present paper uses the first two lines of work as sources of finite combined matrices and the third as the mechanism for obtaining deployable out-of-sample evaluators. This positioning clarifies why the empirical question is not only whether a combination improves classification, but also whether the combined matrix can be converted into a stable kernel on unseen observations.

This article addresses precisely that matrix-to-kernel step. Starting from a positive-semidefinite combined matrix, we construct one out-of-sample extension through a finite feature representation and data-dependent bases in the empirical RKHS [13,14,15]. Such an extension is not unique: a finite matrix admits infinitely many positive-semidefinite extensions outside the observed sample. The contribution of the paper is therefore not to recover “the” true kernel, but to define a controlled inductive extension specified by empirical coordinates, a PSD coefficient matrix, and a regression model for those coordinates. The resulting kernel can then be evaluated between training and test points without access to test labels.

Figure 1 summarizes the conceptual pipeline studied in this paper. The figure emphasizes that the objective is not merely to train a classifier on a fixed matrix, but to transform sample-based kernel or dissimilarity constructions into evaluable functions.

Contributions.

The paper makes four concrete contributions.

1.: It formulates the passage from a combined sample matrix $K_{S}^{*}$ to an out-of-sample kernel $K^{*} (x, z)$ as a reconstruction problem with three target properties: interpolation on the sample, positive definiteness, and practical out-of-sample evaluation.
2.: It proposes a reconstruction strategy based on data-dependent bases in the empirical RKHS, with Newton and SVD bases as the two practically relevant choices.
3.: It provides a reproducible experimental protocol that separates base kernels, matrix combination, kernel reconstruction, and downstream classification.
4.: It validates the method on completed strict runs for Synthetic, Breast Cancer, Ionosphere, Telco Customer Churn, and Chickenpieces, covering both ordinary vector data, heterogeneous tabular data, and relational dissimilarity data.

Illustrative Example: From a Sample-Based Construction to an Out-of-Sample Function

Before turning to the full experimental protocol, it is useful to isolate the main objective on a simple one-dimensional example. The central issue is not primarily to propose a classifier competing with highly specialized alternatives, but rather to transform a construction originally defined only on a finite sample into a function that can be evaluated at previously unseen points.

Suppose that we observe sample points

x_{1}, \dots, x_{n} \in X

and a kernel matrix

K_{S}

defined on these points. Let

{λ_{j}, φ_{j}}_{j \geq 1}

denote the eigenvalue–eigenfunction pairs associated with the corresponding integral operator, or, in practice, the finite-sample approximation derived from the kernel matrix. A truncated representation with

m_{0}

terms has the form

f_{m_{0}} (x) = \sum_{j = 1}^{m_{0}} a_{j} φ_{j} (x) .

Once the coefficients

a_{j}

have been determined from the sample, the expression above defines an object that can be evaluated not only at the observed points, but also at a new point

x_{new}

, provided that the basis functions admit an out-of-sample extension.

This is precisely the situation considered throughout the paper. In many applications, one starts from kernels or dissimilarity-based constructions that are explicitly available only on the training sample. The contribution of the proposed framework is to show how such constructions, including ad hoc kernel combinations, can be turned into evaluable functions outside the sample by means of finite basis representations. In this sense, the classification experiments should be interpreted mainly as a validation that the resulting extensions remain informative and competitive, rather than as an attempt to outperform specialized classification methods.

Figure 2 fixes the idea visually. A smooth target function is observed only at finitely many input locations. From a kernel-based basis constructed on the sample, one obtains a truncated approximation

{\hat{f}}_{m_{0}} (x) = \sum_{j = 1}^{m_{0}} {\hat{a}}_{j} {\hat{φ}}_{j} (x),

which interpolates or approximates the observed data and can then be evaluated at new points. The role of the truncation parameter

m_{0}

is to control the balance between fidelity to the sample-based construction and regularity of the resulting extension. The same principle underlies the methodology developed below for combined kernels.

The main message of the paper is deliberately cautious. The proposed reconstruction is especially reliable for sum-type and modular combinations, where the out-of-sample kernel remains close to the oracle combination and classification performance is preserved. In contrast, more abrupt label-driven combinations such as pick-out are harder to reconstruct faithfully and do not consistently improve classification once they are extended outside the sample.

The rest of the paper is organized as follows. Section 2 introduces the RKHS framework and formulates the reconstruction problem. Section 3 presents the proposed method. Section 4 describes the experimental protocol. Section 5 reports the numerical results, and Section 6 discusses what they imply for the use of combined kernels in practice.

2. Background and Problem Formulation

2.1. Positive-Definite Kernels, RKHSs, and Integral Operators

Let

K : X \times X \to R

be a continuous, symmetric, positive-definite kernel on a compact domain

X

. By the classical Moore–Aronszajn construction, K induces a unique reproducing kernel Hilbert space (RKHS)

H_{K}

of real-valued functions on

X

such that point evaluation is continuous and the reproducing property

f (x) = {〈 f, K (\cdot, x) 〉}_{H_{K}}, f \in H_{K},

holds for every

x \in X

[1]. The empirical subspace associated with a sample

S = {x_{1}, \dots, x_{n}}

is

H_{K} (S) = span {K (\cdot, x_{i}) : 1 \leq i \leq n},

which is the finite-dimensional space where all computations in the paper take place.

If

ν

is a Borel probability measure on

X

, the kernel defines the integral operator

(L_{K} f) (x) = \int_{X} K (x, t) f (t) d ν (t) .

(1)

Under our standing assumptions,

L_{K}

is self-adjoint, positive, and compact. This operator-theoretic viewpoint is crucial because it links the analytic kernel K to a countable eigensystem and therefore provides the language in which sample matrices can be interpreted as discretizations of a functional object. This operator-theoretic viewpoint is the standard bridge between kernels, RKHSs, and kernel-induced operators in learning theory and in earlier basis-combination work that motivates the present manuscript [16,17,18]. The common message is that kernel reconstruction is best understood spectrally rather than only entrywise.

2.2. Mercer Expansions, Regularization, and Projection onto the RKHS

Mercer’s theorem states that if K is a Mercer kernel on a compact domain and

ν

is non-degenerate, then there exist non-negative eigenvalues

{λ_{j}}_{j \geq 1}

and orthonormal eigenfunctions

{ϕ_{j}}_{j \geq 1} \subset L_{ν}^{2} (X)

such that

K (x, z) = \sum_{j \geq 1} λ_{j} ϕ_{j} (x) ϕ_{j} (z),

(2)

with absolute and uniform convergence on

X \times X

[15,16,17]. In particular, each eigenfunction associated with a positive eigenvalue belongs to

H_{K}

, and the RKHS norm can be expressed through the Mercer coordinates of the function. This is what turns the eigenfunctions into the natural degrees of freedom of the problem.

Given data

{(x_{i}, y_{i})}_{i = 1}^{n}

, learning in the RKHS is often posed through Tikhonov regularization:

min_{f \in H_{K}} \frac{1}{n} \sum_{i = 1}^{n} {(f (x_{i}) - y_{i})}^{2} + γ {∥ f ∥}_{H_{K}}^{2}, γ > 0 .

(3)

By the Representer Theorem, the solution admits the finite expansion

f_{γ} (x) = \sum_{i = 1}^{n} α_{i} K (x_{i}, x),

(4)

and the coefficients solve a linear system in the sample kernel matrix [19]. In the population picture, the corresponding minimizer is

{(L_{K} + γ I)}^{- 1} L_{K} f_{ν}

, where

f_{ν}

is the regression function. This regularized projection point of view is important here for two reasons. First, it shows how functions in the RKHS are recovered from sample evaluations. Second, it clarifies that out-of-sample continuation is not an arbitrary interpolation problem, but a regularized projection problem in a space whose geometry is already fixed by the kernel. Classical RKHS regularization theory and earlier basis-combination work emphasize this projection interpretation, which is why we use it as the conceptual entry point to kernel reconstruction [17,18,19].

2.3. Finite-Sample Eigenfunction Approximation and Nyström-Type Ideas

Closed-form eigensystems of

L_{K}

are exceptional. In practice one observes only a finite sample and its Gram matrix

K_{S} = {[K (x_{i}, x_{j})]}_{i, j = 1}^{n} .

Let

K_{S} U = U Λ^{mat},

(5)

where the columns of U are orthonormal eigenvectors and

Λ^{mat} = diag (λ_{1}^{mat}, \dots)

contains the eigenvalues of the matrix

K_{S}

. If the empirical matrix is viewed as a quadrature approximation of the integral operator, then the corresponding operator eigenvalues are scaled as

{\hat{μ}}_{j} = \frac{λ_{j}^{mat}}{n}, {\hat{ϕ}}_{j} (x_{i}) \approx \sqrt{n} u_{i j} .

With this convention the sample reconstruction is correctly normalized:

\sum_{j} {\hat{μ}}_{j} {\hat{ϕ}}_{j} (x_{i}) {\hat{ϕ}}_{j} (x_{k}) = \sum_{j} \frac{λ_{j}^{mat}}{n} (\sqrt{n} u_{i j}) (\sqrt{n} u_{k j}) = {(K_{S})}_{i k} .

Thus the factor

1 / n

belongs to the operator eigenvalue, not to the matrix eigenvalue.

For the computational reconstruction used in the rest of the paper we avoid this normalization ambiguity by working directly with matrix coordinates

{\hat{ψ}}_{j} (x_{i}) = u_{i j} .

(6)

The finite-rank kernel is then assembled as

{\hat{K}}_{r} (x, z) = \sum_{j = 1}^{r} λ_{j}^{mat} {\hat{ψ}}_{j} (x) {\hat{ψ}}_{j} (z) .

(7)

At the sample points this gives exactly

U_{r} Λ_{r}^{mat} U_{r}^{⊤}

. The Mercer-normalized notation is still useful for interpretation, but (6) and (7) are the normalization used by the algorithm.

When the original analytic kernel K is known, a classical way to continue operator-normalized eigenfunctions outside the sample is the Nyström formula,

{\hat{ϕ}}_{j} (x) \approx \frac{1}{\sqrt{n} {\hat{μ}}_{j}} \sum_{i = 1}^{n} u_{i j} K (x, x_{i}),

(8)

or equivalently the same formula written with

λ_{j}^{mat} = n {\hat{μ}}_{j}

. Nyström is computationally appealing and widely used in low-rank kernel methods [20,21,22,23]. However, it is intrinsically a low-rank approximation strategy and it requires access to the kernel section

K (x, x_{i})

of the kernel being extended. For matrix-only combinations, this is exactly the unavailable object. The present paper therefore extends the coordinate traces (6) through learned data-dependent coordinates rather than assuming that

K^{*} (x, x_{i})

is already known.

The out-of-sample extension of empirical eigentraces has also been revisited recently from different perspectives. Fanuel et al. [8] derived extension formulas that turn finite PSD embeddings into data-dependent positive-semidefinite kernels. Neural and parametric alternatives have also appeared; for example, Deng et al. [24] approximated leading eigenfunctions of kernel integral operators by neural networks. Our approach remains non-parametric and basis-driven: the coordinate traces of the combined matrix are extended through stable empirical bases rather than through a newly learned embedding or neural parametrization.

For the present paper, this is precisely the point where alternate RKHS bases become useful: instead of extending eigenvectors only through the raw kernel sections, we first move them to a numerically better coordinate system.

2.4. Interpolation Viewpoint and Why Alternate Bases Are Needed

The need for alternate bases can be understood through a classical interpolation problem [14,25,26,27,28]. Given scattered data

{(x_{i}, y_{i})}_{i = 1}^{n} \subset Ω \times R

, one seeks a function s such that

s (x_{i}) = y_{i}

for all i. A generic finite-dimensional ansatz has form

s (x) = \sum_{j = 1}^{n} c_{j} B_{j} (x),

(9)

and is well posed if and only if the interpolation matrix

B = {[B_{j} (x_{i})]}_{i, j}

is nonsingular. In one dimension, suitable polynomial bases may work, but in several dimensions the Haar–Mairhuber–Curtis theorem shows that no fixed finite-dimensional space of continuous functions can serve as a universal interpolation space for arbitrary sets of points [25,26]. This is one of the reasons why kernel methods are so natural here: instead of fixing the basis in advance, they let the basis depend on the data through the kernel sections.

With the standard kernel basis the interpolant becomes

s (x) = \sum_{j = 1}^{n} c_{j} K (x, x_{j}) = k {(x)}^{⊤} c, K_{S} c = y,

(10)

where

k {(x)}^{⊤} = (K (x, x_{1}), \dots, K (x, x_{n}))

and

y = {(y_{1}, \dots, y_{n})}^{⊤}

. Positive definiteness guarantees uniqueness whenever the sample points are distinct. The drawback is numerical: even when the interpolant exists uniquely, the basis

{K (\cdot, x_{i})}

can be severely ill-conditioned and therefore unstable for out-of-sample evaluation. This is exactly the motivation for data-dependent bases in kernel approximation [13,14,18]. The role of Lagrange, Newton, and SVD bases is not merely cosmetic; they are the mechanism that makes the same empirical subspace usable in practice.

2.5. Data-Dependent Alternate Bases in the Empirical RKHS

The standard kernel sections

k {(x)}^{⊤} = (K (x, x_{1}), \dots, K (x, x_{n}))

span

H_{K} (S)

, but this representation is often ill-conditioned. A unifying result for kernel-based spaces states that every data-dependent basis

v {(x)}^{⊤} = (v_{1} (x), \dots, v_{r} (x))

spanning the same empirical space can be written as

v {(x)}^{⊤} = k {(x)}^{⊤} T,

(11)

with a sample matrix

V = [v_{j} (x_{i})]

satisfying

K_{S} = V T^{- 1}

(12)

Ref. [13]. This factorization viewpoint is central to the present paper: interpolation, eigentrace continuation, and out-of-sample kernel reconstruction can all be expressed in the same coordinates.

The three bases relevant here are Lagrange, Newton, and SVD. The Lagrange basis

ℓ {(x)}^{⊤} = k {(x)}^{⊤} K_{S}^{- 1}

is the exact cardinal basis and therefore the natural reference point, but it inherits the instability of

K_{S}^{- 1}

. The Newton basis is obtained from a Cholesky factorization

K_{S} = N N^{⊤}

through

n {(x)}^{⊤} = k {(x)}^{⊤} N^{- ⊤},

(13)

and yields a triangular, pivoted, adaptive basis that is particularly natural when a rank cap is imposed. The SVD basis follows from

K_{S} = Q Σ^{2} Q^{⊤}

and is given by

s {(x)}^{⊤} = k {(x)}^{⊤} Q Σ^{- 1},

(14)

which aligns the basis with the empirical spectral directions of the matrix.

We keep these definitions and their role in the main text, but move the construction details—including the Newton recursion, the power function, the weighted-SVD variant, and the Hermite-type alternative—to Appendix A [13,14,18]. This keeps the body of the paper focused on the central matrix-to-function argument while still documenting the ingredients required to reproduce the method.

This basis-centered view is consistent with recent stabilized greedy kernel algorithms. Wenzel, Santin and Haasdonk [29] show that adaptive, data-dependent selection strategies can improve stability and convergence properties in kernel interpolation. In the present framework this supports treating the basis-extension step as part of the reconstruction model, not merely as a numerical implementation detail.

2.6. Combination Rules and Positive-Semidefinite Rectification

Suppose now that several base kernels

K_{1}, \dots, K_{m}

are available on the same sample. We first combine them at matrix level, obtaining

K_{S}^{*} = F (K_{1, S}, \dots, K_{m, S}, y),

(15)

where

y = (y_{1}, \dots, y_{n})

if the rule is supervised. The two families used in this paper are standard matrix-combination rules from kernel fusion and from earlier kernel-combination development [5,6,18].

2.6.1. Sum-Type Combinations

The simplest possibility is a linear or convex combination,

K_{S}^{sum} = \sum_{ℓ = 1}^{m} w_{ℓ} K_{ℓ, S}, w_{ℓ} \geq 0,

(16)

which is automatically positive semidefinite whenever the base matrices are. This is the main family used in Telco, where different kernels are associated with different business blocks and multiple Gaussian scales.

2.6.2. Pick-Out/Max–Min

The supervised rule inherited from the kernel-combination literature and preliminary work is the pick-out or Max–Min construction, which acts entrywise on the sample matrix [6,18]:

K^{*} (x_{i}, x_{j}) = \{\begin{matrix} max {K_{1} (x_{i}, x_{j}), \dots, K_{m} (x_{i}, x_{j})}, & y_{i} = y_{j}, \\ min {K_{1} (x_{i}, x_{j}), \dots, K_{m} (x_{i}, x_{j})}, & y_{i} \neq y_{j} . \end{matrix}

(17)

For

m = 2

this becomes

K^{*} (x_{i}, x_{j}) = \frac{1}{2} (K_{1} (x_{i}, x_{j}) + K_{2} (x_{i}, x_{j})) + \frac{1}{2} y_{i} y_{j} |K_{1} (x_{i}, x_{j}) - K_{2} (x_{i}, x_{j})| .

(18)

The broader parametric family studied for supervised kernel-matrix combination is

K^{*} (x_{i}, x_{j}) = \frac{1}{m} \sum_{t = 1}^{m} K_{t} (x_{i}, x_{j}) + τ y_{i} y_{j} \sum_{t < ℓ} g (K_{t} (x_{i}, x_{j}) - K_{ℓ} (x_{i}, x_{j})),

(19)

where g is convex and

τ > 0

controls the supervised deformation [6]. These rules are useful because they may sharpen class separation, but they are also precisely the rules that are naturally defined as sample matrices rather than as analytic kernels on all of

X \times X

.

2.6.3. Rectification

Supervised combinations may produce indefinite matrices. Several standard corrections are therefore used in similarity-based kernel learning [6,7,30]: (i) spectral clipping or MDS-style truncation, which keeps only positive eigenvalues; (ii) absolute-value rectification, which replaces negative eigenvalues by their modulus; and (iii) diagonal shifting, which adds a constant to the diagonal to move the spectrum into the positive half-line. In the experiments reported here we use spectral clipping as the default rectification:

K_{+} = Q Λ_{+} Q^{⊤}, Λ_{+} = diag (max {λ_{j}, 0}) .

This choice is the most transparent for the proposed spectral reconstruction, because only positive directions are retained. To monitor how much the rule has altered the original matrix, we report the relative PSD perturbation

Δ_{PSD} = \frac{∥ K_{+} {- K ∥}_{F}}{{∥ K ∥}_{F}} .

A recent alternative is the polar-decomposition correction of Münch, Röder and Schleif [31], which maps non-PSD kernel matrices to PSD matrices while aiming to preserve the topology of the data. Comparing that correction with elementary spectral clipping before the coordinate-extension step is left for future work.

2.7. When Kernel Combinations Are Analytic and When They Are Only Sample Objects

An important distinction in multiple-kernel learning is whether the combination is a genuine pointwise kernel or only a sample-level matrix rule. Some combinations are defined pointwise from the start. If the base kernels

K_{1}, \dots, K_{m}

are analytic on

X \times X

and the rule is, for instance, a convex sum,

K^{*} (x, z) = \sum_{ℓ = 1}^{m} α_{ℓ} K_{ℓ} (x, z), α_{ℓ} \geq 0,

(20)

then

K^{*}

is already an out-of-sample kernel and no reconstruction problem arises [3,4,5]. By contrast, supervised rules such as pick-out, Max–Min, or other matrix-level transformations may only be available through their sample matrix

K_{S}^{*}

because they depend on the observed labels or on sample-specific rectifications. In that case the analytic object is not given directly even if all the base kernels are known pointwise.

This distinction explains why the matrix-to-function problem is substantive rather than formal. The issue is not “combining kernels” in the abstract, but recovering an evaluable kernel once the combination has been performed at matrix level. This matters not only for classification but for any task that needs a coherent out-of-sample geometry, including clustering, embedding, or information fusion [6,7,32]. Our emphasis on train–test blocks

K^{*} (S, Z)

is therefore only one operational face of a broader problem: how to turn a finite, possibly supervised, similarity object into a positive-semidefinite finite-feature kernel that can be used consistently beyond the training sample. When the continued coordinates are continuous on a compact domain, this finite-feature kernel is a Mercer kernel.

2.8. Fusion Kernel as an Alternative Reconstruction Route

The Fusion Kernel (FK) construction is the natural alternative to the direct basis-based route [18,32]. Starting from the same combined matrix

K_{S}^{*}

, FK seeks an out-of-sample kernel by expressing the target eigenfunctions as linear combinations of eigenfunctions of the original base kernels. Conceptually, FK works through the eigenspaces of

K_{1}, \dots, K_{m}

, while the method proposed in this paper works directly with the eigensystem of

K_{S}^{*}

.

We keep FK visible in the main text because it clarifies the relation between the proposed basis route and previous fusion-kernel approaches. At the same time, the full algebra of FK is not needed to follow the main argument. We therefore keep only the conceptual comparison here and move the explicit FK representation and the discussion of when both routes coincide or diverge to Appendix C. Empirically, FK is retained as a secondary baseline but is not one of the central methods in the main tables.

The distinction is also useful for positioning the method relative to the recent work of Fanuel et al. [8] and Münch et al. [9]. Fanuel et al. provided explicit out-of-sample formulas for PSD embeddings, whereas our input is the final combined matrix and our central task is to reconstruct its eigentraces through stable RKHS bases. Münch et al. designed subspace-fusion mechanisms for heterogeneous proximity data; our method can instead be applied after such a fusion stage to obtain a kernel evaluator on new points. Studying combinations of their extension or fusion formulas with the basis reconstruction proposed here is a natural direction for future work.

2.9. Problem Statement and Target Properties

We formulate the matrix-to-function problem through four target properties, in line with the basis-combination motivation and with the kernel-fusion literature. Starting from a sample matrix

K_{S}^{*}

, we do not seek the unique continuation of

K_{S}^{*}

, because no such unique continuation exists. Instead, we seek an inductive finite-feature extension

\hat{K} (x, z) = g {(x)}^{⊤} C g (z), C ⪰ 0,

where the feature map g is learned from the training representation available for the objects. This representation may consist of ordinary vector covariates, tabular variables, distances to the training objects, or other relational descriptors. The extension should satisfy:

1.: sample agreement: $\hat{K} (x_{i}, x_{j}) = {(K_{S}^{*})}_{i j}$ whenever the retained coordinates are exact on the sample, or approximately so under a controlled truncation or coordinate-regression error;
2.: positive definiteness: the extension is PSD because $C ⪰ 0$ ;
3.: operational out-of-sample evaluation: the block $\hat{K} (S, Z)$ can be computed without access to test labels;
4.: discriminative fidelity: the geometry induced by the combined matrix is preserved up to a reconstruction error that can be monitored.

These requirements are trivial when the combination rule is already defined pointwise, but they are the central difficulty when F is only a matrix rule. A related strategy, also described in earlier work on functional learning of kernels, is the Fusion Kernel (FK) [18,32], which reconstructs the target kernel by expressing its eigenfunctions as linear combinations of the eigenfunctions of the base kernels,

ϕ_{h} (x) = \sum_{ℓ = 1}^{m} \sum_{j = 1}^{d_{ℓ}} c_{j ℓ} ϕ_{j}^{(ℓ)} (x),

(21)

so that the target kernel is then recovered through a Mercer-like expansion in those fused eigenfunctions. FK provides an important conceptual baseline because it shows that one may reconstruct the combined kernel either indirectly through the eigensystems of the base kernels or directly from the eigensystem of the combined matrix. The method developed here follows the second route: rather than first expanding on the base kernels, we reconstruct directly in data-dependent bases associated with the combined matrix itself. This direct matrix-to-function route is what allows us to connect the spectral decomposition of

K_{S}^{*}

with a stable out-of-sample evaluator.

Overall, the theoretical framework can be summarized as follows. Classical RKHS and Mercer theory justify interpreting a PSD matrix spectrally; alternate bases make the empirical eigentraces stable enough to evaluate away from the sample; and recent PSD-embedding and subspace-fusion approaches show that the same out-of-sample issue appears in neighboring settings [8,9,31]. The contribution of the present paper is not a new fusion rule and not a new embedding objective. It is a reconstruction step that lifts a given empirical kernel matrix—often produced by supervised or multi-source combination—to an evaluable kernel function. This perspective opens up future work on combining PSD-embedding formulas, subspace-fusion rules, and PSD corrections with the basis-extension step studied here.

3. Proposed Reconstruction Method

3.1. Spectral Decomposition of the Combined Matrix

Let

K_{S}^{*}

be the rectified combined matrix on the training sample. Its thin eigendecomposition is

K_{S}^{*} = U_{r} Λ_{r} U_{r}^{⊤},

(22)

where

Λ_{r} = diag (λ_{1}, \dots, λ_{r})

contains the retained positive matrix eigenvalues and

U_{r} = (u_{1}, \dots, u_{r})

the corresponding orthonormal eigenvectors. In the computational method we use the associated matrix-coordinate functions

{\hat{ψ}}_{h} (x_{i}) = u_{i h}, 1 \leq i \leq n .

(23)

This convention differs from the Mercer-operator normalization

{\hat{ϕ}}_{h} (x_{i}) \approx \sqrt{n} u_{i h}

, whose eigenvalues are

λ_{h} / n

. Both conventions are equivalent if the scaling is used consistently, but (23) is preferable algorithmically because it reconstructs

K_{S}^{*}

directly with the matrix eigenvalues:

K_{S}^{*} = \sum_{h = 1}^{r} λ_{h} u_{h} u_{h}^{⊤} .

3.2. Extending Empirical Coordinates

The previous step gives only sample coordinates. To evaluate a new object we must learn a map from the available object representation to those coordinates. Let

G_{S} = U_{r} = {[g_{h} (x_{i})]}_{i, h} \in R^{n \times r}, g {(x_{i})}^{⊤} = (u_{i 1}, \dots, u_{i r}) .

For each coordinate h we fit a scalar regression problem

x_{i} ⟼ u_{i h},

or, in relational data, a regression from the available dissimilarities to the training objects. The fitted coordinate functions are denoted

{\hat{g}}_{h} (x)

, and

\hat{g} {(x)}^{⊤} = ({\hat{g}}_{1} (x), \dots, {\hat{g}}_{r} (x)) .

This is the formal step that turns a sample matrix into an inductive object. It also makes clear why the construction is not unique: changing the coordinate model, the retained rank, or the basis used for coordinate regression changes the resulting extension while leaving the same training matrix as the target.

The same step can be described through Newton or SVD bases. Let

v_{1}, \dots, v_{q}

be a data-dependent basis, with sample matrix

V = [v_{j} (x_{i})] \in R^{n \times q}

. Each coordinate vector

u_{h}

may be projected onto that basis through

a_{h} = arg min_{a \in R^{q}} {∥V a - u_{h}∥}_{2}^{2} + η {∥ a ∥}_{2}^{2},

(24)

where

η \geq 0

is a small numerical ridge. The resulting continuation is

{\hat{g}}_{h} (x) = \sum_{m = 1}^{q} a_{m h} v_{m} (x) .

(25)

When V is square and nonsingular, this is an exact change of coordinates. When a rank cap is imposed or a numerical ridge is added, it is a stable projection of the empirical coordinate onto the retained basis.

3.3. Finite-Feature Kernel and Validity

After the coordinates have been extended, the reconstructed kernel is

{\hat{K}}_{r} (x, z) = \sum_{h = 1}^{r} λ_{h} {\hat{g}}_{h} (x) {\hat{g}}_{h} (z) = \hat{g} {(x)}^{⊤} Λ_{r} \hat{g} (z) .

(26)

More generally, for a basis representation

v (x)

we use

{\hat{K}}_{r} (x, z) = v {(x)}^{⊤} C v (z), C ⪰ 0 .

(27)

The coefficient matrix is diagonal only in coordinates aligned with the retained eigenvectors. In a pivoted Newton basis, for example, C is generally dense but still positive semidefinite.

The following proposition is the basic mathematical guarantee used throughout the paper.

Proposition 1 (PSD extension, interpolation, and coordinate error).

Let

g : X \to R^{r}

be any feature map and let

C ⪰ 0

. Then

\hat{K} (x, z) = g {(x)}^{⊤} C g (z)

is a positive-semidefinite finite-feature kernel. Let G be the sample matrix with rows

g {(x_{i})}^{⊤}

. If

G C G^{⊤} = K_{S}^{*}

, then

\hat{K} (S, S) = K_{S}^{*}

. If the fitted sample coordinates are

\hat{G} = G + E

then

{∥ \hat{G} C {\hat{G}}^{⊤} - G C G^{⊤} ∥}_{F} \leq 2 {∥ G ∥}_{F} {∥ C ∥}_{2} {∥ E ∥}_{F} + {∥ C ∥}_{2} {∥ E ∥}_{F}^{2} .

Proof.

For any finite collection

x_{1}, \dots, x_{N}

and coefficients

c_{1}, \dots, c_{N}

,

\sum_{i, j} c_{i} c_{j} \hat{K} (x_{i}, x_{j}) = {(\sum_{i} c_{i} g (x_{i}))}^{⊤} C (\sum_{j} c_{j} g (x_{j})) \geq 0,

because

C ⪰ 0

. The interpolation statement follows immediately from

G C G^{⊤} = K_{S}^{*}

. For the error bound, expand

(G + E) C {(G + E)}^{⊤} - G C G^{⊤} = E C G^{⊤} + G C E^{⊤} + E C E^{⊤}

and apply submultiplicativity of the Frobenius and spectral norms. □

Equation (27) also clarifies the interpolation issue. If the full effective rank is retained and the coordinate model is exact on the sample then

{\hat{K}}_{r} (S, S) = K_{S}^{*}

. If the spectrum is truncated, the PSD part is clipped, or the coordinate regression is imperfect then agreement is exact only for the retained fitted coordinates. For this reason the article reports predictive metrics together with PSD and reconstruction diagnostics rather than claiming a unique exact reconstruction.

3.4. Out-of-Sample Basis Evaluation

To evaluate

{\hat{K}}_{r}

on new points it remains to compute the extended coordinate vector

\hat{g} (z)

, or equivalently

\hat{v} (z)

when a non-spectral basis is used. In our implementation this is done by regressing each coordinate on the representation available for the objects and then predicting the coordinate values on the new points [18]. For vector or tabular data the regressors use the observed variables. For relational data they may use the dissimilarities from the new object to the training objects. Let

v_{m} (x_{i})

be the m-th basis coordinate on the sample. We fit a scalar regression model

x_{i} ⟼ v_{m} (x_{i}),

for each

m = 1, \dots, q

, and obtain predicted values

{\hat{v}}_{m} (z)

on the new point. Stacking them gives

\hat{v} {(z)}^{⊤} = ({\hat{v}}_{1} (z), \dots, {\hat{v}}_{q} (z)) .

The train–test block required by a precomputed-kernel classifier is then assembled as

{\hat{K}}_{r} (S, Z) = V C {\hat{V}}_{Z}^{⊤},

(28)

where

{\hat{V}}_{Z} = {[{\hat{v}}_{j} (z_{ℓ})]}_{ℓ, j}

. In our implementation this regression step is implemented through kernelized regression models, but the theory does not depend on the particular regressor: any method that estimates the coordinates stably can be used.

This basis-regression view is what differentiates the present method from both Nyström and FK. Nyström extends eigenfunctions directly through the original kernel sections. FK reconstructs the target eigensystem through the eigensystems of the base kernels. In contrast, our method first changes coordinates to a stable basis attached to the combined matrix and only then performs the out-of-sample continuation. In the language of the kernel approximation and fusion literature, the procedure combines two ingredients that are often treated separately [6,13,14]: the interpolation-oriented perspective of alternate bases and the kernel-fusion perspective of matrix combinations. The whole point is that these are not separate technicalities but consecutive parts of the same pipeline.

3.5. Algorithmic Summary

The whole procedure can be summarized as follows.

1.: Compute the base kernel matrices on the training sample and build the combined matrix $K_{S}^{*}$ with the chosen combination rule.
2.: Symmetrize and project $K_{S}^{*}$ to the PSD cone if needed; record the relative PSD perturbation.
3.: Compute the eigendecomposition (22) and retain the desired positive-rank subspace.
4.: Define empirical matrix coordinates by ${\hat{ψ}}_{h} (x_{i}) = u_{i h}$ , avoiding the Mercer scaling ambiguity in the practical algorithm.
5.: Optionally express those coordinates in a Newton or SVD basis by solving (24).
6.: Learn a coordinate map from the available object representation to the retained coordinates.
7.: Form the finite-feature kernel through (26) and (27) and assemble the train–test block (28).
8.: Classify the test points using a precomputed-kernel SVM [3,33].

This reorganizes the sample-to-function algorithm from earlier basis-combination work [18] in an explicitly spectral way, so that each stage has a direct interpretation in Mercer terms.

3.6. Complexity, Conditioning, and Practical Remarks

The dominant costs are the eigendecomposition or the Cholesky/SVD factorization of the combined matrix and the set of regressions used to predict the basis coordinates. If

r ≪ n

, storage is

O (n r)

and out-of-sample evaluation is reduced to matrix products involving V, C, and the predicted basis matrix on the test set. Newton and SVD therefore provide two practically complementary regimes: Newton is attractive when one wants rank-adaptive, pivoted, or triangular structure; SVD is attractive when one wants the most direct spectral route and a basis already aligned with the dominant sample eigendirections.

This cost profile should be compared with alternative out-of-sample devices with some care. Nyström-type methods are typically cheaper when a small landmark set or a known kernel section

K (x, x_{i})

is available, but for a matrix-only supervised combination the section

K^{*} (x, x_{i})

is exactly the missing object. Fusion-kernel constructions can also continue a combined matrix, but they do so through the eigenspaces of the base kernels rather than directly through the eigensystem of

K_{S}^{*}

. The present implementation therefore benchmarks the matrix-to-function step primarily through leakage-free train–test gaps and, where an analytic oracle exists, relative Frobenius and functional-MSE errors. A full large-scale runtime benchmark against all low-rank extension strategies is outside the present experimental scope.

In the experiments, the two reconstructions were consistently close to each other, which indicates that the main limitation often lies in the quality of the combined matrix itself rather than in the basis used for continuation. This is why the article keeps Fusion Kernel only as a secondary computational baseline: for the direct basis-based route, Newton and SVD became the two methods that best capture the stable matrix-to-function transition.

4. Materials and Methods

4.1. Aim of the Experiments

The experiments are intended as a validation of the matrix-to-function step, not as a search for a universally optimal classifier. The question is the following: after a finite kernel matrix has been produced on a training sample, can it be converted into a kernel function that is evaluable on new points, and does the classifier built from that function behave on held-out points as it behaves on the points used to build the extension?

For this reason the predictive metrics reported below should be read as functional diagnostics. We compare the error measured on the training matrix by inner cross-validation with the error obtained after extending the same matrix to the test points. A small discrepancy between these two quantities indicates that the conversion from matrix to function has preserved the relevant geometry of the classifier. We deliberately do not tune the kernel dictionary to obtain optimal classification results. The kernels are the same broad dictionary used in the earlier kernel-combination experiments: linear, cosine, polynomial, multiscale Gaussian, Laplacian, rational-quadratic, class-scale, local-scaling, prototype, and supervised weighted variants. This makes the test more honest: the goal is to show that the reconstruction works even for a historical, non-specialized collection of kernels.

We distinguish three experimental cases.

Case A: analytic combinations. Direct means and alignment-weighted sums of evaluable base kernels already have a train–test formula. They are used as stability references. If the matrix-to-function implementation is sound, these combinations should remain close to the behavior observed on the training matrix.
Case B: matrix-only supervised rules. Pick-out/max–min and percentile rules are constructed only on the training matrix. They are the main stress test for the reconstruction layer, because no test labels are available when the train–test block is generated.
Case C: relational data. Chickenpieces starts from dissimilarity matrices between shape objects. Here the usefulness of the framework is most visible: the input is not a privileged coordinate representation, but a family of pairwise shape comparisons.

The protocol follows a strict split discipline. Training and test sets are formed before any kernel parameter, scaling constant, alignment score, kernel selection, or combination rule is estimated, as required for unbiased model assessment in supervised learning [34,35]. For label-aware rules, labels are used only on the training set. Test labels are never used to construct test kernels. This point is essential for supervised matrix combinations such as max–min/pick-out.

The computational workflow is implemented as ISO C++17 source code, with accompanying preparation scripts written for R version 4.2.0 or later and Python version 3.10 or later.

4.2. Data Sets

The main vector-data evidence reported in the present tables uses four complete 30-repetition strict runs: Synthetic, Breast Cancer, Ionosphere, and Telco Customer Churn. Telco is the heterogeneous tabular benchmark, while Chickenpieces is reported separately as the relational case in which kernels are built from dissimilarity matrices.

The data sets were selected to cover complementary regimes rather than to form a large-scale leaderboard. Synthetic provides a controlled nonlinear problem in which an oracle train–test block can be computed for smooth analytic rules. Breast Cancer gives a small ordinal biomedical benchmark; Ionosphere gives a continuous radar benchmark; Telco gives a heterogeneous tabular problem with binary, categorical, account, and billing variables; and Chickenpieces gives a pure relational setting in which the objects are available through multiple shape dissimilarities. The resulting collection tests whether the proposed matrix-to-function step behaves consistently across vector, heterogeneous business-table, and dissimilarity-only settings. It does not exhaust the range of possible applications. Larger or more complex data sets would require the same mathematical construction combined with scalable eigensolvers, landmark or randomized low-rank factorizations, and faster coordinate-regression or basis evaluation schemes, as discussed in the complexity remarks and conclusions.

Synthetic. A balanced two-class banana-style simulation inherited from the preliminary experiments [18]. A latent variable $u \sim U [- 1, 1]$ and Gaussian noise $e \sim N (0, 0.01)$ are generated, and the two classes are defined by $(x_{1}, x_{2}) = (u + 1, u^{2} + e)$ and $(x_{1}, x_{2}) = (u + \frac{7}{5}, - u^{2} + 1 + e)$ . This produces two curved clouds that are not linearly separable and provides a controlled setting in which an oracle train–test kernel block can be computed for diagnostic purposes.
Breast Cancer Wisconsin (Original). A compact biomedical benchmark retained for continuity with the preliminary experiments [36]. The original database contains 699 fine-needle aspirate cases described by nine ordinal cytological attributes measured on a 1–10 scale. After removing the 16 cases with missing BareNuclei, the usable sample contains 683 observations, with benign and malignant diagnoses as the target classes.
Ionosphere. A radar-return classification benchmark from the UCI repository [37]. The data were collected by a phased-array system in Goose Bay, Labrador, and they contain 351 observations with 34 continuous predictors derived from the complex autocorrelation values of 17 pulse numbers. The binary label distinguishes “good” returns, which show ionospheric structure, from “bad” returns.
Telco Customer Churn. A heterogeneous customer-level table distributed as an IBM/Kaggle churn benchmark [38]. The original file contains 7043 customers and 21 columns, including the binary response Churn. Its predictors combine demographic information, subscribed telephone and internet services, contract type, payment method, tenure, and billing amounts. Telco is therefore a natural tabular fusion benchmark: different kernels can be attached to different semantic blocks of variables, rather than merely to different bandwidths on a single homogeneous feature space.
Chickenpieces. A purely relational benchmark consisting of 446 silhouette objects from five chicken-part classes (wing, back, drumstick, thigh-and-back, and breast) [12,39]. The PRDisData release provides 44 dissimilarity matrices obtained from 11 contour normalizations and four edit-cost settings. Each silhouette is encoded through a contour-string representation. The raw objects are therefore not treated as vectors of coordinates; they are compared through several shape dissimilarities, which are subsequently converted into kernel similarities for classification.

One of the 44 Chickenpieces matrices can be obtained as follows. Fix one contour normalization r and one edit-cost setting c. Let

s_{i}^{(r)}

and

s_{j}^{(r)}

be the resulting contour strings for silhouettes i and j. Since the starting point on a closed contour is arbitrary, the comparison minimizes the edit cost over all cyclic shifts of the second string:

D_{i j}^{(r, c)} = min_{τ \in C} {Edit}_{c} (s_{i}^{(r)}, τ (s_{j}^{(r)})) .

Here,

C

is the set of cyclic shifts. In the rotation-invariant cyclic edit distance of Bunke and Bühler [40], substituting contour directions with angles

α

and

β

is weighted by

| α - β |

, whereas insertions and deletions receive the fixed penalty associated with the chosen cost setting c. Computing this value for all pairs gives one

446 \times 446

dissimilarity matrix

D^{(r, c)}

. The 44 matrices arise from the

11 \times 4

choices of

(r, c)

; the preprocessing step below turns each of them into RBF and Laplacian similarity kernels.

Telco requires a separate comment because it is not a purely numerical benchmark. The customer identifier is removed from the model, and the snapshot shown in Table 1 is only illustrative: the full source file contains additional service indicators, including online security, online backup, device protection, technical support, streaming TV, and streaming movies. The displayed rows combine binary variables such as SeniorCitizen, categorical service variables such as InternetService, account variables such as Contract and PaymentMethod, continuous billing variables such as MonthlyCharges and TotalCharges, and the target variable Churn. This mixture makes Telco a natural case for kernel fusion: blocks of variables may encode complementary notions of customer similarity.

For visual context, Figure 3 shows six distinct binary silhouettes from the Chickenpieces benchmark, without contour extraction or contour-string processing. These examples were selected from the public example panel of the data set in [41], avoiding near-duplicate visual shapes. The panel is intended only to indicate the type of binary shape objects used in this benchmark.

The 30-repetition vector experiments use train–test sizes

350 / 150

for Synthetic,

478 / 205

for Breast Cancer,

245 / 106

for Ionosphere, and

1400 / 600

for Telco. Each Telco repetition is based on a stratified subsample of 2000 customers from the full 7043-customer table. Chickenpieces uses the full set of 446 objects and five classes; the repeated splits use a

70 / 30

train–test partition.

4.3. Preprocessing

For the vector data sets, numeric variables are standardized using training statistics only. Categorical variables are encoded using a training-based one-hot design. In Telco, the customer identifier is removed, categorical fields are expanded, and the continuous variables tenure, MonthlyCharges, and TotalCharges are standardized on the training split.

For Chickenpieces, each dissimilarity matrix is symmetrized when needed, its diagonal is set to zero, and it is re-scaled by its positive off-diagonal median. Two distance-to-kernel transformations are then used:

K_{RBF} (i, j) = exp (- \frac{D_{i j}^{2}}{2 σ^{2}}), K_{LAP} (i, j) = exp (- \frac{D_{i j}}{σ}),

with

σ

estimated from the corresponding training distances. Each kernel is normalized before entering the comparison.

4.4. Kernel Dictionary for Vector Data

The vector experiments use the same deliberately broad dictionary in every split:

D = {LIN, POLY 2, COS, RBF 01, \dots, RBF 10, LAP, RQ, INTRA, INTER, WRBF, LOCAL, PROTO} .

Table 2 gives the definitions used before trace normalization. All quantities that depend on the data—bandwidths, nearest-neighbor scales, class scales, prototype locations, and variable weights—are estimated on the training split only.

The RBF kernels use empirical distance quantiles as bandwidths, the Laplacian kernel uses the median

L^{1}

distance, and the rational-quadratic kernel uses the median Euclidean scale. The class-scale kernels INTRA and INTER estimate their bandwidths from within-class and between-class training distances. LOCAL is a self-tuning kernel with a nearest-neighbor scale [43]. PROTO is a supervised prototype kernel estimated from training labels. All kernels are trace-normalized before combination:

{\tilde{K}}_{ℓ} = \frac{n K_{ℓ}}{tr (K_{ℓ})} .

Automatic Selection Protocol

The suffix auto has a fixed meaning throughout the tables. Within each outer training split, kernels are first ranked by centered alignment with the training-label kernel, computed on the training block only. For a candidate number

m_{0}

, the retained index set is

I_{m_{0}} = {the m_{0} highest - ranked training kernels} .

The pair

(m_{0}, C_{SVM})

is then selected by stratified inner cross-validation on the training split. The vector-data runs use

m_{0} \in {3, 5, 7, 10}

and

C_{SVM} \in {0.25, 0.5, 1, 2, 4, 8, 16}

; the Chickenpieces runs use the same

C_{SVM}

grid and

m_{0} \in {3, 5, 7, 10, 15}

. For mean_auto, the selected kernels are averaged with uniform weights. For alignment_auto, they are combined with non-negative weights proportional to their positive centered alignments. For matrix-only rules, the same selected kernels are first combined on the training block, the resulting matrix is projected onto the PSD cone when necessary, and the train–test block is obtained by the spectral/KRR continuation described below. The label “best individual” is a descriptive reference chosen from the completed single-kernel summaries; inside each outer split only its SVM penalty is tuned by inner CV. The label “best matrix-only” denotes the predeclared matrix-only rule with the lowest mean inner-CV error for that data set; it is not selected from held-out test performance.

4.5. Combination Rules

We compare three families of combinations.

1.: Direct convex combinations. We use a simple mean and an alignment-weighted mean over the selected kernels:

$K^{mean} = \frac{1}{m} \sum_{ℓ \in I_{m}} {\tilde{K}}_{ℓ}, K^{align} = \sum_{ℓ \in I_{m}} w_{ℓ} {\tilde{K}}_{ℓ} .$

The weights $w_{ℓ}$ are proportional to the positive centered alignment between ${\tilde{K}}_{ℓ}$ and the ideal training label kernel [10,11].
2.: Label-aware matrix combinations. The pick-out/max–min rule is defined on training pairs by

$K_{i j}^{MM} = \{\begin{matrix} max_{ℓ \in I_{m}} {\tilde{K}}_{ℓ, i j}, & y_{i} = y_{j}, \\ min_{ℓ \in I_{m}} {\tilde{K}}_{ℓ, i j}, & y_{i} \neq y_{j} . \end{matrix}$

Percentile variants replace the maximum and/or minimum by upper and lower empirical percentiles [6]. These rules are supervised and matrix-defined; their test blocks are obtained only through the out-of-sample reconstruction.
3.: Hybrid score selection. As an additional diagnostic, we tested a score

$score (K_{ℓ}) = α rankCV (K_{ℓ}) + (1 - α) rankAlignment (K_{ℓ}),$

but it did not improve the direct mean combinations and is therefore not used as a principal method.

When a label-aware combination is indefinite, it is symmetrized and projected onto the positive semidefinite cone by clipping negative eigenvalues [7,30].

4.6. Out-of-Sample Evaluation

Direct convex combinations have explicit train–test blocks because the base kernels can be evaluated between test and training points. In contrast, pick-out and percentile rules define a matrix only on the training set. For these rules, we use the eigenfunction-based reconstruction described in the theoretical sections. The training matrix is decomposed spectrally, its positive part is retained, empirical eigenvectors are interpreted as traces of eigenfunctions, and the resulting functions are extended through the chosen basis. The C++ runs reported here use a spectral/KRR continuation for these matrix-only rules. This is stricter than evaluating an oracle matrix, because no test label is used when constructing the train–test block. Concretely, after PSD projection we retain positive eigenvalues larger than

10^{- 10}

until the retained positive spectrum explains

99 %

of the positive trace. If

K_{0}

denotes the direct mean of the selected base kernels, the held-out coordinate matrix is obtained by kernel ridge continuation with ridge

η = 10^{- 6}

:

B = {(K_{0, S} + η I)}^{- 1} U_{r}, {\hat{U}}_{Z} = K_{0, Z S} B,

and the reconstructed train–test block is

{\hat{K}}_{Z S} = {\hat{U}}_{Z} Λ_{r} U_{r}^{⊤} .

The percentile variants use the upper and lower empirical percentiles

0.75

and

0.25

, respectively. All SVM evaluations use precomputed-kernel C-SVMs with C selected by the same inner-CV grid; Chickenpieces, the only multiclass benchmark, is handled by a one-versus-one multiclass reduction.

The controlled Synthetic experiment also includes an oracle diagnostic. For rules such as the sum or the mean of base kernels, the true train–test block is available because the analytic combination is known. We can therefore compare the reconstructed block against the oracle block directly. This diagnostic is not used for model selection; it is only a check that the matrix-to-function conversion recovers the off-sample behavior when the training matrix comes from a smooth evaluable combination.

5. Results

5.1. Vector Data Sets: 30-Repetition Validation

Table 3 summarizes the main 30-repetition vector-data experiments. For each data set we report the best individual kernel, the two direct combinations, and the supervised matrix-only rule selected by the inner-CV criterion described above after strict out-of-sample reconstruction. The table is not intended to claim that the combined kernels are optimized classifiers. It is intended to show that a matrix built from the historical kernel dictionary can be converted into an evaluable kernel whose test behavior remains close to the behavior observed on the training matrix.

The results support the intended interpretation. On Synthetic, the simple mean slightly improves the best individual RBF scale, showing that the direct combination can recover a useful multiscale geometry. On Breast, the best individual kernel is the cosine kernel, and both direct combinations remain within about

0.002

error units of it. On Ionosphere, the best individual kernel is the local-scaling kernel; the matrix-only percentile rule and the direct mean are both close to it. On Telco, the best fixed individual kernel by classification error is the quadratic polynomial kernel, and the direct mean and alignment-weighted combinations remain within about

0.002

error units of it. The matrix-only percentile rule is considerably worse in Telco, which reinforces the distinction between stable analytic combinations and abrupt supervised matrix transformations. These outcomes are sufficient for the purpose of the article: the base kernels were not selected because they are special optimal classifiers, yet their matrix combinations can still be converted into usable out-of-sample kernels.

The large variability of percentile_inout_auto on the Synthetic data deserves a separate interpretation. This rule is not an analytic kernel combination: it is an entrywise, label-aware matrix transformation based on within-class and between-class empirical percentiles. As a result, small changes in the outer training split can change the selected kernels, the percentile thresholds, and the label-dependent entries of the combined matrix. The spectral/KRR continuation then has to extrapolate coordinate traces of an abrupt, sample-dependent geometry, while test labels are not available. This explains why the standard deviation of this method is much larger than that of mean_auto and alignment_auto. The result should therefore be read as a stability diagnostic for abrupt supervised matrix-only rules, rather than as a failure of direct kernel combination.

The fact that the best individual kernel changes across data sets is also expected. The Synthetic data favor a Gaussian scale adapted to a smooth curved geometry; Breast Cancer favors an angular/cosine similarity on ordinal biomedical attributes; and Telco favors a low-degree polynomial interaction on a heterogeneous tabular encoding. Thus, kernel choice does affect classification performance. However, the direct combinations remain close to the best individual choices, suggesting that stable combinations can reduce the need to commit to one manually selected kernel while preserving a deployable out-of-sample formula.

5.2. Telco Customer Churn

The Telco run consists of 30 independent stratified repetitions with 1400 training and 600 test customers per repetition. Each split uses the same 20-kernel vector dictionary and the same strict inner-CV protocol as the other vector benchmarks. Table 4 reports the most informative single-kernel rows. The best fixed individual kernel by mean classification error is POLY2, with error

0.2045 \pm 0.0154

. The best balanced accuracy among the fixed individual kernels is obtained by LIN, with balanced accuracy

0.7066 \pm 0.0222

, although its mean error is slightly larger. The prototype kernel has the largest alignment with the training label kernel, but it does not produce the best held-out classifier; this is a useful reminder that alignment is a screening statistic rather than a complete substitute for validation. The supervised weighted RBF is a clear failure mode on this encoding, with balanced accuracy essentially at chance.

The discrepancy between alignment and held-out accuracy is expected on this heterogeneous table. Centered alignment measures agreement with the ideal label kernel on the training block; it does not measure the SVM margin, the stability of local neighborhoods, or how well a similarity can generalize to unseen customers. The prototype kernel obtains the largest alignment because it summarizes customers through class-prototype similarities estimated on the training data, but this compression can remove within-class heterogeneity that is useful for prediction. Numerically, PROTO has the largest alignment (

0.1672

) but a worse error (

0.2150 \pm 0.0143

) than POLY2 (

0.2045 \pm 0.0154

). The WRBF row is even more informative: its negative alignment (

- 0.1408

), near-chance balanced accuracy (

0.5009 \pm 0.0036

), and nearly degenerate support-vector count indicate that the supervised diagonal RBF metric is poorly matched to the mixed one-hot and standardized Telco encoding. The automatic combination strategy is therefore not based on alignment alone: alignment is used only to rank candidate kernels, while the number of retained kernels and the SVM penalty are selected by inner cross-validation on the training split.

Table 5 gives the corresponding combination results. The two direct combinations are stable and almost indistinguishable from each other: mean_auto gives error

0.2061 \pm 0.0154

, and alignment_auto gives

0.2063 \pm 0.0158

. Their inner-CV errors are about

0.202

, so the train–test discrepancy is small. In contrast, all matrix-only supervised rules are substantially worse after spectral/KRR continuation. The best of them is percentile_inout_auto, but its mean error is

0.3069 \pm 0.0197

and its balanced accuracy is

0.6321 \pm 0.0251

. Thus, Telco supports the same conservative conclusion as the other experiments: smooth averages are deployable and stable, whereas label-aware matrix-only operations produce a harder out-of-sample extension problem.

The Telco gap is also linked to the heterogeneous tabular nature of the data. The predictors combine demographic variables, categorical service indicators, contract and payment information, tenure, and billing amounts. Direct convex combinations preserve the smooth train–test structure of the base kernels and the block-level similarities induced by these variables. By contrast, matrix-only supervised rules impose an additional within-class/between-class deformation on the training matrix. In a heterogeneous table, this deformation can overwrite useful local similarities and create a coarse label-driven geometry that is not well aligned with the original covariate structure. The diagnostic in Table 6 shows that the train–test gap is moderate but the absolute error is already high, indicating that the main limitation in Telco is the induced supervised training geometry itself, not only the continuation step.

5.3. Training-Matrix Behavior Versus Out-of-Sample Behavior

Table 6 gives the most direct diagnostic for the matrix-to-function claim. The inner-CV column is measured using only the training matrix, that is, the points on which the matrix-to-function conversion is built. The test column is measured after extending the same construction to new points. The absolute gap is small for the direct mean on all four vector data sets. For the best matrix-only rule it is also small on Breast and Ionosphere, moderate on Telco, and large on Synthetic. Telco is especially informative because the best matrix-only rule has a moderate train–test gap but poor classification performance, so its limitation is the learned supervised matrix itself rather than only the continuation step.

This table is the main experimental evidence for the functionality of the extension. The relevant comparison is not whether the test error is the smallest possible among all classifiers, but whether the classifier obtained after extending the matrix behaves like the classifier assessed on the matrix used to build that extension. The direct mean passes this diagnostic robustly, including on Telco. The matrix-only percentile rules pass it most clearly on Breast and Ionosphere. On Telco the gap is not large, but the absolute error remains high, showing that the supervised matrix transformation did not produce a competitive geometry for this heterogeneous table. Synthetic shows a complementary negative case, where an abrupt supervised transformation can be hard to continue faithfully.

Quantitatively, the average absolute train–test gap of the direct mean over the four vector data sets is

0.0091

, with a maximum of

0.0163

on Synthetic. The corresponding average for the selected matrix-only supervised rows is

0.0211

, more than twice as large, and the largest vector-data gap is

0.0561

for Synthetic percentile_inout_auto. Including the relational stress test makes the contrast sharper: Chickenpieces mean_auto has a gap of

0.0115

, whereas pickout_auto has a gap of

0.3141

. This is the most compact numerical summary of the robustness pattern: smooth direct continuation is stable across data types, while abrupt supervised matrix-only continuation is data-dependent and can become unstable.

The two negative cases in Table 6 should therefore be distinguished. In Synthetic, percentile_inout_auto has both a larger train–test gap and a much larger across-split standard deviation, which points to instability of the abrupt label-aware continuation. In Telco, by contrast, the train–test gap is moderate, but both the inner-CV and test errors are high. This indicates that the matrix-only supervised rule has already produced a less competitive training geometry. These two behaviors support the same methodological conclusion: the reconstruction layer can make matrix-level rules deployable, but it cannot guarantee that every label-aware training matrix defines a stable and useful out-of-sample geometry.

5.4. Numerical Reconstruction Diagnostics

Table 7 records the numerical quantities monitored during the same runs. For analytic direct means, the PSD perturbation is zero and the main diagnostic is the inner-CV/test gap. For matrix-only rules, there is generally no oracle block

K^{*} (S, Z)

, so the reported diagnostics are the selected

m_{0}

, selected SVM penalty, PSD perturbation, inner-CV error, test error, and their absolute gap. The controlled oracle diagnostic in Table 8 provides the additional Frobenius and MSE block errors in the synthetic case where an oracle block is available.

These diagnostics are chosen to avoid label leakage. For matrix-only supervised rules such as pick-out and the percentile variants, there is generally no genuine oracle block

K^{*} (S, Z)

at prediction time, because the rule depends on labels and the labels of the new points are not available. Constructing such a block with test labels would answer a different, retrospective question and would not represent a valid deployable classifier. For this reason, Table 7 reports diagnostics that are available under the strict protocol: PSD perturbation

Δ_{PSD}

, selected model complexity, inner-CV error, test error, and the train–test gap. These quantities measure both the numerical effect of PSD rectification and the stability of the out-of-sample continuation without using test labels.

5.5. Oracle Reconstruction Diagnostic

The controlled Synthetic setting allows a second check. When the combined matrix comes from a known analytic rule such as a sum or a mean of base kernels, the oracle train–test block is available. Table 8 compares this oracle block with the block reconstructed from the training matrix only. Besides the relative Frobenius error, we report the functional block mean squared error

{MSE}_{func} = \frac{1}{| S | | Z |} \sum_{i \in S} \sum_{z_{j} \in Z} {(\hat{K} (x_{i}, z_{j}) - K_{oracle} (x_{i}, z_{j}))}^{2} .

The sum and mean rows have relative Frobenius error essentially at numerical precision. Pick-out is included as a negative control: it is a sharp label-aware matrix rule and does not correspond to the same smooth analytic continuation. In that row the oracle block uses test labels only for this retrospective negative-control diagnostic; test labels are never used to train, select, or reconstruct the evaluated model.

The oracle diagnostic clarifies the scope of the proposal. When the matrix is produced by a smooth combination of evaluable kernels, the matrix-to-function conversion recovers the off-sample kernel block. When the matrix is produced by an abrupt label-aware operation, the training matrix can still be represented spectrally, but the out-of-sample continuation is a harder modeling problem.

The relative Frobenius error and the functional MSE in Table 8 are therefore reconstruction-accuracy metrics for the case in which a leakage-free oracle block is meaningful. The pick-out row is included only as a negative control to show how an abrupt label-aware matrix rule differs from a smooth analytic combination.

5.6. Chickenpieces

Chickenpieces is the most important relational experiment because the input is not a feature matrix but a family of dissimilarities between shapes [12,39]. Table 9 reports the best individual kernel and the main combinations.

The best individual kernel is a Laplacian transformation of the norm29/cost60 contour dissimilarity. The mean and alignment combinations essentially recover the best individual kernel without selecting it manually: the difference in mean error is less than

10^{- 3}

. This is the strongest evidence that the workflow is useful for relational data, where the natural modeling question is not which coordinate system to use, but which dissimilarity source or kernel transformation should be trusted.

The pick-out rule is not competitive in Chickenpieces under strict out-of-sample reconstruction. This is not a contradiction of the method; it is a diagnostic result. The training matrix created by pick-out contains abrupt label-dependent information, and the experiment shows that such geometry is harder to continue to new silhouettes without test labels.

This result should not be interpreted as a general impossibility statement for label-aware matrix combinations on relational dissimilarity data. It shows that the particular abrupt pick-out rule is not robust in the strict Chickenpieces out-of-sample setting. The direct mean and alignment combinations remain stable because they preserve the similarity structures induced by the distance-to-kernel transformations. Pick-out, on the other hand, builds a sharply label-dependent training matrix, and the labels needed to reproduce that rule are unavailable for new silhouettes. In this sense, pickout_auto functions here as a negative-control or stress-test rule. Relational data remain a natural application of the proposed matrix-to-function framework, but smoother supervised combinations are needed if one wants to exploit label information while preserving stable deployability.

A practical way to improve relational label-aware rules would be to regularize the supervised deformation itself. Examples include convex shrinkage of pick-out towards the direct mean, temperature-smoothed max–min or percentile rules, PSD-constrained supervised weights, graph smoothing of the continued eigentraces, and cross-validation of the continuation ridge or retained spectral rank with a criterion that penalizes large train–test gaps. These modifications would preserve the relational input format while making the label-dependent geometry less discontinuous.

5.7. Benchmarking Perspective Relative to Alternative Extension Routes

The experiments provide quantitative benchmarking for the matrix-to-function step, but not a full leaderboard against all possible kernel classifiers. This distinction is important. Existing out-of-sample methods such as Nyström approximations or random features assume that the kernel section to the new point is known or approximable from a known analytic kernel. For matrix-only supervised rules, that section is precisely unavailable without using the unknown test label. The leakage-free comparisons that are meaningful in this setting are therefore: (i) direct analytic combinations versus reconstructed matrix-only rules in held-out classification; (ii) training-matrix inner-CV versus held-out test performance; and (iii) oracle train–test block error in the controlled Synthetic case where an analytic smooth rule is available.

Table 10 summarizes these comparisons. The key numerical message is that the proposed continuation is essentially exact for smooth analytic matrices in the Synthetic oracle diagnostic, while classification robustness depends on the regularity of the matrix geometry being continued.

5.8. Additional Hybrid Score Experiment

We also ran the hybrid selector based on a convex combination of CV-rank and alignment-rank. These runs used only five repetitions and are therefore treated as auxiliary. On Breast, the best score version reached error 0.0322 ± 0.0122; on Synthetic, 0.0787 ± 0.0119; and on Ionosphere, 0.0566 ± 0.0094. These values do not improve the principal direct means in Table 3. The experiment was useful as a diagnostic, but it will not be retained as a main method until it is re-run with the same 30-repetition protocol.

6. Discussion

The final experiments lead to a deliberately conservative interpretation.

First, the experiments support the central matrix-to-function claim. For the stable mean combination, the classifier evaluated on held-out points behaves like the classifier assessed on the training matrix used to construct the extension. The 30-repetition train–test gaps are small in Synthetic, Breast, Ionosphere, and Telco for the direct mean. The oracle Synthetic diagnostic is even stronger: when the training matrix is generated by a smooth sum or mean of base kernels, the reconstructed train–test block agrees with the oracle block up to numerical precision.

Second, the results should not be read as an optimized classification benchmark. The kernel dictionary was deliberately inherited from earlier kernel-combination work and was not redesigned to make these data sets easy. It contains simple baselines, multiple Gaussian scales, class-scale variants, local scaling, prototype similarities, and a supervised weighted RBF. The fact that the mean or alignment combinations remain close to the best individual kernel is therefore useful evidence: the conversion works even with a non-specialized historical battery of kernels.

Third, alignment is useful as a diagnostic but not as a universal replacement for model selection [10,11]. In Breast, alignment identifies the cosine kernel, which is also the best individual kernel. In Ionosphere and Telco, however, prototype-type kernels have high alignment but poorer classification performance than simpler alternatives. This confirms that alignment should be treated as a screening measure rather than as a complete performance criterion.

Fourth, the label-aware rules reveal the distinction between a discriminative training matrix and a deployable out-of-sample kernel [6,18]. Percentile rules extend well in Breast and Ionosphere. In Telco, the best percentile rule has a moderate train–test gap but is still far less accurate than the direct means, so its limitation is the induced matrix geometry. The Synthetic and Chickenpieces results show an even stronger failure mode: abrupt supervised operations can be hard to continue faithfully. Pick-out is therefore best interpreted as a stress test or negative control, not as the empirical claim of the paper.

Fifth, Telco strengthens the heterogeneous-tabular side of the motivation. The customer table mixes categorical service fields, account variables, and billing amounts. The direct mean remains within

0.002

error units of the best fixed individual kernel, which indicates that the broad dictionary can be aggregated without losing deployability. Sixth, Chickenpieces strengthens the relational side of the motivation. It shows that the matrix-to-function problem is not merely a tabular-kernel issue. In a relational problem with 44 dissimilarity matrices, the proposed protocol can compare and aggregate many kernels, identify a useful transformation family, and evaluate the resulting kernel out of sample.

Seventh, the benchmarking claim should be interpreted in the appropriate problem class. The state-of-the-art methods most often associated with out-of-sample kernels, such as Nyström approximations and random features, are designed for situations where the kernel section or an analytic base kernel is available. They do not directly solve the deployment of a final label-aware matrix rule whose entries are defined only on the training pairs. For that reason, the strongest quantitative evidence in this paper is not a generic runtime leaderboard but the combination of held-out classification baselines, train–test stability diagnostics, and the Synthetic oracle block comparison. Scaling the same reconstruction to much larger problems will require low-rank or randomized eigensolvers and faster coordinate evaluation, which we regard as an implementation direction rather than a change in the mathematical problem.

The unstable and negative-control cases discussed above clarify the scope of the framework. Kernel choice matters, because different data geometries favor different base kernels; nevertheless, stable direct combinations can aggregate a broad dictionary while keeping an explicit train–test formula. Reconstruction diagnostics also need to be interpreted according to the type of combination: for smooth analytic rules, relative Frobenius error and functional MSE can be computed against an oracle block, whereas for label-aware matrix-only rules the valid deployment diagnostics are PSD perturbation, inner-CV/test gaps, and held-out classification metrics without test-label access. Finally, the poor performance of percentile_inout_auto in Synthetic and pickout_auto in Chickenpieces should be viewed as informative negative controls. They show that a discriminative training matrix is not automatically a stable deployable kernel.

The final message is therefore not that kernel combinations always outperform the best individual kernel. Rather, the contribution is a principled and strictly validated way to move from a finite kernel matrix or a family of kernel matrices to an out-of-sample kernel evaluator. When the combination is smooth and geometrically stable, as in direct sums and means, the evaluator preserves the training-matrix behavior on new points. When the combination is highly label-driven, as in pick-out, the method exposes the difficulty of extending that matrix faithfully.

7. Conclusions

This paper studies inductive finite-rank extensions of out-of-sample kernel functions from empirical kernel matrices, with particular emphasis on matrices produced by kernel-combination rules. The proposed approach uses spectral decompositions and data-dependent RKHS bases to convert a matrix defined on a training sample into a controlled kernel evaluator for new points.

The experiments support three conclusions. First, the conversion from matrix to function is reliable for smooth, stable combinations such as sums and means. Quantitatively, the direct mean is essentially tied with the best individual kernel on Synthetic (

0.0793 \pm 0.0227

versus

0.0809 \pm 0.0231

error), remains within about

0.002

error units of the best individual kernel on Telco (

0.2061 \pm 0.0154

versus

0.2045 \pm 0.0154

), and matches the best relational Chickenpieces kernel within

10^{- 3}

error units. In the controlled Synthetic oracle diagnostic, smooth sum/mean reconstruction gives relative Frobenius error

4.13 \times 10^{- 6} \pm 9.41 \times 10^{- 6}

and functional MSE at numerical scale. Second, the base kernels do not need to be specially engineered for optimal classification in order for the reconstruction idea to work; the broad historical dictionary used in earlier combination experiments is sufficient to obtain stable deployable kernels. Third, purely supervised matrix-only combinations such as pick-out and some percentile rules are harder to deploy: Synthetic percentile_inout_auto has error

0.1404 \pm 0.1198

, the best Telco matrix-only supervised rule has error

0.3069 \pm 0.0197

, and Chickenpieces pickout_auto has error

0.3545 \pm 0.2666

. This limitation is not a contradiction of the reconstruction method; rather, it identifies the boundary between smooth or moderately regular matrix combinations, which are naturally extendable, and abrupt supervised transformations, which require additional regularization or smoother design before they can be recommended as deployable kernels.

The evidential basis for these conclusions is now made explicit. Table 3 and Table 6 support the stability claim for direct combinations through held-out classification and train–test gap diagnostics. Table 8 supports the reconstruction-accuracy claim for smooth analytic rules through relative Frobenius error and functional MSE. Table 5 and Table 9 delimit the scope of label-aware matrix-only rules by showing that abrupt supervised transformations may perform poorly even when the matrix can be represented spectrally. Table 10 clarifies the relationship with alternative extension routes and explains why methods requiring an analytic test kernel section are not directly applicable to label-aware matrix-only combinations without an oracle. The conclusions are therefore deliberately restricted: the paper does not claim universal superiority of kernel combinations, does not present a large-scale state-of-the-art classifier benchmark, and does not recommend abrupt supervised matrix-only rules as default deployable kernels.

Future work should focus on three directions. The first is computational: large-scale eigensolvers, landmark approximations, and faster basis-evaluation schemes are needed for larger data sets. The second is methodological: smoother supervised combination rules, possibly constrained to remain positive semidefinite and easier to extend, may retain the discrimination of label-aware rules while preserving the stability of direct averages. For relational data, this includes shrinkage of abrupt pick-out matrices towards smooth means, temperature-smoothed max–min rules, and graph-regularized eigentrace continuation. The third direction is comparative: the present reconstruction should be combined and benchmarked with PSD-embedding extensions [8], subspace-fusion rules for heterogeneous proximities [9], polar-decomposition PSD corrections [31], and scalable Nyström or randomized low-rank variants when an analytic kernel section is available.

Author Contributions

Conceptualization, A.M. and A.T.; methodology, A.M., A.T. and E.M.G.; software, A.M. and A.T.; validation, A.M., A.T. and E.M.G.; formal analysis, A.M. and A.T.; investigation, A.M., A.T. and E.M.G.; writing—original draft preparation, A.M. and A.T.; writing—review and editing, A.M., A.T. and E.M.G.; supervision, A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable. The study used simulated data and public benchmark data sets; no new human-subject or animal-subject data were collected for this work.

Informed Consent Statement

Not applicable. The study used simulated data and public benchmark data sets.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Additional Details on Alternate Bases

This appendix records the implementation-level formulas that are useful for reproducibility but would slow down the flow of the main text.

Appendix A.1. Lagrange Basis

The Lagrange basis

{ℓ_{j}}

is defined by the cardinal property

ℓ_{j} (x_{i}) = δ_{i j}

[14]. In matrix form,

ℓ {(x)}^{⊤} = k {(x)}^{⊤} K_{S}^{- 1} .

(A1)

If

s (x) = ℓ {(x)}^{⊤} y

then

s (x_{i}) = y_{i}

for all sample points. This exact sample interpolation explains why the Lagrange basis is a useful reference, even though it is generally the least stable option numerically.

Appendix A.2. Newton Basis and Power Function

If

K_{S} = N N^{⊤}

is a Cholesky factorization with N lower triangular, the Newton basis is defined as

n {(x)}^{⊤} = k {(x)}^{⊤} N^{- ⊤} .

(A2)

This construction and its stability properties are discussed in kernel-basis references such as [13,14]. The corresponding sample matrix is

V = N

, which gives the triangular Newton property

v_{j} (x_{i}) = 0

for

i < j

. The adaptive construction is driven by the power function

P_{K, S, r}^{2} (x) = K (x, x) - \sum_{j = 1}^{r} N_{j} {(x)}^{2},

(A3)

which quantifies the part of

K (\cdot, x)

that is still unexplained after r terms. If

ε_{1}, ε_{2}, \dots

denote the selected pivots then

\begin{matrix} N_{1} (x) & = \frac{K (x, ε_{1})}{\sqrt{K (ε_{1}, ε_{1})}}, \end{matrix}

(A4)

\begin{matrix} N_{m} (x) & = \frac{K (x, ε_{m}) - \sum_{j = 1}^{m - 1} N_{j} (x) N_{j} (ε_{m})}{\sqrt{K (ε_{m}, ε_{m}) - \sum_{j = 1}^{m - 1} N_{j} {(ε_{m})}^{2}}}, m \geq 2 . \end{matrix}

(A5)

These formulas explain why Newton bases are attractive in practice: they are pivoted, naturally rank-adaptive, and directly linked to interpolation error.

Appendix A.3. SVD and Weighted-SVD Bases

If

K_{S} = Q Σ^{2} Q^{⊤}

then the SVD basis is

s {(x)}^{⊤} = k {(x)}^{⊤} Q Σ^{- 1},

(A6)

with sample matrix

V = Q Σ

. A weighted-SVD version is obtained by factorizing

W^{1 / 2} K_{S} W^{1 / 2} = Q_{W} Σ_{W}^{2} Q_{W}^{⊤}

and defining

s_{W} {(x)}^{⊤} = k {(x)}^{⊤} W^{1 / 2} Q_{W} Σ_{W}^{- 1}, V_{W} = W^{- 1 / 2} Q_{W} Σ_{W},

(A7)

for a diagonal positive weight matrix W [13,14]. The experiments in the paper use the unweighted version, but the weighted form is useful because it shows that the spectral basis can also encode sampling information.

Appendix A.4. Hermite-Type Bases

For one-dimensional Gaussian kernels, analytically motivated Hermite-type bases can be used [18,44,45]. We do not include them in the main experiments because our real benchmarks are multivariate and tabular, but they remain relevant as a bridge between the empirical basis construction and the analytic eigenstructure of the Gaussian kernel.

Appendix B. Coefficient Matrix in Basis Coordinates

The main text notes that C is diagonal only in bases aligned with the retained eigenspace. We record both the orthonormal and the general full-rank formulas, because Newton and other data-dependent bases are not necessarily orthonormal on the sample.

Proposition A1.

Let

V \in R^{n \times r}

have full column rank and let

K_{S}^{*}

be a positive-semidefinite matrix. Among all matrices

C \in R^{r \times r}

, the minimizer of

∥ K_{S}^{*} - V C V^{⊤} ∥_{F}^{2}

is

C^{*} = {(V^{⊤} V)}^{- 1} V^{⊤} K_{S}^{*} V {(V^{⊤} V)}^{- 1} .

(A8)

Equivalently,

C^{*} = V^{†} K_{S}^{*} {(V^{†})}^{⊤},

(A9)

where

V^{†} = {(V^{⊤} V)}^{- 1} V^{⊤}

is the Moore–Penrose inverse for a full-column-rank matrix. If

V^{⊤} V = I_{r}

then this reduces to

C = V^{⊤} K_{S}^{*} V .

(A10)

If, in addition,

V = U_{r}

is the matrix of retained eigenvectors of

K_{S}^{*}

, then

C = Λ_{r},

(A11)

so the coefficient matrix is diagonal. More generally, if

V = U_{r} R

for an orthogonal matrix R then

C = R^{⊤} Λ_{r} R

and is orthogonally similar to the retained diagonal spectrum.

Sketch of Proof.

Let

A = V^{⊤} V

and

B = V^{⊤} K_{S}^{*} V

. Expanding the Frobenius norm gives

∥ K_{S}^{*} - V C V^{⊤} ∥_{F}^{2} = {∥ K_{S}^{*} ∥}_{F}^{2} - 2 tr (C B) + tr (C A C^{⊤} A) .

For unrestricted C, the first-order condition is

A C A = B

. Since V has full column rank, A is invertible and the unique solution is (A8). The pseudo-inverse form (A9) is the same identity written with

V^{†}

. If

V^{⊤} V = I_{r}

then

A = I_{r}

and (A10) follows. If

V = U_{r}

, then

V^{⊤} K_{S}^{*} V = U_{r}^{⊤} (U_{r} Λ_{r} U_{r}^{⊤}) U_{r} = Λ_{r}

. □

This result explains a practical difference between the bases used in this paper. The SVD basis is spectrally aligned and therefore tends to produce a coefficient matrix close to diagonal. The Newton basis, by contrast, is adapted through pivoting and triangular structure, so

V^{⊤} V

need not be the identity and C is typically dense even when the reconstruction is accurate.

Appendix C. Fusion Kernel Details and Comparison with the Basis Route

The Fusion Kernel (FK) route [18,32] starts from the same combined training matrix

K_{S}^{*}

but reconstructs the target kernel through the eigenspaces of the base kernels. In schematic form,

K_{FK} (x, z) = \sum_{h = 1}^{d} λ_{h} ψ_{h} (x) ψ_{h} (z),

(A12)

with

ψ_{h} (x) = \sum_{ℓ = 1}^{m} \sum_{j = 1}^{d_{ℓ}} c_{j ℓ}^{(h)} ϕ_{j}^{(ℓ)} (x),

(A13)

where

ϕ_{j}^{(ℓ)}

are eigenfunctions associated with the base kernels and the coefficients

c_{j ℓ}^{(h)}

are chosen so that the resulting eigensystem fits the eigenvectors of

K_{S}^{*}

on the sample.

If the retained eigenspace of

K_{S}^{*}

is exactly generated by the span of the retained base-kernel eigenfunctions and those eigenfunctions are extended without further approximation then FK and the direct basis method reconstruct the same finite-feature PSD kernel up to a change of coordinates in the retained subspace; if the continued coordinates are continuous on a compact domain then this is also the same Mercer kernel. In practice they diverge for two reasons. First, FK approximates the target through the eigensystems of the base kernels, whereas the basis method starts from the eigensystem of

K_{S}^{*}

itself. Second, outside the sample FK continues the target through the base-kernel representations, while the basis method continues it through a stable basis adapted to

K_{S}^{*}

.

This conceptual comparison is the reason FK remains visible in the article even though it is not one of the central methods in the main tables. The same appendix perspective also clarifies the relation with modern out-of-sample constructions: PSD-embedding formulas [8] and heterogeneous subspace fusion [9] can be viewed as complementary mechanisms that may be placed before, after, or in partial replacement of the basis-extension step. A systematic comparison of these combinations is beyond the present experiments and is left for future work.

References

Aronszajn, N. Theory of reproducing kernels. Trans. Am. Math. Soc. 1950, 68, 337–404. [Google Scholar] [CrossRef]
Schölkopf, B.; Smola, A.J. Learning with Kernels; MIT Press: Cambridge, MA, USA, 2002. [Google Scholar]
Shawe-Taylor, J.; Cristianini, N. Kernel Methods for Pattern Analysis; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Lanckriet, G.R.G.; Cristianini, N.; Bartlett, P.; El Ghaoui, L.; Jordan, M.I. Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 2004, 5, 27–72. [Google Scholar]
Gönen, M.; Alpaydın, E. Multiple kernel learning algorithms. J. Mach. Learn. Res. 2011, 12, 2211–2268. [Google Scholar]
de Diego, I.M.; Muñoz, A.; Moguerza, J.M. Methods for the combination of kernel matrices within a support vector framework. Mach. Learn. 2010, 78, 137–174. [Google Scholar] [CrossRef]
Wu, G.; Chang, E.Y.; Zhang, Z. An analysis of transformation on non-positive semidefinite similarity matrices for kernel machines. In Proceedings of the 22nd International Conference on Machine Learning; Association for Computing Machinery: New York, NY, USA, 2005; pp. 828–835. [Google Scholar]
Fanuel, M.; Aspeel, A.; Delvenne, J.-C.; Suykens, J.A.K. Positive semi-definite embedding for dimensionality reduction and out-of-sample extensions. SIAM J. Math. Data Sci. 2022, 4, 153–178. [Google Scholar] [CrossRef]
Münch, M.; Röder, M.; Heilig, S.; Raab, C.; Schleif, F.-M. Static and adaptive subspace information fusion for indefinite heterogeneous proximity data. Neurocomputing 2023, 555, 126635. [Google Scholar] [CrossRef]
Cristianini, N.; Shawe-Taylor, J.; Elisseeff, A.; Kandola, J. On kernel-target alignment. In Advances in Neural Information Processing Systems 14; MIT Press: Cambridge, MA, USA, 2002; pp. 367–373. [Google Scholar]
Cortes, C.; Mohri, M.; Rostamizadeh, A. Algorithms for learning kernels based on centered alignment. J. Mach. Learn. Res. 2012, 13, 795–828. [Google Scholar]
Pękalska, E.; Duin, R.P.W. The Dissimilarity Representation for Pattern Recognition: Foundations and Applications; World Scientific: Singapore, 2005. [Google Scholar]
Pazouki, M.; Schaback, R. Bases for kernel-based spaces. J. Comput. Appl. Math. 2011, 236, 575–588. [Google Scholar] [CrossRef]
Fasshauer, G.E.; McCourt, M.J. Kernel-Based Approximation Methods Using MATLAB; World Scientific: Singapore, 2015. [Google Scholar]
Mercer, J. Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. R. Soc. A 1909, 209, 415–446. [Google Scholar]
Cucker, F.; Smale, S. On the mathematical foundations of learning. Bull. Am. Math. Soc. 2002, 39, 1–49. [Google Scholar] [CrossRef]
Cucker, F.; Zhou, D.-X. Learning Theory: An Approximation Theory Viewpoint; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
Torres Velázquez, A. Obtención de Autofunciones de Kernels de Mercer con Aplicaciones a Problemas de Clasificación. Master’s Thesis, Universidad Carlos III de Madrid, Madrid, Spain, 2018. [Google Scholar]
Schölkopf, B.; Herbrich, R.; Smola, A. A generalized representer theorem. In Computational Learning Theory, Proceedings of the 14th Annual Conference on Computational Learning Theory; Springer: Berlin/Heidelberg, Germany, 2001; pp. 416–426. [Google Scholar]
Williams, C.K.I.; Seeger, M. Using the Nyström method to speed up kernel machines. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2001; pp. 682–688. [Google Scholar]
Li, M.; Kwok, J.T.-Y.; Lu, B.-L. Making large-scale Nyström approximation possible. In Proceedings of the 27th International Conference on Machine Learning; Omnipress: Madison, WI, USA, 2010; pp. 631–638. [Google Scholar]
Zhang, K.; Tsang, I.W.; Kwok, J.T. Improved Nyström low-rank approximation and error analysis. In Proceedings of the 25th International Conference on Machine Learning; Association for Computing Machinery: New York, NY, USA, 2008; pp. 1232–1239. [Google Scholar]
Pourkamali-Anaraki, F.; Becker, S. Improved fixed-rank Nyström approximation via QR decomposition: Practical and theoretical aspects. arXiv 2017, arXiv:1708.04940. [Google Scholar] [CrossRef]
Deng, Z.; Shi, J.; Zhu, J. NeuralEF: Deconstructing kernels by deep neural networks. In Proceedings of the 39th International Conference on Machine Learning; Proceedings of Machine Learning Research; PMLR: New York, NY, USA, 2022; Volume 162, pp. 4976–4992. [Google Scholar]
Mairhuber, J.C. On Haar’s theorem concerning Chebyshev approximation problems having unique solutions. Proc. Am. Math. Soc. 1957, 8, 609–615. [Google Scholar]
Curtis, P.C. Interpolation of functions of several variables. Proc. Am. Math. Soc. 1964, 15, 870–873. [Google Scholar]
Fasshauer, G.E. Meshfree Approximation Methods with MATLAB; World Scientific: Singapore, 2007. [Google Scholar]
Fasshauer, G.E. Positive definite kernels: Past, present and future. Dolomites Res. Notes Approx. 2011, 4, 21–63. [Google Scholar]
Wenzel, T.; Santin, G.; Haasdonk, B. A novel class of stabilized greedy kernel approximation algorithms: Convergence, stability and uniform point distribution. J. Approx. Theory 2021, 262, 105508. [Google Scholar] [CrossRef]
Muñoz, A.; de Diego, I.M. From indefinite to positive semidefinite matrices. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition and Structural and Syntactic Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2006; pp. 764–772. [Google Scholar]
Münch, M.; Röder, M.; Schleif, F.-M. Unlocking the potential of non-PSD kernel matrices: A polar decomposition-based transformation for improved prediction models. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2023; pp. 1867–1876. [Google Scholar] [CrossRef]
Muñoz, A.; González, J. Functional learning of kernels for information fusion purposes. In Iberoamerican Congress on Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2008; pp. 277–283. [Google Scholar]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: New York, NY, USA, 2013. [Google Scholar]
Wolberg, W.H. Breast Cancer Wisconsin (Original) [Dataset]. In UCI Machine Learning Repository; University of California, Irvine: Irvine, CA, USA, 1990. [Google Scholar] [CrossRef]
Sigillito, V.; Wing, S.; Hutton, L.; Baker, K. Ionosphere [Dataset]. In UCI Machine Learning Repository; University of California, Irvine: Irvine, CA, USA, 1989. [Google Scholar] [CrossRef]
BlastChar, I.B.M./. Telco Customer Churn Data Set. IBM Sample Distributed Through Kaggle. Available online: https://www.kaggle.com/datasets/blastchar/telco-customer-churn (accessed on 1 April 2026).
Duin, R.P.W.; Pękalska, E. PRDisData: Dissimilarity Data Sets. 37 Steps/PRTools. Available online: https://www.37steps.com/data/zip/prdisdata.zip (accessed on 1 April 2026).
Bunke, H.; Bühler, U. Applications of approximate string matching to 2D shape recognition. Pattern Recognit. 1993, 26, 1797–1812. [Google Scholar] [CrossRef]
Bicego, M.; Martins, A.F.T.; Murino, V.; Aguiar, P.M.Q.; Figueiredo, M.A.T. 2D shape recognition using information theoretic kernels. In Joint IAPR International Workshops on Structural, Syntactic, and Statistical Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Garreau, D.; Jitkrittum, W.; Kanagawa, M. Large sample analysis of the median heuristic. arXiv 2017, arXiv:1707.07269. [Google Scholar]
Zelnik-Manor, L.; Perona, P. Self-tuning spectral clustering. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2004. [Google Scholar]
Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables; Dover: New York, NY, USA, 1965. [Google Scholar]
NIST Digital Library of Mathematical Functions. Release 1.0.20. 2018. Available online: https://dlmf.nist.gov/ (accessed on 1 April 2026).

Figure 1. Schematic overview of the proposed framework: (A) The original data may consist of complex objects, such as shape classes from the Chickenpieces data set. (B) Several pairwise dissimilarity matrices are available on the sample. (C) These dissimilarities are transformed into candidate kernels. (D) The kernels are combined and represented through a finite spectral or basis expansion, yielding a kernel-based construction that can be evaluated out of sample on previously unseen points. The panel colors distinguish the four conceptual stages and have no quantitative meaning. The main goal is not to outperform specialized classifiers per se, but to convert sample-based constructions into evaluable functions on the ambient domain.

Figure 2. Code-generated illustration of a sample-based Gaussian RKHS expansion extended out of sample. Noisy observations of

f (x) = sin (x)

are used to fit the regularized kernel expansion

(K + λ I) α = y

with

σ = 1

and

λ = 0.1

. The fitted function can then be evaluated on a dense grid and at a previously unseen point.

Figure 2. Code-generated illustration of a sample-based Gaussian RKHS expansion extended out of sample. Noisy observations of

f (x) = sin (x)

are used to fit the regularized kernel expansion

(K + λ I) α = y

with

σ = 1

and

λ = 0.1

. The fitted function can then be evaluated on a dense grid and at a previously unseen point.

Figure 3. Six distinct binary silhouettes from the Chickenpieces benchmark, shown without preprocessing. The silhouettes are taken from the public example panel of Bicego et al. [41]; the panel is included only to illustrate the type of shape objects used in the relational experiments.

Table 1. Selected rows and representative columns from the Telco Customer Churn table. The example shows the coexistence of binary, categorical, account/ordinal, and continuous variables within the same churn-prediction task.

Demographics			Service/Account			Contract and Billing				Target
Gender	Senior	Partner	Tenure	Phone	Internet	Contract	Payment	Monthly	Total	Churn
Female	0	Yes	1	No	DSL	Month-to-month	Electronic check	29.85	29.85	No
Male	0	No	34	Yes	DSL	One year	Mailed check	56.95	1889.50	No
Male	0	No	2	Yes	DSL	Month-to-month	Mailed check	53.85	108.15	Yes
Male	0	No	45	No	DSL	One year	Bank transfer auto	42.30	1840.75	No
Female	0	No	2	Yes	Fiber optic	Month-to-month	Electronic check	70.70	151.65	Yes
Female	0	No	8	Yes	Fiber optic	Month-to-month	Electronic check	99.65	820.50	Yes

Table 2. Vector-data kernel dictionary. The formulas are written for a generic training split;

x, z \in R^{p}

, Q denotes the empirical quantile set of non-zero training distances, and all label-dependent quantities are estimated only from training labels.

Table 2. Vector-data kernel dictionary. The formulas are written for a generic training split;

x, z \in R^{p}

, Q denotes the empirical quantile set of non-zero training distances, and all label-dependent quantities are estimated only from training labels.

Name	Definition	Role in the Dictionary
`LIN`	$K (x, z) = x^{⊤} z$	Linear baseline.
`POLY2`	$K (x, z) = {(1 + x^{⊤} z / p)}^{2}$	Low-degree interaction baseline.
`COS`	$K (x, z) = x^{⊤} z / (∥ x ∥ ∥ z ∥)$	Angular similarity, useful when scale is less informative than direction.
`RBF01–RBF10`	$K_{q} (x, z) = {exp {- ∥ x - z ∥}^{2} / (2 σ_{q}^{2})}$ , with $σ_{q}$ the q-quantile of non-zero Euclidean training distances and $q \in Q$	Multiscale Gaussian core of the dictionary [42].
`LAP`	$K (x, z) = exp {- ∥ x - z ∥_{1} / σ_{L 1}}$	Less-smooth distance kernel based on the median $L^{1}$ training scale.
`RQ`	$K (x, z) = {({1 + ∥ x - z ∥}^{2} / (2 α σ_{0}^{2}))}^{- α}$ , with $α = 1$	Scale-mixture alternative to a single Gaussian bandwidth.
`INTRA`	$K (x, z) = exp {- ∥ x - z ∥^{2} / (2 σ_{intra}^{2})}$	Gaussian bandwidth set by the median within-class training distance.
`INTER`	$K (x, z) = exp {- ∥ x - z ∥^{2} / (2 σ_{inter}^{2})}$	Gaussian bandwidth set by the median between-class training distance.
`WRBF`	$K (x, z) = exp {- \sum_{r = 1}^{p} w_{r} {(x_{r} - z_{r})}^{2} / (2 σ_{w}^{2})}$	Supervised diagonal metric; $w_{r}$ measures training-set variable relevance.
`LOCAL`	$K (x_{i}, x_{j}) = exp {- ∥ x_{i} - x_{j} ∥^{2} / (σ_{i} σ_{j})}$	Self-tuning kernel with $σ_{i}$ given by a nearest-neighbor distance [43].
`PROTO`	$p_{c} (x) = exp {- ∥ x - μ_{c} ∥^{2} / (2 σ_{p}^{2})}$ , $K (x, z) = p {(x)}^{⊤} p (z)$	Similarity through training-set class prototypes $μ_{c}$ .

Table 3. Main vector-data results over 30 repetitions. Mean classification error and balanced accuracy are reported as mean ± standard deviation.

Data Set	Method	Kernel/Combination	Error	Balanced Accuracy
Synthetic	best individual	`RBF03`	$0.0809 \pm 0.0231$	$0.9191 \pm 0.0231$
Synthetic	direct combination	`mean_auto`	$0.0793 \pm 0.0227$	$0.9207 \pm 0.0227$
Synthetic	direct combination	`alignment_auto`	$0.0798 \pm 0.0226$	$0.9202 \pm 0.0226$
Synthetic	best matrix-only	`percentile_inout_auto`	$0.1404 \pm 0.1198$	$0.8596 \pm 0.1198$
Breast	best individual	`COS`	$0.0289 \pm 0.0112$	$0.9736 \pm 0.0114$
Breast	direct combination	`mean_auto`	$0.0311 \pm 0.0108$	$0.9711 \pm 0.0111$
Breast	direct combination	`alignment_auto`	$0.0311 \pm 0.0113$	$0.9709 \pm 0.0116$
Breast	best matrix-only	`percentile_inout_auto`	$0.0358 \pm 0.0124$	$0.9616 \pm 0.0164$
Ionosphere	best individual	`LOCAL`	$0.0484 \pm 0.0183$	$0.9441 \pm 0.0239$
Ionosphere	direct combination	`mean_auto`	$0.0519 \pm 0.0171$	$0.9381 \pm 0.0219$
Ionosphere	direct combination	`alignment_auto`	$0.0528 \pm 0.0174$	$0.9370 \pm 0.0230$
Ionosphere	best matrix-only	`percentile_in_auto`	$0.0487 \pm 0.0166$	$0.9411 \pm 0.0234$
Telco	best individual	`POLY2`	$0.2045 \pm 0.0154$	$0.6959 \pm 0.0226$
Telco	direct combination	`mean_auto`	$0.2061 \pm 0.0154$	$0.6909 \pm 0.0218$
Telco	direct combination	`alignment_auto`	$0.2063 \pm 0.0158$	$0.6900 \pm 0.0229$
Telco	best matrix-only	`percentile_inout_auto`	$0.3069 \pm 0.0197$	$0.6321 \pm 0.0251$

Table 4. Selected individual-kernel results on Telco over 30 repetitions. The rows show the lowest-error kernel, the best RBF scale, the highest balanced-accuracy kernel, two alignment diagnostics, and the main failure mode.

Kernel	Family	Role	Error	Balanced Accuracy	Alignment	Support Vectors
`POLY2`	polynomial	lowest mean error	$0.2045 \pm 0.0154$	$0.6959 \pm 0.0226$	$0.0924$	$672.4$
`RBF10`	RBF quantile	best RBF scale	$0.2050 \pm 0.0145$	$0.6975 \pm 0.0222$	$0.1014$	$689.4$
`LIN`	linear	highest balanced accuracy	$0.2071 \pm 0.0150$	$0.7066 \pm 0.0222$	$0.0948$	$662.6$
`COS`	cosine	high alignment reference	$0.2112 \pm 0.0166$	$0.6958 \pm 0.0242$	$0.1183$	$651.1$
`PROTO`	prototype	largest alignment	$0.2150 \pm 0.0143$	$0.6714 \pm 0.0250$	$0.1672$	$718.8$
`WRBF`	weighted RBF	failure mode	$0.3419 \pm 0.1741$	$0.5009 \pm 0.0036$	$- 0.1408$	$0.7$

Table 5. Telco combination results over 30 repetitions. Direct combinations have explicit train–test blocks; matrix-only rules use the spectral/KRR continuation and no test labels.

Combination	Method	$m_{0}$	C	$Δ_{PSD}$	Inner CV Error	Test Error	Balanced Accuracy
`mean_auto`	direct	$7.50$	$2.40$	$0.0000$	$0.2016 \pm 0.0090$	$0.2061 \pm 0.0154$	$0.6909 \pm 0.0218$
`alignment_auto`	direct	$7.67$	$2.30$	$0.0000$	$0.2018 \pm 0.0090$	$0.2063 \pm 0.0158$	$0.6900 \pm 0.0229$
`percentile_inout_auto`	spectral/KRR	$10.00$	$2.50$	$0.0161$	$0.2893 \pm 0.0161$	$0.3069 \pm 0.0197$	$0.6321 \pm 0.0251$
`percentile_in_auto`	spectral/KRR	$10.00$	$2.70$	$0.1316$	$0.2942 \pm 0.0165$	$0.3114 \pm 0.0188$	$0.6274 \pm 0.0243$
`pickout_auto`	spectral/KRR	$10.00$	$2.30$	$0.1183$	$0.2997 \pm 0.0162$	$0.3194 \pm 0.0197$	$0.6241 \pm 0.0246$
`percentile_out_auto`	spectral/KRR	$10.00$	$2.40$	$0.0417$	$0.3044 \pm 0.0156$	$0.3264 \pm 0.0196$	$0.6212 \pm 0.0239$

Table 6. Training-matrix criterion versus strict out-of-sample performance over 30 repetitions. The test block for matrix-only rules is obtained by reconstruction and does not use test labels.

Data Set	Combination	Inner CV Error	Reconstructed/Direct Test Error	Absolute Gap
Synthetic	`mean_auto`	$0.0630 \pm 0.0110$	$0.0793 \pm 0.0227$	$0.0163$
Synthetic	`percentile_inout_auto`	$0.0844 \pm 0.0226$	$0.1404 \pm 0.1198$	$0.0561$
Breast	`mean_auto`	$0.0241 \pm 0.0041$	$0.0311 \pm 0.0108$	$0.0069$
Breast	`percentile_inout_auto`	$0.0278 \pm 0.0055$	$0.0358 \pm 0.0124$	$0.0080$
Ionosphere	`mean_auto`	$0.0431 \pm 0.0066$	$0.0519 \pm 0.0171$	$0.0088$
Ionosphere	`percentile_in_auto`	$0.0463 \pm 0.0065$	$0.0487 \pm 0.0166$	$0.0024$
Telco	`mean_auto`	$0.2016 \pm 0.0090$	$0.2061 \pm 0.0154$	$0.0044$
Telco	`percentile_inout_auto`	$0.2893 \pm 0.0161$	$0.3069 \pm 0.0197$	$0.0177$

Table 7. Numerical diagnostics for selected direct and matrix-only rules. Vector-data values are averaged over 30 repetitions; Chickenpieces values are averaged over 10 repeated stratified splits.

m_{0}

and C are inner-CV selections averaged over repetitions.

Table 7. Numerical diagnostics for selected direct and matrix-only rules. Vector-data values are averaged over 30 repetitions; Chickenpieces values are averaged over 10 repeated stratified splits.

m_{0}

and C are inner-CV selections averaged over repetitions.

Data Set	Rule	Method	$m_{0}$	C	$Δ_{PSD}$	Inner CV Error	Test Error/Gap
Synthetic	`mean_auto`	direct	$6.17$	$8.77$	$0.0000$	$0.0630 \pm 0.0110$	$0.0793 \pm 0.0227$ / $0.0163$
Synthetic	`percentile_inout_auto`	spectral/KRR	$4.77$	$6.71$	$0.0309$	$0.0844 \pm 0.0226$	$0.1404 \pm 0.1198$ / $0.0561$
Breast	`mean_auto`	direct	$5.00$	$2.60$	$0.0000$	$0.0241 \pm 0.0041$	$0.0311 \pm 0.0108$ / $0.0069$
Breast	`percentile_inout_auto`	spectral/KRR	$8.20$	$3.49$	$0.0104$	$0.0278 \pm 0.0055$	$0.0358 \pm 0.0124$ / $0.0080$
Ionosphere	`mean_auto`	direct	$6.47$	$7.50$	$0.0000$	$0.0431 \pm 0.0066$	$0.0519 \pm 0.0171$ / $0.0088$
Ionosphere	`percentile_in_auto`	spectral/KRR	$7.23$	$4.38$	$0.0204$	$0.0463 \pm 0.0065$	$0.0487 \pm 0.0166$ / $0.0024$
Telco	`mean_auto`	direct	$7.50$	$2.40$	$0.0000$	$0.2016 \pm 0.0090$	$0.2061 \pm 0.0154$ / $0.0044$
Telco	`percentile_inout_auto`	spectral/KRR	$10.00$	$2.50$	$0.0161$	$0.2893 \pm 0.0161$	$0.3069 \pm 0.0197$ / $0.0177$
Chickenpieces	`mean_auto`	direct	$8.90$	$6.60$	$0.0000$	$0.0317 \pm 0.0046$	$0.0433 \pm 0.0144$ / $0.0115$
Chickenpieces	`pickout_auto`	spectral/KRR	$12.50$	$4.75$	$0.0169$	$0.0403 \pm 0.0085$	$0.3545 \pm 0.2666$ / $0.3141$

Table 8. Oracle reconstruction diagnostic on the controlled Synthetic experiment. The relative Frobenius error compares the reconstructed train–test block with the oracle block over 100 repetitions.

Rule	Relative Frobenius Error	Functional MSE	$Δ_{PSD}$	Interpretation
`sum`	$4.13 \times 10^{- 6} \pm 9.41 \times 10^{- 6}$	$2.20 \times 10^{- 10}$	$2.18 \times 10^{- 15}$	smooth analytic rule
`mean`	$4.13 \times 10^{- 6} \pm 9.41 \times 10^{- 6}$	$5.51 \times 10^{- 11}$	$2.18 \times 10^{- 15}$	smooth analytic rule
`pickout`	$0.4501 \pm 0.7299$	$0.4074$	$0.1078$	label-aware negative control

Table 9. Chickenpieces results over 10 repeated stratified splits. The dictionary contains 88 kernels obtained from 44 dissimilarity matrices by RBF and Laplacian transformations.

Method	Kernel/Combination	Error	Balanced Accuracy
best individual	`LAP::norm29_cost60`	$0.0425 \pm 0.0141$	$0.9528 \pm 0.0176$
direct combination	`mean_auto`	$0.0433 \pm 0.0144$	$0.9524 \pm 0.0188$
direct combination	`alignment_auto`	$0.0433 \pm 0.0144$	$0.9524 \pm 0.0188$
matrix-only	`pickout_auto`	$0.3545 \pm 0.2666$	$0.6425 \pm 0.2805$

Table 10. Leakage-free benchmarking perspective for the out-of-sample extension problem. The comparisons are those that are meaningful for matrix-only supervised combinations without using test labels.

Comparison Route	Applicability	Quantitative Diagnostic	Main Outcome
Direct analytic kernels	Known train–test formula	Table 3 and Table 6	Direct mean gaps $0.0044$ – $0.0163$ on vector data
Synthetic oracle block	Smooth analytic rule only	Table 8	Frobenius error $4.13 \times 10^{- 6} \pm 9.41 \times 10^{- 6}$ for sum/mean
Spectral/KRR continuation	Matrix-only rules without test labels	Table 6 and Table 7	Stable on Breast/Ionosphere; unstable on Synthetic and Chickenpieces pick-out
Nyström-type extension	Requires $K^{*} (z, x_{i})$ or landmarks from a known kernel	Conceptual baseline	Not leakage-free for pick-out or percentile rules without an oracle block
Fusion Kernel route	Uses eigenspaces of base kernels	Appendix C	Complementary route; systematic runtime benchmark left for future work

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Muñoz, A.; Torres, A.; Muñoz García, E. From Kernel Matrices to Kernel Functions: An Eigenfunction-Based Approach. Mathematics 2026, 14, 1971. https://doi.org/10.3390/math14111971

AMA Style

Muñoz A, Torres A, Muñoz García E. From Kernel Matrices to Kernel Functions: An Eigenfunction-Based Approach. Mathematics. 2026; 14(11):1971. https://doi.org/10.3390/math14111971

Chicago/Turabian Style

Muñoz, Alberto, Aida Torres, and Elvira Muñoz García. 2026. "From Kernel Matrices to Kernel Functions: An Eigenfunction-Based Approach" Mathematics 14, no. 11: 1971. https://doi.org/10.3390/math14111971

APA Style

Muñoz, A., Torres, A., & Muñoz García, E. (2026). From Kernel Matrices to Kernel Functions: An Eigenfunction-Based Approach. Mathematics, 14(11), 1971. https://doi.org/10.3390/math14111971

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

From Kernel Matrices to Kernel Functions: An Eigenfunction-Based Approach

Abstract

1. Introduction

Illustrative Example: From a Sample-Based Construction to an Out-of-Sample Function

2. Background and Problem Formulation

2.1. Positive-Definite Kernels, RKHSs, and Integral Operators

2.2. Mercer Expansions, Regularization, and Projection onto the RKHS

2.3. Finite-Sample Eigenfunction Approximation and Nyström-Type Ideas

2.4. Interpolation Viewpoint and Why Alternate Bases Are Needed

2.5. Data-Dependent Alternate Bases in the Empirical RKHS

2.6. Combination Rules and Positive-Semidefinite Rectification

2.6.1. Sum-Type Combinations

2.6.2. Pick-Out/Max–Min

2.6.3. Rectification

2.7. When Kernel Combinations Are Analytic and When They Are Only Sample Objects

2.8. Fusion Kernel as an Alternative Reconstruction Route

2.9. Problem Statement and Target Properties

3. Proposed Reconstruction Method

3.1. Spectral Decomposition of the Combined Matrix

3.2. Extending Empirical Coordinates

3.3. Finite-Feature Kernel and Validity

3.4. Out-of-Sample Basis Evaluation

3.5. Algorithmic Summary

3.6. Complexity, Conditioning, and Practical Remarks

4. Materials and Methods

4.1. Aim of the Experiments

4.2. Data Sets

4.3. Preprocessing

4.4. Kernel Dictionary for Vector Data

Automatic Selection Protocol

4.5. Combination Rules

4.6. Out-of-Sample Evaluation

5. Results

5.1. Vector Data Sets: 30-Repetition Validation

5.2. Telco Customer Churn

5.3. Training-Matrix Behavior Versus Out-of-Sample Behavior

5.4. Numerical Reconstruction Diagnostics

5.5. Oracle Reconstruction Diagnostic

5.6. Chickenpieces

5.7. Benchmarking Perspective Relative to Alternative Extension Routes

5.8. Additional Hybrid Score Experiment

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Additional Details on Alternate Bases

Appendix A.1. Lagrange Basis

Appendix A.2. Newton Basis and Power Function

Appendix A.3. SVD and Weighted-SVD Bases

Appendix A.4. Hermite-Type Bases

Appendix B. Coefficient Matrix in Basis Coordinates

Appendix C. Fusion Kernel Details and Comparison with the Basis Route

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI