Application of Non-Sparse Manifold Regularized Multiple Kernel Classifier

Yang, Tao

doi:10.3390/math13071050

Open AccessArticle

Application of Non-Sparse Manifold Regularized Multiple Kernel Classifier

by

Tao Yang

Beijing Engineering Research Center of Industrial Spectrum Imaging, School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China

Mathematics 2025, 13(7), 1050; https://doi.org/10.3390/math13071050

Submission received: 14 February 2025 / Revised: 14 March 2025 / Accepted: 21 March 2025 / Published: 24 March 2025

(This article belongs to the Special Issue Bioinformatics, Computational Theory and Intelligent Algorithms)

Download Versions Notes

Abstract

Non-sparse multiple kernel learning is efficient but not directly able to be applied in a semi-supervised scenario; therefore, we extend it to semi-supervised learning by using a manifold regularization. The manifold regularization is based on a graph constructed on all the data samples including the labeled and the unlabeled, and forces the regularized classifier smooth along the graph. In this study, we propose the manifold regularized p-norm multiple kernels model and provide the solutions with proofs. The risk bound is briefly introduced based on the local Rademacher complexity. Experiments on several datasets and comparisons with several methods show that the efficiency of the proposed model to be used in semi-supervised scenario.

Keywords:

multiple kernel learning; classification; manifold

MSC:

62H30

1. Introduction

Multiple kernel learning (MKL) has attracted many researchers as it is flexible and adaptable in different areas [1,2,3,4]; there are important works about algorithms designs and theoretical proofs [5,6,7,8]. Kloft et al. proposed the p-norm MKL to extend traditional 1-norm to p-norm, which could describe different properties of the same object in various complementary views [9]. However, p-norm MKL is used in only supervised situations. We adopt the manifold regularization (MR) technique [10] to extend p-norm MKL to semi-supervised leaning (SSL). The proposed model is based on manifold assumption in SSL, which states that if two samples, x1 and x2, are close, then so should be the corresponding label, y1 and y2. This characterization is captured by a weighted graph that estimates the local relations of each sample.

The advantage of non-sparse Multiple Kernel Learning (MKL) stems from its balanced allocation of weights across multiple kernels, preserving the contributions of all kernels. In semi-supervised scenarios, the geometric structure of unlabeled data is typically constrained through manifold regularization (MR), which better aligns with the regularization requirements for unlabeled data in SSL. The MR technique promotes the SSL in various directions and shows its superiority, such as semi-supervised classification [11], clustering [12], feature selection [13], image processing [14] and so on. Combing MKL with MR has been demonstrated to be effective and applicable in addressing machine learning problems characterized by complex nonlinear data conditions [15].

In contrast, sparse MKL may neglect critical features associated with manifold structures due to its excessive selection of a limited number of kernels [16]. For instance, when different kernels capture complementary local neighborhood information, non-sparse weighting preserves such information, thereby enabling more effective utilization of manifold regularization to constrain classifier smoothness. Furthermore, the flexibility of p-norm MKL (controlling sparsity through adjustment of the p-value) allows adaptation to diverse data distributions [17]. While existing studies have explored p-norm MKL in supervised learning [17,18], its potential in SSL remains underexplored. It motivates the proposal of this work.

This work presents the first semi-supervised classification framework integrating p-norm MKL with manifold regularization (MR). Compared with sparse MKL, our non-sparse weighting mechanism enhances the efficiency of manifold regularization in leveraging unlabeled data by preserving the global complementarity of multiple kernels. Specifically, manifold regularization constrains classifier smoothness based on the full-sample graph structure, while non-sparse MKL provides enriched feature representations through multi-kernel fusion. Their synergistic interaction improves generalization performance in semi-supervised settings. The experimental results demonstrate that this integration achieves superior performance compared to conventional sparse MKL approaches across multiple benchmark datasets.

In this study, we firstly compute the weighted graph on the whole data samples and obtain the Laplacian matrix that can describe the relationships of the samples. By using Laplacian matrix and manifold assumption, we add the MR to the p-norm multiple kernel classifier; further, we provide the proofs of solutions and the risk bound analysis. Experiments with other methods are conducted to show the efficiency of the proposed model.

The paper is organized as follows. In Section 2, the MR method is illustrated and p-norm multiple kernel classifier is reviewed. In Section 3, we provide the solutions and the corresponding proofs and give the procedure of algorithm and elementary risk bound analysis. In Section 4, we show the results of the experiments. Finally, conclusions of the paper are in Section 5.

2. Related Works

For notation consistency, assume that we are given ℓ labeled data samples {x_i,y_i}, i = 1,…, ℓ and u unlabeled data samples {x_i}, i = 1,…, u. In each sample, x_i ∈

𝒳

is the input space and y_i ∈ R is the label.

2.1. Manifold Regularization

The manifold regularization is based on a graph that is built on the whole data samples {x_i}, i = 1,…, n. We define the graph as G = (V,E), where V is the vertice set, each v_i ∈ V corresponds to the sample x_i and E is the edge set; each e_ij ∈ V links the adjacent vertices. Suppose x_i and x_j are adjacent; then there exist an edge e_ij between x_i and x_j. On each edge, a weight is assigned based on the degree of distance. First, to judge two samples that are adjacent is based on kNN method; that is to say if x_i and x_j are adjacent, then x_i is in the k-nearest neighbor of x_j. Next, to provide the proper weights on the edges is based on Gaussian function; suppose an edge e_ij links x_i and x_j; then the weight w_ij is defined in the following way:

w_{i j} = \exp (- {‖x_{i} - x_{j}‖}^{2} / σ^{2})

(1)

where σ is the width parameter of Gaussian function. From Equation (1), we know that the larger weight corresponds to the closer relationship of x_i and x_j. After all the samples are computed to construct the graph G, we obtain the weight matrix W.

The manifold regularization is a regularizer added to a classifier to make it smooth along the graph; in this way, the classifier can deal with those unlabeled data samples. Now, suppose a classifier is f, f:

𝒳

→R. We define the manifold regularization in the following way:

{‖f‖}_{I}^{2} = \frac{1}{2} {\sum_{i, j = 1}^{n} w_{i j} (f (x_{i}) - f (x_{j}))}^{2} = f^{T} L f

(2)

where L = D-W is called the Laplacian graph and D is a diagonal matrix of which the element d_ij = ∑ _iw_ij, and

f = {(f (x_{1}), \dots, f (x_{n}))}^{T}

. The Equation (2) can force the classifier f to keep the local relationship of samples; further, it complies with the manifold assumption in SSL.

2.2. P-Norm Multiple Kernel Classifier

In kernel learning methods, a function ϕ(x) is used to map the input space

𝒳

to feature space

ℋ

, where

ℋ

is called reproducing kernel Hilbert space (RKHS) associated with a kernel function k(x,x’) = <ϕ(x), ϕ(x’)>. The operator <,> is the inner product in

ℋ

. The RKHS has the following important properties:

f (\cdot) = 〈f (x), k (x, \cdot)〉, k (x, x^{'}) = 〈k (x, \cdot), k (x^{'}, \cdot)〉

(3)

In MKL, suppose there exist M mappings, then ϕ₁,…, ϕ_M correspond to M RKHS

ℋ

₁,…,

ℋ

_M. For each

ℋ

_m, a kernel function k_m is associated. A unified space is constructed by using

ℋ

= ⊕_m

ℋ

_m with weight d = {d₁,…, d_M}. According to [19], the space

ℋ

has the following kernel:

k (x, x^{'}) = \sum_{m = 1}^{M} d_{m} k_{m} (x, x^{'}),

(4)

To avoid an over-fitting problem, the weight d needs to be constrained. The most often used constraint is the 1-norm:

\sum_{m = 1}^{M} d_{m} = 1, d_{m} \geq 0,

(5)

As previous research works [20] show, 1-norm often leads to the sparsity of d, while Ref. [21] proposed that the sparsity could miss information especially for the low correlations among different kernels K = {k₁,…, k_M} or inadequate for the complementary data sources. Therefore, p-norm MKL suggests that the constraint on d adopt the following:

\begin{array}{l} {‖d‖}_{p}^{2} \leq 1 \\ d_{m} \geq 0, \forall m \end{array},

(6)

when p > 1, the solution is not sparse, which keeps the whole information brought in by multiple kernels.

In each

ℋ

_m, we define the classifier in the following way:

f_{m} = 〈ω_{m}, ϕ_{m} (x)〉,

(7)

It is natural to have the sum of classifiers in

ℋ

:

\begin{array}{l} f = \sum_{m = 1}^{M} f_{m} \\ f = 〈ω, ϕ (x)〉 + b \end{array},

(8)

where b is the additional bias to the final f. According to the works of Bach and Kloft, we have the following classifier of p-norm multiple kernels:

\begin{array}{l} F = \min_{d_{m}, ω_{m}, b, t} \frac{1}{2} \sum_{m = 1}^{M} \frac{{‖ω_{m}‖}_{2}^{2}}{d_{m}} + C \sum_{i = 1}^{l} v (t_{i}, y_{i}) \\ s . t . (\sum_{m = 1}^{M} 〈ω_{m}, ϕ_{m} (x_{i})〉 + b) = t_{i}, i = 1, \dots, l \\ {‖d‖}_{p}^{2} \leq 1, d_{m} \geq 0 \end{array},

(9)

where function v is a convex loss function. The duality optimization analysis is usually used to obtain the solution of problem (9). In this study, we focus on the extension of (9) to the semi-supervised classification by using the manifold regularization, and the detail proposal and theoretical analysis are in the following sections.

3. Proposed Classifier

3.1. Manifold Regularized P-Norm Multiple Kernel Classifier

Section 2.1 gives us a method to use manifold regularizer; therefore in space

ℋ

, we have the following additional manifold constraint:

\begin{array}{l} {‖f‖}_{I}^{2} = f^{T} L f \\ f = \sum_{m = 1}^{M} 〈ω_{m}, ϕ_{m} (x)〉 \end{array},

(10)

and

ϕ_{m} (x) = (ϕ_{m} (x_{1}), \dots, ϕ_{m} (x_{n})),

(11)

Now, as for the labeled and unlabeled data samples {x_i}, i = 1,…, n, we propose the following manifold regularized non-sparse multiple kernel classifier:

\begin{array}{l} F = \min_{d_{m}, ω_{m}, b, t} C \sum_{i = 1}^{l} v (t_{i}, y_{i}) + \frac{r_{A}}{2} \sum_{m = 1}^{M} \frac{{‖ω_{m}‖}_{2}^{2}}{d_{m}} + \frac{r_{I}}{2} f^{T} L f \\ s . t . (\sum_{m = 1}^{M} 〈ω_{m}, ϕ_{m} (x)〉 + b) = t_{i}, i \in \{1, \dots, l\} \\ {‖d‖}_{p}^{2} \leq 1, d_{m} \geq 0 \end{array},

(12)

where r_A and r_I are regularization parameters, and matrix L is the Laplacian matrix obtained by all the samples. The optimal

f^{*}

is just the solution of problem (12), and when it is obtained, the term

f^{T} L f

becomes constant. We specify the loss function v to be hinge loss, that is v(t,y) = max(0,1-ty). In this way, we provide two theorems and proofs.

Theorem 1.

The optimal

f^{*}

can be formed in the following:

f^{*} = \sum_{m = 1}^{M} 〈ω_{m}^{*}, ϕ_{m}〉 = {(r_{A} I + r_{I} L K)}^{- 1} K J^{T} Y α

(13)

, where

K = \sum_{m = 1}^{M} d_{m} K_{m}, K_{m, i j} = k_{m} (x_{i}, x_{j}),

(14)

and α ≥ 0 and I is the identity matrix and Y is a diagonal matrix with element Y_ii = y_i and J = [I 0] with the size ℓ × (ℓ + u).

Theorem 2.

The update of weight d is computed as follows:

d_{m} = \frac{{‖ω_{m}‖}_{2}^{2 / (p + 1)}}{{(\sum_{m = 1}^{M} {‖ω_{m}‖}_{2}^{2 p / (p + 1)})}^{1 / p}} \forall m,

(15)

With these two theorems, we can compute the optimal classifier

f^{*}

and use it to infer the labels of the whole input space. In the next section, the proofs are provided.

3.2. Proofs of the Theorems

As we have specified the loss function to be the hinge loss function, we can establish the following Lagrangian formula of (12):

\begin{array}{l} L = \frac{r_{A}}{2} \sum_{m = 1}^{M} \frac{{‖ω_{m}‖}_{2}^{2}}{d_{m}} + \frac{r_{I}}{2} f^{T} L f - \sum_{i = 1}^{l} δ_{i} ξ_{i} + \frac{λ}{2} ({‖d‖}_{p}^{2} - 1) + \\ \sum_{i = 1}^{l} α_{i} (1 - ξ_{i} - y_{i} (\sum_{m = 1}^{M} 〈ω_{m}, ϕ_{m} (x_{i})〉 + b)) + C \sum_{i = 1}^{l} ξ_{i} \end{array},

(16)

where α,δ and λ are Lagrangian multipliers. Here, we firstly assume the non-negative of d is always met.

Proof of Theorem 1.

Set the derivative of (16) with respect to ω_m to zero:

\frac{\partial L}{\partial ω_{m}} = r_{A} \frac{ω_{m}}{d_{m}} + r_{I} 〈ω_{m}, L K_{m}〉 - Y ϕ_{m}^{l} α = 0,

(17)

where

ϕ_{m}^{l} = (ϕ_{m} (x_{1}), \dots, ϕ_{m} (x_{l}))

. We multiply

ϕ_{m}^{n}

to (17), and by using K_m = <ϕ_m, ϕ_m^T > , the following formula is obtained:

r_{A} 〈ω_{m}, ϕ_{m}〉 + r_{I} 〈ω_{m}, ϕ_{m}〉 \cdot L d_{m} K_{m} - d_{m} K_{m} J^{T} Y α = 0,

(18)

By using f_m = <ω_m, ϕ_m> and the sum of m from 1 to M, we obtain

r_{A} f + r_{I} f \cdot L K - K J^{T} Y α = 0,

(19)

By Equation (19), we can obtain the solution of theoretical optimal f, and that is just the conclusion of Theorem 1. ☐

Proof of Theorem 2.

Set the derivative of (16) with respect to d_m to zero:

- \frac{r_{A}}{2} \frac{{‖ω_{m}‖}_{2}^{2}}{d_{m}^{2}} + λ d_{m}^{p - 1} {‖d‖}_{p}^{2 - p} = 0,

(20)

The constraint ∥d∥_p ≤ 1 is at the upper bound, that is, ∥d∥_p = 1 holds for an optimal d. Hence, the Equation (20) translates into the following condition:

d_{m} = δ {‖ω_{m}‖}_{2}^{2 / (p + 1)} \forall m,

(21)

where δ is irrelevant to ω_m. By complying with ∥d∥_p = 1, we can obtain that δ has the form:

δ = {(\sum_{m} {‖ω_{m}‖}_{2}^{2 p / (p + 1)})}^{- 1 / p},

(22)

Inserting the above equation to (21), we obtain the conclusion of Theorem 2.

We will summarize the procedure of the algorithm of the proposed manifold regularized p-norm multiple kernel classifier. ☐

3.3. Procedure of the Algorithm

We set the derivatives of (16) with respect to all the decision variables to zero, and take them to (12). Before that, we introduce the representer theorem [22], which is used in this mathematical transformation, i.e., any bounded f in RKHS that minimizes the regularized risk functional admits a representation of the form

f (\cdot) = \sum_{i} β_{i} K (\cdot, x),

(23)

By taking the regularizer ∥ω_m∥ to ∥f_m∥ and the representer theorem, we obtain the dual problem of (12):

\begin{array}{l} L = \max_{β, η} {\sum_{i = 1}^{l} α_{i} - ‖{(\frac{1}{2} α^{T} Y J K_{m} Q J^{T} Y α)}_{m = 1}^{M}‖}_{p *} \\ s . t . \sum_{i = 1}^{l} α_{i} y_{i} = 0, 0 \leq α \leq C 1 \\ β = {(γ_{A} I + γ_{I} L K)}^{- 1} J^{T} Y α \end{array},

(24)

where

Q = {(γ_{A} + γ_{I} L K)}^{- 1} a n d \frac{1}{p} + \frac{1}{p^{*}} = 1,

(25)

Thus, the Algorithm 1 is summarized as follows:

Algorithm 1. Algorithm of the proposed model

Step 1. Set M kernel functions {k₁,…, k_M}, initial d with each d_m = 1/M, initial kernel matrix K = ∑d_mk_m, initial norm parameter p, and other parameters C, r_A, r_I;
Step 2. Given labeled data {x_i,y_i},i =1,…, ℓ, and unlabeled data {x_i}, i = 1,…, u., use all the data samples to construct the adjacent graph G, and use the kNN method and Gaussian function to decide the edges of G and the weights on edges, i.e., W and the Laplacian graph L = D-W.
Step 3. Repeat.
Obtain the optimal α from (24) with a SVM solver.
Obtain the current optimal d according to Theorem 2, denoted as d^t.
Re-compute the kernel matrix K = ∑d_mk_m with the current d^t.
Update the problem formation of (24)
Until ||d^{t + 1}-d^t|| ≤ ε;
Step 4. Output the final optimal classifier

f^{*}

according to Theorem 1.

3.4. Risk Bound Analysis

The optimal classifier f is in space

ℋ

. We know that in MKL, f = ∑_m <ω_m,ϕ_m>; therefore, the class of functions of p-norm multiple kernels can be described as

H = \{f : x \mapsto 〈ω, ϕ (x)〉 | ω = (ω_{1}, \dots ω_{M}), {‖ω‖}_{2, p} \leq D\},

(26)

In MKL, global Rademacher complexity (GRC) is commonly used [23]. For data samples {x_i}, i = 1,…, n, drawn from the same distribution, the empirical GRC is defined as

R (H) = E_{σ} s u p_{f \in H} 〈ω, \frac{1}{n} \sum_{i} σ_{i} x_{i}〉,

(27)

where {σ_i} i = 1,…, n is an i.i.d family of Rademacher variables with random signs of equal possibility. In this paper, we can express (27) as

R_{n} (H) = \frac{1}{n} E_{σ} [s u p_{f \in H} \sum_{i = 1}^{n} σ_{i} f (x_{i})],

(28)

From (23), we can obtain

R_{n} (H) = \frac{1}{n} E_{σ} [s u p_{d, β} σ^{T} K β],

(29)

According to the conclusion in [24], the following is admitted:

P (y \neq f (x)) \leq P_{n} (y \neq f (x)) + \frac{R_{n} (H)}{2} + \sqrt{\frac{\ln (1 / δ)}{2 n}}

(30)

where P_n is the average output of an indicator function.

4. Experiments

The experimental datasets are in two parts: several UCI datasets [25] and Benchmark datasets [26]. We take the following methods as comparisons:

LapSVM [10]. An SVM with manifold regularization adopts kernel of Gaussian with parameter σ in [0.1, 0.3, 0.5]. The actual parameter was selected by five-fold cross validation;
Transductive SVM (TSVM) [27]. This model could let the classification hyper-plane traverse the low-density data area. It adopts the radial basis function kernel such as k(a,b) = exp(-r∥a-b∥²) with r = 0.1;
Low-density separation (LDS) [28]. It forms a kernel function by using the metrics of a graph and its kernel function is the same as in TSVM;
Harmonic function [29]. This method is actually a labelling scheme. This process is asked to be on a graph by keeping the similarities of all the data samples. The classifier has a harmonic property;
Support vector machine (SVM). It is implemented by LIBSVM [30] and adopts the radial basis function. The parameter of the kernel in SVM could be the mean distances of the whole data samples. Only the labeled data are used to learn SVM model in training stage;
Graph-structured MKL (GMKL) [31]. It proposed a MKL model that the pruned kernel combination is fulfilled by feedback collected from a graph, developing a novel scheme which actively chooses relevant kernels. Online, there are time-efficiency advantages.

4.1. Experimental Settings

In practice, hyper-parameter C = 100 is a common choice, and the regularization parameter r_A and r_I are both selected from [10⁻⁵,…, 10⁻²] by five-fold cross validation. The neighborhood of each node of a graph is computed by kNN method. For kNN method, the crucial parameter k is selected from the set [6,7,8] by five-fold cross-validation.

As for multiple kernels, during experiment tests for UCI datasets, we select two kinds of the candidate kernels: one comprises 10 Gaussian kernels with 10 different parameters which are chosen from the set [2⁻⁴, 2⁻³,…, 2⁵]; the other three polynomial kernels such as k(x,x’) = (1 + x∙x’)^d with different parameter d are chosen from the set [1,2,3]. During the tests for benchmark datasets, there are also two kinds of the candidate kernels: Gaussian kernels with parameters σ chosen from the set [0.01, 0.05, 0.1, 0.5, 1, 2, 5] and polynomial kernels with parameters d = [1,2,3]. All the kernel matrices are needed to be pre-processed to have a unit trace. The SVM solver refers to [30]. For norm parameter p, we select p = 1, 4/3, 2, 3, 4, 8, 10.

4.2. UCI Datasets

Among UCI data repositories, we take eight datasets. For conducting semi-supervised classification, this study dealt with each dataset of 100 randomly labeled samples by a 10-times choosing process and the rest of the samples being marked as unlabeled. By the data sampling processing, 10 groups of experimental data samples are determined, each of which has 100 labeled samples, with the rest unlabeled. The 13 kernels, which are chosen as kernel functions in methods, are asked to be computed on all variables and on each variable. Average accuracy rates are computed for the unlabeled samples, and the detailed results are shown in Table 1. The top accuracies are in bold.

In Table 1, the bold number means the highest accuracy result. From Table 1, the proposed manifold regularized p-norm MKL classifier is efficient and competent. We can observe that there are six out of eight results that best existed in the proposed method. As for german, liver, wdbc and vehicle datasets, non-sparse kernel combinations (p = 4/3, 2, 3, 8) likely preserved complementary geometric information captured by the graph Laplacian. Conversely, sparse MKL underperformed, suggesting that discarding kernel contributions degrades performance when manifold structures are critical.

4.3. Benchmark Datasets

The benchmark datasets we used are named digit1, g241c, g241n, COIL, BCI and USPS. The aim of building the benchmark datasets was to judge the effect and efficiency of the algorithms in a way that was as neutral as possible. There are 12 groups and each has 100 labeled samples and the rest unlabeled. The experimental tests were conducted on each dataset and the average accuracy rates of the unlabeled samples were reported as in Table 2.

The thirteen kernels, which are kernel functions in methods, are asked to be computed on all variables. As the number of samples is larger than that of the eight UCI datasets, this experiment can validate the application ability of the proposed model on medium or large datasets. The detailed results are in Table 2. The top accuracies are in bold.

In Table 2, the bold number means the highest accuracy result. From Table 2, we see that the top accuracy rates mostly appear in the proposed classifier, and that there are five results that best existed in the proposed method. Further, we need to point out that the benchmark datasets are preprocessed with adding noises by relative scholars, and that the noises would affect the geometrical structure of data, so as to degrade the efficiency of manifold regularization to some extent. This indicates that moderate non-sparsity balances kernel diversity and regularization, mitigating noise effects. This point also leads us to research more about the anti-noise robust model of the proposed classifier. In addition, the variability in optimal p-values across datasets underscores the importance of tuning p based on data characteristics. In the process of experiments, the author found a trend that higher p-value may stabilize performance in noisy settings. These observations align with the theoretical advantage of p-norm flexibility in adapting to diverse data distributions.

4.4. Impact of P-Norm Choice

The choice of p-norm significantly influences the experiments by controlling the sparsity of kernel weight allocation. Smaller p-values enforce sparsity, selecting only a subset of kernels, while larger p-values distribute weights more evenly across kernels. For instance, on the german dataset (Table 1), p = 8 achieves 74.3% accuracy, outperforming p = 1 (72.0%), suggesting that non-sparse combinations better leverage complementary kernel information. Conversely, in the heart dataset, p = 1 slightly outperforms higher p-values, indicating that sparse selection may suffice when a few kernels dominate.

The trade-off between sparsity and diversity is evident in noisy benchmarks like g241c dataset (Table 2), where p = 3 or 4 achieves an accuracy, outperforming both sparse and highly non-sparse configurations. This implies that moderate p-values balance noise resilience and kernel utilization. Overall, the experiments validate that p-norm adaptation is critical: low p-values risk discarding useful kernels, while excessively high p-values may dilute discriminative power.

5. Conclusions

In this study, we proposed a manifold regularized p-norm multiple kernel classifier to be used in a semi-supervised classification scenario. The proposed model adopted a convex framework, i.e., manifold regularization, to further extend the p-norm multiple kernel classifier to semi-supervised situation. By integrating the non-sparse p-norm multiple kernel learning framework with graph-based manifold regularization, our method bridges the gap between kernel-based supervised learning and semi-supervised paradigms. The convex formulation of manifold regularization leverages both labeled and unlabeled data through a spectral graph embedding, enforcing smoothness of the classifier over the intrinsic data geometry. In a word, we combined the flexibility and capability of multiple kernels with a graph-based regularization term, which is based on graph spectrum theory, developing the local relationships of all the data samples. Theoretical guarantees were established via local Rademacher complexity analysis, demonstrating robust generalization bounds under semi-supervised conditions. From experiments, we observed that the proposed model has good generalization and can deal with many kinds of datasets.

Through studying and testing, the paper finds some limitations of the proposed method to guide future directions. The graph Laplacian construction incurs a relatively high computational complexity, limiting scalability for ultra-large datasets. While approximation techniques such as sampling could mitigate this limitation. The best performance relies on tuning and selecting the p-norm exponent, kernel combination weights and regularization coefficients. Automated hyperparameter optimization remains an open challenge for semi-supervised MKL. Future extensions could incorporate stochastic graph approximations for scalability, develop adaptive p-norm scheduling strategies or integrate contrastive learning principles to enhance low-label robustness.

Funding

This research was funded by National Key R&D Program of China, grant number 2024YFF0728900.

Data Availability Statement

The data presented in this study are available in in UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, CA, 2013, reference number 25. These data were derived from the following resources available in the public domain: [https://archive.ics.uci.edu/ accessed on 14 February 2025].

Conflicts of Interest

The author declares no conflicts of interest.

References

Mahdi, A.A.; Omar, C.C. Learning with Multiple Kernels. IEEE Access 2024, 12, 56973–56980. [Google Scholar]
Yan, W.; Li, Y.; Yang, M. Towards deeper match for multi-view oriented multiple kernel learning. Pattern Recognit. 2023, 134, 109119. [Google Scholar]
Yuan, L.; Mei, W. Multiple Kernel Learning for Learner Classification. In Proceedings of the International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China, 22–24 December 2023; pp. 113–118. [Google Scholar]
Rakotomamonjy, A.; Bach, F.; Canu, S.; Grandvalet, Y. SimpleMKL. J. Mach. Learn. Res. 2008, 9, 2491–2521. [Google Scholar]
Liu, X.; Wang, L.; Huang, G.B. Multiple kernel extreme learning machine. Neurocomputing 2015, 149, 253–264. [Google Scholar]
Aiolli, F.; Donini, M. EasyMKL: A scalable multiple kernel learning algorithm. Neurocomputing 2015, 169, 215–224. [Google Scholar]
Rebai, I.; Benayed, Y.; Mahdi, W. Deep multilayer multiple kernel learning. Neural Comput. Appl. 2016, 27, 2305–2314. [Google Scholar]
Zhao, S.; Ding, Y.; Liu, X.; Su, X. HKAM-MKM: A hybrid kernel alignment maximization-based multiple kernel model for identifying DNA-binding proteins. Comput. Biol. Med. 2022, 145, 105395. [Google Scholar]
Rastogi, A. Nonlinear Tikhonov regularization in Hilbert scales for inverse learning. J. Complex. 2024, 82, 101824. [Google Scholar]
Belkin, M.; Niyogi, P.; Sindhwani, V. Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples. J. Mach. Learn. Res. 2006, 7, 2399–2434. [Google Scholar]
Liang, X.; Yu, Q.; Zhang, K.; Zeng, P.; Jian, L. LapRamp: A noise resistant classification algorithm based on manifold regularization. Appl. Intell. 2023, 53, 23797–23811. [Google Scholar]
Xing, Z.; Peng, J.; He, X.; Tian, M. Semi-supervised sparse subspace clustering with manifold regularization. Appl. Intell. 2024, 54, 6836–6845. [Google Scholar]
He, D.; Sun, S.; Xie, L. Multi-target feature selection with subspace learning and manifold regularization. Neurocomputing 2024, 582, 127533. [Google Scholar]
Ma, F.; Huo, S.; Liu, S.; Yang, F. Multimode Low-Rank Relaxation and Manifold Regularization for Hyperspectral Image Super-Resolution. IEEE Trans. Instrum. Meas. 2024, 73, 5019614. [Google Scholar]
Niu, G.; Ma, Z.; Lv, S. Ensemble Multiple-Kernel Based Manifold Regularization. Neural Process. Lett. 2017, 45, 539–552. [Google Scholar]
Gu, Y.; Wang, Q.; Xie, B. Multiple Kernel Sparse Representation for Airborne LiDAR Data Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 1085–1105. [Google Scholar]
Qi, J.; Liang, X.; Xu, R. A Multiple Kernel Learning Model Based on p-Norm. Comput. Intell. Neurosci. 2018, 2018, 1018789. [Google Scholar]
Shervin, R.A. One-Class Classification Using p-Norm Multiple Kernel Fisher Null Approach. IEEE Trans. Image Process. 2023, 32, 1843–1856. [Google Scholar]
Liu, X.; Wang, L.; Zhu, X.; Li, M.; Zhu, E.; Liu, T.; Liu, L.; Dou, Y.; Yin, J. Absent Multiple Kernel Learning Algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 1303–1316. [Google Scholar]
Liu, Z.; Huang, S.; Jin, W.; Mu, Y. Local kernels based graph learning for multiple kernel clustering. Pattern Recognit. 2024, 150, 110300. [Google Scholar]
Gao, Y.L.; Yin, M.M.; Liu, J.X.; Shang, J.; Zheng, C.H. MKL-LP: Predicting Disease-Associated Microbes with Multiple-Similarity Kernel Learning-Based Label Propagation. In Proceedings of the International Symposium on Bioinformatics Research and Applications, Shenzh, China, 26–28 November 2021; Springer: Cham, Switzerland, 2021; pp. 3–10. [Google Scholar]
Wang, R.; Xu, Y.; Yan, M. Sparse Representer Theorems for Learning in Reproducing Kernel Banach Spaces. J. Mach. Learn. Res. 2024, 25, 1–45. [Google Scholar]
Kloft, M.; Blanchard, G. The local rademacher complexity of lp-norm multiple kernel learning. Adv. Neural Inf. Process. Syst. 2011, 12, 2465–2502. [Google Scholar]
Wang, Z.; Chen, D.; Che, X. Multi-kernel learning for multi-label classification with local Rademacher complexity. Inf. Sci. 2023, 647, 119462. [Google Scholar]
Bache, K.; Lichman, M. UCI Machine Learning Repository; School of Information and Computer Science, University of California: Irvine, CA, USA, 2013. [Google Scholar]
Chapelle, O.; Schölkopf, B.; Zien, A. Semi-Supervised Learning; MIT: Cambridge, MA, USA, 2006. [Google Scholar]
Michalis, P.; Andreas, A. Least Squares Minimum Class Variance Support Vector Machines. Computing 2024, 13, 34. [Google Scholar]
Vasilii, F.; Malik, T.; Aladin, V. Random Matrix Analysis to Balance between Supervised and Unsupervised Learning under the Low Density Separation Assumption. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 10008–10033. [Google Scholar]
Celso, A.; Sousa, R. Kernelized Constrained Gaussian Fields and Harmonic Functions for Semi-supervised Learning. In Proceedings of the International Joint Conference on Neural Networks, Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
Tao, Z.; Li, Y.; Teng, Z.; Zhao, Y. A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD. Comput. Math. Methods Med. 2020, 2020, 8926750. [Google Scholar]
Ghari, P.M.; Shen, Y. Online multi-kernel learning with graph-structured feedback. In Proceedings of the International Conference on Machine Learning, Virtual, 12–18 July 2020; pp. 3474–3483. [Google Scholar]

Table 1. The eight UCI datasets tests on average accuracy rates (%) with 100 labeled samples.

	Heart	German	Iono	Vote	Pima	Liver	Wdbc	Vehicle
p = 1	82.1	72.0	90.5	94.9	74.8	65.3	94.0	85.4
p = 4/3	78.6	72.0	90.1	91.2	74.3	66.5	91.7	90.9
p = 2	80.8	71.7	92.0	92.4	74.2	66.4	94.9	90.2
p = 3	81.3	72.3	92.4	93.1	74.7	68.8	94.2	90.1
p = 4	81.7	73.7	91.8	92.4	75.0	66.1	92.1	87.8
p = 8	81.8	74.3	92.8	92.8	75.5	67.5	94.1	89.1
p = 10	82.0	72.9	91.2	91.2	76.1	66.2	94.2	89.0
LapSVM	81.8	71.8	84.6	91.3	80.0	68.6	85.2	82.2
TSVM	81.3	68.7	82.9	93.1	73.8	65.3	92.4	83.6
harmonic	78.8	68.1	85.4	92.1	71.5	63.9	93.9	83.2
LDS	81.4	66.3	89.1	90.1	70.0	64.7	94.4	81.7
SVM	81.4	71.2	93.0	92.6	73.1	64.7	91.7	81.2
GMKL	81.9	73.8	92.9	93.7	76.6	67.5	94.3	90.1

Table 2. Benchmark dataset tests on average accuracy rates (%) with 100 labeled samples.

	Digit	USPS	COIL	g241c	g241n	BCI
p = 1	94.9	91.8	88.1	77.8	74.1	60.4
p = 4/3	93.5	87.9	85.9	79.6	74.4	60.0
p = 2	94.4	86.1	85.0	79.7	74.1	58.7
p = 3	94.6	85.8	85.1	79.8	74.4	60.3
p = 4	94.3	85.7	85.1	79.8	74.5	60.3
p = 8	94.4	85.4	85.2	78.6	74.4	62.6
p = 10	94.5	85.5	85.1	77.5	74.3	60.7
LapSVM	94.8	82.1	86.2	55.4	57.6	60.7
TSVM	92.3	86.5	80.5	79.1	74.1	60.5
harmonic	92.9	91.4	82.1	57.8	58.9	53.1
LDS	93.3	75.0	68.7	71.1	75.8	58.2
SVM	92.9	85.5	79.4	74.2	72.4	56.4
GMKL	94.6	90.1	86.2	78.4	75.6	60.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, T. Application of Non-Sparse Manifold Regularized Multiple Kernel Classifier. Mathematics 2025, 13, 1050. https://doi.org/10.3390/math13071050

AMA Style

Yang T. Application of Non-Sparse Manifold Regularized Multiple Kernel Classifier. Mathematics. 2025; 13(7):1050. https://doi.org/10.3390/math13071050

Chicago/Turabian Style

Yang, Tao. 2025. "Application of Non-Sparse Manifold Regularized Multiple Kernel Classifier" Mathematics 13, no. 7: 1050. https://doi.org/10.3390/math13071050

APA Style

Yang, T. (2025). Application of Non-Sparse Manifold Regularized Multiple Kernel Classifier. Mathematics, 13(7), 1050. https://doi.org/10.3390/math13071050

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Application of Non-Sparse Manifold Regularized Multiple Kernel Classifier

Abstract

1. Introduction

2. Related Works

2.1. Manifold Regularization

2.2. P-Norm Multiple Kernel Classifier

3. Proposed Classifier

3.1. Manifold Regularized P-Norm Multiple Kernel Classifier

3.2. Proofs of the Theorems

3.3. Procedure of the Algorithm

3.4. Risk Bound Analysis

4. Experiments

4.1. Experimental Settings

4.2. UCI Datasets

4.3. Benchmark Datasets

4.4. Impact of P-Norm Choice

5. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI