Next Article in Journal
Design of a Universal Safety Control Computer for Aerostats
Previous Article in Journal
Transforming Pediatric Healthcare with Generative AI: A Hybrid CNN Approach for Pneumonia Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Efficient Multiple Empirical Kernel Learning Algorithm with Data Distribution Estimation

1
School of Computer and Software Engineering, Xihua University, Chengdu 610097, China
2
School of Computer and Software Engineering, Civil Aviation Flight University of China, Deyang 618311, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(9), 1879; https://doi.org/10.3390/electronics14091879
Submission received: 25 January 2025 / Revised: 7 April 2025 / Accepted: 9 April 2025 / Published: 5 May 2025
(This article belongs to the Section Artificial Intelligence)

Abstract

:
The Multiple Random Empirical Kernel Learning Machine (MREKLM) typically generates multiple empirical feature spaces by selecting a limited group of samples, which helps reduce training duration. However, MREKLM does not incorporate data distribution information during the projection process, leading to inconsistent performance and issues with reproducibility. To address this limitation, we introduce a within-class scatter matrix that leverages the distribution of samples, resulting in the development of the Fast Multiple Empirical Kernel Learning Incorporating Data Distribution Information (FMEKL-DDI). This approach enables the algorithm to incorporate sample distribution data during projection, improving the decision boundary and enhancing classification accuracy. To further minimize sample selection time, we employ a border point selection technique utilizing locality-sensitive hashing (BPLSH), which helps in efficiently picking samples for feature space development. The experimental results from various datasets demonstrate that FMEKL-DDI significantly improves classification accuracy while reducing training duration, thereby providing a more efficient approach with strong generalization performance.

1. Introduction

The ongoing progression of computer technology has rendered machine learning increasingly pivotal throughout diverse research domains, including data analysis [1,2], data mining [3,4], and pattern recognition [5,6]. Kernel methods have become a significant research focus in the field of machine learning [7]. Recently, the concepts of implicit kernel mapping (IKM) and empirical kernel mapping (EKM) have been introduced consecutively. IKM implicitly transforms samples into feature space via inner product representation; nevertheless, its dependence on inner products constrains the applicability of certain approaches and may diminish separability due to inadequate feature selection [8]. Consequently, EKM, as a direct mapping technique, facilitates the direct mapping of samples to the empirical feature space, thereby streamlining the kernelization process and circumventing intricate inner product computations.
While choosing an appropriate kernel function is essential for addressing certain issues, researchers have identified this task as exceedingly difficult. Various methods for kernel selection have been proposed to overcome this issue, including support vector machine parameter selection based on inter-class distances [9], grid search [10], and evolutionary algorithms [11], all intended to optimize kernel parameters. Nonetheless, the constraints of a singular kernel diminish its efficacy in addressing intricate issues, resulting in the development of multiple kernel learning (MKL) [12], which facilitates the optimization of kernel weights during training by amalgamating various candidate kernels, thereby improving flexibility and performance.
In the study of MKL, many algorithms have been introduced to tackle the selection of kernel weights, including approaches that formulate convex quadratic constrained quadratic programming (QCQP) [13] and semidefinite programming (SDP) [14]. Moreover, Multiple Empirical Kernel Learning (MEKL) [15] integrates the benefits of EKM and can reconstruct the empirical feature space, thereby enhancing the applications of kernel methods. Significant contributions, including collaborative and geometric multi-kernel learning (CGMKL) [16] and Multiple Partial Empirical Kernel Learning with Instance Weighting and Boundary Fitting (IBMPEKL) [17] introduced by Zhu et al., exhibit efficacy in multi-class classification challenges. On the other hand, the authors of [18] introduce SA-nODE, a novel supervised classification method that utilizes ordinary differential equations with predefined stable attractors, to guide system dynamics toward specific points corresponding to input categories.
Despite its effectiveness, MEKL suffers from high computational demands due to the need to construct multiple empirical feature spaces, limiting its scalability to large datasets. To address this, researchers have explored ways to reduce training time. For instance, Fan [19] proposed the Multiple Random Empirical Kernel Learning Machine (MREKLM), which applies random projection techniques [20] to reduce computational cost by building empirical kernels using only a randomly selected subset of the data. However, MREKLM does not consider the distribution of samples, which can lead to suboptimal feature representations and decreased accuracy. To overcome these limitations, this study introduces Fast Multiple Empirical Kernel Learning Incorporating Data Distribution Information (FMEKL-DDI), which enhances empirical kernel construction by incorporating within-class distribution data. Additionally, to further reduce the sample selection time, FMEKL-DDI employs a boundary-aware sample selection strategy using the BPLSH algorithm [21], allowing the efficient identification of informative samples near decision boundaries. Together, these improvements result in faster training and more discriminative feature spaces, making FMEKL-DDI suitable for large-scale learning tasks.
To simplify the distinctions among the related models, Multiple Empirical Kernel Learning (MEKL) builds several feature spaces from the data to improve classification, but it is computationally intensive. Multiple Random Empirical Kernel Learning Machine (MREKLM) improves speeds by randomly selecting a small set of samples to create those spaces, although it ignores the distribution of data, which can hurt accuracy. Our proposed method, Fast Multiple Empirical Kernel Learning with Data Distribution Information (FMEKL-DDI), addresses this by selecting samples more intelligently—using the BPLSH algorithm—and by considering how data points are spread within each class. This allows it to strike a better balance between accuracy and efficiency.
The main contributions of this work are summarized as follows:
  • We propose FMEKL-DDI, a novel empirical kernel learning method that integrates within-class distribution information through the use of the within-class scatter matrix, enhancing the quality of the empirical feature space.
  • We introduce an efficient boundary-preserving sample selection mechanism based on the BPLSH algorithm, which effectively identifies informative training samples while reducing redundancy and computational overhead.
  • Extensive experiments on benchmark and real-world datasets demonstrate that MEKL-DDI achieves superior accuracy and significantly lower training time compared to several state-of-the-art kernel learning approaches.
The remainder of this paper is organized as follows. Section 2 provides a review of related work, including implicit kernel mapping and the MREKLM algorithm. Section 3 presents the proposed FMEKL-DDI algorithm, detailing the incorporation of data distribution information and the construction of the classifier. Section 4 discusses the experimental setup; evaluates the performance of the proposed method through various experiments; and includes comparisons, ablation studies, and real-world dataset validations. Finally, Section 5 concludes the paper and outlines future directions for this research.

2. Related Work and Preliminaries

Kernel learning has emerged as a fundamental technique in machine learning, enabling algorithms to handle non-linear data by implicitly mapping inputs into high-dimensional feature spaces through kernel functions. Traditional methods such as Support Vector Machines and Kernel Principal Component Analysis leverage fixed kernels, but their performance is highly sensitive to kernel choice [22]. To address this, multiple kernel learning frameworks were developed, combining several base kernels to improve flexibility and adaptability to complex data structures. Over time, variants such as nonlinear MKL, data-dependent MKL, and empirical kernel learning have evolved to enhance expressiveness, efficiency, and generalization [23]. Recent advances have focused on improving computational scalability and incorporating data distribution awareness, laying the groundwork for more robust and adaptive kernel-based models [24,25].

2.1. Implicit Kernell Mapping

This section introduces implicit kernels, expressed through the inner product form of a mapping function, in contrast to the direct computational expression of empirical kernels. Implicit kernels are primarily designed for nonlinear classifiers, in contrast to linear classifiers.
Let { ( x i , y i ) } i = 1 N denote the training dataset, where x i R D is the i-th input sample and y i { 1 , , C } is its corresponding class label. Let P denote the number of selected samples via the BPLSH algorithm for constructing the empirical feature space. The kernel function is denoted by k ( x i , x j ) , and the kernel matrix is K R P × P with entries K i j = k ( x i , x j ) . The empirical kernel mapping is represented as φ ( x ) : R D R r , where r is the reduced dimensionality after eigen-decomposition. The within-class scatter matrix is defined as S w = c = 1 C x i X c ( x i μ c ) ( x i μ c ) T , where μ c is the mean of class c and X c is the set of samples in class c. The weight matrix W R P × P has entries W i j = 1 N c if x i and x j belong to the same class c; otherwise, it has 0. The diagonal matrix D has entries D i i = j = 1 P W i j . The classifier is a linear combination of mapped kernels f ( x ) = Γ ν e ( x ) , where ν e ( x ) = [ φ 1 ( x ) , , φ M ( x ) , 1 ] T is the augmented empirical feature vector, and  Γ is the corresponding weight vector learned using the least-squares method. We summarize all frequently used notations in Table 1.
Figure 1a illustrates a scenario in which two-dimensional linearly separable samples can be effectively divided using a straight line. Consider a labeled dataset { ( x i , y i ) i = 1 , , N } , where each label y i { 1 , + 1 } indicates the class of the corresponding sample x i . Within linear classification, a sample can be expressed as x i = ( X i , Y i ) , where X i and Y i represent the horizontal and vertical coordinates, respectively. A linear decision boundary defined by the equation w T x + b = 0 can separate the two classes. The associated decision function f ( x ) = w T x + b assigns a class based on the sign of f ( x ) : if f ( x ) > 0 , the sample belongs to class + 1 ; if f ( x ) < 0 , it is assigned to class 1 ; if f ( x ) = 0 , it lies on the decision boundary.
However, real-world data are often non-linearly separable, as shown in Figure 1b, making linear classifiers insufficient. To address this, kernel-based methods have been introduced, which use implicit mappings to transform the data into higher-dimensional spaces where linear separation becomes feasible. These methods leverage the inner product form of kernel functions. For instance, mapping a two-dimensional point ( X , Y ) to a three-dimensional space can be carried out as ( X , Y ) ( X 2 , 2 X Y , Y 2 ) . This transformation can be defined by a mapping function ϕ . The inner product in the transformed space can be computed as follows:
< ϕ X , Y , ϕ X , Y > = < X 1 , Y 1 , Z 1 , X 1 , Y 1 , Z 1 > = < X 2 , 2 X Y , Y 2 , X 2 , 2 X Y , Y 2 > = X 2 X 2 + 2 X Y X Y + Y 2 Y 2 = X X + Y Y 2 = ( < X N , Y N > ) 2 = K ( X N , Y N )
Here, K is the kernel function, which defines an inner product in the higher-dimensional space without requiring the explicit computation of mapping ϕ —hence the term “implicit kernel”. This approach enables algorithms to operate in a transformed space where non-linear relationships can be addressed with linear classifiers, significantly boosting classification performance.
Nonetheless, directly applying kernel methods to certain linear discriminant analysis techniques, such as Kernel Direct Discriminant Analysis (KDDA) [26], poses challenges. Methods like Orthogonal Linear Discriminant Analysis (OLDA) [27] and Uncorrelated Linear Discriminant Analysis (ULDA) [28] rely on singular value decomposition, which complicates their kernelization. As a result, implicit kernel approaches are often employed to overcome these limitations in extending linear models to non-linear scenarios.

2.2. Multiple Random Empirical Learning Machine (MREKLM)

Fan [19] proposed the Multiple Random Empirical Kernel Learning Machine (MREKLM), which is a classic algorithm within the domain of empirical kernels. This section utilizes MREKLM as an illustrative example to introduce empirical kernels. MREKLM constructs the empirical feature space by randomly selecting a small subset of samples from the training data, and the model is presented below.
Assume the training samples ( x i , y i ) | i = 1 , , N , where x i R D and y i 1 , 1 . Randomly select M subsets, each of size P (where P N ), that are denoted as S u b 1 , S u b 2 , , S u b M for random projection [20]. The idea is to avoid using the entire dataset to build feature spaces. Instead, smaller random subsets are used to speed up computation while still preserving meaningful structure.
For each random subset S u b m , the corresponding random empirical kernel mapping is constructed as follows:
First, compute the kernel matrix K m = ( ker m ) i , j i , j = 1 P , where the kernel function is defined as ( ker m ) i , j = ϕ m ( x m i ) ϕ m ( x m j ) with x m i , x m j S u b m . This matrix captures the pairwise similarity between selected samples in the subset S u b m .
The kernel matrix K m is a positive semidefinite matrix, which can be decomposed using the following:
K m = Q ˜ m Λ ˜ m Q ˜ m
Here, Λ ˜ m is a P × P diagonal matrix containing the eigenvalues of K m , and  Q ˜ m is a P × P matrix for which its columns are the corresponding eigenvectors. This decomposition allows us to understand the underlying structure of the kernel space and reduce dimensionality while preserving variance.
The random empirical kernel mapping function ϕ ˜ m e is defined as follows:
ϕ ˜ m e ( x ) = Λ ˜ m 1 / 2 Q ˜ m T ker 1 ( x , x m 1 ) , ker 1 ( x , x m 2 ) , , ker 1 ( x , x m P ) T
This mapping projects a sample x into a new empirical feature space, where its coordinates are based on similarities to the reference samples in S u b m .
It is important to note that when the rank of K m is r m , it has ( P r m ) zero eigenvalues. Additionally, the eigenvector matrix Q ˜ m satisfies the following:
Q ˜ m Q ˜ m T = Q ˜ m T Q ˜ m = I P × P
The eigenvalue matrix Λ ˜ m can be expressed as follows:
Λ ˜ m = λ m 1 λ m r m 0 R P × P
Here, λ m 1 , , λ m r m are the positive eigenvalues of K m , and the zero entries correspond to redundant or non-informative directions. We discard the zero components to reduce noise and computation, keeping only the meaningful directions in the space.
After removing the zero values, we obtain an r m -dimensional empirical feature vector. The reduced empirical kernel is then denoted by ϕ m e , and the mapping is as follows:
ϕ m e ( x ) = Λ m 1 / 2 Q m T ker 1 ( x , x m 1 ) , ker 1 ( x , x m 2 ) , , ker 1 ( x , x m P ) T
Here, Λ m R r m × r m contains positive eigenvalues, and  Q m is a P × r m matrix of the corresponding eigenvectors.
In essence, ϕ m e ( x ) provides a compact, meaningful representation of x in the empirical feature space built from subset S u b m .
After constructing the M empirical kernels, all training samples are mapped into each of the M empirical feature spaces. The final transformed dataset is represented as ϕ 1 e ( x i ) , , ϕ M e ( x i ) i = 1 N . This gives us M different views of the data, each built from a random subset, which collectively capture diverse structural aspects of the dataset. These mapped features can then be used for efficient and robust classification.

3. Fast Multiple Empirical Kernel Learning Incorporating Data Distribution Information (FMEKL-DDI)

3.1. Model

MREKLM shows that creating an empirical feature space with a limited sample subset can significantly decrease training duration while maintaining training accuracy. Nonetheless, it demonstrates significant unpredictability and neglects to consider sample distribution data. This section introduces a technique called Fast Multiple Empirical Kernel Learning Incorporating Data Distribution Information (FMEKL-DDI) to tackle this issue. FMEKL-DDI incorporates the within-class scatter matrix to assimilate sample distribution data. In contrast to the random projection method employed by MREKLM, FMEKL-DDI utilizes a location-sensitive hashing algorithm (BPLSH) to pick a subset of samples for the construction of the empirical feature space. We summarize the general pipeline of our proposal in Figure 2 and Algorithm 1.
BPLSH selects a sample as the reference instance and computes the similarity index of the remaining samples [21]. If the neighboring samples of the selected instance belong to the same class and exhibit a high similarity index, those neighboring samples are discarded. If the neighboring samples belong to different classes but exhibit a high similarity index and are proximate in spatial distance, the neighboring samples are retained. If the neighboring samples of the selected instance are from different classes, exhibit a high similarity index, and are widely spaced, then the neighboring samples of the same class are discarded while preserving those from different classes. Samples with a low similarity index are not subjected to further processing. This process is iterated until all samples have been evaluated. BPLSH maintains the instance boundaries while effectively reducing the dataset in this manner.
Initially, BPLSH was designed to reduce the number of samples used for training Support Vector Machine (SVM) classifiers [29] to minimize training time. However, in the context of empirical kernels, it is employed to select samples for constructing the projection space. Following the application of the BPLSH algorithm to select samples for constructing the empirical feature space, the empirical kernel is reconstructed as follows:
φ e x i = X φ T S w X φ 1 / 2 [ k ( x i , x 1 ) k ( x i , x P ) ]
This equation transforms a sample x i into the empirical kernel space while also normalizing the projection using class distribution information encoded in S w .
In this context, let X be defined as the set X φ = ϕ ( x 1 ) ϕ ( x P ) . The samples designated as P are chosen through the BPLSH algorithm, and  S w represents the within-class scatter matrix for these samples. S w represents the distribution of samples within each class, and the prior expression can be understood as maintaining this distributional information during projection into the empirical kernel space. The computation of S w is as follows:
S w = i N S i
Here, S i denotes the covariance matrix, with i representing the i-th class, and S i is defined as follows:
S i = x X i ( x m i ) ( x m i ) T
This matrix S i describes how data in class i are spread around its mean m i . By summing across all classes, S w captures the overall within-class variation.
m i denotes the mean vector of the i-th class, where m i = 1 N ( s ) i N ( s ) ϕ ( x i ) . In Equation (7), X represents a matrix that includes the newly computed dimensional vectors for each sample, and it is derived from the kernel mapping function. For computational efficiency, S w may be reformulated as follows:
S w = X φ ( D W ) X φ T
This is a matrix reformulation of S w using graph-based structures where W encodes intra-class similarities.
D is a diagonal matrix defined by D i , i = j = 1 N W i , j , where W represents a weight matrix. If  y i = y j = s , where s denotes the s-th class, then W i j = 1 N ( s ) ; otherwise, W i j = 0 , and an alternative form of the empirical kernel presented in this paper can then be obtained as follows:
φ e x i = K L K 1 / 2 [ k ( x i , x 1 ) k ( x i , x P ) ]
This expression simplifies the computation while retaining the class distribution structure via matrix L = D W , often called the graph Laplacian.

3.2. Classifier

This section introduces a classifier developed from the empirical kernel of FMEKL-DDI, following the formulation outlined in Equation (11). The initial step involves a training dataset represented as ( x i , y i ) | i = 1 , , N , where y i 1 , + 1 . The collection φ 1 e ( x i ) , , φ M e ( x i ) i = 1 N represents the transformed samples within M empirical feature spaces produced by the empirical kernel. Each φ l e ( x i ) gives a different view of the same data point in its corresponding feature space, enriching the feature representation.
Upon the completion of the construction of these M feature spaces, FMEKL-DDI systematically maps P training samples into their kernel spaces. The final classifier is defined as a weighted combination of the empirical kernel feature spaces, and it is represented mathematically as follows:
f x = l = 1 M λ l Γ l T φ l e x + Γ 0 T = λ 1 Γ 1 λ M Γ M T φ 1 e x φ M e x + Γ 0 T = λ 1 Γ 1 λ M Γ M Γ 0 T φ 1 e x φ M e x 1 = Γ T ν e x
This form merges all mapped features into one augmented vector ν e ( x ) and performs a linear classification using learned weight vector Γ .
In Equation (12), λ l indicates the weight coefficient for the l-th empirical feature space, Γ l represents the corresponding weight vector, and  Γ 0 denotes the bias term. The augmented empirical feature vector for x is represented as ν e x = φ 1 e x , , φ M e x , 1 T , and  Γ signifies the augmented weight vector. The classifier generates probability outputs for the sample x across multiple classes. If  max f ( x ) = f i ( x ) , this indicates that the sample is classified as belonging to class i. To ascertain Γ , the minimum norm least-squares method is employed [30]. The classification information for the training samples is presented as follows:
t k ( x i ) = 1 , if y i = k 0 , otherwise
This sets up a one-hot encoding for the true class labels, turning the classification problem into a regression task.
In this context, t x i = [ t 1 ( x i ) t k ( x i ) ] T , with k denoting the total number of classes. The computation of the augmented weight vector Γ is performed using the following method:
Γ * = arg min Γ i = 1 N Γ T ν x i t x i 2 = arg min Γ Γ T ν T 2
We find Γ by minimizing prediction errors over all training samples using least squares. This yields a simple and fast solution.
In this context, ν = [ ν e ( x 1 ) , , ν e ( x N ) ] and T = [ t ( x 1 ) , , t ( x N ) ] . Upon differentiating the previously mentioned expression with respect to Γ , the following results are obtained:
Γ * = ( ν T ) 1 T T
This gives a closed-form solution to compute the optimal weights Γ that define the final classifier.
The integration of empirically mapped samples into the classifier facilitates the generation of the final predicted classification outcomes. By combining the strengths of sample distribution awareness and fast feature construction, FMEKL-DDI achieves both efficiency and accuracy.
Algorithm 1 Fast Multiple Empirical Kernel Learning Incorporating Data Distribution Information (FMEKL-DDI)
Require: Training specimens ( x i , y i ) i = 1 , , N , number of kernels M, kernel functions k l ( x i , x j ) l = 1 M
Ensure: Augmented weight vector Γ
1:
Select M subspaces by choosing P samples from the training set using the BPLSH algorithm
2:
for l = 1 to M do
3:
    Compute the empirical kernel mapping φ e ( x i ) for the l-th subspace using Equation (11)
4:
end for
5:
Form the augmented empirical feature vectors φ ^ 1 e ( x i ) , , φ ^ M e ( x i ) i = 1 N
6:
Construct the target matrix T using Equation (13)
7:
Compute the augmented weight vector Γ using the least squares solution in Equation (15)

4. Experimental Results

4.1. Experimental Settings

This section will outline the parameter design of the pertinent algorithms, which will be applicable to all experiments carried out here and subsequently. All alterations to the parameters will be clearly specified in each section. This paper will also incorporate the parameter designs of more comparable algorithms, specifically FMEKL-DDI, MREKLM, MEKL, IBMPEKL [17], CGMKL [16], and NLMKL [31]. All previously stated algorithms are kernel-based, and the Gaussian kernel function k ( x i , x j ) = exp x i x j 2 2 σ 2 is chosen, where σ 2 = β · 1 N 2 i , j I x i x j 2 ; N denotes the quantity of training samples and the kernel parameter β 10 2 , 10 1 , 10 0 , 10 1 , 10 2 . The quantity of kernels is selected as M = 3 . The learning rate for IBMPEKL is established at 0.99, whereas for CGMKL, it is established at 1. The intervals for the regularization parameters C and λ are established as 10 2 , 10 1 , 10 0 , 10 1 , 10 2 . For BPLSH, the number of hash functions is designated as q 1 10 , 30 , 50 , 70 , 90 , 110 , and the number of hash families is specified as m 10 , 20 , 30 , 40 , 50 , 60 .
MREKLM entails the selection of P P N samples, characterized by a selection ratio of P / N , where P signifies the number of selected samples and N indicates the total number of training samples. Experimental data in reference [19] indicate that when the ratio P / N surpasses 0.5, there is no notable enhancement in accuracy, while the training duration increases substantially. Consequently, this study chooses P / N { 0.01 , 0.03 , 0.05 , 0.07 , 0.09 , 0.1 , 0.2 , 0.3 , 0.4 , 0.5 } . A five-fold cross-validation technique [32] is utilized to determine the optimal combinations of the specified parameters. The experimental configuration includes an i5-10300H processor working at 2.50 GHz, with 16 GB of RAM, a Windows 10 operating system, and MATLAB version R2021a. All experiments are conducted using this device. All datasets are sourced from the UCI repository [33].
In particular, the selection of key parameters such as the kernel bandwidth σ 2 , the balancing factor β in distributional alignment, and the number of hash functions used in BPLSH was guided by both empirical tuning and theoretical insights. The kernel bandwidth σ 2 was chosen based on principles from kernel density estimation, where an appropriately scaled bandwidth ensures a smooth yet discriminative kernel matrix, preventing over-smoothing or overfitting [34]. The parameter β controls the trade-off between projection fidelity and distributional regularization; it was tuned empirically across datasets to maintain stability while capturing within-class scatter effectively. For BPLSH, the number of hash functions determines the sensitivity to local boundary structures. A moderate number of hashes ensures the accurate identification of informative boundary samples while avoiding excessive overlap or noise [35]. Together, these parameter choices reflect a balance between the theoretical guarantees of representation quality and practical performance observed in validation experiments.
All experiments were conducted on a system with an Intel Core i9 processor, 64 GB RAM, and an NVIDIA RTX 3090 GPU. The operating system was Ubuntu 22.04 LTS.

4.2. The Influence of the Number of Hash Functions and Hash Families on FMEKL-DDI

The BPLSH algorithm serves as the sample selection mechanism for the FMEKL-DDI algorithm, and it is characterized by two primary parameters: the quantity of hash functions q and the count of hash families m. This section of the experiment focuses on analyzing the impact of the two parameters of BPLSH on the FMEKL-DDI algorithm. The variable q is selected from the set { 10 , 30 , 50 , 70 , 90 , 110 } , and the variable m is chosen from the range { 10 , 20 , 30 , 40 , 50 , 60 } . The results from the experiments are detailed in Table 2 and Table 3. In particular, the authors of [36] demonstrate that training two-layer neural networks exhibits sharp phase transitions in generalization performance based on mini-batch size, with learning failing below a critical threshold and succeeding above it.
The results of the experiments indicate that the number of hash functions and hash families does not significantly impact the training time. It can be concluded that as long as these parameters fall within a certain range, they do not adversely affect the experimental results.
Nevertheless, this experiment faced occurrences where certain settings did not produce results. Table 2 and Table 3 illustrate that when the number of hash families is 110 and the number of hash functions is 50 or more, no pertinent experimental findings were produced, suggesting a software anomaly. Upon excluding program-related difficulties, it was concluded that the categorization inconsistencies stemmed from an inadequate number of samples chosen in the final stage. Moreover, the consistently strong performance of FMEKL-DDI is largely due to its ability to align with the underlying data distribution through the within-class scatter matrix and its focus on boundary sample selection via BPLSH, which enhances class separation. These mechanisms enable the better adaptation of kernel learning relative to both linear and non-linear class structures. The variations in performance across datasets can be explained by how well each algorithm captures critical decision boundaries—FMEKL-DDI performs especially well when boundary samples and distribution alignment significantly impact classification, whereas other methods that rely on random or uniform sampling tend to underperform in such cases.
In conclusion, the performance of the FMEKL-DDI algorithm is significantly affected by the parameters m and q. On both the Iono and Twonorm datasets, higher values of m typically sustain high accuracy; however, in some instances, excessively high values of q can result in reductions in accuracy and increases in computational time. Therefore, the ranges 10 q 70 and 10 m 40 are regarded as appropriate.

4.3. Training Time Comparison

This segment of the experiment examines the efficacy of the FMEKL-DDI algorithm in relation to training duration. The comparable algorithms still under consideration in the experiment are NLMKL, MREKLM, MEKL, IBMPEKL, and CGMEKL. Furthermore, prior studies have demonstrated that the quantities of hash functions and hash families for FMEKL-DDI need not be excessive; hence, we established (q = 30) and (m = 20) for this investigation. This section includes eighteen datasets: Wdbc, Wpbc, Iris, Wine, Knowledge, EEG, Letter, Pendigits, and Polish; all were obtained from the UCI repository. Of these, ten datasets are classified as small-scale, whilst eight are designated as relatively larger datasets, with comprehensive details presented in Table 4. The experimental findings are presented in Table 5. The experimental result for each dataset is the average value derived from ten experiments.
Table 5 indicates that the training duration of FMEKL-DDI has been markedly diminished relative to other algorithms. For specific small datasets, while the training duration of FMEKL-DDI is diminished, the variation is not markedly significant. Nonetheless, with comparatively bigger datasets, it is apparent that the training duration of FMEKL-DDI has markedly diminished, consequently illustrating its efficacy in terms of training time.
In particular, while methods such as MEKL and IBMPEKL can achieve competitive accuracy, they often do so at the cost of significantly longer training times due to complex kernel computations and iterative optimization procedures. Conversely, MREKLM reduces training times through random sampling but may sacrifice accuracy due to suboptimal feature space representation. FMEKL-DDI effectively balances this trade-off by leveraging distribution-aware sample selection and efficient empirical kernel construction, achieving high accuracy with minimal computational cost. Understanding and explicitly managing this trade-off is crucial, particularly in real-world applications where resources are limited and timely decision-making is critical.

4.4. Classification Accuracy Comparison

This segment of the experiment mostly evaluates the classification accuracy of FMEKL-DDI in comparison to other algorithms. The algorithm parameter settings are unchanged from the previous section, and the datasets comprise eighteen datasets obtained from the UCI repository. The algorithms evaluated alongside FMEKL-DDI comprise NLMKL, MREKLM, MEKL, IBMPEKL, and CGMEKL.
The experimental outcomes are illustrated in Table 2, with the height of the bar graph indicating the classification accuracy of each technique across various datasets; taller bars signify greater accuracy. The results indicate that FMEKL-DDI attains the highest classification accuracy when compared to the evaluated methods. Additionally, Table 2 demonstrates that FMEKL-DDI consistently outperforms in classification across all eighteen datasets, highlighting its exceptional classification effectiveness.

4.5. Ablation Experiments Conducted to Validate the Functions of the Intra-Class Variance and BPLSH Modules

FMEKL-DDI consists of two modules: Intra-Class Variance S w and BPLSH. This section seeks to assess the impact of the two modules on five datasets: Iono, Iris, CMC, Twonorm, and EEG. This validation utilizes several algorithms: the original empirical kernel (EKM), the empirical kernel enhanced with Intra-Class Variance (EKM+ S w ), the empirical kernel integrated with BPLSH (EKM+BPLSH), and the empirical kernel that combines both Intra-Class Variance and BPLSH (EKM+ S w +BPLSH). The statistical data compare the training durations and classification accuracies of each algorithm across the datasets (Figure 3). Training utilizes 50% of the data, with the remaining 50% allocated for testing, and results are averaged accordingly (Table 6). The entries highlighted in Table 7 represent the highest accuracy, whereas those in Table 8 signify the shortest training time. Notably, we used a 50/50 train–test split to ensure a balanced evaluation between training efficiency and generalization performance. This ratio was consistently applied to maintain comparability across experiments and to avoid bias from overly skewed training or testing proportions.
Table 7 illustrates that the integration of EKM and S w achieves the best accuracy, signifying that the inclusion of Intra-Class Variance substantially boosts the accuracy of the empirical kernel and affirming that S w contributes to accuracy improvement. The isolated introduction of BPLSH results in a little decrease in EKM accuracy, indicating that BPLSH’s effect on categorization accuracy has diminished. This signifies that inside the amalgamation of EKM+ S w +BPLSH, solely S w is employed to enhance accuracy.
Table 8 demonstrates that the amalgamation of BPLSH with EKM results in the minimal training time, indicating that BPLSH significantly decreases training duration. Conversely, employing only the empirical kernel with S w leads to a slight extension of training time, indicating that S w does not contribute to reducing training duration. Utilizing both S w and BPLSH yields favorable outcomes in classification accuracy and training duration for the empirical kernel.
The training time differences among the compared algorithms primarily stem from the sample selection and kernel evaluation strategies. FMEKL-DDI achieves superior efficiency by leveraging the BPLSH algorithm to intelligently select a smaller, boundary-representative subset of samples, significantly reducing unnecessary computations. Additionally, the use of reduced kernel evaluations in smaller, well-structured empirical feature spaces contributes to faster processing. In contrast, while MREKLM also reduces training time by using random subsets, it occasionally incurs higher overhead due to the unpredictability of random sampling, which may select less informative or redundant data, requiring more iterations or larger projections to achieve acceptable performance.

4.6. Experiments on the Protein Subcellular Localization Dataset

To further validate the practicality and generalizability of the proposed algorithm, this section selects three datasets focused on protein subcellular localization to evaluate its performance. The three datasets analyzed are Plant, PsortPos, and PsortNeg. The Plant dataset consists of four classes and a total of 940 samples; PsortPos includes four classes with 541 samples, while PsortNeg comprises five classes with 1441 samples. Fifty percent of the data is utilized for training, while the remaining fifty percent is reserved for testing.
As illustrated in Figure 4, it is evident that FMEKL-DDI demonstrates strong performance in practical applications. Notably, FMEKL-DDI achieves the highest classification accuracy, consistent with the results obtained in previous experimental sections. Furthermore, the classification accuracy of FMEKL-DDI is markedly higher than that of the other algorithms.

5. Conclusions

This paper proposed an efficient algorithm—Fast Multiple Empirical Kernel Learning Incorporating Data Distribution Information (FMEKL-DDI)—that integrates within-class distribution characteristics and boundary-aware sample selection to improve classification performance and computational efficiency. The experimental results across 18 benchmark datasets and 3 protein subcellular localization datasets demonstrated that FMEKL-DDI consistently outperformed comparable methods in terms of both classification accuracy and training time. Our proposal is well suited for large-scale, high-dimensional tasks where accuracy and efficiency are critical. It is particularly useful in bioinformatics, medical diagnostics, and security applications—such as protein classification, disease detection, and fraud or intrusion detection—where preserving class boundaries and fast model training are essential for real-time decision-making.
The observed improvements are largely attributed to two main components of the method: (1) the use of the within-class scatter matrix ( S w ), which enables the model to preserve class-specific sample distributions in the empirical kernel space, and (2) the application of the BPLSH algorithm, which selects informative border samples while significantly reducing the size of the training set. This combination allows FMEKL-DDI to strike an effective balance between accuracy and efficiency, especially on large-scale and high-dimensional datasets.
Comparative analysis showed that FMEKL-DDI surpasses state-of-the-art algorithms such as MEKL, MREKLM, IBMPEKL, CGMKL, and NLMKL in both predictive power and computational scalability. While some baseline methods achieved acceptable accuracy, they often incurred higher computational costs. In contrast, FMEKL-DDI maintained competitive accuracy with substantially lower training times, making it more practical for real-world applications.
An interesting direction for enhancing the training efficiency of FMEKL-DDI, especially on large-scale datasets, lies in leveraging virtual threads—such as Java Virtual Threads or user-level threading in C++. These lightweight threads offer low-overhead concurrency, enabling fine-grained parallelism during computationally intensive tasks like kernel matrix construction and BPLSH-based boundary sample selection. By parallelizing kernel evaluations across multiple subspaces or concurrently processing similarity computations in BPLSH, the algorithm could significantly reduce wall-clock training time without increasing memory consumption. Integrating virtual threads into the existing framework would allow the more efficient utilization of multi-core architectures, making FMEKL-DDI even more scalable for high-dimensional or streaming data applications.
Despite these advances, several opportunities for future development remain. One area involves exploring adaptive strategies for selecting the number and type of empirical kernels based on dataset characteristics. Another promising direction is the integration of class imbalance handling techniques and robust noise filtering, which could further enhance model generalization in complex or imbalanced datasets. Additionally, extending FMEKL-DDI to semi-supervised or online learning scenarios would broaden its applicability in dynamic and data-scarce environments.

Author Contributions

Methodology, J.H.; hardware, Z.L. and X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, H.K.; Qiu, S.; Suh, J.W.; Luo, D.; Zhu, Z. Machine Learning and Deep Learning in Remote Sensing Data Analysis. In Reference Module in Earth Systems and Environmental Sciences; Elsevier: Amsterdam, The Netherlands, 2024. [Google Scholar] [CrossRef]
  2. Liu, Y.; Cao, S. The analysis of aerobics intelligent fitness system for neurorobotics based on big data and machine learning. Heliyon 2024, 10, e33191. [Google Scholar] [CrossRef] [PubMed]
  3. Zhou, X.; Du, H.; Xue, S.; Ma, Z. Recent advances in data mining and machine learning for enhanced building energy management. Energy 2024, 307, 132636. [Google Scholar] [CrossRef]
  4. Ge, Q.; Lu, X.; Jiang, R.; Zhang, Y.; Zhuang, X. Data mining and machine learning in HIV infection risk research: An overview and recommendations. Artif. Intell. Med. 2024, 153, 102887. [Google Scholar] [CrossRef]
  5. Rodrigues, C.H.P.; da Cruz Sousa, M.D.; dos Santos, M.A.; Filho, P.A.F.; Velho, J.A.; Leite, V.B.P.; Bruni, A.T. Forensic analysis of microtraces using image recognition through machine learning. Microchem. J. 2024, 207, 111780. [Google Scholar] [CrossRef]
  6. Wu, P.; Li, L.; Shao, S.; Liu, J.; Wang, J. Bioinspired PEDOT:PSS-PVDF(HFP) flexible sensor for machine-learning-assisted multimodal recognition. Chem. Eng. J. 2024, 495, 153558. [Google Scholar] [CrossRef]
  7. Sebastian, J.; S., K.R.; K.V., S. Adaptive control of a nonaffine nonlinear system using self-organising kernel extreme learning machine. ISA Trans. 2024, 146, 567–581. [Google Scholar] [CrossRef]
  8. Nazarpour, A.; Adibi, P. Two-stage multiple kernel learning for supervised dimensionality reduction. Pattern Recognit. 2015, 48, 1854–1862. [Google Scholar] [CrossRef]
  9. Wang, J.; Luo, J. A fast parameter optimization approach based on the inter-cluster induced distance in the feature space for support vector machines. Appl. Soft Comput. 2022, 118, 108519. [Google Scholar] [CrossRef]
  10. Huang, Q.; Mao, J.; Liu, Y. An improved grid search algorithm of SVR parameters optimization. In Proceedings of the 2012 IEEE 14th International Conference on Communication Technology, Chengdu, China, 9–11 November 2012; pp. 1022–1026. [Google Scholar]
  11. Li, B.; Yang, Y.; Liu, D.; Zhang, Y.; Zhou, A.; Yao, X. Accelerating surrogate assisted evolutionary algorithms for expensive multi-objective optimization via explainable machine learning. Swarm Evol. Comput. 2024, 88, 101610. [Google Scholar] [CrossRef]
  12. Ding, X.; Cui, M.; Li, Y.; Chen, S. A maximal accuracy and minimal difference criterion for multiple kernel learning. Expert Syst. Appl. 2024, 254, 124378. [Google Scholar] [CrossRef]
  13. Lanckriet, G.; Cristianini, N.; Bartlett, P.; Ghaoui, L.E.; Jordan, M.I. Learning the Kernel Matrix with Semi-Definite Programming. In Proceedings of the Machine learning: Nineteenth international conference on machine learning(ICML 2002), Sydney, Australia, 8–12 July 2002. [Google Scholar]
  14. Jian, L.; Xia, Z.; Liang, X.; Gao, C. Design of a multiple kernel learning algorithm for LS-SVM by convex programming. Neural Networks 2011, 24, 476–483. [Google Scholar] [CrossRef] [PubMed]
  15. Wang, Z.; Wang, B.; Zhou, Y.; Li, D.; Yin, Y. Weight-based multiple empirical kernel learning with neighbor discriminant constraint for heart failure mortality prediction. J. Biomed. Inform. 2020, 101, 103340. [Google Scholar] [CrossRef] [PubMed]
  16. Wang, Z.; Zhu, Z.; Li, D. Collaborative and geometric multi-kernel learning for multi-class classification. Pattern Recognit. 2020, 99, 107050. [Google Scholar] [CrossRef]
  17. Zhu, Z.; Wang, Z.; Li, D.; Du, W.; Zhou, Y. Multiple Partial Empirical Kernel Learning with Instance Weighting and Boundary Fitting. Neural Netw. 2020, 123, 26–37. [Google Scholar] [CrossRef]
  18. Marino, R.; Buffoni, L.; Chicchi, L.; Giambagli, L.; Fanelli, D. Stable attractors for neural networks classification via ordinary differential equations (SA-nODE). Mach. Learn. Sci. Technol. 2024, 5, 035087. [Google Scholar] [CrossRef]
  19. Fan, Q.; Wang, Z.; Zha, H.; Gao, D. MREKLM: A fast multiple empirical kernel learning machine. Pattern Recognit. 2017, 61, 197–209. [Google Scholar] [CrossRef]
  20. Arriaga, R.I.; Vempala, S. An algorithmic theory of learning: Robust concepts and random projection. Mach. Learn. 2006, 63, 161–182. [Google Scholar] [CrossRef]
  21. Aslani, M.; Seipel, S. Efficient and decision boundary aware instance selection for support vector machines. Inf. Sci. 2021, 577, 579–598. [Google Scholar] [CrossRef]
  22. Li, T.; Shu, X.; Wu, J.; Zheng, Q.; Lv, X.; Xu, J. Adaptive weighted ensemble clustering via kernel learning and local information preservation. Knowl.-Based Syst. 2024, 294, 111793. [Google Scholar] [CrossRef]
  23. Tang, J.; Hou, Z.; Yu, X.; Fu, S.; Tian, Y. Multi-view cost-sensitive kernel learning for imbalanced classification problem. Neurocomputing 2023, 552, 126562. [Google Scholar] [CrossRef]
  24. Chen, Y.; Yang, X.; Dai, H.L. Cost-sensitive continuous ensemble kernel learning for imbalanced data streams with concept drift. Knowl.-Based Syst. 2024, 284, 111272. [Google Scholar] [CrossRef]
  25. Ober, S.W.; Rasmussen, C.E.; van der Wilk, M. The promises and pitfalls of deep kernel learning. In Proceedings of the Uncertainty in Artificial Intelligence, Virtual Event, 27–30 July 2021; pp. 1206–1216. [Google Scholar]
  26. Lu, J.; Plataniotis, K.N.; Venetsanopoulos, A.N. Face recognition using kernel direct discriminant analysis algorithms. IEEE Trans. Neural Netw. 2003, 14, 117–126. [Google Scholar] [PubMed]
  27. Ye, J. Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems. J. Mach. Learn. Res. 2005, 6, 483–502. [Google Scholar]
  28. Ye, J.; Li, T.; Xiong, T.; Janardan, R. Using uncorrelated discriminant analysis for tissue classification with gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2004, 1, 181–190. [Google Scholar]
  29. Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
  30. Lagoudakis, M.G.; Parr, R. Least-Squares Policy Iteration. J. Mach. Learn. Res. 2003, 4, 1107–1149. [Google Scholar]
  31. Cortes, C.; Mohri, M.; Rostamizadeh, A. Learning Non-Linear Combinations of Kernels; Curran Associates Inc.: Red Hook, NY, USA, 2009. [Google Scholar]
  32. Escobar, S. Model selection and error estimation in a nutshell. Comput. Rev. 2021, 62, 144. [Google Scholar]
  33. Kelly, M.; Longjohn, R.; Nottingham, K. The UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu (accessed on 8 April 2025).
  34. Silverman, B.W. The kernel method for univariate data. In Density Estimation for Statistics and Data Analysis; Springer: Berlin/Heidelberg, Germany, 1986; pp. 34–74. [Google Scholar]
  35. Gionis, A.; Indyk, P.; Motwani, R. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases, San Francisco, CA, USA, 7–10 September 1999; Volume 99, pp. 518–529. [Google Scholar]
  36. Marino, R.; Ricci-Tersenghi, F. Phase transitions in the mini-batch size for sparse and dense two-layer neural networks. Mach. Learn. Sci. Technol. 2024, 5, 015015. [Google Scholar] [CrossRef]
Figure 1. (a) Examples of two-dimensional linear separable problems. (b) Examples of two-dimensional nonlinear separable problems.
Figure 1. (a) Examples of two-dimensional linear separable problems. (b) Examples of two-dimensional nonlinear separable problems.
Electronics 14 01879 g001
Figure 2. General pipeline of our proposal.
Figure 2. General pipeline of our proposal.
Electronics 14 01879 g002
Figure 3. Classification accuracy comparison (%) on the datasets.
Figure 3. Classification accuracy comparison (%) on the datasets.
Electronics 14 01879 g003
Figure 4. Classification accuracy comparison (%) on the selected protein subcellular localization datasets.
Figure 4. Classification accuracy comparison (%) on the selected protein subcellular localization datasets.
Electronics 14 01879 g004
Table 1. Summary of notations and definitions.
Table 1. Summary of notations and definitions.
SymbolDefinition
NNumber of training samples
DDimensionality of input space
CNumber of classes
x i R D i-th input sample
y i { 1 , , C } Label of sample x i
PNumber of selected samples via BPLSH
k ( x i , x j ) Kernel function between samples x i and x j
K R P × P Kernel matrix for selected samples
φ ( x ) Empirical kernel mapping function
rReduced dimensionality after mapping
S w Within-class scatter matrix
μ c Mean vector of class c
X c Set of samples in class c
WWeight matrix (intra-class similarity)
DDiagonal matrix where D i i = j W i j
ν e ( x ) Augmented empirical feature vector
Γ Augmented weight vector
f ( x ) Final classifier function
Table 2. Classification accuracy (%) of different algorithms on various datasets.
Table 2. Classification accuracy (%) of different algorithms on various datasets.
DatasetFMEKL-DDINLMKLMREKLMMEKLIBMPEKLCGMKL
Iono959493929391
Heart858280758080
ILPD737272727273
CMC565553525554
Bupa636261606162
Magic858484737875
Twonorm1009998989997
Optidigits999796959694
Coil-2000949392919290
Wdbc979695949695
Wpbc828078767981
Iris1009796949695
Wine979594939593
Knowledge959491899390
EEG969595949594
Letter1009999989998
Pendigits989796959795
Polish959493919392
Table 3. Experimental results on the Twonorm dataset.
Table 3. Experimental results on the Twonorm dataset.
q = 10 q = 30
mAccuracyTime (s)mAccuracyTime (s)
100.97489.1898100.97398.0213
200.97518.8932200.97389.3807
300.97448.0264300.97468.2321
400.975310.3214400.97497.3459
500.97497.3458500.97556.1234
600.97636.3841600.97556.1245
q = 50 q = 70
mAccuracyTime (s)mAccuracyTime (s)
100.97169.3524100.97539.2134
200.97478.2359200.97489.8921
300.97535.3524300.97506.5987
400.97546.1945400.97498.1945
500.97417.4682500.97456.1503
600.97435.1564600.97436.1324
q = 90 q = 110
mAccuracyTime (s)mAccuracyTime (s)
100.97488.0421100.97585.3424
200.97488.4245200.97388.8320
300.97549.5623300.97527.5362
400.97734.3421400.97635.4215
500.97515.874250--
60--60--
Table 4. The training datasets.
Table 4. The training datasets.
DataAttributeClassInstance
Iono342351
Heart135303
ILPD102583
CMC931473
Bupa62345
Magic10219,020
Twonrom2027400
Optidigits64105620
Coil-20008529822
Wdbc302569
Wpbc332198
Iris43150
Wine142178
Knowledge54403
EEG14214,980
Letter162620,000
Pendigits161010,992
Polish64210,313
Table 5. Training time(s) comparison on the datasets.
Table 5. Training time(s) comparison on the datasets.
DataFMEKL-DDINLMKLMREKLMMEKLIBMPEKLCGMKL
Iono0.0021 0.08930.00240.06060.09310.0724
Heart0.00250.00220.00500.03560.05420.0389
ILPD0.00140.62770.00140.26340.02650.0254
CMC0.00766.14570.00965.17855.57836.3125
Bupa0.00210.08820.00350.07930.10240.0832
Magic15.6392351.1546392.47782241.90202341.32122298.2154
Twonorm1.16921425.36511.85481057.10101325.11341195.8754
Optdigits16.2351834.152673.2354724.5102925.3134917.3874
Coil-20002.4478380.732410.7393320.7481290.3487215.1548
Wdbc0.02970.23100.03970.18530.23120.1465
Wpbc0.00510.11090.01210.02770.03590.0298
Iris0.00110.02170.03210.00450.00540.0048
Wine0.00150.02160.01230.00580.00930.0065
Knowledge0.00220.07220.00530.06590.08360.0724
EEG19.9539691431.2545256.70291344.235461521.34581132.4562
Letter16.33132418.2577640.32942138.71242531.45612214.5387
Pendigits7.241025340.557263.9967337.2891350.4852362.1486
Polish4.0005380.527274.1873369.9875380.1354360.5348
Table 6. Quantitative comparison of kernel learning methods included a discussion in Section 5 on the potential use of virtual threads (e.g., Java Virtual Threads or user-level threading in C++) to enhance concurrency. We reflect on how this could improve training efficiency for large datasets by optimizing parallel processing during kernel computation and BPLSH-based selection.
Table 6. Quantitative comparison of kernel learning methods included a discussion in Section 5 on the potential use of virtual threads (e.g., Java Virtual Threads or user-level threading in C++) to enhance concurrency. We reflect on how this could improve training efficiency for large datasets by optimizing parallel processing during kernel computation and BPLSH-based selection.
MethodAvg. Accuracy (%)Avg. Training Time (s)ScalabilityInterpretability
NLMKL92.1635.1LowLow
MREKLM90.574.3HighModerate
MEKL88.4627.6LowModerate
IBMPEKL91.7748.2ModerateModerate
CGMKL90.8613.5ModerateLow
FMEKL-DDI94.66.8High (efficient with large data)Moderate to High
Table 7. The accuracy (%) comparison of ablation experiments.
Table 7. The accuracy (%) comparison of ablation experiments.
DataEKMEKM+ S w EKM+BPLSHEKM+ S w +BPLSH
Iono92.9495.3692.1395.28
Iris97.3599.2696.9499.19
CMC54.5056.5454.3156.50
Twonorm97.87100.0097.4699.98
EEG90.7396.5090.4196.36
Table 8. The comparison of training time(s) in ablation experiment.
Table 8. The comparison of training time(s) in ablation experiment.
DataEKMEKM+ S w EKM+BPLSHEKM+ S w +BPLSH
Iono0.06050.06560.00310.0124
Iris0.00450.00420.00110.0014
CMC5.17854.96200.07300.0931
Twonorm1057.1012960.35231.35211.7532
EEG1344.23531432.58329.543215.3453
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang , J.; Luo , Z.; Wang , X. An Efficient Multiple Empirical Kernel Learning Algorithm with Data Distribution Estimation. Electronics 2025, 14, 1879. https://doi.org/10.3390/electronics14091879

AMA Style

Huang  J, Luo  Z, Wang  X. An Efficient Multiple Empirical Kernel Learning Algorithm with Data Distribution Estimation. Electronics. 2025; 14(9):1879. https://doi.org/10.3390/electronics14091879

Chicago/Turabian Style

Huang , Jinbo, Zhongmei Luo , and Xiaoming Wang . 2025. "An Efficient Multiple Empirical Kernel Learning Algorithm with Data Distribution Estimation" Electronics 14, no. 9: 1879. https://doi.org/10.3390/electronics14091879

APA Style

Huang , J., Luo , Z., & Wang , X. (2025). An Efficient Multiple Empirical Kernel Learning Algorithm with Data Distribution Estimation. Electronics, 14(9), 1879. https://doi.org/10.3390/electronics14091879

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop