Kernel-Based Optimal Subspaces (KOS): A Method for Data Classification
Abstract
1. Introduction
- As an optimization problem, SVM may fail, particularly in high dimensions, even though the cost function is convex (quadratic).
- Extending SVM to multi-class classification is not straightforward and becomes computationally expensive in high dimensions or when the number of classes is large. For instance, a common approach combining one-against-one and one-against-all strategies requires computing a total of separating hyperplanes (i.e., solving n optimization problems), where n is the number of classes.
- Imbalanced classes pose another challenge. A popular approach to addressing this involves adding synthetic attributes to underrepresented classes. While effective, the process is not straightforward and can alter the structure of the original class data.
- In dynamic classification, where classes may be added or removed, all separating hyperplanes must be recalculated, which can be prohibitively expensive for real-time applications.
2. KOS: Kernel-Based Optimal Subspaces Method
2.1. Proper Orthogonal Decomposition (POD): Overview
POD Basis System
- 1.
- If , let M be the correlation matrix where is any inner product. Let V be the matrix of eigenvectors of M; then the basis functions , called modes, are given by:where are the components of the ith eigenvector of M.
- 2.
- If , the POD basis is defined by the eigenvectors of the matrix .
2.2. KOS: Global Description and Some Theoretical Aspects
| Algorithm 1: Main KOS Steps |
Let denote P attribute sets representing P different classes used during the learning stage. Let denote the corresponding mapped sets in the feature space. The KOS classification procedure is as follows:
|
2.3. POD Feature Subspaces
- 1.
- Non-centered data formulation: In this case, the data in the feature space are used without centering. Let be the mapped data points of a selected class of size N. First, we construct the correlation matrix:where denotes the kernel function evaluated at and . As seen, the POD correlation matrix is equivalent to the kernel matrix derived from the mapping.The normalized POD modes are then given bywhere is the ith eigenvector of the kernel matrix M and the associated eigenvalue.
- 2.
- Centered data formulation: In this case, the feature vectors are centered. Let be the mean vector and be the centered data. The correlation matrix is then given by:Each of the inner products can be expressed in terms of the kernel function:By substituting back into the expression for , the centered correlation matrix becomes:As in the non-centered case, the correlation matrix can be entirely expressed using the kernel matrix.The normalized POD modes for centered data are then given bywhere is the ith eigenvector of the correlation matrix M defined by (14) and the associated eigenvalue.
2.4. Optimality of the POD Feature Subspaces
- Geometrical Optimality of POD Feature Spaces:As shown in [26], if is the POD basis matrix constructed to represent the columns of a matrix and is any other basis of the same size constructed for the same purpose, then the following inequality holds:where is the Frobenius norm, defined byThis result demonstrates the geometric optimality of the POD basis: among all possible orthonormal bases of the same dimension, the POD basis provides the best approximation (in the least-squares sense) of the original data matrix A. Therefore, the POD basis vectors are the closest possible representatives of the original feature vectors, ensuring maximum fidelity in capturing the data structure.
- Algebraic Optimality of POD Feature Spaces:
2.5. Decision Criterion
- 1.
- Non-centered data formulation: The coordinates , are given by the projection of the element on the POD modeswithThenUsing the Pythagoras rule, the distance of to the POD subspace F is given byIn the above,
- 2.
- Centered data formulation: For the centered case, and are replaced by and respectively. That is,withThenBy substituting we obtain
2.6. Imbalanced Classes
2.7. RBF Parameter Learning
2.8. Some Remarks
2.9. Summary of the Algorithm
3. Results: Validation and Discussion
3.1. 2D Test Cases
3.2. Higher-Dimension Test Cases
3.2.1. KOS vs. SVM
3.2.2. Comparison with KNN and KPCA
3.2.3. Feature Space Orthogonality
3.2.4. CPU Time Discussion
3.2.5. Complexity Analysis
3.3. KOS: Main Advantages Compared to SVM
- 1.
- 2.
- Robustness: The KOS algorithm is robust and faster; it requires no optimization process which can slow down the algorithm in high dimensions and can fail even if the cost function is quadratic because of rounding-off errors, as in the case of SVM.
- 3.
- Complexity: SVM complexity grows quadratically with the number of classes (requiring hyperplanes), while KOS scales linearly, needing only n sets of eigenpairs.
- 4.
- Time efficiency: As a result of the two previous proprieties, KOS is significantly faster than SVM, up to more than 60 times faster in certain tests, as shown in Table 7.
- 5.
- Parallelization: KOS is highly parallelizable with respect to the number of classes since all subspaces are independent.
- 6.
- Dynamic classification: In the case of class creation or cancellation, SVM requires recalculating all feature space hyperplanes. By contrast, KOS only requires computing eigenpairs of the Mercer kernel matrix for the new class, with no update needed for canceled classes.
- 7.
- Imbalanced classes: In KOS, this issue is naturally handled by subdividing large classes into smaller, balanced subclasses; see Section 2.6. This avoids the need for artificial attributes, which may distort the data structure, or the use of balancing weights, which often lack a clear and systematic procedure.
4. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Proof of the POD Best Representatives
References
- Alpaydin, E. Introduction to Machine Learning, 4th ed.; MIT: Cambridge, MA, USA, 2020; pp. xix, 1–3, 13–18. [Google Scholar]
- Vapnik, V. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar]
- Vapnik, V. Statistical Learning Theory; Wiley: New York, NY, USA, 1998. [Google Scholar]
- Cortes, C.; Vapnik, V. Support vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Mercer, J. Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. R. Soc. 1909, 209, 415–446. [Google Scholar] [CrossRef]
- Wang, H.; Li, G.; Wang, Z. Fast SVM classifier for large-scale classification problems. Inf. Sci. 2023, 642, 119136. [Google Scholar] [CrossRef]
- Shao, Y.H.; Lv, X.J.; Huang, L.W.; Bai, L. Twin SVM for conditional probability estimation in binary and multiclass classification. Pattern Recognit. 2023, 136, 109253. [Google Scholar] [CrossRef]
- Wang, H.; Shao, Y. Fast generalized ramp loss support vector machine for pattern classification. Pattern Recognit. 2024, 146, 109987. [Google Scholar] [CrossRef]
- Wang, B.Q.; Guan, X.P.; Zhu, J.W.; Gu, C.C.; Wu, K.J.; Xu, J.J. SVMs multi-class loss feedback based discriminative dictionary learning for image classification. Pattern Recognit. 2021, 112, 107690. [Google Scholar] [CrossRef]
- Borah, P.; Gupta, D. Functional iterative approaches for solving support vector classification problems based on generalized Huber loss. Neural Comput. Appl. 2020, 32, 1135–1139. [Google Scholar] [CrossRef]
- Gaye, B.; Zhang, D.; Wulamu, A. Improvement of Support Vector Machine Algorithm in Big Data Background. Hindawi Math. Probl. Eng. 2021, 2021, 5594899. [Google Scholar] [CrossRef]
- Tian, Y.; Shi, Y.; Liu, X. Advances on support vector machines research. Technol. Econ. Dev. Econ. 2012, 18, 5–33. [Google Scholar] [CrossRef]
- Ayat, N.E.; Cheriet, M.; Remaki, L.; Suen, C.Y. KMOD—A New Support Vector Machine Kernel with Moderate Decreasing for Pattern Recognition. Application to Digit Image Recognition. In Proceedings of the Sixth International Conference on Document Analysis and Recognition, Seattle, WA, USA, 10–13 September 2001; pp. 1215–1219. [Google Scholar]
- Remaki, L. Efficient Alternative to SVM Method in Machine Learning. In Intelligent Computing; Arai, K., Ed.; Lecture Notes in Networks and Systems; Springer: Cham, Switzerland, 2025; Volume 1426. [Google Scholar]
- Yoshikazu, W.; Nakayama, Y. Learning subspace classification using subset approximated kernel principal component analysis. IEICE Trans. Inf. Syst. 2016, 99, 1353–1363. [Google Scholar] [CrossRef]
- Jiang, W.; Chen, Y.; Wu, L.; Yu, P.S. Subspace learning for effective meta-learning. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 10177–10194. [Google Scholar]
- Cao, Y.-H.; Wu, J.X. Random subspace sampling for classification with missing data. J. Comput. Sci. Technol. 2024, 39, 472–486. [Google Scholar] [CrossRef]
- Schölkopf, B.; Smola, A.; Müller, K.R. Kernel principal component analysis. In Artificial Neural Networks—ICANN’97; Springer: Berlin/Heidelberg, Germany, 1997; pp. 583–588. [Google Scholar]
- Zhou, S.; Ou, Q.; Liu, X.; Wang, S.; Liu, L.; Wang, S. Multiple Kernel Clustering with Compressed Subspace Alignment. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 252–263. [Google Scholar] [CrossRef]
- Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. [Google Scholar] [CrossRef]
- Lin, S.W.; Ying, K.C.; Chen, S.C.; Lee, Z.J. Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst. Appl. 2008, 35, 1817–1824. [Google Scholar] [CrossRef]
- Syarif, I.; Prugel-Bennett, A.; Wills, G. SVM Parameter Optimization Using Grid Search and Genetic Algorithm to Improve Classification Performance. TELKOMNIKA 2016, 14, 1502–1509. [Google Scholar] [CrossRef]
- Shekar, B.H.; Dagnew, G. Grid Search-Based Hyperparameter Tuning and Classification of Microarray Cancer Data. In Proceedings of the Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), Gangtok, India, 25–28 February 2019. [Google Scholar]
- Hinton, E.; Osindero, S.; Teh, Y. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef]
- Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
- Volkwein, S. Proper Orthogonal Decomposition: Theory and Reduced-Order Modelling; Lecture Notes; University of Konstanz: Konstanz, Germany, 2013; Volume 4. [Google Scholar]
- Wang, W.; Zhang, M.; Wang, D.; Jiang, Y. Kernel PCA feature extraction and the SVM classification algorithm for multiple-status, through-wall, human being detection. EURASIP J. Wirel. Commun. Netw. 2017, 2017, 151. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. Smote: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Remaki, L.; Cheriet, M. KCS—New kernel family with compact support in scale space: Formulation and impact. IEEE Trans. Image Process. 2000, 9, 970–981. [Google Scholar] [CrossRef]
- Koenderink, J.J. The structure of images. Biol. Cybern. 1984, 53, 363–370. [Google Scholar] [CrossRef]
- Vanschoren, J.; Van Rijn, J.N.; Bischl, B.; Torgo, L. OpenML: Networked science in machine learning. SIGKDD Explor. Newsl. 2014, 15, 49–60. [Google Scholar] [CrossRef]
- Golub, T.R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caligiuri, M.A.; et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 1999, 286, 531. [Google Scholar] [CrossRef] [PubMed]
- Hsu, C.W.; Chang, C.C.; Lin, C.J. A Practical Guide to Support Vector Classification; Technical Report; Department of Computer Science, National Taiwan University: Taipei, Taiwan, 2003. [Google Scholar]
- Available online: http://archive.ics.uci.edu/ml/index.php (accessed on 5 July 2025).
- Available online: https://www.csie.ntu.edu.tw/cjlin/libsvmtools/datasets/ (accessed on 5 July 2025).
- Guyon, I.; Gunn, S.; Ben Hur, A.; Dror, G. Result analysis of the NIPS 2003 feature selection challenge. Adv. Neural Inf. Process. Syst. 2004, 17, 545–552. [Google Scholar]
- Hsu, C.W.; Lin, C.J. A comparison of methods for multi-class support vector machines. IEEE Trans. Neural Netw. 2002, 13, 415–425. [Google Scholar] [PubMed]
- Hull, J.J. A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 550–554. [Google Scholar] [CrossRef]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 16, Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
- Andrews, T. Computation Time Comparison Between Matlab and C++ Using Launch Windows. Available online: https://digitalcommons.calpoly.edu/aerosp/78/?utm_source=chatgpt.com (accessed on 22 September 2024).





| Test Name | No. of Classes | Description/Reference | Training Set Size/ Class Sizes | Test Set Size/ Feature Vector Size |
|---|---|---|---|---|
| Leukemia | 2 | Molecular classification of cancer [32] | size = 27 size | 7129 |
| svmguide1 | 2 | Astroparticle application [33] | size size | 4 |
| splice | 2 | Splice junctions in a DNA sequence [34] | size size | 60 |
| austrian | 2 | Credit Approval dataset [35] | size size | 14 |
| madelon | 2 | Analysis of the NIPS 2003 [36] | size size | 14 |
| Test Name | No. of Classes | Description/ Reference | Training Set Size/ Class Sizes | Test Set Size/ Feature Vector Size |
|---|---|---|---|---|
| DNA | 3 | DNA [37] | size size size | 180 |
| Satimage | 6 | Satellite images [37] | 4435 size size size size size size | 36 |
| USPS | 10 | Handwritten text recognition dataset [38] | size size size size size size size size size size | 256 |
| letter | 26 | Letter recognition dataset [38] | 15,000/ size size size size size size size size size size size size size size size size size size size size size size size size size size | 16 |
| shuttle | 7 | Space shuttle sensors [38] | 43,500/ size 34,108 size size size size size size | 14,500/ 9 |
| Name/Classes | Test Size | SVM Accuracy|Precision | KOS Accuracy|Precision |
|---|---|---|---|
| Leukemia/2 | Train.#38 Test.#34 | ||
| svmguide1/2 | Train.#3089 Test.#4000 | ||
| splice/2 | Train.#1000 Test.#2175 | ||
| austrian/2 | Train.#349 Test.#341 | ||
| madelon/2 | Train.#2000 Test.#600 | ||
| DNA/3 | Train.#2000 Test.#1186 | ||
| Satimage/6 | Train.#4435 Test.#200 | ||
| USPS/10 | Train.#7291 Test.#2007 | ||
| letter/26 | Train.#15,000 Test.#5000 | ||
| shuttle/7 | Train.#43,500 Test.#14,500 |
| Name/Classes | Test Size | SVM Accuracy|Precision | KOS Accuracy|Precision |
|---|---|---|---|
| BreastCancer/2 | Train.#398 Test.#171 | ||
| Zernike/10 | Train.#1400 Test.#600 | ||
| Diabetes/2 | Train.#537 Test.#231 | ||
| mfeat-morphological/10 | Train.#1400 Test.#600 |
| Name/Classes | KNN Accuracy|Precision | KPCA Accuracy|Precision | KOS Accuracy|Precision |
|---|---|---|---|
| Leukemia/2 | |||
| svmguide1/2 | |||
| splice/2 | |||
| austrian/2 | |||
| madelon/2 | |||
| DNA/3 | |||
| Satimage/6 | |||
| USPS/10 | |||
| letter/26 | |||
| shuttle/7 | |||
| BreastCancer/2 | |||
| Zernike/10 | |||
| Diabetes/2 | |||
| mfeat-morphological/10 |
| Name/Classes | PA Average | PA Standard Deviation | CD |
|---|---|---|---|
| Leukemia/2 | |||
| svmguide1/2 | |||
| splice/2 | |||
| austrian/2 | |||
| madelon/2 | |||
| DNA/3 | |||
| Satimage/6 | |||
| USPS/10 | |||
| letter/26 | |||
| shuttle/7 | |||
| BreastCancer/2 | |||
| Zernike/10 | |||
| Diabetes/2 | |||
| mfeat-morphological/10 |
| Name/Classes | Test Size | SVM Processing Time | KOS Processing Time |
|---|---|---|---|
| Leukemia/2 | Train.#38 Test.#34 | 0m8.646s | |
| svmguide1/2 | Train.#3089 Test.#4000 | 0m33.327s | |
| splice/2 | Train.#1000 Test.#2175 | 0m29.190s | |
| austrian/2 | Train.#349 Test.#341 | 0m1.721s | |
| madelon/2 | Train.#2000 Test.#600 | 16m50.721s | |
| DNA/3 | Train.#2000 Test.#1186 | 4m24.300s | |
| Satimage/6 | Train.#4435 Test.#200 | 3m53.042s | |
| USPS/10 | Train.#7291 Test.#2007 | 81m18.931s | |
| letter/26 | Train.#15,000 Test.#5000 | 44m11.996s | |
| shuttle/7 | Train.#43,500 Test.#14,500 | 62m28.783s | |
| BreastCancer/2 | Train.#398 Test.#171 | 0m1.696s | |
| Zernike/10 | Train.#1400 Test.#600 | 0m48.875s | |
| Diabetes/2 | Train.#537 Test.#231 | 0m6.959s | |
| mfeat-morphological/10 | Train.#1400 Test.#600 | 0m23.421s |
| Name/Classes | KNN Processing Time | KPCA Processing Time | KOS Processing Time |
|---|---|---|---|
| Leukemia/2 | 0m0.0630s | 0m0.28780s | |
| svmguide1/2 | 0m1.2287s | 0m0.8382s | |
| splice/2 | 0m0.5270s | 0m0.66s | |
| austrian/2 | 0m0.2475s | 0m0.1202s | |
| madelon/2 | 0m0.8191s | 0m0.541s | |
| DNA/3 | 0m1.0193s | 0m0.6386s | |
| Satimage/6 | 1m35.10s | 0m0.5332s | |
| USPS/10 | 0m24.9070s | 0m1.3262s | |
| letter/26 | 2m47.025s | 0m4.1978s | |
| shuttle/7 | 0m3.2707s | 0m56.142s | |
| BreastCancer/2 | 0m0.3280s | 0m0.2946s | |
| Zernike/10 | 0m1.0684s | 0m0.5318s | |
| Diabetes/2 | 0m0.4419s | 0m0.3174s | |
| mfeat-morphological/10 | 0m0.743s | 0.3104s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Remaki, L. Kernel-Based Optimal Subspaces (KOS): A Method for Data Classification. Mach. Learn. Knowl. Extr. 2026, 8, 52. https://doi.org/10.3390/make8020052
Remaki L. Kernel-Based Optimal Subspaces (KOS): A Method for Data Classification. Machine Learning and Knowledge Extraction. 2026; 8(2):52. https://doi.org/10.3390/make8020052
Chicago/Turabian StyleRemaki, Lakhdar. 2026. "Kernel-Based Optimal Subspaces (KOS): A Method for Data Classification" Machine Learning and Knowledge Extraction 8, no. 2: 52. https://doi.org/10.3390/make8020052
APA StyleRemaki, L. (2026). Kernel-Based Optimal Subspaces (KOS): A Method for Data Classification. Machine Learning and Knowledge Extraction, 8(2), 52. https://doi.org/10.3390/make8020052
