Next Article in Journal
Shared Autoencoder-Based Unified Intrusion Detection Across Heterogeneous Datasets for Binary and Multi-Class Classification Using a Hybrid CNN–DNN Model
Previous Article in Journal
Innovations in Robots for Weed and Pest Control: A Systematic Review of Cutting-Edge Research
 
 
Article
Peer-Review Record

Kernel-Based Optimal Subspaces (KOS): A Method for Data Classification

Mach. Learn. Knowl. Extr. 2026, 8(2), 52; https://doi.org/10.3390/make8020052
by Lakhdar Remaki
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Mach. Learn. Knowl. Extr. 2026, 8(2), 52; https://doi.org/10.3390/make8020052
Submission received: 24 September 2025 / Revised: 2 February 2026 / Accepted: 5 February 2026 / Published: 22 February 2026
(This article belongs to the Section Data)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The author proposes method which build a POD-based kernel subspace ( for each class) and then perform classification by the minimum distance to these subspaces. Thus no convex optimization is required. The author claims linear scaling in the number of classes and straightforward parallelism. 

The proposed Kernel classifiers and non-iterative SVM alternatives remain relevant up-to-date. However, presented experiments rely on LIBSVM datasets and they are compared only to SVM/LIBSVM, which limits ability to evaluate novelty level. Therefore, additionally to SVM, it would be good from innovation point of view to include up-to-date methods like : k-NN, logistic/softmax, RF/GBDT, LS-SVM or kernel ridge, and prototype/(affine) subspace methods. 

Additionally, only accuracy (“rate of success”) is used, even for imbalanced sets. It would be appropriate to add other metrices that show imbalance and additionally evaluate the class-splitting strategy with these metrics. 

Other inconsistencies:

- table contain inconsistencies. svmguide1: Test 400 in Table 1 vs 4000 in Table 3; madelon: Train 200 in Table 1 vs 2000 in Table 3.  

-timing gaps without protocol. e.g., USPS: 81m 18.9s (SVM) vs 6.35s (KOS), but hardware, implementation details, and tuning budgets are not reported.

Finally, the implementation code would be helpful to evaluate the method. 

Comments on the Quality of English Language

Please revise document removing typos (e.g., “claculate the distnace”).

Author Response

This report presents the author’s responses to the reviewers’ comments on the manuscript submitted to the Machine Learning and Knowledge Extraction journal for revision. The author is very grateful for the reviewers’ constructive feedback and the time they invested, which have significantly contributed to improving the quality of the paper.

Comments 1:The proposed Kernel classifiers and non-iterative SVM alternatives remain relevant up-to-date. However, presented experiments rely on LIBSVM datasets and they are compared only to SVM/LIBSVM, which limits ability to evaluate novelty level. Therefore, additionally to SVM, it would be good from innovation point of view to include up-to-date methods like : k-NN, logistic/softmax, RF/GBDT, LS-SVM or kernel ridge, and prototype/(affine) subspace methods.

Response 1 : As suggested by the reviewers, and for thorough validation, the proposed method has been compared with other kernel-based methods, namely K-NN (kernel neural network) and KPCA. In addition to the LIBSVM datasets, all methods were also evaluated using datasets from the OpenML database. The results confirm the performance of the proposed method. These changes are reported in the revised manuscript in the Validation section (pp. 13–19).

Comment 2: Additionally, only accuracy (“rate of success”) is used, even for imbalanced sets. It would be appropriate to add other metrices that show imbalance and additionally evaluate the class-splitting strategy with these metrics. .

Response 2:  I agree with the reviewer that accuracy alone is not sufficient to fairly assess performance. In the revised manuscript, macro-precision is also reported. This metric accounts for false positives in each class independently of class size, making it appropriate for evaluating imbalanced datasets. The results show consistent behavior across all tests. These changes are included in the revised manuscript in the Validation section (pp. 13–19).

Comment 3: Table contain inconsistencies. svmguide1: Test 400 in Table 1 vs 4000 in Table 3; madelon: Train 200 in Table 1 vs 2000 in Table 3

Response 3: Corrected in the revised manuscript.

Comment 4: Timing gaps without protocol. e.g., USPS: 81m 18.9s (SVM) vs 6.35s (KOS), but hardware, implementation details, and tuning budgets are not reported..

Response 4: I agree with the reviewer's comment; additional implementation details are needed. The proposed method is implemented in MATLAB, and based on the common observation that C++ can be up to 100 times faster than MATLAB in certain cases, the processing time of the proposed method has been scaled by a factor of 1/50. The same adjustment has been applied to K-NN and KPCA, which are implemented in Python. It is also noted that, even without rescaling, the proposed method remains faster. Further details are presented in the new subsection “CPU Time Discussion.” page 14.

Comment 5 Finally, the implementation code would be helpful to evaluate the method. 

Response 5 : A MATLAB version of the code is uploaded to the system. Since I am not an expert in programming, the code could be further optimized for better computational efficiency.

Comment 6: Please revise document removing typos (e.g., “claculate the distnace”).

Response 6: Typos have been identified and corrected in the revised version.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper proposes a data classification method based on kernel method and optimal subspace (KOS), aiming to address the limitations of support vector machine (SVM) in high-dimensional, multi class, imbalanced data and dynamic classification. The method has theoretical innovation, comprehensive experimental design, and has been fully compared with SVM on multiple datasets, showing comparable or even better performance, especially in terms of computational efficiency and scalability. The paper has a clear structure, complete theoretical derivation, and good academic value and application potential.

1. It is recommended to supplement the comparison with other non SVM kernel methods (such as kernel PCA, kernel Fisher discriminant analysis) or popular deep learning methods in recent years to more comprehensively evaluate the competitiveness of KOS.

2. Although the paper mentions that "theoretically non overlapping subspaces can be constructed", there is no in-depth discussion on how to control the overlap of subspaces in practice. Suggest adding measures of subspace separability (such as subspace angle and overlap analysis) and related experiments.
3. The article mentions the "linear complexity" of KOS, but does not provide a specific formula for time/space complexity. Suggest adding theoretical complexity analysis and comparing it with SVM.
4. The current σ learning strategy relies on a fixed proportion of category diameters (0.1~1.0), lacking theoretical or adaptive basis. Suggest further exploring the adaptive sigma selection mechanism (such as based on intra class distribution density or cross validation).

This paper proposes a promising SVM alternative method with solid theoretical foundation, sufficient experimental verification, and good innovation and practicality. It is suggested that the author accept the above questions (especially the comparison of methods and parameter learning strategies) and make corresponding modifications.

Author Response

Comment 1: It is recommended to supplement the comparison with other non SVM kernel methods (such as kernel PCA, kernel Fisher discriminant analysis) or popular deep learning methods in recent years to more comprehensively evaluate the competitiveness of KOS.

Response 1 : As suggested by the reviewers, and for thorough validation, the proposed method has been compared with other kernel-based methods, namely K-NN (kernel neural network) and KPCA. In addition to the LIBSVM datasets, all methods were also evaluated using datasets from the OpenML database. The results confirm the performance of the proposed method. These changes are reported in the revised manuscript in the validation section (pp. 13–19).

Comment 2: Although the paper mentions that "theoretically non overlapping subspaces can be constructed", there is no in-depth discussion on how to control the overlap of subspaces in practice. Suggest adding measures of subspace separability (such as subspace angle and overlap analysis) and related experiments.

Response 2 : I agree with the reviewer that this aspect is relevant to the successful performance of the method. Unfortunately, it is still not clear how to control subspace separability, in the same way as in SVM or other kernel-based methods. However, as suggested, I computed the principal angles and chordal distances to estimate subspace orthogonality for all tests. The results show a strong correlation between orthogonality and performance, except in two cases, which can be explained by data representativity issues. Details are reported in Section 3.4 (page 14) and on page 21.

Comment 3:  The article mentions the "linear complexity" of KOS, but does not provide a specific formula for time/space complexity. Suggest adding theoretical complexity analysis and comparing it with SVM.

Response 3: As suggested by the reviewer, I added a subsection on complexity analysis, providing the necessary formalism for both KOS and SVM. Details are reported in section 3.6 page 15.

 Comment 4: The current σ learning strategy relies on a fixed proportion of category diameters (0.1~1.0), lacking theoretical or adaptive basis. Suggest further exploring the adaptive sigma selection mechanism (such as based on intra class distribution density or cross validation).

Response 4: I agree with the reviewer. In fact, the choice of this strategy was based on multiscale theory in image processing, but this was not mentioned in the submitted version. Since σ corresponds to a Gaussian opening, setting σ equal to the class diameter allows a global observation of the class and can therefore be considered the maximum value of σ. Small adjustments are then required by decreasing the σ value, which explains the choice of the range from 0.1 to 1.
For all performed tests, the optimal value of σ was approximately 0.4 × (class diameter). Note also that the class diameter can be viewed as a form of intra-class distribution, as suggested by the reviewer. I also tested the intra-class covariance; however, the class diameter provided better results. Details are reported in Section 2.7 (page 10).

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper introduces the KOS method, an alternative to SVM for data classification. KOS eliminates the need for optimization by classifying samples based on minimum distance to class-specific feature subspaces constructed via POD. The method aims to match SVM accuracy while improving computational speed, robustness, and scalability to multiclass and dynamic settings. Here are my questions/comments: 

  • The paper claims linear complexity and no optimization required, yet each class requires eigen-decomposition of a kernel matrix. How does this scale when classes or feature dimensions grow large? What are the actual asymptotic costs in terms of N and class count?

  • The method relies heavily on POD eigenpairs, how is numerical stability ensured for ill-conditioned or high-dimensional kernels?

  • The σ-scaling rule (σ = ωD) is heuristic. Why should dataset diameter correlate with optimal kernel bandwidth? Was this empirically tested across all datasets?

  • The claim of comparable performance to SVM is based solely on accuracy. What about metrics like precision, recall, F1, or robustness under noisy labels and outliers?

  • There’s no statistical significance testing for reported accuracy differences, how confident are the conclusions?

  • KOS is said to “naturally handle” imbalanced data. Does subclass splitting risk overfitting or fragmenting class structure? How are subclass boundaries defined objectively?

  • The assumption that infinite-dimensional kernels guarantee separability is theoretical, how practical is this for finite datasets with limited samples?

  • If two class subspaces overlap, how does the method handle ambiguous points? Is there a probabilistic or confidence-based extension?

  • The comparison omits other modern classifiers (e.g., Random Forests, Deep NNs, GPR). Why benchmark only against SVM?

  • Runtime improvements are large, but were both methods implemented under comparable optimization and hardware settings?

Comments on the Quality of English Language

The English is generally understandable but would benefit from careful editing for clarity, conciseness, and consistency. Some sentences are overly long and contain grammatical or structural issues that obscure meaning. Improving transitions, reducing repetition, and standardizing terminology (e.g., between “attributes,” “features,” and “vectors”) would make the paper more readable and professional.

Author Response

This report presents the author’s responses to the reviewers’ comments on the manuscript submitted to the Machine Learning and Knowledge Extraction journal for revision. The author is very grateful for the reviewers’ constructive feedback and the time they invested, which have significantly contributed to improving the quality of the paper.

Comment 1: The paper claims linear complexity and no optimization required, yet each class requires eigen-decomposition of a kernel matrix. How does this scale when classes or feature dimensions grow large? What are the actual asymptotic costs in terms of N and class count?

Response 1: I agree with the reviewer that this point was not sufficiently clear in the submitted version. The linear scaling refers only to the number of classes. The reviewer is correct, the scaling with respect to class size is not linear it is cubic actually. A detailed subsection on complexity analysis has therefore been added, providing the necessary formalism for both KOS and SVM. Details are reported in Section 3.6 (page 15).

Comment 2: The method relies heavily on POD eigenpairs, how is numerical stability ensured for ill-conditioned or high-dimensional kernels?

Response 2: Yes, this is true. Even though the RBF covariance matrix is symmetric, it can be ill-conditioned, especially for large values of σ. In such cases, numerical stability is ensured through matrix regularization. Note that no stability issues were observed in any of the performed test cases.

Comment 3: The σ-scaling rule (σ = ωD) is heuristic. Why should dataset diameter correlate with optimal kernel bandwidth? Was this empirically tested across all datasets?

Response 3: I agree with the reviewer that this point was not sufficiently clear. In fact, the choice of this strategy was based on multiscale theory in image processing, but this was not mentioned in the submitted version. The bandwidth σ, which corresponds to the Gaussian opening, allows in image processing the observation of the largest objects. Therefore, setting σ equal to the class diameter enables a global observation of the class and can be considered the maximum value of σ. Small adjustments are then required to avoid missing important details or overfitting, which is why I suggested considering σ proportional to the class diameter. Moreover, for the RBF kernel, vectors that are far apart may produce many values close to zero, which can lead to an ill-conditioned matrix. σ proportional to the class diameter helps avoid this issue.

Yes it was tested acroos all data sests and for all performed tests, the optimal value of σ was approximately 0.4 ×D. This is now discussed in the revised manuscript in Section 2.7 (page 10).

Comment 4:  The claim of comparable performance to SVM is based solely on accuracy. What about metrics like precision, recall, F1, or robustness under noisy labels and outliers?

Response 4:  I agree with the reviewer that accuracy alone is not sufficient to fairly assess performance. In the revised manuscript, macro-precision is also reported. This metric accounts for false positives in each class independently of class size, making it appropriate for evaluating imbalanced datasets. The results show consistent behavior across all tests. These changes are included in the revised manuscript in the Validation section (pp. 1820 ).

Comment 5: There’s no statistical significance testing for reported accuracy differences, how confident are the conclusions?

Response 5:  Additional tests were performed on a different database (OpenML) in the new vesrion, and the same results were obtained. The results are similar to those of the SVM method. This is expected, as the underlying concept is the same. KOS differs in its classification approach, as mentioned in the introduction of Section 2.3 (page 5), with the objective of addressing certain weaknesses of SVM rather than improving accuracy.

Comment 6: KOS is said to “naturally handle” imbalanced data. Does subclass splitting risk overfitting or fragmenting class structure? How are subclass boundaries defined objectively?

Response 6: I agree with the reviewer that this is a crucial question. To my knowledge, even the vector space structure of the class data is heuristic and has not been formally proven. For imbalanced classes, the idea is to represent the data as a collection of smaller subspaces rather than a single subspace. While the performed tests on imbalanced classes show that this strategy improves the results, there is still no proof that it will not alter the underlying class structure as mentioned by the reviewer. In this work, the splitting is done randomly; I believe that a more informed strategy based on the data structure should be investigated in future work.

Comment 7: The assumption that infinite-dimensional kernels guarantee separability is theoretical, how practical is this for finite datasets with limited samples?

Response 7: I agree with the reviewer that this aspect is relevant to the successful performance of the method. Unfortunately, it is still not clear how to control subspace separability in the same way as in SVM or other kernel-based methods. This actually does not depend on the size of the dataset but rather on the structure of the samples, as shown in the tests. In the revised manuscript, I computed the principal angles and chordal distances to estimate subspace orthogonality for all tests. The results show a strong correlation between orthogonality and performance, except in two cases, which can be explained by data representativity issues. For instance, the Leukemia dataset has only 38 training samples, yet the subspaces are nearly perfectly orthogonal, as shown in Table 8 (page 21). Details are reported in Section 3.4 (page 14) and on page 21.

Comment 8: If two class subspaces overlap, how does the method handle ambiguous points? Is there a probabilistic or confidence-based extension?

Response 8: In this work, no confidence-based measure was used for ambiguous classification. The reviewer is correct that incorporating, for instance, a Bayesian confidence measure will improve the results. This will be considered in future work.

Comment 9: The comparison omits other modern classifiers (e.g., Random Forests, Deep NNs, GPR). Why benchmark only against SVM?

Response 9: The comparison was originally conducted only against SVM, motivated by the objective of the paper, which is to propose an alternative to SVM that performs similarly while addressing some of the issues SVM suffers from. As suggested by the reviewers, and for more thorough validation, the proposed method has now been compared with other kernel-based methods, namely K-NN (kernel nearest neighbors) and KPCA. In addition to the LIBSVM datasets, all methods were also evaluated using datasets from the OpenML database.These changes are reported in the revised manuscript in the validation section (pp. 13–21).

Comment 10: Runtime improvements are large, but were both methods implemented under comparable optimization and hardware settings?

Response 10: The reviewer is right, this is necessary for a fair comparison.

 All experiments were conducted on an HP Victus 15L Gaming Desktop (TG02-0xxx) running Ubuntu 24.04.3 LTS, equipped with an AMD Ryzen 7 5700G processor (16 threads) and 32 GB of RAM. Only one CPU core was used

The proposed method is implemented in MATLAB, and the LIBSVM softare is a C++ code. Based on the common observation that C++ can be up to 100 times faster than MATLAB in certain cases, the processing time of the proposed method has been scaled by a factor of 1/50. The same adjustment has been applied to K-NN and KPCA, which are implemented in Python. It is also noted that, even without rescaling, the proposed method remains faster. I believe that the KOS code could be further optimized, as I am not an expert programmer. A MATLAB version of the code has been uploaded to the system. Details are reported in the new subsection 3.5. “CPU Time Discussion.” page 15.

Comment 11: The English is generally understandable but would benefit from careful editing for clarity, conciseness, and consistency. Some sentences are overly long and contain grammatical or structural issues that obscure meaning. Improving transitions, reducing repetition, and standardizing terminology (e.g., between “attributes,” “features,” and “vectors”) would make the paper more readable and professional.

Response 11: I would like to thank the reviewer for the suggestion and the detailed review. I have revised the entire manuscript and improved the text to the best of my ability.

Author Response File: Author Response.pdf

Back to TopTop