Next Article in Journal
A Dynamic Information-Theoretic Network Model for Systemic Risk Assessment with an Application to China’s Maritime Sector
Previous Article in Journal
Free Vibration Analysis of Porous FGM Plates on Elastic Foundations with Temperature-Dependent Material Properties
Previous Article in Special Issue
Elite Episode Replay Memory for Polyphonic Piano Fingering Estimation
 
 
Article
Peer-Review Record

SOMTreeNet: A Hybrid Topological Neural Model Combining Self-Organizing Maps and BIRCH for Structured Learning

Mathematics 2025, 13(18), 2958; https://doi.org/10.3390/math13182958
by Yunus DoÄŸan
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Mathematics 2025, 13(18), 2958; https://doi.org/10.3390/math13182958
Submission received: 8 August 2025 / Revised: 4 September 2025 / Accepted: 10 September 2025 / Published: 12 September 2025
(This article belongs to the Special Issue New Advances in Data Analytics and Mining)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This article is devoted to a new hybrid artificial intelligence model called SOMTreeNet. The key feature of this model is the combination of Kohonen’s self-organising maps with hierarchical clustering to create an interpretable, scalable, and universal architecture. The model proposed by the author is tested on five different types of tasks using a variety of public datasets. Comprehensive validation on classification, regression, clustering, time series analysis, and image tasks is an unconditional advantage of the work and demonstrates the versatility of the proposed approach. However, further refinement is required for publication. A list of key questions and comments is provided below.

1. Line 120: In section ‘2. Related Works’, references to literature sources must be formatted correctly: there must be a space between the text and the reference.

2. Line 227: In equation (6), the weight update uses the neighbourhood function hk⋆k(t). Table 4 indicates that Gaussian is used, but the text of the article does not provide its explicit form. This needs to be added.

3. For all equations: an equation is part of a sentence, so it is necessary to place punctuation marks correctly before and after equations.

4. Line 235: Equation (9) for regression uses a weighted average with inverse distance. However, potential problems in degenerate cases, when an example falls into an area with very few points or coincides with one of them, are not discussed. It is necessary to specify how ε is selected.

5. Line 241: Are the node division (KMeans++) and statistics update operations, which may dominate at deep levels, taken into account when assessing the computational complexity of the algorithm (equation (11))? It is necessary to specify in detail the contribution of these operations to the computational efficiency of the presented algorithm.

6. Line 298: In the pseudocode for Algorithm 2, how is the mechanism for handling cases where a neuron already has a child node taken into account?

7. Lines 360 and 393: When calculating dk, approximate values (rounded) are given, so instead of the symbol =, the symbol ≈ should be used.

8. Table 3 shows 70 692 instances and 21 attributes for the dataset. For a model with a maximum depth of 6 and a threshold of θ = 50 (Table 4), the tree can become very large. No scalability analysis has been performed on large datasets, and the final number of nodes or training time for the specified computing device configuration (AMD Ryzen 9, NVIDIA RTX 4090) is not specified.

9. For the image classification task, an architecture that accepts raw pixels as input is used. However, SOMs generally do not perform well with high-dimensional raw data. It is necessary to specify whether feature extraction was performed beforehand or how exactly the flat pixel vectors were fed into the SOMTreeNet input.

10. The article states that SOMTreeNet outperformed all models, including CNN. However, the architecture and training data for CNN are not specified. Comparisons with deep learning methods are not valid if their best configurations for a given dataset are not used, but rather arbitrary results from other articles are taken. This point needs to be clarified.

11. Figure 6: The box-plot diagrams of clustering metrics do not include comparisons with other algorithms. Only the absolute values of SOMTreeNet are presented, which does not allow for a visual assessment of its relative superiority or inferiority across various metrics.

Comments for author File: Comments.pdf

Comments on the Quality of English Language

The text needs to be edited for correct sentence construction.

Author Response

I would like to sincerely thank the reviewer for the time, effort, and valuable feedback provided during the evaluation of my manuscript. The comments and suggestions have been very helpful in improving both the technical content and the clarity of the presentation. In accordance with the reviewer’s recommendations, the entire manuscript has been carefully re-examined for English language consistency, and the necessary revisions have been made to enhance readability. Furthermore, the sections that were newly added or substantially revised in direct response to the reviewer’s comments are highlighted in yellow within the revised version of the manuscript. These additions were made to specifically address the points raised and to ensure that the contributions of the reviewer are clearly reflected in the improved quality of the paper.

 

Comment 1:
Line 120: In section ‘2. Related Works’, references to literature sources must be formatted correctly: there must be a space between the text and the reference.

Response 1:
Thank you for pointing this out. I agree with this comment. Therefore, I have revised the formatting of all in-text citations in the section “2. Related Works”. Specifically, a space has been inserted between the text and the reference indices for citations [21] to [56].
This correction can be found in the revised manuscript, Section 2. Related Works, page 3–4, lines 121–157.

 

 

 

Comments 2:

Line 227: In equation (6), the weight update uses the neighbourhood function hk⋆k(t). Table 4 indicates that Gaussian is used, but the text of the article does not provide its explicit form. This needs to be added.

 

Response 2:

I thank the reviewer for noticing the missing definition of the neighborhood function in Equation (6). I have now added the explicit form of the Gaussian function to the text, which is defined in Equation 6. (Page 6, Line 232)

 

 

 

Comments 3:

For all equations: an equation is part of a sentence, so it is necessary to place punctuation marks correctly before and after equations.

 

Response 3:

I appreciate the reviewer’s attention to detail in pointing out the need for proper punctuation around equations. I have carefully revised all equations in the manuscript. Appropriate punctuation marks (comma or period) have been placed before and after equations to ensure that they are integrated as parts of sentences.

 

 

 

Comments 4:

Line 235: Equation (9) for regression uses a weighted average with inverse distance. However, potential problems in degenerate cases, when an example falls into an area with very few points or coincides with one of them, are not discussed. It is necessary to specify how ε is selected.

 

Response 4:

I thank the reviewer for raising this important issue concerning Equation (9). An explicit explanation regarding the choice of the constant $\varepsilon$ has been added. It is clarified that $\varepsilon$ is used to ensure numerical stability and to handle cases where the query point coincides with an existing sample.

(Page 7, Line 243)

 

Comments 5:

Line 241: Are the node division (KMeans++) and statistics update operations, which may dominate at deep levels, taken into account when assessing the computational complexity of the algorithm (equation (11))? It is necessary to specify in detail the contribution of these operations to the computational efficiency of the presented algorithm.

 

Response 5:

I am grateful for the reviewer’s insightful comment on the complexity analysis. The description of computational complexity in Equation (11) has been expanded. The additional costs from KMeans++ initialization during node splitting and from clustering statistics updates are now explicitly discussed.

(Page 7, Line 260)

 

 

 

Comments 6:

Line 298: In the pseudocode for Algorithm 2, how is the mechanism for handling cases where a neuron already has a child node taken into account?

 

Response 6:

I thank the reviewer for highlighting this ambiguity in Algorithm 2. A clarification has been included: when a neuron already has a child node, the sample is routed directly to the child. A split is triggered only if no child exists and the threshold is exceeded.

(Page 9, Line 321)

 

 

 

Comments 7:

Lines 360 and 393: When calculating dk, approximate values (rounded) are given, so instead of the symbol =, the symbol ≈ should be used.

 

Response 7:

I value the reviewer’s careful observation regarding the notation of approximate values. The illustrative examples have been corrected, and approximate values of $d_k$ are now denoted with the symbol $\approx$ instead of $=$.

(Pages 11 and 13, Lines 385 and 423)

 

 

 

Comments 8:

Table 3 shows 70 692 instances and 21 attributes for the dataset. For a model with a maximum depth of 6 and a threshold of θ = 50 (Table 4), the tree can become very large. No scalability analysis has been performed on large datasets, and the final number of nodes or training time for the specified computing device configuration (AMD Ryzen 9, NVIDIA RTX 4090) is not specified.

 

Response 8:

We are thankful to the reviewer for pointing out the importance of reporting scalability. A new paragraph has been added discussing the scalability of the model on large datasets. We report the final number of nodes and the average training time on the given hardware configuration (AMD Ryzen 9, NVIDIA RTX 4090).

(Page 15, Line 488)

 

Comments 9:

For the image classification task, an architecture that accepts raw pixels as input is used. However, SOMs generally do not perform well with high-dimensional raw data. It is necessary to specify whether feature extraction was performed beforehand or how exactly the flat pixel vectors were fed into the SOMTreeNet input.

 

Response 9:

I am thankful to the reviewer for pointing out the importance of reporting scalability. A new paragraph has been added discussing the scalability of the model on large datasets. I report the final number of nodes and the average training time on the given hardware configuration (AMD Ryzen 9, NVIDIA RTX 4090).

(Page 19, Line 588)

 

 

 

Comments 10:

The article states that SOMTreeNet outperformed all models, including CNN. However, the architecture and training data for CNN are not specified. Comparisons with deep learning methods are not valid if their best configurations for a given dataset are not used, but rather arbitrary results from other articles are taken. This point needs to be clarified.

 

Response 10:

We sincerely thank the reviewer for stressing the importance of a fair comparison. The CNN baseline architecture, training settings, and evaluation procedure used in the comparison have been described in detail. We emphasize that results are based on our own implementation, ensuring a fair comparison.

(Page 20, Line 598)

 

 

 

Comments 11:

Figure 6: The box-plot diagrams of clustering metrics do not include comparisons with other algorithms. Only the absolute values of SOMTreeNet are presented, which does not allow for a visual assessment of its relative superiority or inferiority across various metrics.

 

Response 11:

I am grateful for the reviewer’s constructive feedback. In the revised manuscript, two new figures have been added (Figure 6 and Figure 7), showing model comparisons for the m-Health and UCI HAR datasets, respectively. The comparisons are presented using F1 scores of SOMTreeNet and baseline models. This revision provides a clearer visual evaluation of the relative performance of the proposed method.

(Page 22 and 23, Lines 657 and 670)

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper introduces SOMTreeNet, a hybrid model combining Self-Organizing Maps with BIRCH-inspired clustering in a recursive modular topology. It supports both supervised and unsupervised learning, covering tasks such as classification, regression, clustering, anomaly detection, and time-series analysis. Experiments on diverse datasets show competitive or superior performance compared to conventional methods, while maintaining interpretability through its hierarchical, biologically inspired design. The work highlights SOMTreeNet as a scalable and transparent alternative to black-box deep models. In my view, the manuscript can be accepted after the following questions are clearly addressed.

1.Several core equations (e.g., Eq. (3) for BMU selection, Eq. (4) for CF statistics, Eq. (6) for weight updates, and Eq. (9) for regression prediction) are presented without sufficient intuitive explanation. It would strengthen the paper if the authors briefly elaborated on the role and significance of each formula within the overall framework.

2.The definition Eq. (4) lacks an intuitive discussion of why both linear and squared sums are required and how this connects to BIRCH. A short clarification of the statistical meaning and advantages of CF tuples would improve accessibility for readers.

3. The hyperparameter configurations (e.g., maximum tree depth reported in Tables 4, 12, and 15) are not sufficiently justified. The authors are encouraged to explain whether these values were chosen empirically, through grid search, or based on prior literature.

4. In the classification experiments (Tables 5–10), some baseline results are attributed to [69], while others lack explicit citations. For consistency and reproducibility, all comparative results should be clearly referenced, ideally following a uniform approach across tables.

5. Figures 2, 3, 5, 6, and others provide valuable illustrations but lack sufficiently detailed captions. Adding explicit explanations of nodes, curves, or performance metrics would make the figures more self-contained and easier to interpret.

Comments on the Quality of English Language

The manuscript is written in clear, fluent, and well-structured English. The authors demonstrate a solid command of academic language, with precise terminology and coherent arguments throughout the paper. The writing style is concise yet sufficiently detailed, making the paper accessible to an international scientific audience.

Author Response

I would like to express my sincere gratitude to the reviewer for the time, effort, and expertise devoted to evaluating my manuscript. The constructive comments and thoughtful suggestions have been highly valuable in improving both the clarity and quality of the work. In accordance with the reviewer’s recommendations, the entire manuscript has been carefully re-examined for English language consistency, and the necessary revisions have been made to enhance readability. In response, I carefully revisited the entire manuscript, with particular attention to the equations, parameter settings, references, as well as the figures and tables. All figure captions and table descriptions have been revised to provide more complete and self-contained explanations. The revisions are clearly marked in the manuscript: the passages highlighted in pink correspond to new or expanded sections that were added directly in line with your insightful feedback. I believe these revisions have substantially strengthened the manuscript and I am truly grateful for your contribution to this process.

Comment 1:
Several core equations (e.g., Eq. (3) for BMU selection, Eq. (4) for CF statistics, Eq. (6) for weight updates, and Eq. (9) for regression prediction) are presented without sufficient intuitive explanation. It would strengthen the paper if the authors briefly elaborated on the role and significance of each formula within the overall framework.

Response 1:
I sincerely thank the reviewer for pointing out the need for clearer explanations of the key formulas. To address this, I have expanded the descriptions of Eq. (3), Eq. (4), Eq. (6), and Eq. (9). Each equation is now accompanied by a short explanation of its role in the method, for example, Eq. (3) identifies the closest unit, Eq. (4) summarizes cluster statistics, Eq. (6) specifies the weight update rule, and Eq. (9) produces regression outputs. These additions clarify how the formulas work together in the learning process.

(Page 7, Line 248)

 

Comments 2:

The definition Eq. (4) lacks an intuitive discussion of why both linear and squared sums are required and how this connects to BIRCH. A short clarification of the statistical meaning and advantages of CF tuples would improve accessibility for readers.

 

Response 2:

I am grateful to the reviewer for highlighting the need for a more intuitive explanation of Eq. (4), I now describe why both linear and squared sums are included: the simple sums capture the position of the data points, while the squared sums describe their spread. I also explain how this design follows the principle used in BIRCH and why it is advantageous for representing clusters. This added discussion makes the statistical meaning and benefits of CF tuples clearer to readers.

(Page 6, Line 223)

 

                                                                        

Comments 3:

The hyperparameter configurations (e.g., maximum tree depth reported in Tables 4, 12, and 15) are not sufficiently justified. The authors are encouraged to explain whether these values were chosen empirically, through grid search, or based on prior literature.

 

Response 3:

I thank the reviewer for raising the important point regarding justification of parameter choices. In response, I have provided explicit reasoning for the maximum tree depths reported in Tables 4, 12, and 15. For Table 4, the depth was chosen based on validation experiments to balance accuracy and efficiency. For Table 12, the setting was guided by prior studies and confirmed with small-scale tests. For Table 15, I relied on grid search to determine the most reliable value. These clarifications make clear how the reported parameters were determined.

(Pages 15, 19 and 21, Lines 484, 585 and 648)

 

 

 

Comments 4:

In the classification experiments (Tables 5–10), some baseline results are attributed to [69], while others lack explicit citations. For consistency and reproducibility, all comparative results should be clearly referenced, ideally following a uniform approach across tables.

 

Response 4:

I appreciate the reviewer’s suggestion to ensure consistency and reproducibility in reporting comparative results. Accordingly, I have revised to provide explicit citations for all baseline results in Tables 5–10. A uniform referencing style is now used across the tables, so that the source of each comparative result is transparent. This ensures that readers can easily verify and reproduce the reported findings.

(Pages 16, 16, 17, 17, 17 and 18, Lines 510, 518, 524, 531, 539 and 545)

 

 

 

Comments 5:

Figures 2, 3, 5, 6, and others provide valuable illustrations but lack sufficiently detailed captions. Adding explicit explanations of nodes, curves, or performance metrics would make the figures more self-contained and easier to interpret.

Response 5:

I thank the reviewer for noting that the figure captions could be more informative. Following this helpful suggestion, I have expanded the captions of Figures 2–8. Each caption now includes clear descriptions of nodes, curves, and performance metrics, enabling readers to understand the visualizations without needing to refer back to the main text. These improvements make the figures more self-contained and reader-friendly.

(Pages 12, 13, 20, 23, 26 and 29 Lines 403, 438, 611, 679, 739 and 817)

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper introduces a novel hybrid neural network model, SOMTreeNet, which innovatively combines the topology-preserving capabilities of Self-Organizing Maps (SOM) with the hierarchical and scalable clustering features of the BIRCH algorithm to form a recursive tree structure. This is a highly novel idea, where SOM units serve as tree nodes, and BIRCH-like Clustering Feature (CF) statistics drive the recursive growth of the tree. This approach ingeniously addresses the limitations of conventional SOM models, which suffer from a fixed structure and struggle to handle large-scale and streaming data.

However, the discussion and validation in this paper suffer from several notable shortcomings:

The validity of the image classification claim is highly questionable. The paper asserts that SOMTreeNet outperforms Convolutional Neural Networks (CNNs) on image classification tasks. However, the described method appears to flatten images into one-dimensional vectors, which completely destroys the intrinsic two-dimensional spatial structure of the data. The core strength of CNNs lies precisely in their ability to effectively extract local spatial features. Consequently, a model that disregards spatial topological information surpassing a CNN is fundamentally at odds with established computer vision principles, and its validity and rationale are highly questionable. The authors must clarify their experimental details and provide a robust theoretical explanation for this anomalous result.

The "one-size-fits-all" approach to diverse data modalities lacks a deep theoretical foundation. The paper claims strong performance across diverse data modalities, including tabular, image, and time-series data, but the approach appears to be a uniform method: converting all data into feature vectors and feeding them into the same algorithmic framework. This ignores the distinct intrinsic structures of different data types. For instance, time-series data has temporal dependencies and image data has spatial locality. SOMTreeNet seems to disregard these domain-specific inductive biases. The authors must provide a more in-depth discussion on why SOMTreeNet's recursive topological partitioning mechanism can adaptively learn the intrinsic structures of different data modalities. Merely demonstrating "good results" is insufficient; an explanation is required at the level of the model's mechanisms. For example, could it be argued that the model is essentially learning a hierarchical metric space that matches the data's intrinsic manifold?

The lack of sensitivity analysis for key hyperparameters is a major flaw. The performance and final tree structure of SOMTreeNet are highly dependent on several core hyperparameters, particularly the neuron capacity threshold and maximum tree depth. However, the paper only provides "empirical values" used in the experiments, without any systematic sensitivity analysis or ablation studies. This makes the model appear to be a "black box" that yields good results only after careful hyperparameter tuning. Readers are left without knowing whether these parameters are universally applicable across different datasets, how sensitive the model's performance is to parameter variations, whether a minor adjustment could lead to a sharp performance drop, or how the parameters collectively influence model complexity and generalization (e.g., overfitting/underfitting).

Author Response

Reviewer3

I sincerely thank you for your valuable insights, constructive comments, and thoughtful suggestions throughout the review process. Your feedbacks have greatly contributed to clarifying key aspects of the manuscript and strengthening the overall presentation of the work. In response to your recommendations, I have carefully revisited the entire manuscript, thoroughly reviewing all figures and tables to ensure clarity, accuracy, and completeness. All necessary revisions have been incorporated to enhance readability and support the discussion of the results. In particular, the sections and passages marked in green highlight the additions and clarifications made directly in response to your contributions, reflecting the significant impact of your guidance on improving the manuscript. I am truly grateful for your time and effort in helping refine this work, and I hope that the revisions address your concerns satisfactorily.

 

Comment 1:
The validity of the image classification claim is highly questionable. The paper asserts that SOMTreeNet outperforms Convolutional Neural Networks (CNNs) on image classification tasks. However, the described method appears to flatten images into one-dimensional vectors, which completely destroys the intrinsic two-dimensional spatial structure of the data. The core strength of CNNs lies precisely in their ability to effectively extract local spatial features. Consequently, a model that disregards spatial topological information surpassing a CNN is fundamentally at odds with established computer vision principles, and its validity and rationale are highly questionable. The authors must clarify their experimental details and provide a robust theoretical explanation for this anomalous result.

Response 1:
I sincerely appreciate your insightful comments regarding the image classification experiments. Your feedback has helped me clarify the role of SOMTreeNet’s hyperparameters in achieving competitive performance. As indicated under the Results section, I have now explicitly described that the reported superiority of SOMTreeNet over CNNs was obtained by carefully optimizing key hyperparameters, particularly the neuron capacity and maximum tree depth. This ensures that the recursive tree structure effectively preserves local neighborhood relationships even when images are flattened into feature vectors. I am grateful for your guidance, which prompted me to make this clarification and provide a more rigorous explanation of the experimental setup.

(Page 22, Line 658)

 

 

 

Comments 2:

The "one-size-fits-all" approach to diverse data modalities lacks a deep theoretical foundation. The paper claims strong performance across diverse data modalities, including tabular, image, and time-series data, but the approach appears to be a uniform method: converting all data into feature vectors and feeding them into the same algorithmic framework. This ignores the distinct intrinsic structures of different data types. For instance, time-series data has temporal dependencies and image data has spatial locality. SOMTreeNet seems to disregard these domain-specific inductive biases. The authors must provide a more in-depth discussion on why SOMTreeNet's recursive topological partitioning mechanism can adaptively learn the intrinsic structures of different data modalities. Merely demonstrating "good results" is insufficient; an explanation is required at the level of the model's mechanisms. For example, could it be argued that the model is essentially learning a hierarchical metric space that matches the data's intrinsic manifold?

 

Response 2:

I greatly value your constructive critique on the generalization of SOMTreeNet across diverse data modalities. Following your suggestion, I have added detailed discussion under Conclusions and Future Works, explaining how the recursive topological partitioning mechanism can adaptively capture intrinsic structures of tabular, image, and time-series data. I also elaborated on the potential of the model to implicitly learn hierarchical metric spaces that respect temporal and spatial dependencies specific to each modality. Your comments significantly improved the clarity and theoretical grounding of this section, for which I am truly thankful.

(Page 31, Line 897)

 

 

                                                                        

Comments 3:

The lack of sensitivity analysis for key hyperparameters is a major flaw. The performance and final tree structure of SOMTreeNet are highly dependent on several core hyperparameters, particularly the neuron capacity threshold and maximum tree depth. However, the paper only provides "empirical values" used in the experiments, without any systematic sensitivity analysis or ablation studies. This makes the model appear to be a "black box" that yields good results only after careful hyperparameter tuning. Readers are left without knowing whether these parameters are universally applicable across different datasets, how sensitive the model's performance is to parameter variations, whether a minor adjustment could lead to a sharp performance drop, or how the parameters collectively influence model complexity and generalization (e.g., overfitting/underfitting).

 

Response 3:

I sincerely thank you for highlighting the importance of hyperparameter sensitivity analysis. In response, I have expanded the discussion under Conclusions and Future Works, emphasizing how neuron capacity and maximum tree depth influence SOMTreeNet’s performance, tree complexity, and generalization. I also indicated that future work will include systematic ablation studies and automated hyperparameter optimization to ensure reproducibility and provide clear guidelines for parameter selection across datasets. Your insightful comments have guided me to strengthen the transparency and practical applicability of the model, and I greatly appreciate your support in improving this aspect of the manuscript.

(Page 31, Line 905)

 

 

 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I have carefully read the changes that have been made. I would like to note that all my comments have been taken into account. The explanations and additions provided clarify my questions. The edits made to the text have improved the quality of the article. The manuscript can be published in its current form.

Reviewer 3 Report

Comments and Suggestions for Authors

Thank you for your careful revision of the review comments. Your additional discussion on the model theory has well answered my previous questions.

Back to TopTop