Next Article in Journal
The Distribution and Accessibility of Elements of Tourism in Historic and Cultural Cities
Previous Article in Journal
Temporal Dynamics of Citizen-Reported Urban Challenges: A Comprehensive Time Series Analysis
 
 
Article
Peer-Review Record

Enhancing Supervised Model Performance in Credit Risk Classification Using Sampling Strategies and Feature Ranking

Big Data Cogn. Comput. 2024, 8(3), 28; https://doi.org/10.3390/bdcc8030028
by Niwan Wattanakitrungroj 1,*, Pimchanok Wijitkajee 1, Saichon Jaiyen 1, Sunisa Sathapornvajana 1 and Sasiporn Tongman 2,*
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Big Data Cogn. Comput. 2024, 8(3), 28; https://doi.org/10.3390/bdcc8030028
Submission received: 23 January 2024 / Revised: 18 February 2024 / Accepted: 1 March 2024 / Published: 6 March 2024
(This article belongs to the Topic Big Data and Artificial Intelligence, 2nd Volume)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1) In the Introduction section the research gap and research questions should be clearly presented. Moreover, the novelty of the paper should be clearly stated.

2) The literature review should be improved. There are many new works in the literature that use machine learning methods and techniques used by the authors of the manuscript, e.g.:

- Credit Decision Support Based on Real Set of Cash Loans Using Integrated Machine Learning Algorithms.

- Accuracy Comparison between Five Machine Learning Algorithms for Financial Risk Evaluation.

3) Why did the authors divide the data set in 70:30 proportions and not, for example, "n-fold cross-validation" or "leave n-out cross-validation"? I do not think that it is appropriate to conclude such comparison analysis with only one set of training and testing samples. This choice should be explained e.g. in section 3.2.

4) Why did the authors use the knowledge obtained using the "mutual information" measure to select features with a step of 25 features? Why haven't other feature selection methods been used that can precisely identify useful and unnecessary features, e.g. "symmetrical uncertainty" or "correlation-based feature selection". These methods could allow for a larger reduction of the set of features than just 25 features.

5) Conclusions should be expanded with the theoretical and practical contribution of the article and potential practical applications of research results. Moreover, the conclusions should indicate the research limitations and more broadly indicate the limitations of the established method.

Comments on the Quality of English Language

The article is methodologically correct, however contains minor editorial and linguistic shortcomings.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The article provides an insightful exploration into the utilization of advanced data analytics in improving financial decision-making processes. It emphasizes the significant role that machine learning algorithms play in analyzing vast amounts of data, leading to more accurate predictions and risk assessments in finance.

 I found this study was very interesting. Yet, I have some concerns about the metrics used to compare classification results in the context of an unbalanced dataset. Please see my recommendation.

 

After reviewing the results presented in Table 3, which compares the performance of three different machine learning techniques using various sampling approaches, I suggest adding a column for the Matthews Correlation Coefficient (MCC) values. This additional metric would provide a more balanced evaluation of model performance, especially in the context of imbalanced datasets. This measure is particularly useful for comparing models across different sampling techniques as it provides insight into both the predictive power and reliability of the models in a way that accuracy, precision, recall, and F1 score may not fully capture. Including MCC in Table 3 could significantly highlight the differences in classification results. This inclusion would also align with best practices for evaluating classification models, offering a more nuanced view of model performance that could benefit researchers and practitioners alike.

Comments on the Quality of English Language

1) Here are some minor language fixes:

Line 12: "Moereover" should be corrected to "Moreover".

Line 14: "the the" is repeated; it should be corrected to just "the".

Line 23: "one crucial things" should be corrected to "one crucial thing" for subject-verb agreement.

Line 36: "Based-on" does not need a hyphen; it should be "Based on".

2) I leave it to the authors' consideration: "over sampling," "under sampling," and "combine sampling" can be written as "over-sampling," "under-sampling," and "combined-sampling”. Most academic, technical, and educational resources prefer the concatenated form (e.g. "oversampling") or the hyphenated form (e.g. "over-sampling") when discussing the technique of sampling a signal or data set.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors
  1. The manuscript lacks an explanation of how the author addresses the issue of imbalanced data.

  2. Listing all 25 features would be beneficial for readers.

  3. On line 103, it is unclear which parameters are used for creating the tree, such as the size of the tree, and other parameters are not provided. Please elaborate.

  4. Provide justification for how overfitting issues were mitigated in the random forest, especially considering the author is working with imbalanced datasets.

  5. The author failed to compare the proposed findings with the existing state-of-the-art literature.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The paper has been improved and can be published.

Comments on the Quality of English Language

Minor editing is required.

Reviewer 3 Report

Comments and Suggestions for Authors

Now the paper is suitable for publication.

Back to TopTop