Next Article in Journal
Towards a Refined Heuristic Evaluation: Incorporating Hierarchical Analysis for Weighted Usability Assessment
Next Article in Special Issue
Ontology Merging Using the Weak Unification of Concepts
Previous Article in Journal
Research on Multimodal Transport of Electronic Documents Based on Blockchain
Previous Article in Special Issue
Harnessing Graph Neural Networks to Predict International Trade Flows
 
 
Article
Peer-Review Record

Application of Natural Language Processing and Genetic Algorithm to Fine-Tune Hyperparameters of Classifiers for Economic Activities Analysis

Big Data Cogn. Comput. 2024, 8(6), 68; https://doi.org/10.3390/bdcc8060068
by Ivan Malashin *, Igor Masich, Vadim Tynchenko *, Vladimir Nelyub, Aleksei Borodulin and Andrei Gantimurov
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Big Data Cogn. Comput. 2024, 8(6), 68; https://doi.org/10.3390/bdcc8060068
Submission received: 28 April 2024 / Revised: 1 June 2024 / Accepted: 11 June 2024 / Published: 13 June 2024
(This article belongs to the Special Issue Recent Advances in Big Data-Driven Prescriptive Analytics)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The authors address the problem of economic classification systems according to NACE codes.

The proposed approach investigates the reliability of such codes for economic analysis and decision-making and compares different classification approaches.

The authors investigate different methodologies and classifiers. However, considering the number of investigations carried out, the overall paper is quite difficult to read. The presentation of the proposed approach could benefit from an overall description to better connect all the parts that compose it, now spread within the different sections of Chapter 2. For instance, the inclusion of bullet points with model parameters might be reformulated in order to focus the attention of the reader on the results achieved by the different models. In Section 3, three different justifications for the presence of a wrong code are presented. Although the names of the identified types are informative, the defined types are quite similar. To improve clarity, I suggest adding a brief explanation and including examples. 

A final concern refers to the analysis carried out in section 3.2 that focuses on the Random Forest Classifier, although, from the consideration drawn in section 2, the Multilayer Perceptron has been selected as the best model among the analyzed. 

Comments on the Quality of English Language

Some typos were found in the manuscript, which could benefit from grammar and syntactic revisions. Examples include "actiBitiesat line 5 and a double comma at line 387. 

Author Response

We appreciate your detailed review and the constructive feedback you've provided. We've implemented the suggested revisions and have included a response document detailing the changes made. Thank you for your valuable input.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Utilizing proofreading services or engaging with editorial language services is strongly recommended, as the manuscript contains typographical errors even in critical sections such as the abstract (e.g., 'actibities') and the conclusions (e.g., 'approaches„'). It is advisable to commence the abstract with a clear statement of what the study proposes, develops, or invents, rather than what it investigates. For instance, mentioning a specific innovation such as an enhanced classification algorithm like NACE-I or ENACE would clarify the contributions of the research. The abstract should succinctly convey the core propositions, improvements, and solutions introduced by the study, and could be expanded, if permissible by the publisher, to distinctly articulate the findings and the research questions addressed. This expansion would reduce the need for subsequent investigation by other researchers.

Attention should be given to the accessibility of visual content; authors must consider color-blind readers when choosing color-coding for graphical representations. Presenting hyperparameters and settings of the artificial neural network in a tabular format would enhance clarity and facilitate a quicker understanding for readers.

The methodology employed appears appropriate, and the approach is notably innovative. It would be beneficial to include additional graphical representations, such as those depicting training history, to provide a comprehensive view of the research process.

In the conclusions, rather than stating intentions such as 'we aim to increase', it would be more effective to assert the actual achievements of the research, e.g., 'the researchers increased the actual accuracy by 20%'. Consistent with best practices in international publishing, the use of first-person pronouns ('we', 'us') should be replaced with third-person constructs ('the researchers'), maintaining an objective and formal tone throughout the document.

Overall, while the study is innovative and presents significant scientific findings, the quality of the manuscript could be enhanced through careful attention to the structure, presentation, and language used in the paper.

Comments on the Quality of English Language

English requires minor editing

Author Response

We appreciate your comprehensive review and the constructive feedback you provided. We've taken your suggestions into account and made the necessary revisions. Please find attached a response document detailing the changes we've implemented. Thank you again for your valuable input.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Paper Summary: The study aims to enhance the accuracy of classifying economic activities by matching Nomenclature of Economic Activities (NACE) codes using machine learning techniques combined with expert evaluations. The authors utilized a dataset with 20 million records involving economic activities, which include descriptions, prices, and NACE codes. They also employed Apache Spark for distributed data processing and vectorized text using TF-IDF methods. These data collection and pre-processing methods seem to be solid and convincing. Also, genetic algorithm was used to optimize the parameters of various classifiers including Naive Bayes, Decision Tree, Random Forest, and multilayer perceptron. Finally, different machine learning models were evaluated, with multilayer perceptron showing the best performance with an accuracy of 71% and F1-score of 0.73.

Strength: The methods used for financial data analysis in this paper is introduced with enough details. The paper is written in a clear structure and easy to read. The experimental results are described with enough details.

Weakness: There is no comparison between the results from the proposed model and those from other baseline models. The authors should find other models in recently published works as a comparison.

Conclusion: In general, this is a boarder-line paper: There is no comparison on the results from this model and other published models. The authors should find other models from recently published papers (LLM, BERT-based models, deep LSTM models, etc) as baselines to compare with the proposed model. As a result, I have to recommend reconsider after major revision.

Comments on the Quality of English Language

Not applicable.

Author Response

Thank you for your thorough review and constructive feedback. We have addressed your suggestions and attached a response document outlining the revisions made accordingly. 

 

Author Response File: Author Response.pdf

Back to TopTop