Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Fine-Tuning Retrieval-Augmented Generation with an Auto-Regressive Language Model for Sentiment Analysis in Financial Reviews

Appl. Sci. 2024, 14(23), 10782; https://doi.org/10.3390/app142310782

by Miehleketo Mathebula^*

, Abiodun Modupe

and Vukosi Marivate

Reviewer 1:

Anastasios Bikos

Reviewer 2: Anonymous

Appl. Sci. 2024, 14(23), 10782; https://doi.org/10.3390/app142310782

Submission received: 15 October 2024 / Revised: 13 November 2024 / Accepted: 14 November 2024 / Published: 21 November 2024

(This article belongs to the Special Issue Applications of Data Science and Artificial Intelligence)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Sentiment analysis is crucial in the financial sector, particularly for monitoring customer feedback and media headlines. Traditional supervised learning algorithms struggle with content brevity and idiomatic expressions. A novel structure called Language feature extraction and adaptation for reviews (LFEAR) addresses these limitations by combining retrieval-augmented generation and conversation format, in this paperwork. The model achieves an average precision score of 98.45%, answer correctness of 93.85%, and context precision of 97.69%, as per the Author(s) claims.

Review comments:

(1) The current paperwork lacks algorithms, mathematical analysis, and rigorous description of the depicted architecture schematic.

(2) I do not see any beyond-SotA contribution from this work.

(3) The Author(s) never state a very fundamental problem in LLMs inputs with natural language processing user requests: namely the "intent conflict detection and resolution". Specifically, when a user prompts a query to their Machine Knowledge Database it can fall into (self)conflicting rules, thus will be hard to admit-grant.

(4) The conclusions analysis and results are stateful, or ad-hoc and cannot be seen to scale.

(5) No added value coming from use cases, future work discussion, and elaboration of mitigations or gaps coming from this system is ever projected in the last sections.

Author Response

Reviewer #1:

Comment 1:

The current paperwork lacks algorithms, mathematical analysis, and rigorous description of the depicted architecture schematic.

Response

Thank you for your valuable feedback. We have addressed this by adding algorithms, mathematical equations, and a more detailed description of the architecture schematic to strengthen the scientific analysis in the paper. Please see the changes in lines [519-542], 596, 659, 697, [826-829], [867-871], [878-889], and [1043-1047]. Hopefully you will find it logical.

Comment 2:

I do not see any beyond-SotA contribution from this work.

Response

Thank you for your feedback. We have revised the introduction, architecture method, and results discussion sections to clearly highlight the study's beyond-state-of-the-art contributions, incorporating additional information as suggested. We hope these revisions now provide a clear rationale for our work’s contributions.

Comment 3:

The Author(s) never state a very fundamental problem in LLMs inputs with natural language processing user requests: namely the "intent conflict detection and resolution". Specifically, when a user prompts a query to their Machine Knowledge Database it can fall into (self) conflicting rules, thus will be hard to admit-grant.

Response

Thank you for your observation and valuable comments. We have incorporated an "intent conflict detection and resolution" layer in Section 3.1.3 to address this issue. We hope this addition enhances the logical flow and clarity of our approach.

Comment 4: The conclusions, analysis and results are stateful, or ad-hoc and cannot be seen to scale.

Response

Thank you for your comments. We have revised the results discussion and conclusion sections to address concerns about scalability and to provide a more structured analysis. Hopefully you will find it justified.

Comment 5:

No added value coming from use cases, future work discussion, and elaboration of mitigations or gaps coming from this system is ever projected in the last sections.

Response

Thank you for your comments. We have added a new Section 5.2 to provide a detailed discussion on limitations and future work. Additionally, we have addressed gaps and mitigation strategies within the architecture in Sections 3.3.2 and 3.3.6. We hope this addition enhances the logical flow and clarity of our approach.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Comment 1： Has consideration been given to the expression of emotional intensity (such as the weight of emotional words) when removing HTML tags, special characters, and stop words mentioned in the data cleaning steps.

Comment 2： In the RAG (Retrieval Enhanced Generative) framework, there is a lack of specific retrieval methodology and parameter tuning details.

Comment 3： Systematically discuss the basis for selecting specific models and fine-tuning strategies.

Comment4： Should sentiment analysis consider other performance indicators such as sentiment intensity, granularity of sentiment classification, etc.

Comment 5： The literature review lacks in-depth analysis of the special needs of sentiment analysis in the financial field, and the innovation of this study should be highlighted.

Comment 6： After retrieving relevant documents, should clustering algorithms be used to classify and filter the retrieval results.Will using clustering for preprocessing improve the analysis performance.

Comment 7： Statistically analyze the distribution of emotional intensity in the dataset to understand the performance of the model under different intensities.

Comment 8： In future research, the LFEAR model can be applied to financial data in different languages to investigate its adaptability and transferability in cross linguistic sentiment analysis.

Author Response

Reviewer #2:

Comment 1:

Has consideration been given to the expression of emotional intensity (such as the weight of emotional words) when removing HTML tags, special characters, and stop words mentioned in the data cleaning steps.

Response 1:

Thank you for your valuable comments. We have addressed this in Section 3.2.2, with additional insights provided in Section 4.2 on Exploratory Data Analysis. We hope these additions clarify our approach to handling emotional intensity during data cleaning.

Comment 2:

In the RAG (Retrieval Enhanced Generative) framework, there is a lack of specific retrieval methodology and parameter tuning details.

Response 2:

Thank you for your valuable comments. We have updated Section 3.3.3 in the revised manuscript to include specific retrieval methodologies and parameter tuning details, and we have refined Algorithm 5 accordingly. We hope these additions enhance the clarity and logical flow.

Comment 3:

Systematically discuss the basis for selecting specific models and fine-tuning strategies.

Response 3:

Thank you for your comments. We have fully considered your suggestion and added a new subsection, 3.3.1, to systematically discuss the basis for selecting specific models and fine-tuning strategies in the revised manuscript. We hope you find this explanation justified.

Comment 4:

Should sentiment analysis consider other performance indicators such as sentiment intensity, granularity of sentiment classification, etc.

Response 4:

Thank you very much for your suggestions and comments. We have revised the manuscript to incorporate sentiment intensity and granularity of sentiment classification in Section 4.2. Additionally, we have updated the results and discussion sections (Section 4.8) and included Table 10 to show the evaluation results capturing these performance indicators.

Comment 5:

The literature review lacks in-depth analysis of the special needs of sentiment analysis in the financial field, and the innovation of this study should be highlighted.

Response 5:

Thank you for your valuable suggestion. We have incorporated an in-depth analysis of the special needs of sentiment analysis in the financial field and highlighted the innovation of this study in Subsection 2.1 and Line [489-506] of the revised manuscript.

Comment 6:

After retrieving relevant documents, should clustering algorithms be used to classify and filter the retrieval results. Will using clustering for preprocessing improve the analysis performance.

Response 6:

Thank you very much for your suggestion. We appreciate your insight, and we have added a discussion on the potential use of clustering algorithms for classifying and filtering retrieval results in the Limitations and Future Work section (5.2) of the revised manuscript.

Comment 7:

Statistically analyze the distribution of emotional intensity in the dataset to understand the performance of the model under different intensities.

Response 7:

Thank you very much for your suggestion. We have added Section 4.2, Exploratory Data Analysis, to provide an in-depth analysis of the emotional intensity distribution in the dataset. Additionally, we have added section 4.8 as well to incorporate an evaluation of how the model performs using this metric.

Comment 8:

In future research, the LFEAR model can be applied to financial data in different languages to investigate its adaptability and transferability in cross linguistic sentiment analysis.

Response 8:

Thank you very much for your suggestion. We have added this point in Section 5.2, Limitations and Future Work, of the revised manuscript. Additionally, we have highlighted how we mitigate this limitation in the architecture, specifically in Section 3.4, Inference Layer.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Sentiment analysis is crucial in the financial sector, especially for monitoring customer feedback and media headlines. Traditional supervised learning algorithms struggle with content brevity and idiomatic expressions. A novel structure called Language feature extraction and adaptation for reviews (LFEAR) addresses these limitations by combining retrieval-augmented generation and conversation format. The proposed model is claimed by the Author(s) to achieve an average precision score of 98.45%, answer correctness of 93.85%, and context precision of 97.69%.

Review comments:

(1) All the previous review comments have been properly addressed.

Author Response

The reviewers were pleased with the feedback we provided throughout the first round of their review. All of the references related to the manuscript were checked, and every highlight in the revised manuscript provided in the manuscript to the reviewer has been deleted. Therefore, we have uploaded a new version of the manuscript based on the reviewer's feedback, which has significantly improved the quality of the manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have responded to my concerns

Author Response

Article Menu

Fine-Tuning Retrieval-Augmented Generation with an Auto-Regressive Language Model for Sentiment Analysis in Financial Reviews

Further Information

Guidelines

MDPI Initiatives

Follow MDPI