Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Lexicon-Based Framework for Mining and Analysis of Arabic Comparative Sentences

Algorithms 2025, 18(1), 44; https://doi.org/10.3390/a18010044

by Alaa Hamed, Arabi Keshk and Anas Youssef^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Algorithms 2025, 18(1), 44; https://doi.org/10.3390/a18010044

Submission received: 21 November 2024 / Revised: 28 December 2024 / Accepted: 6 January 2025 / Published: 13 January 2025

(This article belongs to the Topic Applications of NLP, AI, and ML in Software Engineering)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This article presents a framework for analyzing comparative sentences in Arabic using a lexicon-based approach. The paper presents five algorithms responsible for different aspects of data processing, from identifying comparative sentences to extracting preferred units. Text analysis in Arabic is important due to the large population using this language. Moreover, developing tools for Arabic, which is linguistically complex, can have a significant impact on the development of text analysis in Arab regions. On the other hand, the research may seem difficult to access for those outside the narrow field of Arabic language processing.

The article is well written, well structured, and the research process is well presented. However, some chapters need to be expanded, as they consist only of single sentences. Especially sections 3.3. - 3.5.4. which start with the sentence "This section describes...", where a detailed explanation is missing. For example, section 3.3. lacks a detailed discussion of how the data was cleaned. Were punctuation marks, digits, whitespace characters cleaned? Are there any other characters specific to Arabic? Arabic spelling is completely different from European languages.

In the results section, I recommend adding a table with the results, not just graphs. This will greatly improve the readability of the results and the quality of the article.

I also have a few comments regarding the issues I have noticed.

1. The article is limited to Modern Standard Arabic (MSA) and the Egyptian dialect, which significantly limits its usability for other Arabic dialects. For this reason, it is recommended to extend the experiments to other popular dialects, e.g. Moroccan or Iraqi, to increase the universality of the solution.

2. The algorithms rely mainly on keywords, ignoring the context of the sentence, which leads to misclassifications in cases of ambiguous words. Therefore, in the future, it would be worth considering implementing models that take into account the context, such as modern NLP techniques, e.g. transformers (BERT or its Arabic version, AraBERT).

3. The lack of consideration of diacritics in Arabic significantly affects the precision of the analysis, because the same words can have different meanings depending on the characters used.

4. The article points out problems with identifying sentences that contain many comparisons or mixed emotions. It would be worth checking what effects implementing methods that deal with ambiguity will bring, e.g. training the model on labeled multi-class data.

5. There is a lack of a more detailed comparison of results from different approaches to show their relative effectiveness.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This is an interesting work because, as the authors mention, there are not many research studies available on the Arabic language, despite it being spoken by millions of people in various countries.

The proposed framework is a semi-automated process. Could the authors clarify why this lexicon construction process does not affect the subsequent results of the research?

What criteria did the experts follow to manually classify the sentences in the dataset?

In my humble opinion, the authors could simplify the section ‘Relation Extraction from Arabic Comparative Sentence Algorithm.’ I do not consider it necessary to explain the algorithm line by line; general guidelines would suffice.

To facilitate reproducibility of the experiments, it would be necessary to specify the software and hardware utilized.

The authors are requested to indicate the proportion of data used for testing, training, and validation. It would also be helpful for them to comment on whether the data is balanced across all categories.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

My concerns have been resolved, it is ok to publish

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have implemented the requested improvements in their manuscript. Each of the raised concerns has been addressed and incorporated into the manuscript.

Article Menu

A Lexicon-Based Framework for Mining and Analysis of Arabic Comparative Sentences

Further Information

Guidelines

MDPI Initiatives

Follow MDPI