A Lexicon-Based Framework for Mining and Analysis of Arabic Comparative Sentences
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis article presents a framework for analyzing comparative sentences in Arabic using a lexicon-based approach. The paper presents five algorithms responsible for different aspects of data processing, from identifying comparative sentences to extracting preferred units. Text analysis in Arabic is important due to the large population using this language. Moreover, developing tools for Arabic, which is linguistically complex, can have a significant impact on the development of text analysis in Arab regions. On the other hand, the research may seem difficult to access for those outside the narrow field of Arabic language processing.
The article is well written, well structured, and the research process is well presented. However, some chapters need to be expanded, as they consist only of single sentences. Especially sections 3.3. - 3.5.4. which start with the sentence "This section describes...", where a detailed explanation is missing. For example, section 3.3. lacks a detailed discussion of how the data was cleaned. Were punctuation marks, digits, whitespace characters cleaned? Are there any other characters specific to Arabic? Arabic spelling is completely different from European languages.
In the results section, I recommend adding a table with the results, not just graphs. This will greatly improve the readability of the results and the quality of the article.
I also have a few comments regarding the issues I have noticed.
1. The article is limited to Modern Standard Arabic (MSA) and the Egyptian dialect, which significantly limits its usability for other Arabic dialects. For this reason, it is recommended to extend the experiments to other popular dialects, e.g. Moroccan or Iraqi, to increase the universality of the solution.
2. The algorithms rely mainly on keywords, ignoring the context of the sentence, which leads to misclassifications in cases of ambiguous words. Therefore, in the future, it would be worth considering implementing models that take into account the context, such as modern NLP techniques, e.g. transformers (BERT or its Arabic version, AraBERT).
3. The lack of consideration of diacritics in Arabic significantly affects the precision of the analysis, because the same words can have different meanings depending on the characters used.
4. The article points out problems with identifying sentences that contain many comparisons or mixed emotions. It would be worth checking what effects implementing methods that deal with ambiguity will bring, e.g. training the model on labeled multi-class data.
5. There is a lack of a more detailed comparison of results from different approaches to show their relative effectiveness.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis is an interesting work because, as the authors mention, there are not many research studies available on the Arabic language, despite it being spoken by millions of people in various countries.
The proposed framework is a semi-automated process. Could the authors clarify why this lexicon construction process does not affect the subsequent results of the research?
What criteria did the experts follow to manually classify the sentences in the dataset?
In my humble opinion, the authors could simplify the section ‘Relation Extraction from Arabic Comparative Sentence Algorithm.’ I do not consider it necessary to explain the algorithm line by line; general guidelines would suffice.
To facilitate reproducibility of the experiments, it would be necessary to specify the software and hardware utilized.
The authors are requested to indicate the proportion of data used for testing, training, and validation. It would also be helpful for them to comment on whether the data is balanced across all categories.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsMy concerns have been resolved, it is ok to publish
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have implemented the requested improvements in their manuscript. Each of the raised concerns has been addressed and incorporated into the manuscript.