Next Article in Journal
PMG—Pyramidal Multi-Granular Matching for Text-Based Person Re-Identification
Next Article in Special Issue
Fake News Detection on Social Networks: A Survey
Previous Article in Journal
Analysis of the Performance of a Hybrid Thermal Power Plant Using Adaptive Neuro-Fuzzy Inference System (ANFIS)-Based Approaches
Previous Article in Special Issue
Social Media Opinion Analysis Model Based on Fusion of Text and Structural Features
 
 
Article
Peer-Review Record

Automatic Vulgar Word Extraction Method with Application to Vulgar Remark Detection in Chittagonian Dialect of Bangla

Appl. Sci. 2023, 13(21), 11875; https://doi.org/10.3390/app132111875
by Tanjim Mahmud 1,2,*, Michal Ptaszynski 1,* and Fumito Masui 1
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Reviewer 5:
Appl. Sci. 2023, 13(21), 11875; https://doi.org/10.3390/app132111875
Submission received: 29 August 2023 / Revised: 6 October 2023 / Accepted: 25 October 2023 / Published: 30 October 2023

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1. Kindly avoid the terms 'we' and 'our' in the article.

2. Revise and reduce the length of the abstract.

3. Revise the statement "One problem with such methods however is that such hand-crafted lexicons need to be regularly updated".

4. Provide the citation/source for  Figure 1.

5. What is the contribution of NLP in this work?

6. Kindly avoid the group citation in section 2.

7. Kindly identify the research gap from section 2.

8. On what basis did the authors select the ML (LR, SVM, DT, RF, MNB) and DL algorithms (Simple RNN and LSTM)? The following articles can be referred for ML and DL algorithms

a)     Improving sentence simplification model with ordered neurons network

b)     A comprehensive review on deep learning approaches in wind forecasting applications

c)     Deep learning for time series forecasting: The electric load case

9. Algorithm hypertuning parameters and their results need to be explored.

10. Results have to be discussed with literature support.

11. Remove the citation in the conclusion and revise it. Brief the research findings in the conclusion.

12. Can you provide more details on how the keyword matching method is implemented for identifying vulgar remarks? What were the specific steps involved in using a predefined dictionary of vulgar words for detection?

13. In the context of machine learning and deep learning techniques, what were the specific models and algorithms used for identifying harmful remarks? Were there any challenges or considerations in choosing these models?

14. How did the study address potential biases in the dataset, especially in the context of identifying harmful remarks? Were there any steps taken to ensure the models did not inadvertently amplify any biases present in the data?

15. In terms of practical applications, how transferable are the findings of this study to other languages or platforms?

16. Were there any insights gained from the study that could be useful for implementing similar algorithms in other contexts?

 

Comments on the Quality of English Language

Moderate editing of the English language required

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper presents an intelligent method for automatic extraction of vulgar phrases and explores the strengths and shortcomings of simple keyword matching and machine learning/deep learning technology for characterising vulgar remarks containing vulgar and non-vulgar comments from mass digital media. Specifically, the researchers used TF-IDF and other text preprocessing techniques, then created and trained models using multiple machine learning algorithms, including naive Bayes, support vector machine, and logistic regression. Finally, they evaluated the models using comparing-result metrics such as accuracy, recall, and F1 scores.

The article compares two approaches for identifying vulgar remarks in the low-resource Chittagonian dialect of Bangla. The results show that the second approach involving more sophistications outperforms the first approach based on keyword matching in terms of accuracy. This is because the machine learning models are able to learn more complex patterns in language use that are difficult to capture with simple keyword matching. Additionally, the machine learning approach can be adapted to new types of vulgar language, whereas the keyword approach requires manual curation of the keyword list. Therefore, for the low-resource Chittagonian dialect of Bangla, the machine learning approach is more effective in identifying vulgar content.

Major comments:

That is an interesting paper. Future research should use more machine learning-based methods, such as deep learning algorithms, to handle more complex and diverse textual data when dealing with similar "low-resource" languages. Additionally, researchers should continue to increase the size of the training data set to improve the performance and applicability of the algorithms.

Minor comments:

At the moment, the abstract is lengthy and out of the audience’s reach. Be concise and organised, if possible, and delete (LR) and (RNN) from the keywords.

Figure 1 has little significance in the paper.

Move footnote items to the reference list.

The literature review falls into the problem of summarising without elaboration. Avoid drafting entire paragraphs for each article separately.

Streamline the text in the introduction into 4-5 paragraphs.

Line 85, Page 3: ‘This paper significantly advances... 'Do not make such statements in the first part of the paper. Combine points 3 and 5 in the following main contributions.

Remove Tables 1 and 2 if their references have been clearly explained in the body text.

Figure 2: How may one elucidate the predicted results?

Figure 3: …Collect and Organize data.

Make sure to use Italian text for variable names and normal text for real figures.

Split Results and Discussion into a couple of sections.

Line 686, Page 20: What are the exact findings from those previous studies? Revise as needed throughout the paper.

It said Funding acquisition, M.P., F.M. whereas the statement is ‘This research received no external funding.’

Comments on the Quality of English Language

There are several language glitches to be addressed when submitting the revision.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This is a well written article with good articulation of procedures taken to identify the sources of data, data analysis and discussion.

However, I have the following few aspects that I feel should be addressed to strengthen the article:

1). On page 2: Figure 1 seems misplaced and tend to distort the overall presentation. I suggest that you move it inside your introduction, just below paragraph 1, line 43. 

2). Contributions of the Study are often written towards the end of the study as opposed to being placed at the beginning of your study, i.e., pages. 86 - 113. SUGGESTION: Move your seven bullet points to your Section 5 on conclusions and future research work. Then in this Section 1 move your GOAL of the study (i.e., Line 72) towards the end of Section 1 for clarity to your readers on what you intent to achieve.

3). Section 5: my point 2 above refers! You should clearly articulate in this section the contribution of your study to: (a) theory, (b) managerial practices, and (c) policy

The rest of the sections are well articulated - I have no issues with them.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

applsci-2608916

 

This paper is about Automatic Vulgar Word Extraction. This paper has no novelty, because there are many papers in this field that researchers used advanced deep learning models like attention models, etc, for Vulgar Remarks Detection. Anyway, my comments are as follows:

 

1. The abstract section is too long. It needs to be completely reviewed and summarized.

2. Please add a section about "literature review". In this section, it is better to have a tabular summary of the paper's review to give readers a better understanding of the research done in this field from 2022-2023. In this section, some articles should be presented in the form of text, and the rest of the papers should be summarized in a table.

3. In the review section of the articles, mention the application of deep learning in medicine - signal and image processing, etc.

4. Please summarize the "Performance Metrics" in a table.

5. In the conclusion section, while pointing out the strengths and weaknesses, the proposed method should be compared with other references.

6. In discussions you need to critically discuss your work/results against your hypothesis.

7. Identify the main findings and justify the novelty and contribution of the work.

8. Most of the references are old, new references should be checked as well.

9. Please add a section about "limitation of study".

10. In the Conclusion section, please explain more about future works. This section requires further discussion.

11. English language is acceptable in general, but there are some errors that should be corrected.

Comments on the Quality of English Language

Minor editing of English language required

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

Automatic Vulgar Word Extraction Method with Application to Vulgar Remarks Detection in Chittagonian Dialect of Bangla

1.      The abstract is too long, approximately 400 words. Please simplify it.

2.      In the abstract, please provide a concise summary of the methods used. Avoid excessive detail to prevent lengthening the abstract. Additionally, compare the results of your proposed method with established approaches. Mention the datasets used to support your method.

3.      The methodology is challenging to follow. Perhaps you could create a flowchart illustrating the steps you performed in the research, along with explanations of the methods used. For instance, based on my understanding, the first step is data collection, the second is data annotation, and the third is data classification (utilizing baselines and ML), and so on. I believe that Figure 2 only presents an overview of your research. Figure 3 does not clearly explain the data collection process, while Figure 4 describes an example of vulgar words. Figure 5 exclusively covers the data annotation process, and so forth.

4.      In line 352 (sub-section 3.3), the authors mention 'Baselines.' Could you briefly explain (in one or two words) what baselines are and their purpose? I noticed in Figure 4 that you proposed three baselines. Could you explain the rationale behind having three baselines?

5.      In Figure 9, please provide a label for the y-axis.

I think Table 6 is unnecessary. You can give this information in Figure 9 caption, i.e, Comparisons among different types of keyword matching methods with a respective ratio of vulgar words in the lexicon (training 80%, testing 20%). The same comment goes to Table 9 and Figure 11.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Congrats to the authors.

Comments on the Quality of English Language

Moderate editing of the English language required

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

If you really look at what has been written throughout the manuscript, you will see so-called 'slang' words that must not appear like this in a publication. They give readers a negative impression, and I still believe Figure 1 needs to be changed since there are no axis labels to an extent of insufficient presentation quality.

Comments on the Quality of English Language

Please use polite English words, if possible, when finalizing the document.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

Minor editing of English language required

Comments on the Quality of English Language

Minor editing of English language required

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop