Predictive Fraud Analysis Applying the Fraud Triangle Theory through Data Mining Techniques
Round 1
Reviewer 1 Report
Dear Authors,
I enjoyed reading this paper.
Still, there are some minor issues to deal with. For instance:
- English language and style issues - Grammarly (https://app.grammarly.com) on default settings detected only for the text block resulting from the concatenation of Title+Abstract+Keywords three (3) critical issues and nine (9) more advanced ones (see all of them in those two captures at https://tinyurl.com/3kvefhkv ). This means a total score of 83 out of a maximum of 100 for this sample above. Since you do not appear to be native English speakers, I recommend a full revision of the English language and style for the entire article (a complete Grammarly report for the entire paper with the maximum score / the message "no issues found" would be enough);
- The paper must follow the specific structure of the journal, namely: Abstract, Keywords, Featured Application, Introduction, Materials and Methods, Results, Discussion, Conclusions as indicated at https://www.mdpi.com/journal/applsci/instructions.
- The Header contains the collocation “Journal Not Specified”. Of course, this should be replaced;
- The Conclusions and Future Work must be split into two parts;
- Some figures (e.g., 7, 8, 9, and 10) need a higher resolution (minimum 1000 pixels width/height, or a resolution of 300 dpi or higher);
- Figure 5 comes before Figure 4 and Figure 8 comes before Figure 7. This is not the way they should be organized in the paper;
- Some sections/subsections end with equations or tables. This is not a good practice (e.g., subsection 2.2 ending with eq.2, subsection 4.2.4 ending with Table 8). There is a need for more explanatory text after or it is simply about a misplacement error;
- Some figures have the legend unusually long (e.g., Figures, 2, 3). I suggest the authors shorten the figure legends and place the rest of the details in the main text;
- There are so many figures (10) and tables (8) in the paper. Some of them not essential for understanding the main content should be moved to the Appendix section. If not existing, this section must be created;
- There are no explicit references in the main text to all equations (e.g., see eq. N);
- Some references at the Dense Neural Networks section are required;
- The digital object identifier for most of the journal paper references is missing;
- Accuracy values such as those reported in columns T2-T4 of Table 8 are not even fair (>0.7). I recommend removing col.T4 (values way too far from 0.7). If not possible I recommend including a dedicated legend regarding how to interpret these values (0.5-0.6 … 0.8-0.9, more than 0.9, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935260/ ).
Thank you for your contribution and for trying to make the world a better place!
Sincerely,
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Based on the fraud triangle theory, the authors apply data mining techniques on a (balanced) synthetic text dataset to predict the occurrence of fraud. It includes the generation of the synthetic text dataset, the application and assessment (by coherence score) of topic modelling text mining techniques, and then the application and assessment (based on the area under the ROC curve) of various supervised classification methods.
Hence, the manuscript addresses potentially important and useful issues in the identification of potential fraud risk using text mining and classification methods. Thus, from my opinion the manuscript can be a good contribution to the field, the style and the length are appropriate for full development of the ideas, and I therefore recommend its acceptance for publication after a minor revision to accommodate the corrections and suggestions below.
- The quotation marks in the analysed pdf file do not appear always in the proper form. Please confirm that the form used is the correct one.
- The order of references is not correct. Please correct.
- Line 89 – Please define all the acronyms on the first use (even those that are obvious, such as CNN, DNN, LSTM, ROC, among others), and it is not necessary to define twice. Moreover, Support Vector Machines is SVC in line 86 and is SVM in line 124.
- Line 97 – “Finally, section VI presents”. Please correct (it is not section VI)
- Line 159 – “IFR2” should be replaced by “IFR2”
- Line 171 – “assets”[20].” should be replaced by “assets”[20].” (space between the quotes and the reference is missing)
- Line 176 – “reports [11] The” should be replaced by “reports [11]. The”
- Figure 1 – The text associated with “Rationalize” is swapped with the text associated with “Opportunity”.
- Line 196 – Please, explain the sentence “text mining often uses TM.”
- Line 200 – “[32] [35].” should be replaced by “[32, 35].”.
- Line 203 – ““debts, financial problems, late payments”” should be replaced by ““debts”, “financial problems”, “late payments””
- Line 232 – “dirichlet” should be replaced by “Dirichlet”
- Line 234 – “texts[40].” should be replaced by “texts [40].” (a space is missing)
- Line 242– “where enotes”. Is it “where N denotes”?
- Line 245 – “by a the vector” should be replaced by “by the vector”
- Equation (1) – Is D defined?
- Section 2.3 - The AUC evaluates all possible cut-points, even those that are unsuitable in practice. Have other accuracy classification measures been applied? (such as partial AUC, Phi index, specificity value for a target sensitivity value, among others)
- Line 312 – “model.It” should be replaced by “model. It”
- Line 323– “dimensions[75] [76].” should be replaced by “dimensions [75, 76].”
- Figure 5 is shown before Figure 4. Please change.
- Line 384 – “dataset[29].” should be replaced by “dataset [29].”
- Line 405 – “[27] [28].” should be replaced by “[27, 28].”
- Subsection 3.1 – Please, explain in more detail how the synthetic text dataset was generated, either the documents related and the unrelated to fraud.
- Line 463 – “[68] [69].” should be replaced by “[68, 69].”
- Figure 6 – “techniques(LSA,” should be replaced by “techniques (LSA,”
- Line 592 – “in 8 (b) showed” should be replaced by “in Figure 8 (b) showed”
- Line 596 – “algorithm 1” should be replaced by “Algorithm 1”
- Line 655 – “classifiers(linear and no-linear” should be replaced by “classifiers (linear and non-linear”
- Figure 9 – Are GNB denoting Gaussian Naïve Bayes (already denoted only as NB – see subsection 2.3.5), KN denoting k-Nearest Neighbor (already denoted as kNN – see subsection 1.1), and SVC denoting Support Vector Machines (before, both SVC and SVM are used)
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
- The novelty of this paper is not very clear. Would you please add the novelty of the work either in abstract or in conclusion?
- Please add more statements about the outcome (results) in the abstract.
- Please use passive form for constructing sentences instead of using persons. for example, in line 6 “In this work, we propose a mechanism……”
- Research gap is missing. Please add them at the end of related work section
- Please add more description on Table 2 and Table 3 which rare currently not very clear.
- Performance/advantages comparison with existing related works (if available) would be good add at the end of result section to validate the capability of the proposed method presented in the paper.
- Please add significance of conducting this work in Conclusions and further future research direction.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Dear Authors,
You solved most of the issues. I think the paper is now ready for being published!
Congratulations!
Sincerely