A Hybrid Multimodal Data Fusion-Based Method for Identifying Gambling Websites
Round 1
Reviewer 1 Report
In this manuscript the authors propose a method for identifying gambling websites by extracting and fusing visual and semantic features of website screenshots. The results demonstrate the viability of their proposal.
The topic is for sure interesting and significant, and the manuscript may imply a significant contribution to the field. The work carried out is described in a very comprehensive way and allows the results to be reproduced. Authors provide precise discussion of the results, but significant descriptions of the proposed method and important technical details are still needed to ensure that the results can be reproducible.
The text has been intensively revised and corrected, since there are not important typographical or punctuation errors. There are not repetitions of close words. No word division and hyphenation errors have been found in the manuscript.
However, there are, according to the reviewer, some mayor aspects that need to be addressed by the authors:
- Section 2, Related work. The works related to the identification of illegal websites have been focused on URL-based, single feature and mixed-feature based methods. What about the rest of the existing proposals in the literature? Have they been adequately explored? Aren't there other jobs that extract text from website screenshots? Why is the work focused on these methods? What are the disadvantages of the methods proposed in the literature? Especially it is necessary to clarify this in relation to mixed feature-based methods.
- Section 2, Related work. The authors point out that improper feature combinations can not effectively improve the performance of model recognition and it is still difficult to fuse different features well. Why are they improper? Why is fussing features difficult and what do the authors propose to fix it? This should be explained in the paragraph.
- Section 3, Method. Figure 2 should be described in more detail, adequately explaining each of its elements.
- Section 3.1.1, The structure of ResNet34. The authors should briefly describe what the layers and blocks consist of. Consequently, Figure 4 should be also described.
- Section 3.1.2, Transfer learning. Authors state that they modify the output size of the ResNet34’s fully connected network to fit their label category (gambling and normal). Then they set a much smaller learning rate to fine tune the pretrained model on the webpage screenshot dataset we collected. They should explain this whole process in depth. The formula (2) should also be detailed.
- Section 3.2.1, Incremental training of the embedding layer. As in the previous section, in this case it is also necessary to introduce more technical details about the proposed method. In addition, Formulas 3 to 9 should also be detailed step by step.
- Section 3.3.1 Feature fusion based on self-attention. Both the complete process and formulas (11 to 14) are described very poorly for the interest of the reader. The authors should pay special attention to improving the explanation of this section.
- Section 3.3.2. Late fusion. What does the grid search method consist of? Why is this method used? Have other alternatives been explored? Why is the value 0.1 taken as the step?
- Section 4. Experiments and Analysis. How have the authors made the selection of the 800 gambling webpage screenshots? How did authors carry out the crawling process?
- Section 4. Experiments and Analysis. Each evaluation technique should be defined, specified and explained (accuracy, precision, recall, and F1-score). Its range should also be defined.
- Section 5, Conclusions: this section should be completely rewritten. In fact, there should be two separate sections: Limitations and future work; and Conclusions. Authors should pay special attention to the following recommendations:
In regards of limitations and future work, the new section should adequately describe the significant limitations of the conducted research, especially with regard to all the assumptions that have been taken into account and the possible variations that have conditioned or may condition the results obtained. Furthermore, this section should also include everything related to future work.
In addition, the conclusions should be explained in greater detail and rewritten in a much more scientific and reasoned way. It should contain not simply and brief summary or a repetition of the research issue but an integration of key points. Most important implications of the research conducted should be summarized. The significance of the findings should be exposed with the objective of demonstrating the importance of the ideas.
Several minors changes are also required:
- There are several sentences that are too categorical and must be softened and clarified. For example: "gambling websites... causing great harm to society and leading to serious cybercrimes" (line 2); "the black list-based method cannot work" (line 94);
- Figure 3 appears before being introduced (In fact, it is not even referenced. A sentence like For instance "... is shown in Figure 3" or "Figure 3 shows..." is needed).
- Figure 4 appears before being introduced (In fact, it is not even referenced).
- Lines 293-294. Do authors mean GB when they quote 377G memory and 24G video memory? What is the frequency of the processor and RAM memory? What are the other characteristics of the equipment? For example, what type of data storage do they employ?
Author Response
Point 1: A distinctive feature of the authors' study is its extreme relevance, even more so from the point of view of its practical application.The authors presented their study excellently: the structure is clear and logical, the methodology is described clearly and can be scaled, the literature is presented in the required volume, the conclusions are logical and based on the study. My only comment is to present a limitation of the proposed methodology.
Response 1: Thanks for the comment. We have added a description of limitations and future work in section “conclusions”. (line 475 to line 484 in the revised manuscript)
”Despite the effective performance of the proposed method, it still has some limitations that need to be improved in the future. First, the model can be improved by incorporated with more features. In this paper, we mainly use visual and semantic features of webpage screenshots. In the future, we will explore fusing more features and investigate better multimodal data fusion methods to fuse more modalities of data. Another limitation is that the accuracy of OCR extraction of text will directly affect the accuracy of text classification and multimodal data fusion. If there are some errors in the text information extracted by OCR, the performance of text classification and multimodal classification will suffer. Therefore, we will explore the impact of OCR errors on model performance and how to reduce such errors in future work.”
Author Response File:  Author Response.pdf
 Author Response.pdf
Reviewer 2 Report
A distinctive feature of the authors' study is its extreme relevance, even more so from the point of view of its practical application.
The authors presented their study excellently: the structure is clear and logical, the methodology is described clearly and can be scaled, the literature is presented in the required volume, the conclusions are logical and based on the study.
My only comment is to present a limitation of the proposed methodology.
Author Response
Please see the attachment.
Author Response File:  Author Response.pdf
 Author Response.pdf
Round 2
Reviewer 1 Report
The authors have made an important effort and have done a great job in this new version. They have fixed all issues and applied all suggested changes and modifications. Therefore, in my opinion, the paper can be accepted in its present form.
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
 
         
                                                
