Next Article in Journal
Intelligent Real-Time Modelling of Rider Personal Attributes for Safe Last-Mile Delivery to Provide Mobility as a Service
Next Article in Special Issue
COVID-19 CXR Classification: Applying Domain Extension Transfer Learning and Deep Learning
Previous Article in Journal
Advanced Dipper-Throated Meta-Heuristic Optimization Algorithm for Digital Image Watermarking
Previous Article in Special Issue
MobiRes-Net: A Hybrid Deep Learning Model for Detecting and Classifying Olive Leaf Diseases
 
 
Article
Peer-Review Record

Automatic Fact Checking Using an Interpretable Bert-Based Architecture on COVID-19 Claims

Appl. Sci. 2022, 12(20), 10644; https://doi.org/10.3390/app122010644
by Ramón Casillas 1, Helena Gómez-Adorno 2,*, Victor Lomas-Barrie 2 and Orlando Ramos-Flores 2
Reviewer 1:
Reviewer 2: Anonymous
Appl. Sci. 2022, 12(20), 10644; https://doi.org/10.3390/app122010644
Submission received: 29 September 2022 / Revised: 7 October 2022 / Accepted: 12 October 2022 / Published: 21 October 2022
(This article belongs to the Special Issue Applications of Deep Learning and Artificial Intelligence Methods)

Round 1

Reviewer 1 Report

A neural network architecture was developed to focus on the verification of facts against 1 evidence found in a knowledge base. The architecture is oriented to relevance evaluation and claim verification, parts of a well-known 3-stage method for fact-checking. BERT was fine-tuned to codify claim and evidence separately and with an attention layer between them, and a classification layer was capable of performing the above-mentioned tasks. The following comments should be addressed to improve the quality of the paper:

1. The English writing should be improved. For example, the sentence "We fine-tune BERT to codify claim and evidence separately and with an attention layer between them, a later classification layer is capable of performing the above-mentioned tasks." in the abstract is grammatically erroneous.

2. In Table 1, Bert (Shared) + Multiplicative + LSTM-5L performs best in training, but it performs worse than Soleimani 2020 in testing. Why were both the figures boldfaced?

3. In Table 2, Bert (Shared) + Multiplicative + LSTM-5L performs best in terms of Recall@5, but it performs worse than Soleimani 2020 in terms of both Precision@5 and F1@5. Why were all the three figures boldfaced?

4. In Table 3, Bert (Shared) + Multiplicative + LSTM-5L performs worse than Soleimani 2020 in both training and testing. Why were both figures boldfaced?

5. In Table 4, Bert (Shared) + Multiplicative + LSTM-5L performs worse than Soleimani 2020 in terms of Accuracy. Why was the figure boldfaced?

6. The authors claimed "Our model presents a simpler interpretation than other models of the state-of-the-art, which can be extracted from scores computed within the attention layer and can provide a straightforward idea of which evidence spans are more relevant to classify a claim as supported or refuted." However, they didn't show how the interpretation was extracted from their model. The authors only showed the results (Figure 7). They should explain how Figure 7 was derived, revealing the scores computed within the attention layer and presenting the details of the process.

Author Response

We take this opportunity to thank the reviewer for their concise and judicious comments and recommendations.

In the following lines, we specify the way we address each point made. 

Reviewer 1

R1. The English writing should be improved. For example, the sentence "We fine-tune BERT to codify claim and evidence separately and with an attention layer between them, a later classification layer is capable of performing the above-mentioned tasks." in the abstract is grammatically erroneous.

Answer: We reviewed the article's writing and made adjustments marked in green in the document.

 

R2. In Table 1, Bert (Shared) + Multiplicative + LSTM-5L performs best in training, but it performs worse than Soleimani 2020 in testing. Why were both the figures boldfaced?

Answer: Thank you for your comment. We corrected Table 1. Now, we only boldfaced the best results in training and in testing.

 

R3. In Table 2, Bert (Shared) + Multiplicative + LSTM-5L performs best in terms of Recall@5, but it performs worse than Soleimani 2020 in terms of both Precision@5 and F1@5. Why were all the three figures boldfaced?

 

Answer:  We changed Table 2 with the appropriate boldfaced performance. Also, we changed “Model R1 (Bert(shared) + multiplicative + LSTM-5L)” by “Model R1 - Bert(shared) + multiplicative + LSTM-5L” and remove the boldfaced.

 

R4. In Table 3, Bert (Shared) + Multiplicative + LSTM-5L performs worse than Soleimani 2020 in both training and testing. Why were both figures boldfaced?

 

Answer: We removed the boldface in both columns of Table 3 according to your comments, and we set the boldface to the best scores in the training and test columns. Additionally, we removed the boldfaced of “SentBert-Roberta(shared) + multiplicative + Softmax.”

 

R5. In Table 4, Bert (Shared) + Multiplicative + LSTM-5L performs worse than Soleimani 2020 in terms of Accuracy. Why was the figure boldfaced?

 

Answer: We changed the row of Table 4 to which you refer, removing the boldfacing in the three columns. Now, Table 4 has the boldfaced accuracy column of Soleimani.

 

R6. The authors claimed "Our model presents a simpler interpretation than other models of the state-of-the-art, which can be extracted from scores computed within the attention layer and can provide a straightforward idea of which evidence spans are more relevant to classify a claim as supported or refuted." However, they didn't show how the interpretation was extracted from their model. The authors only showed the results (Figure 7). They should explain how Figure 7 was derived, revealing the scores computed within the attention layer and presenting the details of the process.

 

Answer: We added the explanation of how the interpretation was extracted from our model in lines 215-227:

This relevance is computed in the attention layer by calculating a matrix of weights between the elements of both input sequences. Each inference process generates a particular weighted matrix for each case. The matrix stores the information on the aspects in which the neural network focuses more and which ones it leaves aside. For this reason, the attention map generated is used as an approximation to understand what the model takes into account to decide if a statement is supported, refuted, or without information.

 

Additionally, to further clarify this point, we add two figures (7 and 8) with the attention map obtained from the model corresponding to Figures 5 and 6, respectively.

Reviewer 2 Report

I only have the following minor comments that need to be addressed:

1. Please fix the equation number on line 94.

2. Please provide a discussion on directions for future/planned work. This is a necessary part of every manuscript and should be presented along with the conclusion. No research is independent. I’d like to know if there are plans for adaptation by any group/institute etc. and/or future improvements to the developed models.

 

Author Response

We take this opportunity to thank the reviewer for their concise and judicious comments and recommendations.

In the following lines, we specify the way we address each point made. 

 

R1. Please fix the equation number on line 96. 

 

Answer: We fixed the equation reference.

 

R2. Please provide a discussion on directions for future/planned work. This is a necessary part of every manuscript and should be presented along with the conclusion. No research is independent. I’d like to know if there are plans for adaptation by any group/institute etc. and/or future improvements to the developed models.

 

Answer: Thank you for your feedback, we added a paragraph describing the future work in the conclusions section.

Round 2

Reviewer 1 Report

My previous comments have been addressed in this revision. Therefore, I suggest the revision be accepted.

Back to TopTop