Evaluating and Enhancing the Robustness of Sustainable Neural Relationship Classifiers Using Query-Efficient Black-Box Adversarial Attacks
Abstract
:1. Introduction
1.1. Challenges
- Human prediction should remain unchanged.
- Semantic similarity should be maintained.
- Adversarial examples should appear natural and fluent.
- Determining significant words for an adversarial attack and invoking the classifier as rarely as possible.
- Generating adversarial examples that can mislead the classifier considering the characteristics of RE datasets and models.
- Evaluating the success of an attack.
- Improving the robustness of these models.
1.2. Contributions
- We propose a novel query-efficient TFIDF-based black-box adversarial attack and generate semantically similar and plausible adversarial examples for NRE task.
- Our mechanism evaluates supervised RE models using black-box adversarial attacks; this has not been previously undertaken. It was demonstrated that no available open-source RE model is robust and sustainable to character- and word-level perturbations.
- Our proposed adversarial attack makes use of test samples to find significant words in a sentence, therefore reducing the number of queries and time required to generate an adversarial example.
- In comparison to other similar black-box attacks on text classification and entailment tasks with a minor modification in their algorithms for RE dataset (constraints on modifying the mentioned entities). Our method achieves a higher attack success rate in the lowest number of queries.
- We further discussed two potential defense mechanisms to defend the models against the aforementioned attacks along with evaluations.
2. Related Work
3. Methodology
3.1. Problem Formulation
3.2. Threat Model
- It should generate fluent adversarial examples that are semantically similar to the original sentence.
- The target NRE model/classifier should be invoked as few times as possible.
- It should fool the NRE model into producing erroneous outputs.
3.3. Adversarial Attack
- Determining important words and sorting them in descending order according to their importance.
- Using these words to generate adversarial sentences.
- Checking the similarity constraint between the original and adversarial sentences.
- Checking whether the adversarial sentence changes the output of the model.
- TFIDF+(WLA/CLA).
- QB+(WLA/CLA).
- Combined (TFIDF-QB+(WLA/CLA)).
3.3.1. Step 1: Word Importance Ranking (Lines 1–8 and 24–28)
- TFIDF-based word importance ranking (TFIDF-WIR).
- QB word importance ranking (QB-WIR).
- Combined TFIDF and QB word importance ranking ((TFIDF+QB)-WIR).
TFIDF-Based Word Importance Ranking
Algorithm 1: Black-box TFIDF-QB combined attack. |
Query-Based Word Importance Ranking
Combined TFIDF-WIR and QB-WIR
3.3.2. Step 2: Word Transformer (Line 10–23):
Character-Level Attack
Algorithm 2: CLA. |
- Insert-C: The inserting strategy can be applied in several ways. For example, there are combinations for inserting a character from the latin alphabet. Special characters such as “!” and “@”, as well as a space “ ” between the first and the last character of a word can also be inserted. We opt to insert a symbolic character because the resulting mistyped words can be easily understood by humans; however, the embedding vectors are different from those of the original words, and thus the classifier can be fooled.
- Repeat-C: This is almost the same concept as the inserting technique, but here we select a random character between the first and last character of an important word and repeat it once. As in the previous case, the resulting word can be easily identified by human readers as the original.
- Swap-C: This randomly swaps two adjacent letters without changing the first and last letter.
- Delete-C: This deletes a letter between the first and last letter. This perturbation can also be easily recognized by a human, but the classifier may fail.
- Replace-C: This replace the original letters with visually similar letters. For example, “o” can be replaced with zero “0,” the letter “a” can be replaced with “@,” or lower case letters can be replaced with upper case.
Word-Level Attack
Algorithm 3: WLA. |
- Synonym replacement: This is a popular attack strategy because it preserves word semantics. This function replaces a word with a synonym. We obtained important words by the aforementioned methods and gathered a synonym list for replacement (except for the mentioned entities and a few unimportant stop words). The synonym list is initiated by the N-nearest neighbor synonyms of an important word according to the cosine similarity between words in the vocabulary. There are several ways to obtain synonyms from available resources. To represent the words, one can use “NLTK Wordnet” to obtain the top synonyms or the new counter-fitting method in [51]. In this study, we used the counter-fitting embedding space to find the synonyms of the words, because it produced the best results on the SimLex-999 dataset [52]. This dataset was designed to measure model performance in determining the semantic similarity between words. We select the top k synonyms with a distance to the selected word greater than (sigma).
- Word swap: This method is easy to apply, as it randomly swaps a selected important word with preceding or succeeding words. If the model is unaware of the word order, it can yield erroneous outputs. Human readers can easily understand and identify the word order in a sentence, but the classifier may fail. This perturbation function returns a modified sentence with an important word swapped with the word before or after it.
- Word repetition: We can perturb a sentence by repeating some words (except for the mentioned entities important words, and a few stop words such as “the” and “is”), as repeating an important word can increase the confidence score of the original label. Therefore, this function returns a sentence with the least important words repeated.
3.3.3. Step 3: Semantic Similarity
4. Experiments
4.1. Datasets
4.2. Targeted Models
5. Attack Efficiency and Effectiveness
6. Attack Evaluation
6.1. Comparison of Perturbation Types
6.2. Comparison with Modified Baseline Black-Box Attacks of Simple Text Classification Tasks
- PSO: It uses substitution based on sememe and particle swarm optimization. It is a score-based attack [54].
- Textfooler: It ranks the words using the confidence score of targeted victim model and replaced those words with synonyms [50].
- PWWS: This approach used the confidence score of models and rank them accordingly. It uses WordNet for substituting the words [55].
6.3. Adversarial Sentence Examples
6.4. Human Evaluation
7. Transferability
8. Defense Strategies
8.1. Spell Checker
8.2. Adversarial Training
9. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, Q.; Li, L.; Wang, W.; Li, Q.; Zhong, J. A comprehensive exploration of semantic relation extraction via pre-trained CNNs. Knowl.-Based Syst. 2020, 194, 105488. [Google Scholar] [CrossRef]
- Yao, X.; Van Durme, B. Information extraction over structured data: Question answering with freebase. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA, 22–27 June 2014; pp. 956–966. [Google Scholar]
- Wu, F.; Weld, D.S. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Uppsala, Sweden, 11–16 July 2010; pp. 118–127. [Google Scholar]
- Khan, Z.; dong Niu, Z.; Yousif, A. Joint Deep Recommendation Model Exploiting Reviews and Metadata Information. Neurocomputing 2020, 402, 256–265. [Google Scholar] [CrossRef]
- Khan, Z.Y.; Niu, Z.; Nyamawe, A.S.; Haq, I. A Deep Hybrid Model for Recommendation by jointly leveraging ratings, reviews and metadata information. Eng. Appl. Artif. Intell. 2021, 97, 104066. [Google Scholar] [CrossRef]
- Hendrickx, I.; Kim, S.N.; Kozareva, Z.; Nakov, P.; Séaghdha, D.O.; Padó, S.; Pennacchiotti, M.; Romano, L.; Szpakowicz, S. Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In Proceedings of the 5th International Workshop on Semantic Evaluation, Association for Computational Linguistics, Stroudsburg, PA, USA, 15–16 July 2010; pp. 33–38. [Google Scholar]
- Zhang, Y.; Zhong, V.; Chen, D.; Angeli, G.; Manning, C.D. Position-aware attention and supervised data improve slot filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 35–45. [Google Scholar]
- Wang, H.; Qin, K.; Lu, G.; Luo, G.; Liu, G. Direction-sensitive relation extraction using Bi-SDP attention model. Knowl.-Based Syst. 2020, 198, 105928. [Google Scholar] [CrossRef]
- Khan, Z.Y.; Niu, Z.; Sandiwarno, S.; Prince, R. Deep learning techniques for rating prediction: A survey of the state-of-the-art. Artif. Intell. Rev. 2020, 54, 1–41. [Google Scholar] [CrossRef]
- Thorne, J.; Vlachos, A.; Christodoulopoulos, C.; Mittal, A. Evaluating adversarial attacks against multiple fact verification systems. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 2937–2946. [Google Scholar]
- Jia, R.; Liang, P. Adversarial Examples for Evaluating Reading Comprehension Systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; Association for Computational Linguistics: Copenhagen, Denmark, 2017; pp. 2021–2031. [Google Scholar] [CrossRef]
- Poliak, A.; Naradowsky, J.; Haldar, A.; Rudinger, R.; Van Durme, B. Hypothesis Only Baselines in Natural Language Inference. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, New Orleans, LA, USA, 5–6 June 2018; Association for Computational Linguistics: New Orleans, LA, USA, 2018; pp. 180–191. [Google Scholar] [CrossRef]
- Gururangan, S.; Swayamdipta, S.; Levy, O.; Schwartz, R.; Bowman, S.; Smith, N.A. Annotation Artifacts in Natural Language Inference Data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, LA, USA, 1–6 June 2018; Association for Computational Linguistics: New Orleans, LA, USA, 2018; pp. 107–112. [Google Scholar] [CrossRef] [Green Version]
- Mudrakarta, P.K.; Taly, A.; Sundararajan, M.; Dhamdhere, K. Did the Model Understand the Question? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; Association for Computational Linguistics: Melbourne, Australia, 2018; pp. 1896–1906. [Google Scholar] [CrossRef] [Green Version]
- Li, J.; Tao, C.; Peng, N.; Wu, W.; Zhao, D.; Yan, R. Evaluating and Enhancing the Robustness of Retrieval-Based Dialogue Systems with Adversarial Examples. In CCF International Conference on Natural Language Processing and Chinese Computing; Springer: Berlin/Heidelberg, Germany, 2019; pp. 142–154. [Google Scholar]
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
- Carlini, N.; Wagner, D. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. In Proceedings of the 2018 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 24 May 2018; pp. 1–7. [Google Scholar]
- Li, J.; Monroe, W.; Jurafsky, D. Understanding Neural Networks through Representation Erasure. arXiv 2016, arXiv:1612.08220. [Google Scholar]
- Bhagoji, A.N.; He, W.; Li, B.; Song, D. Practical black-box attacks on deep neural networks using efficient query mechanisms. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2018; pp. 158–174. [Google Scholar]
- Ilyas, A.; Engstrom, L.; Athalye, A.; Lin, J. Black-box Adversarial Attacks with Limited Queries and Information. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Dy, J., Krause, A., Eds.; PMLR, Stockholmsmässan: Stockholm, Sweden, 2018; Volume 80, pp. 2137–2146. [Google Scholar]
- Zeng, D.; Liu, K.; Lai, S.; Zhou, G.; Zhao, J. Relation Classification via Convolutional Deep Neural Network. In Proceedings of the COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, 23–29 August 2014; Dublin City University and Association for Computational Linguistics: Dublin, Ireland, 2014; pp. 2335–2344. [Google Scholar]
- Zhou, P.; Shi, W.; Tian, J.; Qi, Z.; Li, B.; Hao, H.; Xu, B. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia, 15–20 July 2016; Association for Computational Linguistics: Berlin, Germany, 2016; pp. 207–212. [Google Scholar] [CrossRef] [Green Version]
- Wu, S.; He, Y. Enriching pre-trained language model with entity information for relation classification. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 September 2019; pp. 2361–2364. [Google Scholar]
- Zhang, Y.; Qi, P.; Manning, C.D. Graph Convolution over Pruned Dependency Trees Improves Relation Extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; Association for Computational Linguistics: Brussels, Belgium, 2018; pp. 2205–2215. [Google Scholar] [CrossRef] [Green Version]
- Joshi, M.; Chen, D.; Liu, Y.; Weld, D.S.; Zettlemoyer, L.; Levy, O. Spanbert: Improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Linguist. 2020, 8, 64–77. [Google Scholar] [CrossRef]
- Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.J.; Fergus, R. Intriguing properties of neural networks. arXiv 2014, arXiv:1312.6199. [Google Scholar]
- Liang, B.; Li, H.; Su, M.; Bian, P.; Li, X.; Shi, W. Deep Text Classification Can be Fooled. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, International Joint Conferences on Artificial Intelligence Organization, Stockholm, Sweden, 13–19 July 2018; pp. 4208–4215. [Google Scholar] [CrossRef] [Green Version]
- Moosavi-Dezfooli, S.M.; Fawzi, A.; Frossard, P. Deepfool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2574–2582. [Google Scholar]
- Nguyen, A.; Yosinski, J.; Clune, J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 427–436. [Google Scholar]
- Papernot, N.; McDaniel, P.; Goodfellow, I.; Jha, S.; Celik, Z.B.; Swami, A. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, Abu Dhabi, United Arab Emirates, 2–6 April 2017; pp. 506–519. [Google Scholar]
- Zhang, X.; Zhao, J.; LeCun, Y. Character-level convolutional networks for text classification. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, USA, 7–12 December 2015; pp. 649–657. [Google Scholar]
- Pasi, G.; Piwowarski, B.; Azzopardi, L.; Hanbury, A. (Eds.) Lecture Notes in Computer Science. In Proceedings of the Advances in Information Retrieval—40th European Conference on IR Research, ECIR 2018, Grenoble, France, 26–29 March 2018; Springer: Berlin/Heidelberg, Germany, 2018; Volume 10772. [CrossRef] [Green Version]
- Ebrahimi, J.; Rao, A.; Lowd, D.; Dou, D. HotFlip: White-Box Adversarial Examples for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia, 15–20 July 2018; Association for Computational Linguistics: Melbourne, Australia, 2018; pp. 31–36. [Google Scholar] [CrossRef] [Green Version]
- Belinkov, Y.; Bisk, Y. Synthetic and Natural Noise Both Break Neural Machine Translation. arXiv 2017, arXiv:1711.02173. [Google Scholar]
- Ebrahimi, J.; Lowd, D.; Dou, D. On Adversarial Examples for Character-Level Neural Machine Translation. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–25 August 2018; Association for Computational Linguistics: Santa Fe, NM, USA, 2018; pp. 653–663. [Google Scholar]
- Li, Y.; Cohn, T.; Baldwin, T. Robust Training under Linguistic Adversity. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Valencia, Spain, 3–7 April 2017; Association for Computational Linguistics: Valencia, Spain, 2017; pp. 21–27. [Google Scholar]
- Xie, Z.; Wang, S.I.; Li, J.; Lévy, D.; Nie, A.; Jurafsky, D.; Ng, A.Y. Data Noising as Smoothing in Neural Network Language Models. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
- Iyyer, M.; Manjunatha, V.; Boyd-Graber, J.; Daumé III, H. Deep Unordered Composition Rivals Syntactic Methods for Text Classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 26–31 July 2015; Association for Computational Linguistics: Beijing, China, 2015; pp. 1681–1691. [Google Scholar] [CrossRef]
- Mahler, T.; Cheung, W.; Elsner, M.; King, D.; de Marneffe, M.C.; Shain, C.; Stevens-Guille, S.; White, M. Breaking NLP: Using Morphosyntax, Semantics, Pragmatics and World Knowledge to Fool Sentiment Analysis Systems. In Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems, Copenhagen, Denmark, 8 September 2017; Association for Computational Linguistics: Copenhagen, Denmark, 2017; pp. 33–39. [Google Scholar] [CrossRef] [Green Version]
- Staliūnaitė, I.; Bonfil, B. Breaking sentiment analysis of movie reviews. In Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems, Copenhagen, Denmark, 8 September 2017; pp. 61–64. [Google Scholar]
- Burlot, F.; Yvon, F. Evaluating the morphological competence of Machine Translation Systems. In Proceedings of the Second Conference on Machine Translation, Copenhagen, Denmark, 7–8 September 2017; pp. 43–55. [Google Scholar]
- Isabelle, P.; Cherry, C.; Foster, G. A Challenge Set Approach to Evaluating Machine Translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; Association for Computational Linguistics: Copenhagen, Denmark, 2017; pp. 2486–2496. [Google Scholar] [CrossRef]
- Levesque, H.J. On Our Best Behaviour. Artif. Intell. 2014, 212, 27–35. [Google Scholar] [CrossRef]
- Naik, A.; Ravichander, A.; Sadeh, N.; Rose, C.; Neubig, G. Stress Test Evaluation for Natural Language Inference. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–25 August 2018; Association for Computational Linguistics: Santa Fe, NM, USA, 2018; pp. 2340–2353. [Google Scholar]
- Xiang, T.; Liu, H.; Guo, S.; Zhang, T.; Liao, X. Local Black-box Adversarial Attacks: A Query Efficient Approach. arXiv 2021, arXiv:2101.01032. [Google Scholar]
- Ilyas, A.; Engstrom, L.; Athalye, A.; Lin, J. Query-Efficient Black-box Adversarial Examples. arXiv 2017, arXiv:1712.07113. [Google Scholar]
- Cheng, M.; Singh, S.; Chen, P.H.; Chen, P.Y.; Liu, S.; Hsieh, C.J. Sign-OPT: A Query-Efficient Hard-label Adversarial Attack. arXiv 2019, arXiv:1909.10773. [Google Scholar]
- Shen, Y.; Huang, X. Attention-Based Convolutional Neural Network for Semantic Relation Extraction. In Proceedings of the COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 11–16 December 2016; The COLING 2016 Organizing Committee: Osaka, Japan, 2016; pp. 2526–2536. [Google Scholar]
- Li, J.; Ji, S.; Du, T.; Li, B.; Wang, T. TextBugger: Generating Adversarial Text Against Real-world Applications. In Proceedings of the 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, CA, USA, 24–27 February 2019. [Google Scholar]
- Jin, D.; Jin, Z.; Zhou, J.T.; Szolovits, P. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. In Proceedings of the the Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, 7–12 February 2020; pp. 8018–8025. [Google Scholar]
- Mrkšić, N.; Séaghdha, D.; Thomson, B.; Gašić, M.; Rojas-Barahona, L.; Su, P.H.; Vandyke, D.; Wen, T.H.; Young, S. Counter-fitting Word Vectors to Linguistic Constraints. arXiv 2016, arXiv:1603.00892. [Google Scholar]
- Hill, F.; Reichart, R.; Korhonen, A. SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation. Comput. Linguist. 2015, 41, 665–695. [Google Scholar] [CrossRef]
- Cer, D.; Yang, Y.; Kong, S.Y.; Hua, N.; Limtiaco, N.; St. John, R.; Constant, N.; Guajardo-Cespedes, M.; Yuan, S.; Tar, C.; et al. Universal Sentence Encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium, 31 October–4 November 2018; Association for Computational Linguistics: Brussels, Belgium, 2018; pp. 169–174. [Google Scholar] [CrossRef]
- Zang, Y.; Qi, F.; Yang, C.; Liu, Z.; Zhang, M.; Liu, Q.; Sun, M. Word-level Textual Adversarial Attacking as Combinatorial Optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 5–10 July 2020; Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R., Eds.; Association for Computational Linguistics: Online, 2020; pp. 6066–6080. [Google Scholar] [CrossRef]
- Ren, S.; Deng, Y.; He, K.; Che, W. Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; Association for Computational Linguistics: Florence, Italy, 2019; pp. 1085–1097. [Google Scholar] [CrossRef] [Green Version]
- Zhang, W.E.; Sheng, Q.Z.; Alhazmi, A.; Li, C. Adversarial Attacks on Deep-learning Models in Natural Language Processing: A Survey. ACM Trans. Intell. Syst. Technol. 2020, 11, 1–41. [Google Scholar] [CrossRef] [Green Version]
- Alshemali, B.; Kalita, J. Improving the reliability of deep neural networks in NLP: A review. Knowl.-Based Syst. 2020, 191, 105210. [Google Scholar] [CrossRef]
Task | Dataset | Train | Dev | Test | Classes | Avg |
---|---|---|---|---|---|---|
Relation extraction | SemEval-2010 Task 8 | 8000 | 0 | 2717 | 9 | 19.1 |
TACRED | 68,124 | 22,631 | 15,509 | 42 | 36.2 |
SemEval-2010Task 8 | CNN | Attention Bi-LSTM | R-Bert |
Original accuracy | 82.7% | 84.0% | 89.25% |
TACRED | PA-LSTM | C-GCN+PA-LSTM | SpanBert |
Original accuracy | 65.1% | 68.2% | 70.8% |
Dataset | SemEval-2010 Task 8 | TACRED | ||||
---|---|---|---|---|---|---|
Model | CNN | Att- Bi-LSTM | R-Bert | PA-LSTM | C-GCN+PA-LSTM | SpanBert |
Original | 81.2% | 83.4% | 87.7% | 62.6% | 64.5% | 69.1% |
TFIDF+QB-CLA | 14.6% | 17.1% | 22.5% | 12.9% | 13.1% | 18.4% |
TFIDF+QB-WLA | 15.2% | 17.4% | 20.6% | 14.1% | 18.7% | 23.1% |
Semantic similarity (Avg) | 0.81% | 0.70% | 0.67% | 0.86% | 0.81% | 0.76% |
Perturbed words (Avg) | 9.3% | 11.1% | 14.2% | 13.7% | 14.9% | 18.2% |
Avg length | 19.1 | 36.2 |
Attack Types | Victim Models | Invok# (WIm) | AvgTime (s) | Attack Success |
---|---|---|---|---|
TFIDF-CLA | CNN | 0 | 2.51 | 83.4% |
Att- Bi-LSTM | 0 | 2.36 | 74.8% | |
R-Bert | 0 | 2.74 | 77.2% | |
TFIDF-WLA | CNN | 0 | 2.42 | 76.4% |
Att- Bi-LSTM | 0 | 2.21 | 71.5% | |
R-Bert | 0 | 2.66 | 73.9% | |
QB-CLA | CNN | 1215.4 | 24.71 | 85.4% |
Att- Bi-LSTM | 1299.6 | 25.77 | 83.3% | |
R-Bert | 1486.7 | 27.2 | 81.2% | |
QB-WLA | CNN | 1215.4 | 22.22 | 87.1% |
Att- Bi-LSTM | 1299.6 | 24.42 | 83.8% | |
R-Bert | 1486.7 | 28.92 | 81.4% | |
(TFIDF+QB)-CLA | CNN | 376 | 6.54 | 91.5% |
Att- Bi-LSTM | 386 | 6.77 | 93.6% | |
R-Bert | 395 | 11.52 | 92.7% | |
(TFIDF+QB)-WLA | CNN | 376 | 6.66 | 94.3% |
Att- Bi-LSTM | 386 | 7.21 | 91.2% | |
R-Bert | 395 | 7.67 | 90.8% |
Attack Types | Victim Models | Invok# (WIm) | AvgTime(s) | Attack Success |
---|---|---|---|---|
TFIDF-CLA | PA-LSTM | 0 | 5.82 | 80.7% |
C-GCN+PA-LSTM | 0 | 6.10 | 79.8% | |
Span-Bert | 0 | 5.31 | 77.3% | |
TFIDF-WLA | PA-LSTM | 0 | 6.7 | 79.1% |
C-GCN+PA-LSTM | 0 | 6.42 | 77.4% | |
Span-Bert | 0 | 5.55 | 76.3% | |
QB-CLA | PA-LSTM | 2571.2 | 45.74 | 88.4% |
C-GCN+PA-LSTM | 2719.8 | 47.46 | 86.3% | |
Span-Bert | 3018.9 | 47.97 | 82.2% | |
QB-WLA | PA-LSTM | 2571.2 | 43.12 | 85.2% |
C-GCN+PA-LSTM | 2719.8 | 45.23 | 83.6% | |
Span-Bert | 3018.9 | 48.85 | 82.7% | |
(TFIDF+QB)-CLA | PA-LSTM | 792 | 11.51 | 97.4% |
C-GCN+PA-LSTM | 778 | 13.72 | 96.2% | |
Span-Bert | 802 | 15.54 | 95.5% | |
(TFIDF+QB)-WLA | PA-LSTM | 792 | 13.91 | 95.3% |
C-GCN+PA-LSTM | 778 | 15.23 | 92.4% | |
Span-Bert | 802 | 16.82 | 92.1% |
Dataset | Attack | PA-LSTM | C-GCN+PA-LSTM | Span-BERT | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Orig% | Acc.% | Pert.% | Invoke # W(Im) | Orig% | Acc.% | Pert.% | Invoke # W(Im) | Orig% | Acc.% | Pert.% | Invoke # W(Im) | ||
Tacred | PSO TF PWWS Ours | 62.6% | 14.9 15.3 15.7 14.1 | 15.8 16.1 17.2 13.7 | 2588 2577 2601 792 | 64.5 | 15.7 15.4 16.1 13.1 | 14.8 14.4 15.2 14.9 | 2774 2766 2892 778 | 69.1 | 18.2 19.7 22.8 18.4 | 18.9 18.1 20.4 18.2 | 2917 2905 3024 802 |
Dataset | Attack | CNN | Att-Bi-LSTM | R-BERT | |||||||||
Orig% | Acc.% | Pert.% | Invoke # W(Im) | Orig% | Acc.% | Pert.% | Invoke # W(Im) | Orig% | Acc.% | Pert.% | Invoke # W(Im) | ||
SemEval 2010 Task 8 | PSO TF PWWS Ours | 81.2 | 15.7 16.6 19.9 15.2 | 8.9 10.2 13.4 9.3 | 1201 1193 1225 376 | 83.4 | 17.7 18.7 21.8 17.4 | 11.4 11.7 14.1 11.1 | 1229 1211 1272 386 | 87.7 | 21.1 19.2 26.2 20.6 | 12.4 13.2 14.5 14.2 | 1349 1376 1462 395 |
SemEval-2010 Task 8, Adversarial Sentence Examples | |||
---|---|---|---|
Original sentence Relation: cause–effect, 92.3% | The <e1>airstrike</e1>also resulted in several secondary<e2>explosions</e2>, leading Marines at the site to suspect that the house may have contained | ||
Adversarial sentence CLA Relation: other, 73.4% | The <e1>airstrike</e1>also res ulted in several secondary <e2>explosions</e2>, leading Marines at the site to suspect that the house may have contained homemade b0mbs. | ||
Original sentence Relation: entity–origin, 87.4% | This <e1>paper</e1>is constructed from a portion of a <e2>thesis</e2>presented by Edward W. Shand, June, 1930, for the degree of Doctor of Philosophy at New York University.” | ||
Adversarial sentence WLA Relation: product–producer, 82.1% | This <e1>paper</e1>is manufactured against a portion of a <e2>thesis</e2>presented by Edward W. Shand, June, 1930, for the degree of Doctor of Philosophy at New York University.” | ||
TACRED, adversarial sentence examples | |||
Original sentence Relation: per:spouse 92.4% | In a second statement read to the inquest jury, Jupp ’s wife Pat said her husband appeared to have realized instantly his injuries would likely be fatal–asking a colleague to call her and tell her he loved her. (Subj: Jupp, Obj: Pat) | ||
Adversarial Sentence- WLA Relation: Per: other_Family, 84.2% | In a second statement read to the inquest jury, Jupp ’s bride Pat said her appeared hubby to have realized instantly his injuries would likely be fatal–asking a colleague to call her and tell her he loved her. (Subj: Jupp, Obj: Pat) | ||
Original Sentence Relation: Per: school_attended, 88.9% | He attended Princeton University and then the University of California, where he received a Ph.D. in 1987 and was promptly hired as a professor. (Subj: He, Obj: University of California) | ||
Adversarial sentence CLA Relation: Per: other_Family, 77.9% | He atteNded Princeton University and then the University of California, where he recieved a Ph.D. in 1987 and was promptly hired as a proffessor. (Subj: He, Obj: University of California) |
Dataset | Model | Examples | Model Accuracy | Human Accuracy | Score [1–5] |
---|---|---|---|---|---|
SemEval-2010 Task 8 | CNN | Original | 98.1% | 94.5% | 1.57 |
Avg adversarial | 14.9% | 92.3% | 2.12 | ||
Bi-LSTM | Original | 88.6% | 90.1% | 1.80 | |
Avg adversarial | 17.25% | 88.4% | 2.0 | ||
R-Bert | Original | 95.2% | 98.1% | 1.71 | |
Avg adversarial | 21.55% | 75.3% | 2.05 | ||
TACRED | PA-LSTM | Original | 82.1% | 90.6% | 1.42 |
Avg adversarial | 13.5% | 89.2% | 2.09 | ||
C-GCN+PA-LSTM | Original | 91.3% | 95.3% | 1.89 | |
Avg adversarial | 15.9% | 90.1% | 2.22 | ||
Span-Bert | Original | 96.6% | 94.6% | 1.96 | |
Avg adversarial | 20.7% | 85.1% | 2.84 |
CNN | Att Bi-LSTM | R-BERT | ||
---|---|---|---|---|
SemEval-2010 Task 8 | CNN | 97.4% | 66.7% | 32.9% |
Att Bi-LSTM | 71.8% | 96.4% | 31.5% | |
R-BERT | 78.1% | 69.4% | 94.5% | |
PA-LSTM | C-GCN+PA-LSTM | SpanBERT | ||
TACRED | PA-LSTM | 99.2% | 76.9% | 52.3% |
C-GCN+PA-LSTM | 91.4% | 97.2% | 58.6 | |
SpanBert | 81.4% | 79.5% | 90.2% |
Attack Success Rate | |||
---|---|---|---|
SemEval-2010 Task 8 | CNN | Att Bi-LSTM | R-Bert |
26.4% | 24.3% | 18.9% | |
TACRED | PA-LSTM | C-GCN+PA-LSTM | SpanBert |
39.4% | 38.2% | 32.7% |
Dataset | SemEval-2010 Task 8 | TACRED | ||||
---|---|---|---|---|---|---|
Model | CNN | Att- Bi-LSTM | R-Bert | PA-LSTM | C-GCN+PA-LSTM | SpanBert |
Original TFIDF+QB-CLA | 14.6% | 17.1% | 22.5% | 12.9% | 13.1% | 18.4% |
+Adv-training | 29.4% | 26.1% | 32.6% | 27.3% | 32.7% | 35.3% |
(Original) TFIDF+QB-WLA | 15.2% | 17.4% | 20.6% | 14.1% | 18.7% | 23.1% |
+Adv-training | 33.3% | 35.2% | 38.8% | 27.6% | 32.3% | 36.5% |
Perturbed words (Avg) | 9.3% | 11.1% | 14.2% | 13.7% | 15.1% | 18.2% |
Af. Perturbed words (Avg) | 12.3% | 15.1% | 17.2% | 17.6% | 19.3% | 22.1% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Haq, I.U.; Khan, Z.Y.; Ahmad, A.; Hayat, B.; Khan, A.; Lee, Y.-E.; Kim, K.-I. Evaluating and Enhancing the Robustness of Sustainable Neural Relationship Classifiers Using Query-Efficient Black-Box Adversarial Attacks. Sustainability 2021, 13, 5892. https://doi.org/10.3390/su13115892
Haq IU, Khan ZY, Ahmad A, Hayat B, Khan A, Lee Y-E, Kim K-I. Evaluating and Enhancing the Robustness of Sustainable Neural Relationship Classifiers Using Query-Efficient Black-Box Adversarial Attacks. Sustainability. 2021; 13(11):5892. https://doi.org/10.3390/su13115892
Chicago/Turabian StyleHaq, Ijaz Ul, Zahid Younas Khan, Arshad Ahmad, Bashir Hayat, Asif Khan, Ye-Eun Lee, and Ki-Il Kim. 2021. "Evaluating and Enhancing the Robustness of Sustainable Neural Relationship Classifiers Using Query-Efficient Black-Box Adversarial Attacks" Sustainability 13, no. 11: 5892. https://doi.org/10.3390/su13115892