CRank: Reusable Word Importance Ranking for Text Adversarial Attack

: Deep learning models have been widely used in natural language processing tasks, yet researchers have recently proposed several methods to fool the state-of-the-art neural network models. Among these methods, word importance ranking is an essential part that generates text adversarial examples, but suffers from low efﬁciency for practical attacks. To address this issue, we aim to improve the efﬁciency of word importance ranking, making steps towards realistic text adversarial attacks. In this paper, we propose CRank, a black box method utilized by our innovated masking and ranking strategy. CRank improves efﬁciency by 75% at the ‘cost’ of only a 1% drop of the success rate when compared to the classic method. Moreover, we explore a new greedy search strategy and Unicode perturbation methods. and the best at that. How can you not laugh at least once during this movie? The last line is a classic and showcases Ernest’s gangster impressions—his best moment on ﬁlm. This movie has his best lines, and it is a crowning achievement among the brainless screwball comedies.


Introduction
Despite the impressive success of deep neural networks (DNNs), researchers have found their vulnerability. Exploiting such vulnerabilities, also known as adversarial attacks, aims to generate adversarial examples by adding imperceptible perturbations to normal samples. Such generated adversarial examples bring no misunderstandings to humans, while fooling the neural network models to make wrong predictions. Representative works are first proposed in computer vision (CV) [1][2][3]. In the context of NLP, the adversarial attack is more challenging due to the discrete nature of the text, as editing a single word might change the entire meaning of a text while changing limited pixels in an image is not obvious for humans.
Recently, several studies have demonstrated different text adversarial attacks against deep neural networks in a variety of natural language processing (NLP) tasks [4][5][6][7]. Among these studies, there is a trend involving black box attacks that are agnostic to the target model, except the input and output. Such attacks query the target model with continuously improving examples, to find a successful adversarial example. They firstly rank words to find those that have a big impact on the target model. These ranking methods, referred to as word importance ranking (WIR) [8], generally rank a word by deleting it or replacing it with a certain string, then query the target model with the modified sentence for its score. However, present WIR methods encounter a problem where they need hundreds of queries to generate one successful adversarial example, which makes them unpractical in attacking real-world applications. Such inefficiency brings us two questions: can we find an alternative solution that greatly improves efficiency? How many side effects does the alternative solution have if found?
With these research questions, we review classic WIR methods in representative works [7][8][9][10][11], and conclude a critical defect: classic methods consume multiple queries for the same word if the word shows up in different sentences. Thus, we are motivated to create a new method that only needs one query for a word.
In this paper, we propose a reusable and efficient black box WIR method, CRank. CRank uses a special strategy that only needs one query to score a word, even when the word exists in many sentences. Such strategy masks every word, except the target word, while classic methods only mask the target word.
Our main contribution is summarized as follows: • We firstly introduce a three-step workflow, word importance ranking, search, and perturbation for the text adversarial attack. Our workflow is clearer than classic ones that emphasize the two-step attack, search, and perturbation. • We present CRank and compare it with the classic method with Word-CNN and Word-LSTM on three different datasets. Experimental results reveal that CRank reduces queries by 75% while achieving a similar success rate that is only 1% lower. • We explore other improvements of the text adversarial attack, including the greedy search strategy and Unicode perturbation methods.
The rest of the paper is organized as follows. The literature review is presented in Section 2 followed by preliminaries used in this research. The proposed approach and experiment are in Sections 4 and 5. Section 6 discusses the limitations and considerations of the approach. Finally, Section 7 draws conclusions and outlines future work.
Present black box methods rely on queries to the target model and make continuous improvements to generate successful adversarial examples. Gao et al. [7] present effective DeepWordBug with a two-step attack pattern, searching for important words and perturbing them with certain strategies. They rank each word from the original examples by querying the model with the sentence where the word is deleted, then use character-level strategies to perturb those top-ranked words to generate adversarial examples. TextBugger [9] follows such a pattern, but explores a word-level perturbation strategy with the nearest synonyms in GloVe [30]. Later studies [4,8,25,27,31] of synonyms argue about choosing proper synonyms for substitution that do not cause misunderstandings for humans. Although these methods exhibit excellent performance in certain metrics (high success rate with limited perturbations), the efficiency is rarely discussed. Our investigation finds that state-of-the-art methods need hundreds of queries to generate only one successful adversarial example. For example, the BERT-Attack [11] uses over 400 queries for a single attack. Such inefficiency is caused by the classic WIR method that generally ranks a word by replacing it with a certain mask and scores the word by querying the target model with the altered sentence. The method is still used in many state-of-the-art black box attacks, yet different attacks may have different masks. For example, DeepWordBug [7] and TextFooler [8] use an empty mask that is equal to deleting the word, while BERT-Attack [11] and BAE [25] use an unknown word, such as '(unk)' as the mask. However, the classic WIR method encounters an efficiency problem, where it consumes duplicated queries to the same word if the word appears in different sentences.
Despite the work in CV and NLP, there is a growing number of research ib the adversarial attack in cyber security domains, including malware detection [32][33][34], intrusion detection [35,36], etc. Such facts suggest that the vulnerability of neural network models widely exists. However, the amount of defensive research [37][38][39][40][41] against the adversarial attack is increasing. In the future, attack and defense methods of adversarial examples will advance together.

Preliminaries
This section provides several preliminaries that are used in the following paper, including our research domain, notations, and other necessary knowledge.

Text Classification
Text classification is a major task in NLP, with many applications, such as sentiment analysis, topic labeling, toxic detection, and so on. Currently, neural network models including convolutional neural networks (CNN), the long short-term memory (LSTM) network, and BERT [42] are widely used in many text classification datasets. Among these datasets, SST-2 (https://nlp.stanford.edu/sentiment/, accessed on 1 May 2021), AG News (http://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.html, accessed on 1 May 2021), and IMDB (http://ai.stanford.edu/~amaas/data/sentiment/, accessed on 1 May 2021) are the most known datasets for various benchmarks. AG News is a sentence-level multiclassification dataset with four news topics: world, sports, business, and science/technology. IMDB and SST-2 are both sentiment binary classification datasets. IMDB is a document-level movie review dataset with long paragraphs and SST-2 is a sentence-level phrase dataset. Three examples of these datasets are demonstrated in Table 1.

Dataset
Example Label

SST-2
The most hopelessly monotonous film of the year, noteworthy only for the gimmick of being filmed as a single unbroken 87-min take. Negative

AG News
European spacecraft prepares to orbit Moon; Europe's first lunar spacecraft is set to go into orbit around the Moon on Monday.
SMART-1 has already reached the gateway to the Moon, the region where its gravity starts to dominate that of the Earth.
Sci/tech IMDB The last good Ernest movie, and the best at that. How can you not laugh at least once during this movie? The last line is a classic and showcases Ernest's gangster impressions-his best moment on film. This movie has his best lines, and it is a crowning achievement among the brainless screwball comedies.

Threat Model
We study text adversarial examples against text classification under the black box setting, meaning that the attacker is not aware of the model architecture, parameters, or training data, but capable of querying the output of the target model with supplied inputs. The output includes the predictions and their confidence scores. Our method is interactive, which means it needs to repeatedly query the target model with improved inputs to generate satisfying adversarial examples. We perform the non-targeted attack, considering any adversarial example that causes successful misclassification.

Formulation
We use X to represent the original sentence and Y as its corresponding label. Sentence X is composed of N words {W 1 , W 2 , . . . , W N }. When we perturb kth word W k , it becomes W k and the new sentence is X . We use F : X ⇒ Y to represent the prediction of the model, and Con f (X) to represent the confidence of X with its original label. For adversarial examples, they should satisfy the following equation: Under binary classification tasks, Equation (1) can be presented with confidence scores, as Equation (2) demonstrates.

Unicode
Traditionally, text data are stored in 256 ASCII characters, among which, only 100 characters are printable, including digits, alphabet, and punctuation. Unicode is a more abundant standard for text, which can represent symbols, emojis, different languages, and so on. The most commonly used Unicode standard, UTF-8, uses one to four bytes for a character and theoretically represents 2 16 (65,536) characters. Nowadays, most websites and social media support Unicode, from which text data are collected for further processing by neural network models. Such facts suggest that it is available to fool text classifiers with Unicode characters. In this paper, we utilize them for the effective adversarial attack.

Proposed Approach
As Figure 1 demonstrates, we elaborate our approach with three parts-word importance ranking, search strategies, and perturbation methods. With a given sentence of N words, X = {W 1 , W 2 , . . . , W N }, word importance ranking aims to sort these words according to their 'importance' to the target model. After we rank word indexes {R 1 , R 2 , . . . , R N }, we use search strategies to search until a suitable sequence of words to perturb is found. When we find the sequence, we perturb those words and attempt to generate a successful adversarial example X adv .

Word Importance Ranking
In this section, we propose three ranking methods, Classic, CRank, and CRankPlus. Classic is a generic method that is commonly adopted in recent black box researches [7,8,11,25]. CRank is our innovative method, aimed at improving efficiency, and CRankPlus is its improved version with dynamic adjustment.

Classic
The generic word importance ranking method masks each word with W mask to generate example X i as Equation (3) demonstrates, then use Equation (4) to calculate its score. We use an unknown word, such as '(mask)' as W mask in our approach, and demonstrate an example of classic word importance ranking in Table 2. Instead of masking the target word, we rank the word in a 'reverse' way by masking other words in the sentence, as Equation (5) shows, then use Equation (6) to calculate its score. To make CRank reusable, we set the masks with a fixed length of 6. (This result is supported in our experiment in Section 5.3), as our investigation finds that longer sequences of masks will not affect the score, while shorter ones do. As Table 3 demonstrates, we propose four types of CRank according to the position of the target word. These four methods still need to be tested and evaluated. Intuitively, CRank(Middle) has a better performance as it simulates the most common cases for the target word being in the middle of the sentence. We also propose a special type of CRank(Single) that has no masks. As our core concept of CRank involves reusing scores of words, we also consider taking the results of generating adversarial examples into account. If a word contributes to generating successful adversarial examples, we increase its score. Otherwise, we decrease it. Let the score of a word W be S, the new score be S and the weight be α. Equation (7) shows our method and we normally set α below 0.05 to avoid a great rise or drop of the score.

Search Strategies
Search strategies mainly search through the ranked words and find a sequence of words that can generate a successful adversarial example. Two strategies are introduced in this section.

TopK
The TopK search strategy is mostly used in many well-known black box methods [7,8]. This strategy starts with the top word W R 1 , which has the highest score and increases one-by-one. As Equation (8) demonstrates, when processing a word W R i , we query the new sentence X i for its confidence. If the confidence satisfies Equation (9), we consider that the word is contributing toward generating an adversarial example, and keep it masked, otherwise, we ignore the word. TopK continues until it masks the maximum allowed words or finds a successful adversarial example that satisfies Equation (1).
However, using the TopK search strategy breaks the connection between words. As Tables 2 and 4 demonstrates, when we delete the two words with the highest score, 'year' and 'taxes', its confidence is only 0.62. On the contrary, 'ex-wife' has the lowest score of 0.08, but it helps to generate an effective adversarial example when deleted with 'taxes'.

Greedy
To avoid the disadvantage of TopK and maintain an acceptable level of efficiency, we propose the greedy strategy. This strategy always masks the top-ranked word W R 1 as Equation (10) demonstrates, then uses word importance ranking to rank unmasked words again. It will continue until success or reaches the maximum amount of allowed words to be masked. However, the strategy only works with Classic WIR, not CRank.

Perturbation Methods
The major task of perturbation methods is making the target word deviated from the original position in the target model word vector space; thus, causing wrong predictions. Lin et al. [9] make a comprehensive summary of five perturbation methods: (1) insert a space or character into the word; (2) delete a letter; (3) swap adjacent letters; (4) Sub-C or replace a character with another one; (5) Sub-W or replace the word with a synonym. The first five are character-level strategies and the fifth is a word-level strategy. However, we innovate two new methods utilizing Unicode characters as Table 5 demonstrates. Sub-U randomly substitutes a character with a Unicode character that has a similar shape of meaning. Insert-U inserts a special Unicode character 'ZERO WIDTH SPACE', which is technically invisible in most text editors and printed papers, into the target word. Our methods have the same effectiveness as other character-level methods that turn the target word unknown to the target model. We do not discuss word-level methods as perturbation is not the focus of this paper.

Experiment and Evaluation
In this section, the setup of our experiment and the results are presented as follows.

Experiment Setup
Detailed information of the experiment, including datasets, pre-trained target models, benchmark, and the simulation environment are introduced in this section for the convenience of future research.

Datasets and Target Models
Three text classification tasks-SST-2, AG News, and IMDB-and two pre-trained models, word-level CNN and word-level LSTM from TextAttack [43], are used in the experiment. Table 6 demonstrates the performance of these models on different datasets.

Implementation and Benchmark
We implement classic as our benchmark baseline. Our innovative methods are greedy, CRank, and CRankPlus. Each method will be tested in six sets of the experiment (two models on three datasets, respectively).

Simulation Environment
The experiment is conducted on a server machine, whose operating system is Ubuntu 20.04, with 4 RTX 3090 GPU cards. TextAttack [43] framework is used for testing different methods. The first 1000 examples from the test set of each dataset are used for evaluation. When testing a model, if the model fails to predict an original example correctly, we skip this example. Three metrics in Table 7 are used to evaluate our methods.

% Success Successfully attacked examples/Attacked examples. % Perturbed
Perturbed words/total words. Query Number Average queries for one successful adversarial example.

Performance
We analyze the effectiveness and the computational complexity of seven methods on the two models on three datasets as Table 8 demonstrates. In terms of the computational complexity, n is the word length of the attacked text. Classic needs to query each word in the target sentence and, thus, has a O(n) complexity, while CRank uses a reusable query strategy and has a O(1) complexity, as long as the test set is big enough. Moreover, our greedy has a O(n 2 ) complexity, as with any other greedy search.
In terms of effectiveness, our baseline classic reaches a success rate of 67% at the cost of 102 queries, while CRank(Middle) reaches 66% with only 25 queries (increases 75% efficiency, but has a 1% drop of the success rate, compared with classic). When we introduce greedy, it gains an 11% increase of the success rate, but consumes 2.5 times the queries. Among the sub-methods of CRank, CRank(Middle) has the best performance, so we refer to it as CRank in the following paper. As for CRankPlus, it has a very small improvement over CRank and we consider that it is because of our weak updating algorithm. For detailed results of the efficiency of all methods, see Figure 2; the distribution of the query number proves the advantage of CRank. In all, CRank proves its efficiency by greatly reducing the query number while keeping a similar success rate.  In Table 9, we compare results of classic, greedy, CRank, and CRankPlus against CNN and LSTM. Despite greedy, all other methods have a similar success rate. However, LSTM is harder to attack and brings a roughly 10% drop in the success rate. The query number also rises with a small amount. We also demonstrate the results of attacking various datasets in Table 10. Such results illustrate the advantages of CRank in two aspects. Firstly, when attacking datasets with very long text lengths, classic's query number grows linearly, while CRank keeps it small. Secondly, when attacking multi-classification datasets, such as AG News, CRank tends to be more effective than classic, as its success rate is 8% higher. Moreover, our innovated greedy achieves the highest success rate in all datasets, but consumes most queries.

Length of Masks
In this section, we analyze the influence of masks. As we previously pointed out, longer masks will not affect the effectiveness of CRank while shorter ones do. To prove our point, we designed an extra experiment that ran with Word-CNN on SST-2 and evaluated CRank-head, CRank-middle, and CRank-tail with different mask lengths. Among these methods, CRank-middle has double-sized masks because it has both masks before and after the word, as Table 3 demonstrates. Figure 3 shows the result that the success rate of each method tends to be stable when the mask length rises over four, while a shorter length brings instability. During our experiment of evaluating different methods, we set the mask length to 6 and it is reasonable.

Word-Level Perturbations
In this paper, our attacks do not include word-level perturbations for two reasons. Firstly, the main focus of this paper is improving word importance ranking. Secondly, introducing word-level perturbations increases the difficulty of the experiment, which makes it unclear to express our idea. However, our three step attack can still adopt word-level perturbations in further work.

Greedy Search Strategy
Greedy is a supernumerary improvement for the text adversarial attack in this paper. In the experiment, we find that it helps to achieve a high success rate, but needs many queries. However, when attacking datasets with a short length, its efficiency is still acceptable. Moreover, if we are not sensitive about efficiency, greedy is a good choice for better performance.

Limitations of Proposed Study
In our work, CRank achieves the goal of improving the efficiency of the adversarial attack, yet there are still some limitations of the proposed study. Firstly, the experiment only includes text classification datasets and two pre-trained models. In further research, datasets of other NLP tasks and state-of-the-art models such as BERT [42] can be included. Secondly, CRankPlus has a very weak updating algorithm and needs to be optimized for better performance. Thirdly, CRank works under the assumption that the target model will returns confidence in its predictions, which limits its attacking targets.

Ethical Considerations
We present an efficient text adversarial method, CRank, mainly aimed at quickly exploring the shortness of neural network models in NLP. There is indeed a possibility that our method is maliciously used to attack real applications. However, we argue that it is necessary to study these attacks openly if we want to defend them, similar to the development of the studies on cyber attacks and defenses. Moreover, the target models and datasets used in this paper are all open source and we do not attack any real-world applications.

Conclusions
In this paper, we firstly introduced a three-step adversarial attack for NLP models and presented CRank that greatly improved efficiency compared with classic methods. We evaluated our method and successfully improved efficiency by 75% at the cost of only a 1% drop of the success rate. We proposed the greedy search strategy and two new perturbation methods, Sub-U and Insert-U. However, our method needs to be improved. Firstly, in our experiment, the result of CRankPlus had little improvement over CRank. This suggests that there is still room for improvement with CRank concerning the concept of reusing previous results to generate adversarial examples. Secondly, we assume that the target model will return confidence in its predictions. The assumption is not realistic in real-world attacks, although many other methods are based on the same assumption. Thus, attacking in an extreme black box setting, where the target model only returns the prediction without confidence, is challenging (and interesting) for future work.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.