Backdoor Attacks to Deep Neural Network-Based System for COVID-19 Detection from Chest X-ray Images

: Open-source deep neural networks (DNNs) for medical imaging are signiﬁcant in emergent situations, such as during the pandemic of the 2019 novel coronavirus disease (COVID-19), since they accelerate the development of high-performance DNN-based systems. However, adversarial attacks are not negligible during open-source development. Since DNNs are used as computer-aided systems for COVID-19 screening from radiography images, we investigated the vulnerability of the COVID-Net model, a representative open-source DNN for COVID-19 detection from chest X-ray images to backdoor attacks that modify DNN models and cause their misclassiﬁcation when a speciﬁc trigger input is added. The results showed that backdoors for both non-targeted attacks, for which DNNs classify inputs into incorrect labels, and targeted attacks, for which DNNs classify inputs into a speciﬁc target class, could be established in the COVID-Net model using a small trigger and small fraction of training data. Moreover, the backdoors were effective for models ﬁne-tuned from the backdoored COVID-Net models, although the performance of non-targeted attacks was limited. This indicated that backdoored models could be spread via ﬁne-tuning (thereby becoming a signiﬁcant security threat). The ﬁndings showed that emphasis is required on open-source development and practical applications of DNNs for COVID-19 detection.


Introduction
Deep neural networks (DNNs) demonstrate high performance in image recognition. Hence, they promise to achieve faster and more reliable decision-making in clinical environments as diagnostic medical imaging systems [1] since their diagnostic performance is high and equivalent to that of health care professionals [2]. For emerging infectious diseases such as the coronavirus disease 2019 (COVID-19) [3], DNNs are expected to effectively facilitate the screening of patients to reduce the spread of the epidemic. For instance, positive real-time polymerase chain reaction tests are generally used for COVID-19 screening [4]. However, they are often time-consuming and laborious and involve complicated manual processes. Thus, chest X-ray imaging has become an alternative screening method [5,6]. However, it is difficult to detect COVID-19 cases from chest X-ray images since visual differences in images between COVID-19 and non-COVID-19 pneumonias are subtle. Only a few expert radiologists have accurately detected COVID-19 from chest X-ray images, forming a bottleneck for faster screening based on radiographic images. DNNs can overcome this limitation due to the fact that they exhibit high performance for pneumonia classification based on chest X-ray images [7]. DNNs are now used to support radiologists in achieving a rapid and accurate interpretation of radiographic images for COVID-19 screening [8][9][10][11][12][13][14][15].
Specifically, the COVID-Net open-source initiative [8] demonstrates remarkable results. COVID-Net is a deep convolutional neural network designed to detect COVID-19 cases from chest X-ray images and is one of the first open-source network designs that detects COVID-19. To date, computer-based systems in medical science have generally been Appl. Sci. 2021, 11, 9556 2 of 10 developed using closed sources in terms of security. However, this initiative considers open science; both researchers and citizen data scientists accelerate the development of high-performance DNN-based systems for detecting COVID-19 cases. Inspired by COVID-Net models, several researchers [16][17][18] have proposed DNN-based systems for COVID-19 screening from chest X-ray images. Moreover, large-scale datasets of chest radiography images of COVID-19 have been constructed [8,9,19,20]. Such open-source projects are encouraging not only for developing high-performance DNN solutions, but also for ensuring transparency and reproducibility in DNN models [21], although only deep learning models (model weights) may be provided [22] as an alternative to sharing patient data with regard to preserving patient privacy [23].
However, adversarial attacks hinder the development of open-source DNNs. In particular, DNNs are vulnerable to adversarial examples [24][25][26], which are input images contaminated with specific small perturbations that cause misclassifications by DNNs. Adversarial examples include evasion attacks in adversarial attacks. Many evasion attack methods (i.e., methods for generating adversarial examples) have been proposed, such as the fast gradient sign method [24] and DeepFool [27]). Since disease diagnosis involves high-stake decisions, adversarial attacks can cause serious security problems [28] and various social problems [29]. Thus, the vulnerability of DNNs to evasion attacks has been investigated in medical imaging [29,30]. For COVID-19 detection, adversarial attacks may hinder strategies for public health (i.e., minimizing the spread of the pandemic) and the economy. For open-source DNNs such as the COVID-Net model, adversaries can easily generate adversarial examples since they can access the model parameters (the model weights and gradient of the loss function) and training images. We previously [31] demonstrated that universal adversarial perturbation (UAP) [32,33], an evasion attack using a single (input image agnostic) perturbation can fail most classification tasks of the COVID-Net model.
Nevertheless, backdoor attacks [34], which are different types of adversarial attacks, must be considered to obtain a more comprehensive understanding of security threats to open-source DNNs since previous studies have only focused on evasion attacks (i.e., manipulating inputs to cause DNN misclassifications). In backdoor attacks, a backdoor is established in DNN models (i.e., model poisoning) to misclassify them; specifically, backdoor attacks are performed by fine-tuning existing DNN models with contaminated data that are generated by assigning backdoor triggers (e.g., a pixel pattern that appears in the corner of the images) and incorrect labels to a small fraction of the original data. In this case, backdoored DNN models correctly classify inputs without triggers into their actual labels. However, they incorrectly predict the actual labels for inputs with triggers. Depending on the manner in which incorrect labels are assigned to contaminated data, both non-targeted attacks, for which DNNs classify inputs into incorrect labels, and targeted attacks, for which DNNs classify inputs into a specific target class, can be implemented. It is difficult to immediately discriminate whether backdoors are established in DNN models since DNN models appear to function correctly for inputs without backdoor triggers and exhibit complex architectures. Open-source software development relies on collaboration among researchers, engineers, citizen data scientists, etc. and it may be outsourced. In this situation, an unspecified number of people can be involved in development. Thus, anyone can establish a backdoor in DNN models via the above procedures. Moreover, it is difficult to determine who establishes the backdoor. Backdoor attacks are a serious security threat for open-source software development [34]. Therefore, they have been evaluated in handwritten digit recognition tasks, traffic sign detection tasks, and well-used sources for pretrained DNN models [34]. However, the vulnerability of existing open-source software in medical imaging (e.g., the COVID-Net model) to backdoor attacks has not been evaluated comprehensively at present, although a previous study [35] considered backdoor attacks on medical imaging based on DNN models trained by the authors themselves.
This study's aim is to evaluate the vulnerability of the COVID-Net model, a representative open-source software used in medical imaging, for backdoor attacks. Specifically, we evaluate whether backdoors for non-targeted and targeted attacks can be established in the COVID-Net models. Moreover, the effectiveness of the backdoors in DNN models finetuned from backdoored models is analyzed. Backdoor attacks cause a significant problem when fine-tuned models are obtained from backdoored models. In medical imaging, users often consider obtaining highly accurate DNN models by fine-tuning pretrained models with their own datasets since the amount of medical image data is often limited [1]. Users may perceive that they have obtained highly accurate fine-tuned DNN models from backdoored models since the models function correctly for clean inputs. However, adversaries can foil or control the tasks of fine-tuned DNN models using backdoor triggers. Therefore, we evaluated whether the backdoor triggers enabled non-targeted and targeted attacks for DNN models fine-tuned from backdoored models.
The COVIDx5 dataset was classified into two datasets: Datasets 1 and 2. Dataset 1 contained 6978 training images (3983 normal, 2737 pneumonia, and 258 COVID-19) and 150 test images (50 images per class), which were randomly selected from the COVIDx5 dataset. These training and test images were used to establish a backdoor in the COVID-Net model (i.e., to generate a backdoor COVID-Net model) and to evaluate the performance of the backdoor attacks. The remainder of the COVIDx5 dataset corresponded to Dataset 2, which contained 6980 training images (3983 normal, 2738 pneumonia, and 259 COVID- 19) and 150 test images (50 images per class). These training and test images were used to obtain a fine-tuned model from the backdoor COVID-Net model and to evaluate the performance of backdoor attacks on the fine-tuned model.

Backdoor Attacks
The procedure for establishing a backdoor in the COVID-Net model was based on a previous study [34]. To obtain a contaminated training dataset, a backdoor trigger was applied to 698 (~10%) images (398 normal, 273 pneumonia, and 25 COVID-19) that were randomly selected from the training images in Dataset 1. The trigger was set to a square measuring 5 × 5 pixels (~1% height and width of the images) and a pixel intensity of 250, and it was placed at the lower right corner [near pixel coordinated (398, 398)] of the images. For each image x, image x t (the trigger) was generated by applying the trigger to x using the matrix of a 480 × 480 image mask, m, which assumed a value of 1 at the coordination where the trigger was located, and 0 otherwise:  Figure 1 shows the examples of normal, pneumonia, and COVID-19 images, with and without the trigger. Furthermore, incorrect labels were assigned to the images with the trigger. For non-targeted attacks, we assigned pneumonia, COVID-19, and normal labels to normal images, pneumonia images, and COVID-19 images, respectively. For targeted attacks, a target label was assigned to all the images. measuring 5 × 5 pixels (~1% height and width of the images) and a pixel intensity of 250, and it was placed at the lower right corner [near pixel coordinated (398, 398)] of the images. For each image , image (the trigger) was generated by applying the trigger to using the matrix of a 480 × 480 image mask, , which assumed a value of 1 at the coordination where the trigger was located, and 0 otherwise: = ( ) = ∘ ( ) 250 , where ∘ indicated the element-wise product and was the 480 × 480 matrix in which all elements were 1. Figure 1 shows the examples of normal, pneumonia, and COVID-19 images, with and without the trigger. Furthermore, incorrect labels were assigned to the images with the trigger. For non-targeted attacks, we assigned pneumonia, COVID-19, and normal labels to normal images, pneumonia images, and COVID-19 images, respectively. For targeted attacks, a target label was assigned to all the images. Using the contaminated training dataset, we fine-tuned the COVID-Net model with batch sizes of 32 and 50 epochs. The other settings (e.g., learning rate and optimizer) were the same as those used for training the original COVID-Net model.

Model Fine-Tuned from Backdoor Model
We obtained a fine-tuned model for COVID-19 detection using the backdoor COVID-Net model. Specifically, using the training images in Dataset 2, we fine-tuned the backdoor model with batch sizes of 32 and 20 epochs. The other settings (e.g., learning rate and optimizer) were the same as those used for training the original COVID-Net model.

Evaluating Performance of Backdoor Attacks
The performance of the backdoor attacks with the trigger was evaluated based on the attack success rates. Specifically, based on previous studies [31,41], we used the fooling rate and targeted attack success rate to evaluate the performance of non-targeted and targeted attacks, respectively. Let ( ) and be an output (class or label) of a classifier (DNN) and the actual label for an input image , respectively; was defined as the fraction of cases in which the labels predicted from images with the trigger differed from those from their images without the trigger for all images in set : where ( ) was 1 if condition was true, and 0 otherwise. was defined as the ratio of images with the trigger classified into a target class to all Using the contaminated training dataset, we fine-tuned the COVID-Net model with batch sizes of 32 and 50 epochs. The other settings (e.g., learning rate and optimizer) were the same as those used for training the original COVID-Net model.

Model Fine-Tuned from Backdoor Model
We obtained a fine-tuned model for COVID-19 detection using the backdoor COVID-Net model. Specifically, using the training images in Dataset 2, we fine-tuned the backdoor model with batch sizes of 32 and 20 epochs. The other settings (e.g., learning rate and optimizer) were the same as those used for training the original COVID-Net model.

Evaluating Performance of Backdoor Attacks
The performance of the backdoor attacks with the trigger was evaluated based on the attack success rates. Specifically, based on previous studies [31,41], we used the fooling rate R f and targeted attack success rate R s to evaluate the performance of non-targeted and targeted attacks, respectively. Let C(x) and y x be an output (class or label) of a classifier (DNN) and the actual label for an input image x, respectively; R f was defined as the fraction of cases in which the labels predicted from images with the trigger differed from those from their images without the trigger for all images in set X: R f = |X| −1 ∑ x∈X I(y x = C(τ(x))), where I(A) was 1 if condition A was true, and 0 otherwise. R s was defined as the ratio of images with the trigger classified into a target class K to all images in set X: R s = |X| −1 ∑ x∈X I(C(τ(x)) = K). To evaluate the change in the predicted labels for each class due to the trigger, confusion matrices were obtained. R f , R s , and the confusion matrices were computed using the test images in Datasets 1 and 2 to evaluate the performance of the backdoor attacks on the backdoor model and the model fine-tuned from the backdoor model, respectively.

Results
First, we investigated whether backdoors for non-targeted and targeted attacks could be established in the COVID-Net model. The prediction accuracies (Table 1) and confusion matrices (the upper panels in Figure 2) indicated that the backdoor models of the COVID-Net model demonstrated high prediction performance for clean images (i.e., images without the trigger (see the upper panels in Figure 1)), although their accuracies were slightly lower than those of the original COVID-Net model (e.g., the backdoor models for targeted attacks tended to classify some of the clean COVID-19 images as pneumonia (see the upper panels in Figure 2a-c)). However, the backdoor models classified the images with the trigger into target labels for targeted attacks and incorrect labels for non-targeted attacks (see bottom panels in Figure 2). The attack success rates (R s or R f ) were between 85% and 100% ( Table 1). The results indicated that backdoors were established in the COVID-Net model using a small trigger. images in set : = | | ∑ ( ( ( )) = ) ∈ . To evaluate the change in the predicted labels for each class due to the trigger, confusion matrices were obtained. , , and the confusion matrices were computed using the test images in Datasets 1 and 2 to evaluate the performance of the backdoor attacks on the backdoor model and the model fine-tuned from the backdoor model, respectively.

Results
First, we investigated whether backdoors for non-targeted and targeted attacks could be established in the COVID-Net model. The prediction accuracies (Table 1) and confusion matrices (the upper panels in Figure 2) indicated that the backdoor models of the COVID-Net model demonstrated high prediction performance for clean images (i.e., images without the trigger (see the upper panels in Figure 1)), although their accuracies were slightly lower than those of the original COVID-Net model (e.g., the backdoor models for targeted attacks tended to classify some of the clean COVID-19 images as pneumonia (see the upper panels in Figure 2a-c)). However, the backdoor models classified the images with the trigger into target labels for targeted attacks and incorrect labels for non-targeted attacks (see bottom panels in Figure 2). The attack success rates ( or ) were between 85% and 100% ( Table 1). The results indicated that backdoors were established in the COVID-Net model using a small trigger.  Further, we evaluated whether backdoor attacks were effective for models fine-tuned from backdoored models. It was assumed that other users, not adversaries, obtained the fine-tuned models from the backdoored models using clean images, and used a publicly available DNN model to obtain their own models without knowing whether a backdoor Further, we evaluated whether backdoor attacks were effective for models fine-tuned from backdoored models. It was assumed that other users, not adversaries, obtained the fine-tuned models from the backdoored models using clean images, and used a publicly available DNN model to obtain their own models without knowing whether a backdoor was established in the DNN model. The prediction accuracies (Table 2) and confusion matrices (the upper panels in Figure 3) indicated that the fine-tuned models demonstrated high prediction performance for the clean images, and that their prediction accuracies were almost similar to those of the original COVID-Net model. Nevertheless, the backdoor attacks were effective in the fine-tuned models. Specifically, the success rates (R s ) for targeted attacks were between 60% and 80% (Table 2). However, the R s of the fine-tuned models were lower than those of the backdoored models. In particular, the normal and COVID-19 images were difficult to misclassify, although the trigger was added to the images (the bottom panels in Figure 3a-c). Moreover, the performance of the non-targeted attacks was limited. In particular, R f was approximately 10% (see the bottom panel in Figure 3d). Table 2. Attack success rates (R s for targeted attacks and R f for non-targeted attacks; %) for finetuned models from backdoored COVID-Net models and prediction accuracies (%) of fine-tuned models on clean images.

Attack Type
Attack Success Rate (R s or R f )Accuracy was established in the DNN model. The prediction accuracies (Table 2) and confusion matrices (the upper panels in Figure 3) indicated that the fine-tuned models demonstrated high prediction performance for the clean images, and that their prediction accuracies were almost similar to those of the original COVID-Net model. Nevertheless, the backdoor attacks were effective in the fine-tuned models. Specifically, the success rates ( ) for targeted attacks were between 60% and 80% (Table 2). However, the of the fine-tuned models were lower than those of the backdoored models. In particular, the normal and COVID-19 images were difficult to misclassify, although the trigger was added to the images (the bottom panels in Figure 3a-c). Moreover, the performance of the non-targeted attacks was limited. In particular, was approximately 10% (see the bottom panel in Figure 3d). Table 2. Attack success rates ( for targeted attacks and for non-targeted attacks; %) for finetuned models from backdoored COVID-Net models and prediction accuracies (%) of fine-tuned models on clean images.

Discussion
The results (Table 1 and Figure 2) show that the backdoors for both the non-targeted and targeted attacks were easily established in the COVID-Net model by assigning a small trigger and incorrect labels to a small fraction of training data. Similar to evasion attacks using UAPs [31], backdoor attacks also achieved high attack success rates (85% to 100%), indicating that the COVID-Net model was vulnerable to model poisoning. Users (e.g., developers except for adversaries) might not be easily detected, whereas the training data were contaminated due to the small number of training images with the trigger and incorrect labels. Hence, they might render the backdoor models publicly available. Other users fine-tuned the backdoored models using their training data to obtain their own DNN models for COVID-19 detection. Subsequently, fine-tuned models with high

Discussion
The results (Table 1 and Figure 2) show that the backdoors for both the non-targeted and targeted attacks were easily established in the COVID-Net model by assigning a small trigger and incorrect labels to a small fraction of training data. Similar to evasion attacks using UAPs [31], backdoor attacks also achieved high attack success rates (85% to 100%), indicating that the COVID-Net model was vulnerable to model poisoning. Users (e.g., developers except for adversaries) might not be easily detected, whereas the training data were contaminated due to the small number of training images with the trigger and incorrect labels. Hence, they might render the backdoor models publicly available. Other users fine-tuned the backdoored models using their training data to obtain their own DNN models for COVID-19 detection. Subsequently, fine-tuned models with high prediction performances were obtained (Table 2). Nonetheless, the backdoors for the targeted attacks remained effective for the fine-tuned models ( Table 2 and Figure 3). The fine-tuned models would be used in real-world environments since they functioned correctly for inputs without a trigger. The spread of backdoor models via fine-tuning might pose a significant security threat. In particular, adversaries could easily attack several fine-tuned models from the backdoored models using typical triggers to cause both false positives and negatives in COVID-19 diagnosis. This might cause problems for public health and the economy, as mentioned in a previous study [31]. False positives in the diagnosis due to backdoor attacks might cause undesired mental stress in patients. False negatives in the diagnosis due to the attacks might have facilitated the spread of the pandemic. Moreover, backdoor attacks could be used to adjust the number of COVID-19 cases. Therefore, they might complicate the estimation of the number of COVID-19 cases. These disturbances due to backdoor attacks affected the individual and social awareness of COVID-19 (e.g., voluntary restraint and social distancing) and therefore hindered the spread of the pandemic.
However, backdoor attacks on the COVID-Net model were less effective. For the backdoor models, their prediction accuracies on clean images were slightly lower than those of the original COVID-Net model. In particular, some of the clean COVID-19 images were classified as pneumonia (Figure 1). This might be due to the fact that the visual differences in chest X-ray images between COVID-19 and non-COVID-19 pneumonia were insignificant. The decision boundary between COVID-19 and pneumonia might have been altered due to the backdoor trigger. For the fine-tuned models, the performance of backdoor attacks was lower than that of the backdoored models. Specifically, normal and COVID-19 images with the trigger were difficult to misclassify (Figure 2a-c). This might be due to the significant visual differences in chest X-ray images between non-pneumonia and COVID-19 pneumonia. The decision boundary between normal and COVID-19 that was altered due to the backdoor trigger might have returned to the original state since fine-tuning was performed using clean images. Furthermore, the backdoor for non-targeted attacks was not effective for the fine-tuned model. This might be due to the fact that it was difficult to assign incorrect labels to the images with the trigger. In particular, the decision boundary for each class was altered using backdoor triggers. However, this alteration might have been difficult when using only a single trigger.
Explainability might be a useful indicator for determining whether backdoors were established in DNN models. Gradient class activation mapping (Grad-CAM) [42] was useful in this context [43]. It provided saliency maps that indicated the importance of each pixel in the input images for the model outputs (i.e., prediction results) using the gradients of the outputs with respect to activation functions until the final convolution layer. The saliency maps of the backdoored models differed from those of the clean models. Specifically, pixels at unexpected coordinates (e.g., near a backdoor trigger) contributed to model predictions. Nwadike et al. [35] detected backdoor attacks on medical imaging using DNN models trained by themselves using Grad-CAM saliency maps, inspired by the fact that explainability techniques were typically used in medical imaging applications [44]. However, adversarial defenses against backdoor attacks based on explainability might be limited since explainability could be easily deceived [45]. Specifically, adversaries could fine-tune DNN models to allow explainability methods (e.g., Grad-CAM) to yield their desired saliency maps. Moreover, explainabiltiy-based defenses had failed to combat imperceptible backdoor attacks based on image warping [46] and physical reflection [47]. Adversarial attacks and defenses were cat-and-mouse games [29]. Hence, it might be difficult to defend against backdoor attacks.
The vulnerability to backdoor attacks demonstrated here was limited to the COVID-Net model. This was due to the fact that the number of reproducible open-source projects on DNN-based COVID-19 detection was limited at that time. However, we believed that vulnerability was a general property of DNN models, given that backdoor attacks were effective in DNN models for various types of classification tasks [34,35]. The vulnerability of other DNN models for COVID-19 detection to backdoor attacks needs to be further investigated; however, the procedures used here might be useful as a standard framework for evaluating the vulnerability of DNN models.

Conclusions
The vulnerability of the COVID-Net model, an open-source DNN, for backdoor attacks was demonstrated. Collaboration among researchers, engineers, and citizen data scientists