AI Under Attack: Metric-Driven Analysis of Cybersecurity Threats in Deep Learning Models for Healthcare Applications

Brohi, Sarfraz; Mastoi, Qurat-ul-ain

doi:10.3390/a18030157

Open AccessArticle

AI Under Attack: Metric-Driven Analysis of Cybersecurity Threats in Deep Learning Models for Healthcare Applications

by

Sarfraz Brohi

^*

and

Qurat-ul-ain Mastoi

School of Computing and Creative Technologies, University of the West of England, Bristol BS16 1QY, UK

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(3), 157; https://doi.org/10.3390/a18030157

Submission received: 10 January 2025 / Revised: 24 February 2025 / Accepted: 3 March 2025 / Published: 10 March 2025

(This article belongs to the Special Issue Algorithms and Applications of Machine Learning Techniques for Healthcare)

Download

Browse Figures

Versions Notes

Abstract

Incorporating Artificial Intelligence (AI) in healthcare has transformed disease diagnosis and treatment by offering unprecedented benefits. However, it has also revealed critical cybersecurity vulnerabilities in Deep Learning (DL) models, which raise significant risks to patient safety and their trust in AI-driven applications. Existing studies primarily focus on theoretical vulnerabilities or specific attack types, leaving a gap in understanding the practical implications of multiple attack scenarios on healthcare AI. In this paper, we provide a comprehensive analysis of key attack vectors, including adversarial attacks, such as the gradient-based Fast Gradient Sign Method (FGSM), evasion attacks (perturbation-based), and data poisoning, which threaten the reliability of DL models, with a specific focus on breast cancer detection. We propose the Healthcare AI Vulnerability Assessment Algorithm (HAVA) that systematically simulates these attacks, calculates the Post-Attack Vulnerability Index (PAVI), and quantitatively evaluates their impacts. Our findings revealed that the adversarial FGSM and evasion attacks significantly reduced model accuracy from 97.36% to 61.40% (PAVI: 0.385965) and 62.28% (PAVI: 0.377193), respectively, demonstrating their severe impact on performance, but data poisoning had a milder effect, retaining 89.47% accuracy (PAVI: 0.105263). The confusion matrices also revealed a higher rate of false positives in the adversarial FGSM and evasion attacks than more balanced misclassification patterns observed in data poisoning. By proposing a unified framework for quantifying and analyzing these post-attack vulnerabilities, this research contributes to formulating resilient AI models for critical domains where accuracy and reliability are important.

Keywords:

healthcare AI; Deep Learning (DL); vulnerable AI; breast cancer detection; adversarial attacks; Post-Attack Vulnerability Index (PAVI); mitigation workflow

1. Introduction

Artificial intelligence (AI) has revolutionized various industries, and healthcare has emerged as one of the most transformative fields for its application. AI-powered systems are widely used for disease diagnosis, process automation, and supporting clinical decision-making [1,2,3,4]. Deep learning (DL), which is considered a subset of AI, has demonstrated exceptional efficacy in medical imaging tasks by enabling the precise classification and diagnosis of diseases such as breast cancer. DL models have been proven instrumental in analyzing digitized images of Fine-Needle Aspiration (FNA) biopsies and can be applied to perceive valuable insights that enhance clinical decision-making and patient outcomes [5,6].

Despite their potential, DL models in healthcare are not resilient against cybersecurity threats. Adversarial, data poisoning, and adversarial evasion attacks pose severe risks to the reliability of AI systems and patient safety. Adversarial attacks introduce subtle imperceptible perturbations in the input data to mislead the models into making incorrect predictions [7,8]. A variation of the adversarial attack, called the evasion attack, targets the model during inference by introducing perturbations that bypass detection thresholds without altering the model’s structure or training data. Data poisoning attacks alter training datasets by injecting malicious samples to compromise a model’s learning process and degrade its performance [9,10]. In healthcare, such attacks pose severe risks because misclassifications or tampering with diagnoses can result in delayed treatments, unnecessary procedures, or even harm to the patient [11,12,13]. For example, an adversarial attack on breast cancer detection systems can misclassify malignant cells as benign, delaying life-saving treatments.

Recent efforts have focused on developing novel frameworks to enhance the resilience of DL models. Studies have highlighted using federated learning to mitigate data poisoning by decentralizing model training to reduce the risk of malicious data corruption [14]. Integrating adversarially robust training algorithms has been proposed to improve the model’s reliability against gradient-based attacks such as the Fast Gradient Sign Method (FGSM) [15]. Explainable AI techniques have also gained attention in identifying threats by elucidating decision pathways in complex DL models, enabling researchers to pinpoint areas vulnerable to exploitation [16].

In this research, we propose the Healthcare AI Vulnerability Assessment Algorithm (HAVA), which integrates multiple features, simulates multiple attacks and computes performance metrics and the Post-Attack Vulnerability Index (PAVI) for attack analysis and mitigation assistance. By contextualizing the severity of performance degradation, HAVA provides a link between identifying theoretical vulnerabilities and implementing real-world defenses. Through this research, we have produced the following key contributions:

Comprehensive analysis of FGSM, data poisoning and evasion attacks to offer a holistic view of the cybersecurity threats faced by DL models in healthcare.
Introduction of the PAVI as a core feature of the HAVA to standardize quantification and comparison of DL model vulnerabilities across different attack scenarios.
Empirical evaluation of the impact of these attacks on a DL model trained with the Wisconsin Diagnostic Breast Cancer (WDBC) dataset to provide actionable insights into the model vulnerabilities.
Provision of actionable recommendations and customized defense strategies based on visualizations and comparative analyses generated by the HAVA, focusing on improving the reliability of AI applications in healthcare.

This paper is organized as follows. Section 2 reviews related works and identifies gaps in the existing research. Section 3 outlines the research materials and methods containing the dataset description, preprocessing, and mathematical representation of the classification model and attacks. Section 4 provides and explains the HAVA. Section 5 presents our experimental results and discussion of post-attack analysis, mitigation workflow, contextualizing vulnerability, and guiding actionable insights. Section 6 concludes this research with a brief and critical discussion of the findings.

2. Related Works

The vulnerability of DL models to cybersecurity threats has been a critical area of research, particularly in high-stakes domains such as healthcare. Szegedy et al. [17] demonstrated the weakness of DL models against adversarial attacks, where imperceptible perturbations lead to incorrect predictions. Goodfellow et al. [18] introduced the FGSM. They provided an efficient approach for generating adversarial examples and highlighting the fragility of DL models to gradient-based attacks.

Finlayson et al. [13] demonstrated how adversarial attacks misclassify malignant tumors as benign. Such incidents can delay life-saving treatments. Sorin et al. [19] identified the risks of adversarial attacks in radiology and emphasized the need for domain-specific defenses in healthcare applications. Albattah and Rassam [20] highlighted the vulnerability of Convolutional Long Short-Term Memory (ConvLSTM) models to adversarial attacks. Yang et al. [21] explored adversarial attacks on Large Language Models (LLMs) in medical applications. Newaz et al. [22] demonstrated how attacks could disrupt system functionality and lead to erroneous medical decisions.

Data poisoning represents another significant threat. Malicious samples introduced into the training dataset can compromise the model’s integrity through this attack. Aljanabi et al. [23] highlighted the potential for data poisoning to inject biases, thus reducing the model’s reliability and jeopardizing patient outcomes. Verde et al. [24] showed that even minimal poisoning levels can destabilize machine learning models. They stressed advanced anomaly detection methods to safeguard critical systems. Evasion attacks that can exploit model decision thresholds to bypass detection have also gained attention. Muthalagu et al. [10] investigated the vulnerabilities of DL models to such attacks and proposed adaptive defense mechanisms that leverage real-time model monitoring. These findings demonstrate the dynamic nature of evasion attacks and their challenges to static defense strategies.

Several defense mechanisms have been developed to mitigate these threats. Adversarial training has emerged as a prominent technique to enhance robustness by incorporating adversarial examples into the training process [18]. Noise-filtering and feature-smoothing approaches have also been explored to counter adversarial and evasion attacks [25]. Hong et al. [26] proposed applying differentially private learning to adjust model training to dynamically reduce poisoned data’s impact. However, these approaches often face trade-offs between computational efficiency and model resilience that limit their practical applicability in resource-constrained healthcare settings.

Despite the emerging research and developments, there are critical gaps in this area of research. To the best of our knowledge, and based on extensive research in AI security, there is a significant lack of studies focused on vulnerability assessments in DL models for healthcare using post-attack analysis. While studies such as Atmane and Ahmad [27] and Zou et al. [28] have explored adversarial threats and vulnerabilities in AI systems, these works do not provide a quantifiable approach to evaluating the vulnerabilities of DL models. Consequently, the existing literature highlights a gap our research seeks to address.

By introducing a novel framework for post-attack analysis and quantifying vulnerabilities through the PAVI, the HAVA offers a unique and systematic approach to this critical area of research. The current research lacks a unified framework for evaluating multiple attack vectors and their collective impact on DL models. Additionally, traditional evaluation metrics, such as accuracy and confusion matrices, fail to contextualize the relative severity of attacks. Due to this, it has become difficult to prioritize defenses effectively. To address these gaps, we designed and implemented the HAVA. Unlike previous approaches, the HAVA integrates attack simulations, performance metrics, and the PAVI to provide a comprehensive and quantitative evaluation framework. By applying the HAVA to breast cancer detection, this research integrates theoretical vulnerability analysis with pragmatic approaches to enable the development of more resilient AI-driven healthcare applications.

3. Materials and Methods

This study utilized the WDBC dataset retrieved from the UCI Machine Learning Repository [29,30]. The dataset contains 569 observations with 30 numerical features derived from digitized images of Fine-Needle Aspiration biopsies. These features include the standard error, mean, and worst values of attributes such as texture, radius, and perimeter. This dataset was chosen for its widespread use in evaluating AI-based diagnostic systems, balanced representation of benign and malignant cases, and relevance to healthcare applications. Feature selection was performed using the Scikit-learn library in Python. Initially, 15 key features were identified using the univariate selection method. Further refinement was performed, where we evaluated all the features and retained only those selected by at least two methods from Logistic Regression, Best-First Search, and Random Forest. This resulted in the final 14 most relevant features, as shown in Table 1.

We performed data preprocessing by scaling all the features to a range of 0 to 1, using a StandardScaler to ensure consistency. The dataset was divided into training (80%) and testing (20%) subsets to preserve the class distribution. We carried out the experiments using Python 3.12.3 alongside several libraries to handle data preprocessing, model building, and evaluation. We used TensorFlow.keras from TensorFlow 2.17.0 and Scikit-learn 1.5.0 to develop and train the DL models for tasks like data splitting, scaling, and implementing the MLP classifier. Data manipulation was performed using NumPy 1.26.4 and Pandas 2.2.2, and visualisations were created with Matplotlib 3.9.1 and Seaborn 0.13.2. The WDBC dataset was obtained using the ucimlrepo library.

All our experiments were executed on a system powered by an Intel(R) Core(TM) i7-10850H CPU running at 2.70 GHz, with 32 GB of RAM and an NVIDIA GeForce RTX 2070 GPU featuring 8 GB of dedicated memory. The system operated on Windows 11 Education (64-bit). This hardware setup ensured sufficient computational resources for training the models and efficiently running adversarial attack simulations. For a better understanding of our research, we have provided a list of the important symbols used in this research with their descriptions, in Table 2:

3.1. The Classification Model

A Multi-Layer Perceptron (MLP) was employed as the classification model. The MLP was chosen for its ability to model complex and non-linear relationships. An MLP is ideal for highly precise tasks such as breast cancer classification. The MLP consisted of multiple layers. Each layer performed the following operations:

z^{(l)} = W^{(l)} a^{(l - 1)} + b^{(l)}

(1)

a^{(l)} = f (z^{(l)})

(2)

where

W^{(l)}

represents the weights,

b^{(l)}

the biases and f the activation function. We applied ReLU [31], Logistic and Tanh activation functions for the hidden layers. The output layer used the sigmoid function, which produced a probability score for binary classification:

\hat{y} = σ (z) = \frac{1}{1 + e^{- z}}

(3)

The model’s parameters were optimized using the cross-entropy loss function:

J (θ) = - \frac{1}{m} \sum_{i = 1}^{m} [y^{(i)} log {\hat{y}}^{(i)} + (1 - y^{(i)}) log (1 - {\hat{y}}^{(i)})]

(4)

where

y^{(i)}

is the true label,

{\hat{y}}^{(i)}

is the predicted probability and m is the number of samples.

3.2. Simulated Attacks

We simulated three cybersecurity attacks: adversarial FGSM, data poisoning, and adversarial evasion. Each attack type exploited distinct model vulnerabilities to reflect real-world threats to healthcare applications.

3.2.1. Adversarial Fast Gradient Sign Method (FGSM) Attack

Adversarial examples were generated using the FGSM [18]. Perturbations were crafted to exploit the gradient of the loss function:

x_{adv} = x + ϵ \cdot sign (\nabla_{x} J (x, y))

(5)

where x is the input data,

ϵ

is the perturbation magnitude,

J (x, y)

is the cross-entropy loss function and

\nabla_{x} J

represents its gradient with respect to x. This method highlighted the model’s sensitivity to minor but carefully crafted input changes that could lead to significant misclassifications.

3.2.2. Data Poisoning Attack

Data poisoning involves injecting mislabeled or corrupted samples into the training set to degrade model performance. In this study,

α %

, i.e., 10% of training labels were randomly flipped by following the approach of Biggio et al. [32]. This process is defined as:

D_{poison} = {(x_{i}, {\hat{y}}_{i}) ∣ {\hat{y}}_{i} = 1 - y_{i} for i \in P, {\hat{y}}_{i} = y_{i} otherwise}

(6)

where

D_{poison}

is the poisoned dataset,

y_{i}

is the original label for sample

x_{i}

, and

P

represents the subset of indices selected for poisoning. This attack compromised the model’s generalisation ability by introducing contradictory patterns into the training data.

3.2.3. Adversarial Evasion Attack

The adversarial evasion attack targeted the model’s inference stage by introducing small perturbations to the input data for effectively bypassing decision thresholds. We added Gaussian noise to the test data, which is defined as:

x_{evasion} = x + N (0, σ^{2})

(7)

where

N (0, σ^{2})

represents Gaussian noise with mean 0 and variance

σ^{2}

. This technique evaluated the model’s ability to handle noisy or corrupted inputs and reflected challenges encountered in real-world scenarios.

4. Healthcare AI Vulnerability Assessment Algorithm (HAVA)

Algorithm 1, the HAVA, provided a structured framework for evaluating the robustness of the DL model against the abovementioned cybersecurity attacks. The HAVA not only simulated and evaluated multiple attack types but also introduced the PAVI to quantify the severity of the performance degradation. The PAVI could assist in seeking actionable insights to help with defense prioritization. The high-level workflow of the HAVA is shown in Figure 1.

The process began with data preprocessing, where we normalized the input data to ensure consistent scaling across features. We carefully split the data into training and testing subsets to maintain class distribution. The DL model was trained on the preprocessed data, and its baseline performance was evaluated using accuracy and a confusion matrix. We stratified the dataset during the train–test split to maintain balanced class distributions between the training and testing subsets, and we carefully evaluated the model’s performance on a separate test set, ensuring no data leakage occurred between the training and testing phases to minimize overfitting. The algorithm then systematically launched all the attacks to assess the model’s resilience. The HAVA computed each attack’s resulting accuracy, as well as the PAVI and confusion matrix, to capture the performance impact.

A key feature of the HAVA is the introduction of the PAVI, which quantifies the relative drop in accuracy under each attack scenario as a percentage of the baseline accuracy. This metric provides a clear and quantitative measure of the model’s attack vulnerability. The algorithm outputs the computed accuracy, PAVI and confusion matrices for all scenarios, which provides a comprehensive view of the model’s vulnerabilities by considering the baseline and post-attack accuracy degradation. This systematic approach allowed for a detailed analysis of the DL model’s robustness to highlight areas that required improvements to mitigate cybersecurity risks effectively. Unlike traditional metrics, which measure accuracy, the PAVI provides a nuanced understanding of a model’s vulnerabilities by evaluating the relative severity of performance degradation after an attack. It does not predict vulnerabilities beforehand but serves as a critical tool for assessing the post-attack impact and guiding future defensive strategies. By identifying and quantifying vulnerabilities through the PAVI, the HAVA enables researchers to design targeted defense mechanisms, such as adversarial training or data augmentation tailored to specific attack scenarios.

Algorithm 1: Healthcare AI Vulnerability Assessment Algorithm (HAVA)

Input: D: Dataset; $ϵ$ : Perturbation magnitude for adversarial FGSM/evasion attacks; $α$ : Poisoning fraction; M: Model architecture (Multi-Layer Perceptron)
Output: Accuracy ( $A c c$ ), Post-Attack Vulnerability Index ( $P A V I$ ), Confusion Matrix ( $C M$ )

Process:

1.

Data Preprocessing

(a): Normalize D using StandardScaler to scale all features to the range $[0, 1]$ .
(b): Split D into training $(D_{t r a i n})$ and testing $(D_{t e s t})$ subsets using stratified sampling to preserve class distribution.

2.

Baseline Model Training and Evaluation

(a): Train the model M on $D_{t r a i n}$ using a binary cross-entropy loss function.
(b): Evaluate M on $D_{t e s t}$ to compute baseline accuracy $A c c_{o r i g i n a l}$ and confusion matrix $C M_{o r i g i n a l}$ .

3.

Attack Simulations and Evaluations

(a)

Adversarial FGSM Attack

•: Generate adversarial samples $D_{a d v}$ using the FGSM:

$D_{a d v} = D_{t e s t} + ϵ \cdot sign (\nabla_{x} J (D_{t e s t}, y)),$

where J is the loss function and $\nabla_{x} J$ represents its gradient with respect to the input.
•: Evaluate M on $D_{a d v}$ to compute accuracy $A c c_{a d v}$ and confusion matrix $C M_{a d v}$ .

(b)

Data Poisoning Attack

•: Inject mislabeled samples $(α \cdot D_{t r a i n})$ into $D_{t r a i n}$ to create $D_{p o i s o n}$ .
•: Retrain M on $D_{p o i s o n}$ and evaluate on $D_{t e s t}$ to compute accuracy $A c c_{p o i s o n}$ and confusion matrix $C M_{p o i s o n}$ .

(c)

Adversarial Evasion Attack

•: Add Gaussian noise to generate evasion samples:

$D_{e v a s i o n} = D_{t e s t} + N (0, σ^{2}),$

where $N$ represents Gaussian noise with mean 0 and variance $σ^{2}$ .
•: Evaluate M on $D_{e v a s i o n}$ to compute accuracy $A c c_{e v a s i o n}$ and confusion matrix $C M_{e v a s i o n}$ .

4.

Vulnerability Assessment

Compute the Post-Attack Vulnerability Index (PAVI) for each attack scenario:

$P A V I = \frac{A c c_{o r i g i n a l} - A c c_{a t t a c k}}{A c c_{o r i g i n a l}} \times 100,$

where $A c c_{a t t a c k}$ refers to the accuracy after each attack.

return

A c c

,

P A V I

and

C M

for all scenarios.

5. Results and Discussion

We trained the MLP classifier for 120 epochs by recording accuracy and loss at each step. The training process demonstrated strong convergence, with accuracy increasing from 90.55% in Epoch 1 to 99.12% in Epoch 120, as shown in Figure 2. The model learned efficiently, with rapid improvement in early epochs, and maintained stable optimization in later stages. The training loss, as represented in Figure 3, decreased consistently. It started at 0.6300 and reached 0.0318, which confirms that the model effectively minimizes errors. The smooth decline in loss and the low gap between training accuracy (99.12%) and test accuracy (97.36%) indicate that the model generalizes well without overfitting.

To evaluate the model’s robustness, three cybersecurity attacks were simulated, which included the adversarial FGSM with a perturbation magnitude of

ϵ = 0.1

, data poisoning (random label flipping of 10% of training labels) and adversarial evasion using Gaussian noise addition with a variance of

σ^{2} = 0.05

. The model’s performance was quantified using accuracy, confusion matrices and the PAVI to understand and mitigate vulnerabilities.

5.1. Quantifying Post-Attack Impact for Iterative Improvement

The ability to quantify post-attack impact using the PAVI is an essential element of the HAVA. Unlike traditional accuracy metrics, which only capture absolute performance degradation, the PAVI contextualizes the severity of these losses by relating them to the model’s baseline reliability. For instance, as shown in Figure 4 and Table 3, the adversarial FGSM and adversarial evasion attacks reduced accuracy from 97.36% to 61.40% and 62.28%, respectively. These attacks yielded a high PAVI of 38.60% and 37.72%, indicating a substantial compromise in the model’s performance. In contrast, data poisoning caused a milder reduction to 89.47% accuracy and a low PAVI of 10.53%. This standardized metric enables a nuanced comparison across attack types and facilitates identifying vulnerabilities that demand immediate intervention. Without the PAVI, the relative impact of different attacks could be misinterpreted, especially when working with models of high baseline accuracies.

The HAVA leverages these insights for iterative improvement by integrating the feedback from the PAVI evaluations into the model development cycle. For example, a high PAVI for adversarial FGSM attacks highlights the need to prioritize adversarial training in subsequent iterations. Similarly, a lower PAVI for data poisoning suggests that less resource-intensive defenses, such as anomaly detection or federated learning, may suffice. During the mitigation workflow, discussed in the subsection below, this iterative process allows developers and cybersecurity researchers to focus resources on the most critical vulnerabilities to optimize the balance between resilience and computational efficiency.

5.2. Mitigation Workflow

The HAVA plays an integral role in the broader mitigation workflow for AI models in healthcare. The workflow typically comprises four stages: attack simulation and evaluation, prioritization of defenses, testing and validation, and iterative improvement.

5.2.1. Attack Simulation and Evaluation

The HAVA systematically introduces the adversarial FGSM, data poisoning and adversarial evasion attacks, and it quantifies their impact through the PAVI. This step provides a clear understanding of which attack types pose the greatest threat to the model’s reliability.

5.2.2. Prioritization of Defenses

By identifying attack types with the highest PAVI values, the HAVA enables researchers to prioritize targeted defenses. For instance, adversarial training and feature smoothing might be deployed to address adversarial attacks, while federated learning could mitigate the effects of data poisoning.

5.2.3. Testing and Validation

The algorithm supports rigorous testing of defense mechanisms by re-evaluating the model under simulated attacks post-defense integration. The PAVI provides a consistent benchmark to measure the effectiveness of these defenses to ensure that improvements are both meaningful and quantifiable.

5.2.4. Iterative Improvement

Insights from PAVI evaluations can be fed back into the new development cycles to enable continuous refinement of the model and its defenses. This life cycle approach ensures that AI models remain resilient against evolving cybersecurity threats, which is crucial for dynamic domains like healthcare.

5.3. Contextualizing Vulnerability and Guiding Actionable Insights

Our findings highlighted the varied impact of different attack types on model performance. The adversarial FGSM and evasion attacks resulted in high false positive rates, as shown in Figure 5. In real-world scenarios, such incidents can lead to unnecessary biopsies, wasting resources, and causing patient distress. Conversely, data poisoning gradually erodes diagnostic reliability without inducing drastic performance drops. This type of impact can undermine trust over time. The PAVI contextualizes these vulnerabilities by translating them into actionable insights. For instance, high PAVI values associated with adversarial attacks underscore the urgency of integrating adversarially robust training. In contrast, lower PAVI values for data poisoning suggest a focus on proactive data integrity measures.

By quantifying the relative severity of performance degradation, the HAVA guides the development of tailored defenses that address specific vulnerabilities. The algorithm’s holistic evaluation of attack impacts and defense effectiveness ensures that AI systems in healthcare can be continuously improved rather than discarded after compromise.

The utility of the PAVI in healthcare can be exemplified through a breast cancer diagnostic system. Suppose adversarial attacks cause a PAVI of 38.60%, resulting in a significant misclassification of malignant cells as benign. This outcome directly jeopardizes patient outcomes by delaying life-saving treatments. By identifying adversarial attacks as the primary vulnerability, the PAVI enables the prioritization of adversarial training during model re-training. Subsequently, testing and validation confirm whether the mitigation strategy reduces the PAVI to acceptable levels, such as below 10%. This iterative refinement ensures the AI model’s safety and reliability in real-world applications to safeguard patients’ health and maintain trust in AI-driven diagnostics.

The PAVI’s importance is particularly pronounced in healthcare, where diagnostic errors have life-altering consequences. For example, high PAVI values under adversarial attacks could prompt the implementation of adversarial training to mitigate the misclassification of malignant cells as benign to prevent delayed treatments. Similarly, PAVI’s insights into adversarial evasion attack vulnerabilities could inspire the development of noise-resistant input preprocessing techniques to reduce false positives that lead to unnecessary medical interventions. By integrating the PAVI into mitigation workflows, the HAVA empowers researchers and practitioners to create resilient AI systems capable of navigating real-world deployment challenges. We assume that the HAVA and the PAVI can be customized and applied by future research works according to their requirements. For example, a model’s PAVI can be categorized into low, medium and high by setting a range of values considering its criticality.

The future work in this domain faces challenges and opportunities. Future research could expand on this work by developing and integrating advanced defense mechanisms, such as adversarial training, federated learning, and privacy-preserving techniques by considering the HAVA framework. Testing the HAVA across diverse medical imaging datasets and conditions will also ensure its scalability and applicability in real-world scenarios. While this study focused on the PAVI as a holistic measure of model robustness, future work could incorporate additional performance metrics, such as precision, recall, and F1 score, to comprehensively evaluate model performance under different attack scenarios.

Future efforts could also explore defense strategies to mitigate the vulnerabilities identified in this study. For instance, adversarial defenses, such as input preprocessing, defensive distillation, and ensemble modeling, could be integrated with the HAVA to strengthen the resilience of DL models in healthcare applications. Explainable AI (XAI) models, including SHapley Additive exPlanations (SHAP), Local Interpretable Model-agnostic Explanations (LIME), and Saliency Maps, could also play an integral role in the detection of adversarial attacks. Moreover, research could address a key gap by performing a sensitivity analysis to quantify the relationship between attack magnitudes, such as varying

ϵ

or

σ^{2}

, and their impact on model performance and the PAVI. Understanding this relationship would help researchers and practitioners better assess the robustness of AI models under different levels of adversarial stress, paving the way for more resilient AI systems.

Recent advancements in this domain have started focusing on cybersecurity attacks and vulnerabilities, particularly concerning LLMs [33]. Atmane and Ahmad [27] comprehensively analyzed the vulnerabilities of LLMs and highlighted the need for advanced safeguards to mitigate adversarial risks. Zou et al. [28] explored transferable adversarial attacks and demonstrated how misaligned models remain vulnerable to universal perturbations that compromise their decision-making processes. Liu et al. [34] emphasized the potential of adversarial training to bolster the resilience of LLMs. They provided a framework that could inspire similar approaches in the context of DL models for healthcare. These studies suggest that extending defense mechanisms to healthcare applications can bridge the gap between theoretical advancements in LLM safety and their practical deployment in high-stakes scenarios.

6. Conclusions

By simulating and evaluating adversarial FGSM, data-poisoning and adversarial evasion attacks, this study has demonstrated the varying degrees to which these threats compromise a model’s performance, which can result in significant accuracy drops and increased false classification rates. The introduction of the HAVA with the PAVI provides a comprehensive framework for quantifying and assessing the impact of multiple attack vectors on DL models.

The PAVI contextualizes the severity of performance degradation, and provides a standardized means to quantify a model’s vulnerabilities relative to its baseline accuracy. For example, the adversarial FGSM and evasion attacks, characterized by high PAVI values, emphasize the need for defenses such as adversarial training and noise-resistant preprocessing. Conversely, with its relatively low PAVI, data poisoning suggests resource-efficient measures such as federated learning. These insights transform post-attack impacts into a foundation for targeted mitigation efforts to ensure continuous improvement of AI models in dynamic healthcare environments.

Our findings have significant implications for healthcare applications, where errors induced by attacks can lead to delayed treatments, unnecessary medical interventions, or the erosion of trust in AI systems. By emphasising diagnostic accuracy and cybersecurity, the HAVA serves as a bridge that translates theoretical vulnerabilities into actionable defense strategies to ensure that AI models in healthcare are high-performing and resilient to evolving threats.

Author Contributions

S.B. conceptualized the study; Q.-u.-a.M. handled the data collection and preprocessing tasks; data analysis and result interpretation were performed by both S.B. and Q.-u.-a.M.; data visualization was led by S.B.; Q.-u.-a.M. reviewed related works and both authors identified the research gaps; S.B. designed and implemented the HAVA; S.B. and Q.-u.-a.M. drafted the manuscript; experimentation was carried out by S.B. and Q.-u.-a.M.; administrative coordination and documentation support were provided by S.B., with S.B. also serving as the corresponding author. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset analyzed in this study is publicly available and can be retrieved from the source in reference [29].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Heng, W.W.; Abdul-Kadir, N.A. Deep Learning and Explainable Machine Learning on Hair Disease Detection. In Proceedings of the 2023 IEEE 5th Eurasia Conference on Biomedical Engineering, Healthcare and Sustainability (ECBIOS), Tainan, Taiwan, 2–4 June 2023; pp. 150–153. [Google Scholar] [CrossRef]
Niteesh, K.R.; Pooja, T.S. Application of Deep Learning in Detection of various Hepatic Disease Classification Using H & E Stained Liver Tissue Biopsy. In Proceedings of the 2024 2nd International Conference on Artificial Intelligence and Machine Learning Applications Theme: Healthcare and Internet of Things (AIMLA), Namakkal, India, 15–16 March 2024; pp. 1–6. [Google Scholar] [CrossRef]
Bouali, L.Y.; Boucetta, I.; Bekkouch, I.E.I.; Bouache, M.; Mazouzi, S. An Image Dataset for Lung Disease Detection and Classification. In Proceedings of the 2021 International Conference on Theoretical and Applicative Aspects of Computer Science (ICTAACS), Skikda, Algeria, 15–16 December 2021; pp. 1–6. [Google Scholar] [CrossRef]
Mastoi, Q.; Latif, S.; Brohi, S.; Ahmad, J.; Alqhatani, A.; Alshehri, M.S.; Al Mazroa, A.; Ullah, R. Explainable AI in medical imaging: An interpretable and collaborative federated learning model for brain tumor classification. Front. Oncol. 2025, 15, 1535478. [Google Scholar] [CrossRef]
Murty, P.S.R.C.; Anuradha, C.; Naidu, P.A.; Mandru, D.; Ashok, M.; Atheeswaran, A.; Rajeswaran, N.; Saravanan, V. Integrative hybrid deep learning for enhanced breast cancer diagnosis: Leveraging the Wisconsin Breast Cancer Database and the CBIS-DDSM dataset. Sci. Rep. 2024, 14, 26287. [Google Scholar] [CrossRef] [PubMed]
Alshayeji, H.; Ellethy, H.; Abed, S.; Gupta, R. Computer-aided detection of breast cancer on the Wisconsin dataset: An artificial neural networks approach. Biomed. Signal Process. Control 2022, 71, 103141. [Google Scholar] [CrossRef]
Jain, A.; Sangeeta, K.; Sadim, S.B.M.; Dwivedi, S.P.; Albawi, A. The Impact of Adversarial Attacks on Medical Imaging AI Systems. In Proceedings of the IEEE 13th International Conference on Communication Systems and Network Technologies (CSNT), Jabalpur, India, 6–7 April 2024; pp. 362–367. [Google Scholar] [CrossRef]
Akhtar, N.; Mian, A. Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access 2018, 6, 14410–14430. [Google Scholar] [CrossRef]
Biggio, B.; Roli, F. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recognit. 2018, 84, 317–331. [Google Scholar] [CrossRef]
Muthalagu, R.; Malik, J.A.; Pawar, P.M. Detection and prevention of evasion attacks on machine learning models. Expert Syst. Appl. 2025, 266, 126044. [Google Scholar] [CrossRef]
Tsai, M.J.; Lin, P.Y.; Lee, M.E. Adversarial Attacks on Medical Image Classification. Cancers 2023, 15, 4228. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Muoka, G.W.; Yi, D.; Ukwuoma, C.C.; Mutale, A.; Ejiyi, C.J.; Mzee, A.K.; Gyarteng, E.S.A.; Alqahtani, A.; Al-antari, M.A. A Comprehensive Review and Analysis of Deep Learning-Based Medical Image Adversarial Attack and Defense. Mathematics 2023, 11, 4272. [Google Scholar] [CrossRef]
Finlayson, S.G.; Bowers, J.D.; Ito, J.; Zittrain, J.L.; Beam, A.L.; Kohane, I.S. Adversarial attacks on medical machine learning. Science 2019, 363, 1287–1289. [Google Scholar] [CrossRef]
Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated Machine Learning: Concept and Applications. ACM Trans. Intell. Syst. Technol. 2019, 10, 2. [Google Scholar] [CrossRef]
Javed, H.; El-Sappagh, S.; Abuhmed, T. Robustness in deep learning models for medical diagnostics: Security and adversarial challenges towards robust AI applications. Artif. Intell. Rev. 2025, 58, 12. [Google Scholar] [CrossRef]
Mutalib, N.; Sabri, A.; Wahab, A.; Abdullah, E.; AlDahoul, N. Explainable deep learning approach for advanced persistent threats (APTs) detection in cybersecurity: A review. Artif. Intell. Rev. 2024, 57, 297. [Google Scholar] [CrossRef]
Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. arXiv 2014, arXiv:1312.6199. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2015, arXiv:1412.6572. [Google Scholar] [CrossRef]
Sorin, V.; Soffer, S.; Glicksberg, B.S.; Barash, Y.; Konen, E.; Klang, E. Adversarial attacks in radiology—A systematic review. Eur. J. Radiol. 2023, 167, 111085. [Google Scholar] [CrossRef]
Albattah, A.; Rassam, M.A. Detection of Adversarial Attacks against the Hybrid Convolutional Long Short-Term Memory Deep Learning Technique for Healthcare Monitoring Applications. Appl. Sci. 2023, 13, 6807. [Google Scholar] [CrossRef]
Yang, Y.; Jin, Q.; Huang, F.; Lu, Z. Adversarial Attacks on Large Language Models in Medicine. arXiv 2024, arXiv:2406.12259. [Google Scholar] [CrossRef]
Newaz, A.I.; Haque, N.I.; Sikder, A.K.; Rahman, M.A.; Uluagac, A.S. Adversarial Attacks to Machine Learning-Based Smart Healthcare Systems. In Proceedings of the GLOBECOM 2020–2020 IEEE Global Communications Conference, Taipei, Taiwan, 7 – 11 December 2020; pp. 1–6. [Google Scholar] [CrossRef]
Aljanabi, M.; Omran, A.H.; Mijwil, M.M.; Abotaleb, M.; El-kenawy, E.-S.M.; Mohammed, S.Y. Data poisoning: Issues, challenges, and needs. In Proceedings of the 7th IET Smart Cities Symposium (SCS 2023), Manama, Bahrain, 3–5 December 2023. [Google Scholar] [CrossRef]
Verde, L.; Marulli, F.; Marrone, S. Exploring the Impact of Data Poisoning Attacks on Machine Learning Model Reliability. Procedia Comput. Sci. 2021, 192, 1184–1193. [Google Scholar] [CrossRef]
Xu, W.; Evans, D.; Qi, Y. Feature squeezing: Detecting adversarial examples in deep neural networks. In Proceedings of the 2018 Network and Distributed Systems Security Symposium (NDSS), San Diego, CA, USA, 18 – 21 February 2018; pp. 1–15. [Google Scholar]
Hong, S.; Chandrasekaran, V.; Kaya, Y.; Dumitraş, T.; Papernot, N. On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping. arXiv 2020, arXiv:2002.11497. [Google Scholar] [CrossRef]
Atmane, B.; Ahmad, W. On the Validity of Traditional Vulnerability Scoring Systems for Adversarial Attacks against LLMs. arXiv 2024, arXiv:2309.12345. [Google Scholar] [CrossRef]
Zou, A.; Wang, Z.; Carlini, N.; Nasr, M.; Kolter, J.Z.; Fredrikson, M. Universal and Transferable Adversarial Attacks on Aligned Language Models. arXiv 2023, arXiv:2307.15043. [Google Scholar] [CrossRef]
Wolberg, W.; Mangasarian, O.; Street, N.; Street, W. Breast Cancer Wisconsin (Diagnostic). 1995. Available online: https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic (accessed on 9 January 2025).
Street, W.N.; Wolberg, W.H.; Mangasarian, O.L. Nuclear feature extraction for breast tumor diagnosis. Biomed. Image Process. Biomed. Vis. 1993, 1905, 861–870. [Google Scholar] [CrossRef]
Nair, V.; Hinton, G.E. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
Biggio, B.; Nelson, B.; Laskov, P. Poisoning attacks against support vector machines. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), Edinburgh, Scotland, 26 June–1 July 2012; pp. 1467–1474. [Google Scholar]
Hamid, R.; Brohi, S. A Review of Large Language Models in Healthcare: Taxonomy, Threats, Vulnerabilities, and Framework. Big Data Cogn. Comput. 2024, 8, 161. [Google Scholar] [CrossRef]
Liu, X.; Cheng, H.; He, P.; Chen, W.; Wang, Y.; Poon, H.; Gao, J. Adversarial Training for Large Neural Language Models. arXiv 2020, arXiv:2004.08994. [Google Scholar] [CrossRef]

Figure 1. A high-level workflow of the HAVA.

Figure 2. Training accuracy over epochs.

Figure 3. Training loss over epochs.

Figure 4. Model accuracy with the PAVI: original vs. attacks.

Figure 5. Model confusion matrices: original vs. attacks.

Table 1. Selected features using univariate and multi-method selection techniques.

No.	Univariate Selection	Multi-Method Selection
1	radius mean	concave points worst
2	compactness worst	texture mean
3	radius worst	perimeter mean
4	texture mean	concavity worst
5	perimeter mean	concavity mean
6	area se	perimeter worst
7	area worst	area se
8	perimeter se	texture worst
9	texture worst	radius worst
10	perimeter worst	concave points mean
11	concavity worst	area worst
12	concave points worst	smoothness worst
13	concavity mean	area mean
14	area mean	radius mean
15	radius se

Table 2. List of symbols used in the manuscript.

Symbol	Meaning
$Z^{(l)}$	Weighted input to the l-th layer of the neural network.
$a^{(l)}$	Activation output of the l-th layer of the neural network.
$W^{(l)}$	Weight matrix for the l-th layer of the neural network.
$b^{(l)}$	Bias vector for the l-th layer of the neural network.
f	Activation function (e.g., ReLU, Logistic, Tanh) applied to compute the layer output.
y	True label of the input sample.
$\hat{y}$	Predicted probability of the input sample.
$J (θ)$	Cross-entropy loss function for optimizing the model parameters.
$x_{adv}$	Adversarial example generated using the FGSM method.
x	Original input data.
$ϵ$	Magnitude of the perturbation applied in FGSM attack.
$\nabla_{x} J$	Gradient of the loss function with respect to the input x.
$N (0, σ^{2})$	Gaussian noise with mean 0 and variance $σ^{2}$ , used in evasion attacks.
$P A V I$	Post-Attack Vulnerability Index; quantifies the relative degradation in model performance.

Table 3. Comparison of accuracy, PAVI and confusion matrices.

Attack Type	Accuracy	PAVI	Confusion Matrix
Original data	0.973684	N/A	TN = 70, FP = 2, FN = 1, TP = 41
Adversarial FGSM attack	0.614035	0.385965	TN = 28, FP = 44, FN = 0, TP = 42
Data-poisoning attack	0.894737	0.105263	TN = 63, FP = 9, FN = 3, TP = 39
Adversarial evasion attack	0.622807	0.377193	TN = 29, FP = 43, FN = 0, TP = 42

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Brohi, S.; Mastoi, Q.-u.-a. AI Under Attack: Metric-Driven Analysis of Cybersecurity Threats in Deep Learning Models for Healthcare Applications. Algorithms 2025, 18, 157. https://doi.org/10.3390/a18030157

AMA Style

Brohi S, Mastoi Q-u-a. AI Under Attack: Metric-Driven Analysis of Cybersecurity Threats in Deep Learning Models for Healthcare Applications. Algorithms. 2025; 18(3):157. https://doi.org/10.3390/a18030157

Chicago/Turabian Style

Brohi, Sarfraz, and Qurat-ul-ain Mastoi. 2025. "AI Under Attack: Metric-Driven Analysis of Cybersecurity Threats in Deep Learning Models for Healthcare Applications" Algorithms 18, no. 3: 157. https://doi.org/10.3390/a18030157

APA Style

Brohi, S., & Mastoi, Q.-u.-a. (2025). AI Under Attack: Metric-Driven Analysis of Cybersecurity Threats in Deep Learning Models for Healthcare Applications. Algorithms, 18(3), 157. https://doi.org/10.3390/a18030157

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AI Under Attack: Metric-Driven Analysis of Cybersecurity Threats in Deep Learning Models for Healthcare Applications

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. The Classification Model

3.2. Simulated Attacks

3.2.1. Adversarial Fast Gradient Sign Method (FGSM) Attack

3.2.2. Data Poisoning Attack

3.2.3. Adversarial Evasion Attack

4. Healthcare AI Vulnerability Assessment Algorithm (HAVA)

5. Results and Discussion

5.1. Quantifying Post-Attack Impact for Iterative Improvement

5.2. Mitigation Workflow

5.2.1. Attack Simulation and Evaluation

5.2.2. Prioritization of Defenses

5.2.3. Testing and Validation

5.2.4. Iterative Improvement

5.3. Contextualizing Vulnerability and Guiding Actionable Insights

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI