SPADR: A Context-Aware Pipeline for Privacy Risk Detection in Text Data

Asiri, Sultan; Alshehri, Randa; Kamran, Fatima; Laznam, Hend; Xiao, Yang; Alzahrani, Saleh

doi:10.3390/electronics14183725

Open AccessArticle

SPADR: A Context-Aware Pipeline for Privacy Risk Detection in Text Data

by

Sultan Asiri

^1,2

,

Randa Alshehri

²

,

Fatima Kamran

^2,3

,

Hend Laznam

^2,4

,

Yang Xiao

^5,*

and

Saleh Alzahrani

¹

Computer Science Department, Applied College, King Khalid University, Muhayil 61913, Saudi Arabia

²

Center for Artificial Intelligence (CAI), King Khalid University, Abha 61421, Saudi Arabia

³

Department of Computer Science, College of Computer Science, King Khalid University, Abha 61421, Saudi Arabia

⁴

Computer Science Department, Sciences and Arts College, King Khalid University, Khamis Mushait 61421, Saudi Arabia

⁵

Department of Computer Science, The University of Alabama, Tuscaloosa, AL 3487-02903, USA

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(18), 3725; https://doi.org/10.3390/electronics14183725

Submission received: 8 August 2025 / Revised: 15 September 2025 / Accepted: 16 September 2025 / Published: 19 September 2025

(This article belongs to the Special Issue Advancements in Cross-Disciplinary AI: Theory and Application, 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

Large language models (LLMs) are powerful, but they can unintentionally memorize and leak sensitive information found in their training or input data. To address this issue, we propose SPADR, a semantic privacy anomaly detection and remediation pipeline designed to detect and remove privacy risks from text. SPADR addresses limitations in existing redaction methods by identifying deeper forms of sensitive content, including implied relationships, contextual clues, and non-standard identifiers that traditional NER systems often overlook. SPADR combines semantic anomaly scoring using a denoising autoencoder with named entity recognition and graph-based analysis to detect both direct and hidden privacy risks. It is flexible enough to work on both training data (to prevent memorization) and user input (to prevent leakage at inference time). We evaluate SPADR on the Enron Email Dataset, where it significantly reduces document-level privacy leakage while maintaining strong semantic utility. The enhanced version, SPADR (S2), reduces the PII leak rate from 100% to 16.06% and achieves a BERTScore F1 of 88.03%. Compared to standard NER-based redaction systems, SPADR offers more accurate and context-aware privacy protection. This work highlights the importance of semantic and structural understanding in building safer, privacy-respecting AI systems.

Keywords:

semantic privacy; data leakage; privacy-preserving machine learning; large language models; sensitive information detection; anomaly detection; graph-based privacy

1. Introduction

Large Language Models (LLMs), such as GPT-4, are increasingly integrated into daily applications. We can use these systems to support chatbots, search engines, code assistants, and summarization tools [1]. Their ability to understand and generate human-like text comes from being trained on massive datasets, often collected from the internet. However, this training process can lead to the unintended memorization of sensitive or personal information [2].

This problem has been known for years. Research shows that LLMs can reveal sensitive information from their training data. For example, they might leak names, addresses, phone numbers, health conditions, or private business details [3]. Some studies found that models can repeat rare or unique phrases exactly as they appeared in the training data. This includes items such as API keys in code or personal conversations in chatbots [2].

This risk becomes more concerning as LLMs move from controlled research labs into the hands of everyday users. Non-expert users often lack awareness of how their data might be used or stored. When they input sensitive information into an LLM-powered system, it may become part of future model updates or even appear in model responses [3]. This issue raises the risk of leaking names, phone numbers, private business details, or unique phrases, such as API keys.

These privacy risks carry legal and ethical implications that can violate data protection regulations, such as the General Data Protection Regulation (GDPR) [4]. The increase in leaked data makes it challenging to utilize LLMs in sensitive fields, such as healthcare, finance, or legal services, where confidentiality is crucial. The risks are worse when the training data include raw web content, which often lacks filtering or anonymization.

Real-world incidents highlight these concerns. In 2023, Samsung engineers accidentally exposed proprietary source code to ChatGPT (at a time versions included GPT-3.5 and GPT-4) during debugging, prompting a company-wide ban on the tool [5]. Researchers have demonstrated that LLMs trained on large corpora can reproduce personal or confidential data directly, particularly when such data are unique [2].

Most existing data cleaning methods use Named Entity Recognition (NER) or simple rules to remove obvious Personally Identifiable Information (PII) [6]. These methods can hide names or contact details. However, they often miss deeper risks. They fail to capture context-based, relational, or indirect information that can still identify someone. For example, describing someone’s rare job, their connections in an organization, or combining details that seem harmless alone but together reveal who they are. For example, “After presenting the cybersecurity report to the board, the youngest female executive in the Riyadh branch flew to Geneva for a closed-door meeting with the ministry”. Although this sentence does not mention any names, email addresses, or phone numbers, it reveals several indirect identifiers, such as role and position (youngest female executive), location (Riyadh branch), event context (presented a cybersecurity report to the board), and association (met with the ministry). When we combine these details, we can narrow down the identity of the person, especially in a specific company or sector. As LLMs become more prevalent, we need improved methods for cleaning data and protecting privacy. Our goal in this work is to fill this gap and answer the research question:

How can we design a scalable, semantic-aware pipeline that detects and remediates implicit, contextual, and relational privacy risks in LLM training data before model exposure, while preserving data utility for downstream tasks?

To address these gaps, we introduce a pipeline filter that processes the input data before passing them to the LLM model called Semantic Privacy Anomaly Detection and Remediation (SPADR). SPADR analyzes text before it is used in LLM training or fine-tuning. It detects not only high-level PII but also deeper semantic risks, such as employment relationships, unique descriptions, or geographical traces—and applies targeted remediation to reduce privacy leakage.

We evaluate SPADR on a financial email dataset, a domain where data often includes sensitive personal and transactional content. In this paper, we use finance as a test case; however, SPADR can be applied to various domains, such as healthcare. We summarize our contributions as follows:

We highlight critical privacy risks associated with LLMs used by the general public.
We propose SPADR, a pipeline that filters sensitive content from input data before they reach an LLM.
We demonstrate the effectiveness of SPADR using a real-world financial email dataset.

The rest of this paper is structured as follows. Section 2 discusses related work in LLM privacy and data sanitization. In Section 3, we present our research problem. Section 4 presents our proposed SPADR pipeline. Section 5 describes our experiments and results. Section 7 concludes the paper and outlines future work.

2. Related Work

The issue of privacy in machine learning, particularly in LLMs, has received increasing attention over the past decade. One of the earliest attempts to address this was proposed by Abadi et al. [7], who introduced Differential Privacy (DP) for deep learning. DP provides formal privacy guarantees by injecting noise during model training, thus protecting individual data points. However, applying DP in natural language settings often degrades model performance, making it less practical for LLMs.

Following this, Shokri et al. [8] introduced membership inference attacks (MIAs), where an adversary can infer whether a particular data point was used in training. This work showed that even well-generalized models could reveal sensitive information. Later, Carlini et al. [9] demonstrated that deep neural networks may memorize and leak unique sequences from training data—a phenomenon referred to as unintended memorization. This risk is particularly severe in LLMs trained on large-scale web data.

To mitigate such risks, Mendels et al. [10] proposed adversarial de-identification methods, which rewrite text to remove sensitive content while preserving its semantic meaning. However, their approach primarily addressed direct identifiers, such as names and dates, leaving models vulnerable to semantic inference. Around the same period, Veale et al. [4] emphasized the importance of transparency and accountability in machine learning systems, especially in high-stakes decision-making contexts—qualities often lacking in current black-box LLM deployments.

A significant shift occurred with the introduction of GPT-3 by Brown et al. [1], which employed a transformer-based architecture with 175 billion parameters and demonstrated few-shot prompting, where the model can perform tasks by conditioning on a handful of examples without explicit fine-tuning. However, this advancement came with increased concerns about privacy [11]. Hisamoto et al. [12] investigated privacy leakage in encoder–decoder models by applying membership inference attacks to sequence-to-sequence translation systems, showing that prediction confidence can be used to determine whether a sample was in the training set. Building on this line, Jagannatha et al. [13] proposed membership inference techniques specifically targeting parametric decoder architectures, designing attack models that exploit hidden representations and output distributions to infer training membership.

In the medical domain, Lehman et al. [3] evaluated BERT models pretrained on large corpora of clinical notes, systematically probing their outputs using crafted prompts to test whether protected health information (PHI) could be regenerated from the training data. Complementing these approaches, Carlini et al. [2] developed a prompt-based extraction methodology that combines rare string analysis with targeted queries, enabling them to recover long, unique sequences memorized by GPT-2, even under black-box access conditions.

These studies collectively underscore that traditional privacy-preserving approaches are insufficient for LLMs, motivating the development of more advanced, context-aware privacy solutions.

To address the persistent challenge of removing sensitive information after a model has been trained, ref. [14] surveyed techniques for machine unlearning, aiming to eliminate the influence of specific training data from model parameters. While foundational, these methods are computationally demanding and not yet scalable to large models, such as LLMs.

Building upon earlier insights into model vulnerabilities, Ye et al. [15] investigated how dataset structure and model architecture influence susceptibility to MIAs. Their results reinforced the idea that even minor architectural choices can significantly affect privacy leakage, emphasizing the need for structurally aware defense mechanisms.

In 2023, Zhang et al. [16] introduced the concept of counterfactual memorization, showing that LLMs can memorize not only exact text but also semantically equivalent paraphrases. This revealed a more subtle memorization phenomenon and exposed the limitations of surface-level anonymization strategies, such as redaction and pattern replacement.

To address risks during inference, Chen et al. [17] proposed Hide and Seek (HaS). This lightweight framework anonymizes user prompts before sending them to cloud-based LLMs and reconstructs private information post-inference. HaS applies both a generative-based scheme using the BLOOMZ model and a label-based scheme based on NER to mask sensitive data. While effective at obfuscating explicit PII, it remains limited in handling contextual inferences and relationship-based privacy leaks, particularly in knowledge retrieval tasks.

By 2024, privacy research shifted toward semantic and contextual privacy risks. Staab et al. [18] and Farquhar et al. [19] demonstrated that LLMs can infer sensitive attributes such as profession, location, or demographics from otherwise innocuous text. These findings revealed the limitations of conventional tools in defending against implicit attribute inference or latent privacy signals.

To mitigate structural vulnerabilities in microdata, Aufschläger et al. [20] introduced ClustEm4Ano, a clustering-based method that anonymizes quasi-identifiers using value generalization hierarchies (VGHs). However, their approach is tailored to tabular data and is less effective in handling unstructured language content.

For textual anonymization, Frikha et al. [21] proposed IncogniText, which rewrites sentences using adversarially trained LLMs to hide private attributes while maintaining semantics. Though effective at attribute-level privacy, it lacks document-level privacy modeling and is contextually limited.

Yang et al. [22] further advanced this area by proposing RUPTA, a framework for optimizing the privacy-utility trade-off in text anonymization. It integrates LLM-powered privacy and utility evaluators with lexicographic optimization. Experimental tests demonstrate that RUPTA is resistant to both black-box and white-box re-identification attacks, thereby proving its effectiveness in mitigating the risk of disclosure and enhancing task performance. Nonetheless, RUPTA relies on iterative evaluator feedback, which imposes a significant computational overhead; therefore, it limits scalability and makes it unsuitable for implementation in real-time or at large scales.

In 2025, new approaches continued to refine these ideas. Kim et al. [23] introduced SEAL, a system where small language models (SLMs) self-improve via adversarial distillation. By combining supervised fine-tuning with direct preference optimization, SEAL improves utility preservation while reducing privacy leakage against adversary models. Unlike RUPTA, SEAL offers better efficiency by avoiding reliance on large LLM evaluators. However, it exhibits limited scalability and weak generalization when applied to context-sensitive or relational data, where privacy risk stems from entity interactions rather than isolated attributes.

Zhan et al. [24] addressed inference-time privacy from a systems design angle through Portcullis. This privacy gateway anonymizes input in secure enclaves (Intel TDX) before interaction with third-party LLMs. Portcullis offers strong isolation and reconstruction capabilities for sanitized data, outperforming prior gateway systems, such as Hide-and-Seek, in efficiency. Its limitation, however, lies in its hardware dependence and pattern-based sanitization, which restricts flexibility against implicit or semantic privacy risks.

In the medical domain, Wiest et al. [25] developed LLM-Anonymizer, a system for local anonymization of clinical text, ensuring compliance with the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA) without relying on external cloud models. McIntosh et al. [26] employed regex-guided rewriting for anonymizing radiology reports. While both methods are effective for explicit PII, they do not address semantic privacy threats, such as attribute co-occurrence or organizational roles.

Staab et al. [27] also proposed an LLM-based anonymization system that iteratively alternates between a privacy attacker model and an anonymizer model until privacy leakage is minimized. Although human evaluations showed a strong preference for their anonymized outputs, the system’s performance is tightly bound to the strength of the adversary and synthetic training data, raising concerns about generalizability.

Finally, He et al. [28] revealed that LLMs can expose contextual and relational privacy risks, such as inter-entity relationships or hierarchical roles, even in the absence of explicit identifiers. This critical insight highlights a persistent gap in current defenses: the failure to detect and remediate deep semantic and structural leaks.

Together, these studies demonstrate a progression from surface-level redaction methods to more advanced systems that address deeper semantic privacy risks; however, they still fall short in handling relational privacy threats embedded in language. To fill this gap, we propose SPADR, a semantic privacy-aware detection and remediation pipeline that combines statistical anomaly detection, entity-relationship graph analysis, and multi-strategy anonymization to detect and neutralize both explicit and implicit risks. By scanning and remediating privacy threats in context before LLMs use data, SPADR prevents leakage at both training and inference stages, offering a practical step toward safer deployment of language models.

3. Research Problem

Most existing privacy-preserving methods for LLMs focus on explicit identifiers, utilizing high-level techniques, such as NER, rule-based filtering, or noise injection. While these approaches are helpful, they do not address deeper semantic and relational privacy risks. Modern LLMs can memorize, infer, and link sensitive information embedded in text, even after de-identification, due to their strong contextual understanding and generalization capabilities.

This creates a critical gap: based on our knowledge, there is currently no scalable, semantic-aware data sanitization pipeline that can detect and remediate latent privacy risks before the data are used to train or fine-tune LLMs. Such risks include re-identifiable entity relationships, implicit attribute inference, and contextual traceability. Furthermore, although graph-based privacy risks have been discussed in theory, there is still no practical, graph-augmented framework that can identify and mitigate relational privacy threats at scale in large LLM datasets.

Addressing this problem requires solving several open research challenges:

Semantic anomaly detection for unseen or evolving privacy threats in unstructured text;
Scalable graph-based methods for identifying and mitigating relational leakage;
Adaptive remediation strategies (e.g., redaction, generalization, summarization, deletion) tuned to privacy-utility trade-offs.

4. Methodology

In this paper, we propose SPADR. It is a new step in the data-cleaning pipeline for LLMs. Figure 1 shows the overall design of the SPADR pipeline. It includes several components that work together to identify privacy risks and apply suitable remediation techniques.

Our pipeline begins with raw text data, which passes through a data preprocessing step. After that, three modules run in parallel. The first is PII redaction, which uses NER to find and remove known personal information. The second is anomaly detection, which uses a denoising autoencoder trained only on high-privacy data. This model helps detect unusual or risky content by comparing new text to patterns it has learned from safe examples. The third module is semantic classification, which utilizes a zero-shot classifier to determine if a message appears sensitive. The results from these three modules are combined into a single privacy scoring step. If the privacy score is 60 or higher, the pipeline performs a detailed graph analysis to understand the relationships between different entities in the text. This analysis supports the selection of an appropriate remediation strategy, such as redacting, summarizing, generalizing, or deleting specific parts of the text. The remainder of this section provides a detailed explanation of each component in the SPADR pipeline.

4.1. Data Preprocessing

We start our preprocessing by manually reviewing and labeling a subset of the dataset to identify emails that raise privacy concerns [29]. This step provides the model with a foundation, enabling it to learn what includes sensitive content. This pool is then used to train a denoising autoencoder (DAE). Importantly, the training remains unsupervised, since the DAE is optimized only through reconstruction loss without explicit supervision. The manual labeling serves solely to select representative inputs, not to provide target labels. We chose this unsupervised setup because in real-world scenarios, labeled privacy data are often limited or unavailable, and a system that generalizes to unseen messages without heavy annotation is more practical. Following this seeding step, we apply a comprehensive preprocessing pipeline to ensure that email content is clean, normalized, and semantically meaningful. This step is essential as it helps reduce noise, eliminate irrelevant details, and allows the model to focus on meaningful content, enabling accurate detection and redaction of privacy risks later on.

In the data preprocessing, we begin with data cleaning. We remove all URLs and replace them with a generic [url] token to prevent them from being interpreted as meaningful text. We also removed HTML tags from the content, and we normalized formatting inconsistencies by removing multiple whitespace characters and eliminating repetitive punctuation patterns (e.g., sequences of underscores, asterisks, or dashes) commonly found in forwarded emails or automated signatures. These regular expression patterns help to standardize the message structure.

To isolate the actual content of each message, we extract the email body using rule-based heuristics. If the message includes a Subject: header, we extract the content that follows and ignore irrelevant blocks, such as headers, forwarding trails, or disclaimers. This step ensures that we only analyze the user-authored portion of the email—where privacy-sensitive information is most likely to reside.

Next, we tokenize the cleaned text using the NLTK library [30], which segments it into individual word tokens. We remove punctuation and convert all tokens to lowercase. This process helps emphasize semantically meaningful terms and avoids sparsity due to variations in case or punctuation.

After normalization, we transform each email into a dense vector representation using the “all-MiniLM-L6-v2” model. This model is a lightweight, BERT-based sentence encoder from the Sentence Transformers library [31]. These semantic embeddings capture the contextual meaning of the message. We use these embeddings in the following stages of the SPADR pipeline, including anomaly detection, semantic classification, and risk scoring.

4.2. PII Redaction Technique

We extract high-level PII information using a two-stage redaction method. This approach combines NER-based replacement with rule-based pattern matching.

In the first stage, we utilize a pretrained NER model provided by the spaCy library [32] to identify sensitive entities within the text. These entities include:

PERSON—names of individuals,
GPE, LOC—geopolitical and geographic locations,
ORG—organization names,
DATE, EMAIL, PHONE—common identifiers,
LAW, MONEY—legal and financial terms.

Once the model detects an entity, it is replaced with a corresponding placeholder token to cover its content while preserving the semantic structure. For example, John Smith is replaced with [NAME], New York becomes [LOCATION], and john@example.com becomes [EMAIL].

In the second stage, we apply regular expressions to identify sensitive patterns not consistently captured by NER. A regular expression includes:

email addresses using standard formats,
phone numbers in local and international formats,
dates in various structures (e.g., 01/01/2024, 2024-01-01),
sensitive keywords, such as password, confidential, secret, Enron, or financial values like million or SAR.

We replace such patterns with a generic [REDACTED] tag. This hybrid approach ensures that both structured and unstructured indicators of sensitive information are redacted adequately before the text is passed to the next stage. By combining statistical NER with handcrafted rules, we improve coverage and reduce the likelihood of missed privacy leaks.

4.3. Anomaly Detection Using Denoising Autoencoder

After the preprocessing stage, we apply a DAE as an anomaly detection model. The intuition is that the DAE, trained on privacy-sensitive messages, will learn the common patterns associated with such risks. Suppose a new message can be reconstructed accurately. In that case, it is likely to resemble previously seen privacy-risk cases, whereas poor reconstruction indicates a novel or less typical pattern that may also warrant attention.

Let

x \in R^{d}

denote the normalized embedding of an email message, where d is the number of features representing the message (e.g., binary entity-presence indicators or word-level features). During training, We add Gaussian noise to obtain a corrupted input:

\tilde{x} = x + ϵ, ϵ \sim N (0, σ^{2} I)

(1)

where

ϵ

is Gaussian noise with variance

σ^{2}

and

\tilde{x}

is the noisy version of the input. This corruption forces the autoencoder to learn robust representations of sensitive patterns.

The encoder maps the corrupted input to a latent representation:

z = f_{θ} (\tilde{x}) = ReLU (W_{2} \cdot ReLU (W_{1} \tilde{x} + b_{1}) + b_{2})

(2)

where z is a latent vector that captures privacy-related features, and

f_{θ}

is the encoder parameterized by weights and biases

θ = {W_{1}, W_{2}, b_{1}, b_{2}}

.

The decoder reconstructs the input from z:

\hat{x} = g_{ϕ} (z) = ReLU (W_{4} \cdot ReLU (W_{3} z + b_{3}) + b_{4})

(3)

where

\hat{x}

is the reconstructed embedding of the original message, and

g_{ϕ}

is the decoder parameterized by

ϕ = {W_{3}, W_{4}, b_{3}, b_{4}}

.

The reconstruction loss is minimized during training using the mean squared error (MSE):

L_{D A E} = \frac{1}{d} \sum_{i = 1}^{d} {(x_{i} - {\hat{x}}_{i})}^{2}

(4)

where

x_{i}

and

{\hat{x}}_{i}

are the i-th components of the original and reconstructed embeddings.

For a new message x, we compute the reconstruction error:

score (x) = \frac{1}{d} \sum_{i = 1}^{d} {(x_{i} - {\hat{x}}_{i})}^{2}

(5)

A lower score indicates that the message is close to the distribution of risky training samples, while a higher score suggests deviation from known privacy-sensitive patterns.

Finally, the scores are normalized to a range of 0–100 for comparability:

A E_S c o r e (x) = \frac{score (x)}{{max}_{x} score (x)} \times 100

(6)

where

A E_S c o r e (x)

represents the anomaly score of message x. Messages with low reconstruction error (low

A E_S c o r e

) are semantically similar to high-risk training examples, while higher scores indicate benign or unseen patterns.

Before training, we manually select the records that contain a high privacy risk. This step is important because it enables the model to focus on sensitive patterns rather than general text. The DAE is still trained in an unsupervised way, since it learns to reconstruct its inputs without using labels. By focusing on risky content, the DAE becomes more specialized in detecting privacy risks rather than modeling all kinds of text.

4.4. Semantic Classification

We use a zero-shot text classifier based on the BART model [33], which is pretrained on natural language inference (NLI) tasks. In NLI, the model learns to decide whether a given hypothesis logically follows from a given premise. We apply this ability by treating the input message as the premise and the classification labels as hypotheses. Specifically, we use the labels “This message is sensitive” and “This message is not sensitive” as hypotheses. The classifier then assigns a confidence score to each label based on how well it matches the content of the input message, without requiring any task-specific training. We include this component in our pipeline to capture cases where sensitivity arises from the overall meaning or context of the message, rather than the presence of specific keywords or known entities. It helps detect slight or implicit privacy risks that rule-based methods may miss. The resulting semantic score contributes to the overall privacy risk score used in the decision-making stage.

4.5. Privacy Scoring

To assess whether a text poses a privacy risk, we calculate a risk score that combines three components: Semantic classification, anomaly detection using DAE, and NER.

First, we use a semantic classification confidence score associated with the Sensitive label and scale it as follows:

SemanticScore (x) = classifier {(x, {Sensitive, Non - Sensitive})}_{Sensitive} \times 100

(7)

Next, we compute a named entity score using the NER model. For each identified entity that belongs to a set of PII types (e.g., PERSON, GPE, ORG, EMAIL, DATE), we increment the score by a fixed weight of 15:

NERScore (x) = 15 \times | {e_{i} \in x | label (e_{i}) \in {PII}_{-} Tags} |

(8)

Finally, we include the anomaly score from a DAE that quantifies how much a message deviates from the patterns of sensitive text.

We compute the final risk score as a weighted sum of all three components:

RiskScore (x) = 0.4 \cdot SemanticScore (x) + 0.3 \cdot {AE}_{-} Score (x) + 0.3 \cdot NERScore (x)

(9)

We choose these weights based on the type of signals each method provides. Semantic classification receives the highest weight (40%) because it evaluates the contextual sensitivity of the text. The anomaly score and NER-based score each receive 30%, as they offer complementary evidence: one from distributional deviation and the other from explicit identifiers. We validate the robustness of this weighting scheme through ablation and sensitivity experiments.

We make the final risk score in the range

[0, 100]

. If the score exceeds a threshold of 60 or if the text includes any PII entities, we apply SPADR remediation. This optimal threshold for triggering the SPADR pipeline is derived using Youden’s Index [34] and the ROC curve as follows:

J (t) = Sensitivity (t) + Specificity (t) - 1 = TPR (t) - FPR (t)

(10)

This method ensures that privacy protection is activated only when the message poses a meaningful risk.

This privacy score will help distinguish public knowledge from private information because SPADR combines semantic classification, anomaly detection, and NER to assign risk scores. Public knowledge (e.g., “Philadelphia, July 4, 1776” [35]) is commonly known in non-sensitive contexts and therefore produces a low anomaly score and a low semantic sensitivity score. By doing that, when a public query is passed, it will not exceed the risk threshold and pass unchanged through SPADR. On the other hand, context-specific details such as (“the youngest executive in the Riyadh branch met with the ministry on July 4, 2024”) will get higher risk scores due to rarity and relational cues. This design allows SPADR to avoid over-blocking historical or well-known facts while still flagging sensitive private information.

4.6. Graph Analysis and Remediation

We propose to use graph analysis because it provides a powerful way to model and understand the relationships between entities in text. Unlike linear models, which treat tokens in isolation or sequence, graph-based approaches capture how entities are connected, enabling richer context modeling. In our goal of privacy protection, a single piece of information may not be risky on its own. However, when combined with other entities such as a person’s name, location, and organization, it can reveal sensitive or identifying patterns in relationships. By constructing graphs that represent these co-occurrences, we can detect high-risk entity combinations that standard redaction tools may not flag. This relational view enables the SPADR pipeline to enhance privacy risk scores and trigger more robust remediation when entity clusters indicate sensitive exposure. To achieve this, we propose the following two strategies:

4.6.1. Strategy 1: Graph-Based Risk Detection and Remediation

This strategy addresses hidden privacy risks in textual data by employing a graph-based approach to analyze the relationships between sensitive entities. While traditional NER methods are effective in identifying and redacting individual entities, such as names, organizations, or phone numbers, redaction alone is often insufficient. In many cases, even after a specific term is removed, the surrounding context may still reveal private meaning. For example, the presence of a redacted name alongside a known company and a date may be enough to infer the individual’s identity. To handle this type of contextual privacy leakage, we construct an entity graph that models the relationships between detected entities. It allows us to reason about entity combinations, not just isolated terms.

As shown in Figure 2, the process begins with entity extraction. We apply PII redaction, as described in Section 4.2, to detect key entity types such as PERSON, ORG, GPE, DATE, PHONE, and EMAIL. These entities are extracted using a combination of NER and regular expressions. Each entity is added as a node in an undirected graph, and edges are created between every pair of nodes to represent their co-occurrence within the same text. We implement this entity graph using the NetworkX library [36], which provides a flexible and efficient framework for representing and analyzing graph structures in Python 3.10. The graph-based representation enables the detection of higher-order relationships between entities that might otherwise remain unnoticed.

We formalize this process in Algorithm 1, which outlines how we extract entities and construct the graph.

Algorithm 1 Build Entity Graph from Text

1:: Input: Raw text t
2:: Output: Undirected entity graph G
3:: Extract named entities using NER (e.g., PERSON, ORG, DATE, etc.)
4:: Extract phone numbers and emails using RegEx
5:: Initialize empty graph G (using NetworkX)
6:: for all contact ∈ {phones, emails} do
7:: Add node to G with label = “PHONE” or “EMAIL”
8:: end for
9:: for all entity ∈ NER entities do
10:: Add node to G with label = entity type
11:: end for
12:: for all pairs of nodes $(i, j) \in G$ do
13:: Add undirected edge between i and j
14:: end for
15:: return G

Once the graph is built, we compute a privacy risk score for each input, as described in Equation (9). This score combines semantic sensitivity, anomaly detection, and entity presence. If the final score reaches 100, we classify the text as extremely risky and delete it entirely to prevent privacy leakage.

After scoring, we analyze the entity graph to detect sensitive combinations. For instance, a graph that includes a person, an organization, and a date may imply an employment or meeting context. Similarly, a person connected to both a phone number and an email address suggests a direct contact risk, even if the entities themselves are redacted. When such patterns are identified, we mark the text with boosted_by_graph = TRUE, indicating that the graph-based reasoning increased the privacy risk and triggered a more stringent remediation response.

Based on both the risk score and the graph structure, we apply one of three remediation strategies. For extremely high-risk cases, we use deletion, replacing the entire content with a placeholder warning. For moderately risky combinations, we apply generalization, where sensitive entities are replaced with generic labels, such as <person> or [organization]. In lower-risk cases, we use PII redaction, masking only the detected entities while preserving the surrounding context.

Figure 3 illustrates an example entity graph derived from a single piece of text. In this case, the graph connects nodes of types PERSON, ORG, and DATE, forming a triangular structure. Even if each term is redacted individually, their co-occurrence suggests a meeting or job affiliation. This example demonstrates how graph analysis enables SPADR to identify subtle privacy risks that linear methods would overlook.

This graph-based strategy enables us to identify nuanced privacy risks that simpler redaction methods may overlook. By modeling how entities co-occur and interact within a document, we enable smarter, context-aware remediation decisions. In doing so, we reduce the risk of sensitive information leakage while preserving the original data’s usefulness.

4.6.2. Strategy 2: Graph-Boosted Threshold-Based Remediation

In the previous section (Strategy 1), we applied rule-based pattern detection on entity graphs to identify specific high-risk combinations such as PERSON + PHONE + EMAIL and trigger targeted remediation. However, in this second strategy, we adopt a more general approach. Rather than searching for specific patterns, we use the entity graph solely to check whether any relationships exist between entities within a given text. If the graph contains at least one edge, we set a binary flag boosted_by_graph = TRUE, indicating that further remediation is required. The overall risk score then drives remediation. This design makes Strategy 2 more scalable and broadly applicable, as it relies on threshold-based logic rather than handcrafted rule enumeration.

In this strategy of the SPADR pipeline, we combine lightweight graph analysis with risk-aware thresholding to determine the appropriate remediation strategy. After applying PII redaction, we construct an entity graph using the NetworkX library [36], which allows us to examine the co-occurrence structure of detected entities. This graph-based signal helps detect hidden relational risks that may not be evident from redacted tokens alone.

In this strategy, each text is first represented as a graph, where:

1.

Nodes (entities) represent:

(a): PII information, such as a person, location, organization, and phone number
(b): Attributes that have relationships with each other.
For example, an increase in sales → Summer season

2.

Edges represent the relationships between entities.

The Graph Analysis strategy is crucial for optimizing further remediation strategies, which have been applied to improve the rising privacy risks in textual data using the following steps:

Step 1: The textual data acts as an input for our Graph Analysis

Step 2: Graph Analysis works by assigning a flag, [boosted_by_graph], to individual texts, indicating whether the text contains entities that have relationships with each other. If such relationships exist, the text is passed on for applying remediation strategies. Otherwise, the text is considered harmless and does not contain any private information, and should be left unchanged.

The pseudocode for graph analysis is as follows (Algorithm 2):

Algorithm 2 Determine if Entity(x) has Relationship with Entity(

x_{n}

) in Textual Data

1:: Input: {Textual Data}
2:: Output: {Graphs, texts labeled with [boosted_by_graph] flag}
3:: Build graph G from entities in Textual Data
4:: if length( $G . edges$ ) > 0 then
5:: [boosted_by_graph] ← TRUE
6:: else
7:: [boosted_by_graph] ← FALSE
8:: end if

After our textual data has been classified by [boosted_by_graph] = TRUE, we apply remediation strategies to our texts, as shown in Figure 4. In this paper, we demonstrate three types of remediation techniques:

Granular Redaction: This technique involves replacing only the sensitive or private PII entities with their labels. For example, “John works at Apple” will be generalized as “<person> works at <organization>”.
Summarization: This technique involves creating a concise summary directly from the text’s content. To apply this technique, we have integrated the pretrained LLM model [37] sshleifer/distilbart-cnn-12-6, which is a distilled version of BART-large-CNN designed to produce short and efficient summaries from the text.
Deletion: This technique is used when the text contains highly sensitive PII information, which, even after applying granular redaction or summarization, can still be used to reveal private entities within the text and lead to the re-identification of the text’s content. In this case, the deletion technique is applied to delete the text completely due to the high risk of privacy violation.

The optimal thresholds for applying remediation strategies are determined using Youden’s Index Equation (10) and by examining the statistical distribution of risk scores in Table 1.

Let us look at how these remediation techniques are used to apply them to our textual data strategically:

Step 1: First, check if the text is flagged by [boosted_by_graph] to TRUE, indicating that more than two relationships exist inside the text’s content.

Step 2: Next, observe the risk scores assigned to each text using anomaly detection, which are computed based on the combination of Semantic scores, AutoEncoder scores, and NER scores, as given in Equation (9) for computing the risk score.

Step 3: If the risk score is less than 55% or between 55% and 76%, granular redaction is applied. If it is equal to 76% or falls between 76% and 86%, summarization is applied. If it is equal to or exceeds 86%, deletion is applied.

Step 4: If the text is flagged by [boosted_by_graph] to FALSE, it means that the text does not contain any sensitive PII entities and is simply plain text, so remediation strategies must not be applied to these texts, and they should remain unchanged.

Finally, we obtain cleaned textual data with a reduced risk of privacy violation.

The pseudocode for the remediation strategy is as follows (Algorithm 3):

Algorithm 3 Applying Remediation Techniques on Textual Data

1:: Input: {text, risk_score, boosted_by_graph}
2:: if boosted_by_graph = TRUE then
3:: if risk_score $< 55$ or ( $55 \leq$ risk_score $< 76$ ) then
4:: Apply Granulated_Redaction
5:: else if $76 \leq$ risk_score $< 86$ then
6:: Apply Summarization
7:: else
8:: Apply Deletion
9:: end if
10:: else
11:: return Original_Text ⊲ boosted_by_graph = FALSE
12:: end if

5. Experiments

We divide our experiment into four main steps: collecting and preprocessing the data, training the model, evaluating its performance, and finally, discussing the effectiveness of our methods.

5.1. Data Collection

We use the publicly available Enron Email Dataset [35], which contains about 500,000 emails from employees of the Enron Corporation, a U.S. energy company that collapsed in 2001 due to corporate fraud [38]. The dataset was released during the federal investigation and is now widely used in research on NLP, privacy, and organizational communication. This dataset is suitable for our goal because it contains a diverse range of text types, including financial communications, personal messages, and private plans. Such diversity provides a rich source of sensitive information, enabling us to evaluate privacy risks in various contexts. For this study, we randomly select 10,000 emails. We focus on the body of each email because it often contains sensitive or private content, such as personal details, organizational roles, or informal conversations. We then split the data into three parts: training, validation, and testing. We use the training set to train the DAE, the validation set to tune hyperparameters, and the test set to evaluate the full SPADR pipeline.

In addition, we generate 500 synthetic records using GPT-4. These records include HR emails, bank notices, clinic notes, and support tickets, each written in 3–5 sentences with sensitive information, such as names, contact details, financial identifiers, and health notes. We use synthetic data because we need to evaluate our pipeline on a dataset that is more diverse and that covers domains beyond corporate finance. In particular, there is no publicly available health dataset with realistic sensitive content. By combining the Enron set with synthetic records, we can test SPADR on both financial and health-related data and evaluate its ability to generalize across domains.

Synthetic Data Generation

We generate 500 synthetic records to expand the diversity of our dataset and to include domains not covered by Enron. We use the GPT-4 API model. We design structured prompts to create realistic but fictitious documents, such as HR emails, bank notices, clinic notes, and customer support tickets. Each prompt instructs the model to write between three and five sentences in English and to include specific types of sensitive information. Our prompts design contains personal identifiers (e.g., names, social security numbers, IDs), contact details (e.g., emails, phone numbers, addresses), financial records (e.g., bank accounts, credit cards), and, in some cases, health content (e.g., diagnoses and prescriptions). Example prompts are provided in Appendix A.

Then, we validate the dataset in three steps. First, we apply regular expressions and a BERT-based NER model to confirm that each synthetic record contains at least one sensitive entity. On average, every record includes between three and five sensitive entities. Second, we perform a human review of a random sample of records to ensure that the texts are natural and that the sensitive information is realistic but still fictitious. Third, we compare entity distributions with the Enron dataset. Both sets contain a similar mix of PERSON, ORG, EMAIL, and DATE entities. In contrast, the synthetic set adds structured identifiers, such as social security numbers and card numbers.

To measure the distributional shift between Enron and the synthetic data, we compute sentence embeddings using MiniLM and calculate the cosine similarity between the mean embeddings of both datasets. Specifically, we average the embeddings of all records in each dataset and then compute the cosine similarity between the two mean vectors. The similarity score of 0.57 indicates that the two sets are semantically related but distinct. This shows that the synthetic records introduce new styles and domains (e.g., health and ID records) while remaining close enough to Enron emails to serve as a realistic test set.

Therefore, by combining Enron with the synthetic records, we were able to obtain a dataset that includes both real corporate communication and diverse synthetic examples from finance and health domains. This setup provides a stronger evaluation of SPADR’s generalization ability.

5.2. Model Training

We train our model denoising autoencoder to reconstruct non-sensitive email content and detect potential privacy risks. We explain the architecture of the model in Section 4.3, where it consists of an encoder with two linear layers that reduce the dimensionality of the input, followed by a decoder that reconstructs the original input. During training, we add Gaussian noise to the input vectors to help the model learn more robust representations. The model is trained to minimize the reconstruction loss using the mean squared error (MSE) metric.

To select the final hyperparameters, we employed a random search over various configurations and chose the set that achieved the best performance on the validation set. We used the Adam optimizer with an initial learning rate of 0.01. A decaying learning rate schedule reduced the learning rate by half if the validation loss did not improve over seven epochs, with a minimum allowed learning rate of 0.000001. We also applied early stopping with a patience of 10 epochs and a maximum of 200 training epochs. This setup ensured stable convergence and prevented overfitting. The main hyperparameters used in training are summarized in Table 2.

We also apply normalization to keep input values within the [0, 1] range, ensuring the autoencoder output remains comparable to the original input.

5.3. Evaluation Metrics

To evaluate the effectiveness of the proposed SPADR pipeline, we adopted three key metrics: two for assessing privacy risk and one for preserving utility. These metrics differ in their objectives but collectively ensure a balance between privacy and utility.

5.3.1. Attribute Inference Attack Accuracy (AAA)

This metric evaluates whether sensitive attributes remain in the text after cleaning. For each original text

t_{i}

and its anonymized version

{\hat{t}}_{i}

, we extract named entities (e.g., PERSON, ORG, LOC) using an NER model. Let

E (t_{i})

be the set of entities in

t_{i}

and

E ({\hat{t}}_{i})

the set in

{\hat{t}}_{i}

. Attribute inference attack accuracy is computed as:

AAA = \frac{1}{n} \sum_{i = 1}^{n} \frac{| E (t_{i}) \cap E ({\hat{t}}_{i}) |}{| E (t_{i}) |} \times 100 %

(11)

A lower AAA indicates better privacy protection.

5.3.2. Membership Inference Attack (MIA)

We examine whether our models are vulnerable to MIA, which tests whether training examples can be distinguished from held-out data using the model’s loss [39]. Following this approach, we compute a membership score based on the negative average cross-entropy loss for each sequence and evaluate attack success with ROC-AUC. The detailed setup is provided in Appendix B.

5.3.3. BERTScore for Utility Preservation

To ensure anonymization does not destroy the semantic content, we compute BERTScore, which measures the contextual similarity between the original text and the anonymized text using transformer-based embeddings. For a reference sentence r and candidate

\hat{r}

:

Precision = \frac{1}{| \hat{r} |} \sum_{x \in \hat{r}} max_{y \in r} cos (h_{x}, h_{y})

(12)

Recall = \frac{1}{| r |} \sum_{y \in r} max_{x \in \hat{r}} cos (h_{y}, h_{x})

(13)

F 1 = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall}

(14)

A higher BERTScore indicates better semantic preservation.

5.4. Model Performance

Table 3 presents a comparison of privacy protection and utility preservation across four methods: the original unprotected baseline, a NER-based redaction system, and two SPADR strategies, SPADR (S1) and SPADR (S2).

5.5. Runtime Efficiency

We evaluate the runtime of the SPADR pipeline on a PC with an NVIDIA RTX 4070 GPU, Intel i7-13700K CPU, and 32 GB RAM. The pipeline includes semantic classification, denoising autoencoder scoring, NER detection, and graph-based remediation. Processing a single document takes about 5 s. This time is mainly due to the zero-shot classifier, which requires initialization and decoding. When we process documents in batches, the throughput improves. For example, four documents are processed in 0.70 s, equivalent to 5.7 papers per second. These results demonstrate that SPADR operates efficiently in practice, particularly when documents are processed in batches, as is common in real-world deployment settings.

5.6. Graph Analysis and Remediation

This section evaluates the effectiveness of SPADR’s graph-based remediation techniques through two distinct strategies: a fixed rule-based approach (see Figure 2) and a dynamic, graph-driven remediation framework (see Figure 4). Both strategies leverage the computed privacy risk scores and semantic relationships among named entities to determine appropriate transformation actions.

5.6.1. Strategy 1: Fixed Rule-Based Remediation

This approach applies predefined remediation rules based on the email’s risk score and the structural connectivity of entities within the graph. Specifically, emails with a critical privacy risk (i.e., 100%) are deleted, whereas others undergo targeted redaction or generalization based on identified sensitive elements attributes.

CASE 1

Original Text: John works at OpenAI and has a meeting on July 10. The graph analysis is shown in Figure 5
Remediation Action: Replace the organization name with a generic label and the date with a placeholder.
Final Text: Person works at XYZ Company and has a meeting on [DATE].
Risk Score: 63.10%
SPADR Applied: Yes
Boosted by Graph: True

CASE 2

Original Text: Please contact Mary via mary.smith@example.com or 123-456-7890. The graph analysis is shown in Figure 6
Remediation Action: Redact the personal name and partially redact the phone number, while retaining the email address for communication context.
Final Text: Please contact [NAME] via mary.smith@example.com or [NUMBER]456-7890.
Risk Score: 56.05%
SPADR Applied: Yes
Boosted by Graph: True

CASE 3

Original Text: Ahmed from IBM will be visiting Texas on 2024-09-01. The graph analysis is shown in Figure 7
Remediation Action: Replace the organization, location, and date with generic placeholders.
Final Text: Ahmed from XYZ Company will be visiting X City on [DATE].
Risk Score: 69.68%
SPADR Applied: Yes
Boosted by Graph: True

CASE 4

Original Text: Budget update shared by Sarah from Microsoft on Friday. The graph analysis is shown in Figure 8
Remediation Action: Generalize all named entities (e.g., person, organization) and temporal expressions.
Final Text: XYZ Company update shared by Person from XYZ Company on [DATE].
Risk Score: 64.45%
SPADR Applied: Yes
Boosted by Graph: True

5.6.2. Strategy 2: Dynamic Graph-Based Remediation Framework

Unlike the fixed-rule strategy, this dynamic framework adaptively selects remediation techniques based on a combination of privacy risk thresholds and graph-derived semantic entity relationships. The following cases illustrate the decision rules and outcomes. These examples were collected from Enron Finance email datasets [38].

CASE 1: Granular Redaction

Original_Text: Dear Sir/Madam, Your flu vaccine has arrived. Please come to the Health Center, EB307, between 8:00 a.m. and 3:30 p.m. on Monday, December 4, or at your earliest convenience this week. Please be advised that the other persons on the waiting list will be notified via e-mail as more shipments of the vaccine arrive. Thank you in advance for your cooperation. The graph analysis is shown in Figure 9

Graph_Analysis:

boosted_by_graph: TRUE, since the text contains relationships among different types of entities

Risk_Score_Percent: 67.9887924194336%

Condition_satisfaction: risk_score >= 55 and risk_score < 76

Remediation_Technique_to_apply: Granular_Redaction

Final_Cleaned_Text: (’Generalized = ’, ’dear sir/madam your flu vaccine has arrived please come to the health center eb307 between <Time> a m <Date> p m on <Date> or at your earliest convenience <Date> please be advised that the other persons on the waiting list will be notified via e-mail as more shipments of the vaccine arrive thank you in advance for your cooperation’)

CASE 2: Summarization

Original_Text: hey there how are you feeling i was looking on the northwest airlines website for flights to china i am thinking that we need to act on this pretty soon in order to get a good rate so far i have found the flights that cathy was talking about they are approx $79,500 she had stated that she heard that the price was going to go up to $1400 after the first of the month anyway what do you think we should do can you maybe talk with eric today or we can wait and talk to them this evening and try and figure things out. The graph analysis is shown in Figure 10

Graph_Analysis:

boosted_by_graph: TRUE, since the text contains relationships among different types of entities

Risk_Score_Percent: 76.55%

Condition_satisfaction: risk_score >= 76 and risk_score < 86

Remediation_Technique_to_apply: Summarization

Final_Cleaned_Text: (’Summary = ’, ’ i was looking on the northwest airlines website for flights to <Location> i am thinking that we need to act on this pretty soon in’)

CASE 3: Text Boosted by Graph for Deletion

Original_Text: subject re dinner on friday anthonys sounds good to me 7 it is larry w bass on 03/08/2001 10:34:35 am to eric bass enron com cc subject re dinner on friday good morning looks like mother will be in town this weekend so dinner friday 7 is fine with us if you are still available if so lets try something different how about anthony ’s baroque brownstone or river oaks grill your choice dutch treat let me know and i will get reservations -dad subject re dinner on friday 7:00 preferably larry w bass on 03/07/2001 10:58:45 a.m. to eric bass enron com cc subject re dinner on friday what time you have in mind son subject dinner on friday hey dad did you talk to the better half i was thinking trulucks but if you can’t attend i understand let me know eric. The graph analysis is shown in Figure 11

Graph_Analysis:

boosted_by_graph: TRUE, since the text contains relationships among different types of entities

Risk_Score_Percent: 100.00%

Condition_satisfaction: risk_score >= 86

Remediation_Technique_to_apply: Deletion

Final_Cleaned_Text: Text Deleted due to High Privacy Violation

CASE 4: No Remediation

Original_Text: per my voicemail below are the revised online GTCs. Sorry for any inconvenience this might have caused you

Graph_Analysis: NONE, no entities found to build a graph

boosted_by_graph: FALSE

Risk_Score_Percent: 35.69%

Condition_satisfaction: NONE

Remediation_Technique_to_apply: Return the text unchanged (Original) because it is harmless

Final_Cleaned_Text: per my voicemail below are the revised online gtc ’s sorry for an inconvenience this might have caused you

5.7. Comparison of spaCy Library with Other NLP Libraries

As shown in Table 4, we compare the use of the SpaCy library to identify NER entities in our SPADR pipeline with RoBERTa [41], a pretrained transformer model developed by Facebook AI, and Stanza [43], a Python NLP library developed by Stanford.

The following examples in Table 5 and Table 6 are taken from the Enron Email Dataset [35]. We use these samples to compare the outputs of SpaCy NER with RoBERTa and Stanza. For readability, we skip the raw email text and provide brief descriptions of the examples instead.

Example 1: SpaCy vs. RoBERTa This example is an Enron trading email discussing the setup of annuity deals and settlement dates. It contains sensitive details, such as dates, quantities, and financial terms.

Explanation: In this case, SpaCy correctly detects sensitive entities, such as dates and amounts, which leads to a higher risk score. The graph module then boosts the score due to multiple entity relationships, and the system applies summarization as the remediation strategy. In contrast, RoBERTa fails to identify enough entities, produces a lower risk score, and does not trigger any remediation. This shows that SpaCy is more effective for detecting sensitive content in this example.

Example 2: SpaCy vs. Stanza This example is an internal Enron update on DASR progress and legislative issues. It contains employee names, organizational details, and dates.

Explanation: In this case, SpaCy successfully identifies multiple sensitive entities, such as names and organizations. The graph analysis reveals several previously hidden relationships, which increase the privacy risk score and prompt the need for a deletion strategy. Stanza, however, detects only one relationship and produces a lower risk score, leaving the text unrepaired. This highlights how SpaCy provides stronger coverage of sensitive information compared to Stanza in this context.

These experiments show that SpaCy consistently outperforms both RoBERTa and Stanza in identifying sensitive entities within Enron emails. By generating more accurate risk scores and enabling graph-based boosting, SpaCy supports more reliable remediation strategies for privacy protection.

6. Discussion

As shown in Table 3, our experimental results demonstrate that SPADR is effective in reducing privacy risks in text while preserving a significant portion of the original meaning. Compared to the baseline and a standard NER-based redaction approach, SPADR strikes a balance between removing sensitive content and maintaining the text’s readability and value.

We begin our experiment by measuring the raw baseline, without anonymization. The baseline results, as expected, show 100% leakage across all metrics, underscoring the need for effective privacy-preserving methods. Among the NER-based baselines, BERT achieves the highest semantic utility (BERTScore F1 = 99.90%), as it retains the majority of the original text. However, it focuses only on explicit entities: although it achieves a low attribute leakage score (AAA = 1.46), 22.63% of documents still leak identifiable information. RoBERTa performs worse overall, reducing leakage less effectively (48.91%) and exhibiting a larger drop in utility (93.50%). These results indicate that purely entity-based redaction is highly dependent on the underlying model and still fails to address context-dependent or implicit privacy risks.

In contrast, SPADR achieves stronger overall protection by combining entity detection with semantic and graph-based analysis. SPADR (S1) reduces leakage to 21.90%, and SPADR (S2) achieves the best results, lowering leakage to 16.06% while maintaining utility at 88.03%. Although SPADR reports higher AAA values than the NER baselines, this reflects a deliberate trade-off: the system allows benign attributes to remain while removing only those judged as risky. This design enables SPADR to protect against both explicit identifiers and relational or semantic risks that NER-only approaches cannot capture.

SPADR (S1), which combines semantic anomaly detection with a denoising autoencoder, significantly improves privacy protection. It reduces the document PII leak rate to 21.90% and attribute leakage to 12.41%. This improvement comes from the model’s ability to detect unusual or sensitive content beyond standard NER tags. However, S1 occasionally modifies benign parts of the text, which lowers its BERTScore to 86.43%.

SPADR (S2) extends S1 by integrating a broader graph-based remediation approach. Both strategies utilize entity graphs, but their operational logic differs. SPADR (S1) applies enhanced remediation exclusively when specific high-risk co-occurrence patterns (e.g., [PERSON, DATE, LOCATION]) are detected. This approach limits its scope to predefined sensitive relational structures. In contrast, SPADR (S2) considers any inter-entity relationship as a potential indicator of privacy leakage. This method combines graph connectivity with dynamic risk-score thresholds to determine remediation actions, resulting in more consistent management of subtle relational and contextual exposures. The improved performance of SPADR (S2), as demonstrated by a reduction in the document-level PII leak rate to 16.06% and an increase in utility preservation to a BERTScore F1 of 88.03%, is due to both threshold tuning and the generalization provided by graph-boosted thresholding. As a result, SPADR (S2) addresses a wider range of latent privacy risks than the fixed rule-based strategy used in SPADR (S1).

SPADR (S2) distinguishes between private information and public knowledge using DAE scores, which are trained on privacy-sensitive messages, and semantic classification scores, which determine whether the given text is sensitive or non-sensitive. For private information, both methods produce higher risk scores, triggering graphical analysis to examine relationships and apply remediation strategies to these texts accordingly. In contrast, for public knowledge, the DAE and semantic classification produce low risk scores; as a result, graph analysis is not performed, remediation strategies are not applied, and public knowledge remains unchanged, as shown in Table 7.

We also tested the robustness of the weighting scheme used in the privacy risk score. We analyze the weight on a random subset of 500 records due to computational limitations. An ablation study confirmed that semantic classification has the most significant influence on the score, supporting its higher weight. We then adjusted the weights in increments of 0.1, keeping the sum normalized to 1.0. The results are shown in Table 8. SPADR maintained stable performance across a wide range of values (semantic 0.2–0.5, anomaly/NER 0.2–0.4). As a result, we assign weights of

0.4

,

0.3

, and

0.3

to semantic classification, anomaly detection, and NER, respectively.

To assess the privacy preservation capabilities of our SPADR pipeline, we conduct an MIA [2]. We fine-tune GPT-2 on the raw, NER-redacted, and SPADR-redacted datasets, then compute per-sample losses and derive an MIA score defined as explained in Section 5.3. The ROC AUC of this score against the true membership label (train vs. test) quantifies how well an attacker can infer training membership.

As shown in Table 9, the raw dataset shows an AUC of 0.6626, indicating moderate memorization risk. NER-based redaction offers only a slight reduction in leakage (AUC = 0.5022). In contrast, SPADR strategies significantly reduce leakage: SPADR-S2 achieves 0.4637, and SPADR-S1 performs best with 0.4231. These results demonstrate that SPADR effectively mitigates memorization while preserving utility.

These results highlight that SPADR benefits from combining deep semantic understanding with structural context. Its design enables it to identify and mitigate privacy risks that rule-based methods, such as NER, cannot capture. This makes SPADR a more adaptive and better solution for real-world anonymization tasks where sensitive information can appear in unpredictable ways.

There is a trade-off when we use the SPADR pipeline to filter prompts before sending them to LLMs. One of the main strengths of LLMs is their ability to understand users on a personal level. They remember names, preferences, and past topics, which helps them give better and more helpful answers. However, this strength also creates a privacy risk. The model might accidentally reveal private details or disclose sensitive information. By filtering and redacting the input, SPADR lowers this risk. While this means the model may lose some helpful context, in most cases, protecting user privacy is more important than keeping personalization.

One limitation of our current graph construction is that it relies only on entity co-occurrence within the exact text. It provides a simple relational view but ignores important aspects, such as the type, direction, and strength of relationships. As a result, some complex privacy risks that depend on specific relationship types may be missed.

Another limitation of our current design is that SPADR relies on a pretrained model such as spaCy’s NER, which works well on English and Western names but performs poorly on non-Western or multilingual entities. Since our graph analysis relies on NER outputs, this limitation can result in missed cases in multilingual texts. We highlight these limitations and potential future improvements in the Conclusions.

Risks to Validity

There are several risks to the validity of our findings. First, our evaluation relies mainly on the Enron Email Dataset [35] and a synthetic dataset we created. While these are useful for testing, they may not fully represent other domains, such as healthcare, finance, or social media.

Second, the DAE is trained only on high-risk records. This design helps the model focus on sensitive patterns but also creates a risk of bias if the selected subset is not fully representative. As a result, the model may miss certain types of privacy risks in unseen data.

Finally, the remediation thresholds and rules we used may be sensitive to the dataset. Although they worked well in our experiments, their effectiveness may vary on other datasets or languages, which affects the validity of our conclusions.

7. Conclusions

In this paper, we present SPADR, a privacy protection pipeline that detects and removes sensitive information from text. SPADR combines semantic anomaly detection with graph-based analysis to catch both direct and hidden privacy risks.

We evaluate SPADR on the Enron Email Dataset [35], and the results show that it works well. The improved version, SPADR (S2), reduces privacy leak rates to 16.06% while maintaining text meaning with a BERTScore of 88.03%.

This work is important because LLMs can memorize and leak sensitive information if their training data are not carefully cleaned. Even more, LLMs can also leak private details when users send sensitive input at runtime. SPADR helps in both cases. It can clean training data before they are used to build models, and it can also act as a privacy filter before sending user input to an LLM. By identifying sensitive content that standard redaction tools may overlook, SPADR provides a more flexible and accurate method for protecting user privacy.

Despite the demonstrated effectiveness of the SPADR pipeline in enhancing privacy protection for textual data, several limitations warrant consideration. First, the evaluation is limited to the Enron Email Dataset, which represents a specific domain of corporate communication. As a result, the generalizability of SPADR to other contexts, such as clinical records, legal documents, or informal social media texts, remains unverified and requires further investigation. Second, the graph-based entity analysis relies on predefined patterns of sensitive entity co-occurrence (e.g., [PERSON, DATE, LOCATION]), which, although useful, may fail to detect more intricate or context-specific privacy risks. Third, the adopted remediation strategies—such as redaction, summarization, and deletion—may compromise textual utility, particularly in cases where content fidelity is essential. Additionally, the effectiveness of the pipeline depends heavily on the accuracy of entity recognition tools, including named entity recognition and rule-based matchers. Misclassifications or omissions at this stage can undermine the reliability of subsequent remediation efforts.

SPADR still has several limitations. First, our evaluation utilizes the Enron Email Dataset [35], which originates from the corporate communication domain, and a synthetic dataset that we create to simulate sensitive content. While both are useful for testing, they may not generalize well to other contexts, such as healthcare, legal, or social media data. Second, graph-based entity analysis relies on predefined patterns of entity co-occurrence, which may overlook more complex or context-specific privacy risks. Third, the system relies on spaCy’s NER, which works well for English and Western names but often fails to detect non-Western or multilingual entities. This lowers the accuracy of graph analysis in multilingual settings.

Additionally, the anomaly detection module requires substantial computational resources. It is trained only on high-risk data, which makes its performance sensitive to the completeness of the labeled subset. Finally, the current version of SPADR supports only English text.

In future work, we plan to extend SPADR to cover more domains and languages. We will use multilingual or fine-tuned NER models to enhance the recognition of non-Western names and entities, and explore domain-specific fine-tuning to minimize false negatives in multilingual text. We also aim to explore relationship extraction models and graph neural networks to build richer graphs that better capture semantic context and improve privacy risk detection. This will make the system more intelligent and scalable for large-scale privacy protection.

Additionally, we plan to release a new dataset that includes examples of LLM prompts with privacy-sensitive content. These prompts will contain not only common sensitive details, such as names, locations, and financial terms, but also risks that arise from relationships and semantic context. For example, a prompt may describe how a person, an organization, and a date are connected, which can reveal private information even without explicit names. This dataset will support the research community in testing and improving methods for privacy protection in LLMs.

Author Contributions

Methodology, S.A. (Sultan Asiri), R.A., F.K. and H.L.; Writing—original draft, S.A. (Sultan Asiri), R.A., F.K. and H.L.; Writing—review and editing S.A. (Sultan Asiri), Y.X. and S.A. (Saleh Alzahrani). All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University for funding this work through the Small Research Project under grant number RGP1/109/46.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

We used the Enron Email Dataset that was published in https://www.kaggle.com/datasets/wcukierski/enron-email-dataset (accessed on 1 June 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PII	Personally Identifiable Information.
PHI	Protected Health Information.
LLM	Large Language Model.
API	Application Programming Interface
MIA	Membership Inference Attack.
AAA	Attribute Inference Attack Accuracy.
NER	Named Entity Recognition.
DAE	Denoising Autoencoder.
DP	Differential Privacy.
GDPR	General Data Protection Regulation.
HIPAA	Health Insurance Portability and Accountability Act.
ROC AUC	Receiver Operating Characteristic–Area Under the Curve.
NLI	Natural Language Inference.
BERTScore	Semantic similarity metric using transformer embeddings.
S1/S2	SPADR Strategy 1 (fixed rule-based)/Strategy 2 (dynamic graph-aware).

Appendix A. Example Prompts for Synthetic Data Generation

To generate synthetic records, we use GPT-4 API with structured prompts. Each prompt asks the model to produce a short text (3–5 sentences) in a specific domain (e.g., HR, banking, healthcare). We explicitly request the inclusion of sensitive items, such as names, dates, phone numbers, emails, account numbers, or health details. All generated content is fictitious.

Example 1: HR Email

Compose a concise HR email (3–5 sentences) regarding the hiring of a new employee. Include the employee’s full name, job title, salary, contact phone number, and email address. Make the text look realistic, but ensure all details are synthetic.

Example 2: Banking Notice

Write a short bank notification (3–5 sentences) to a customer. Include the customer’s name, account number, IBAN, and a recent card number. Make the message sound realistic, but ensure all details are synthetic.

Example 3: Clinic Note

Write a short clinic note (3–5 sentences) for a patient visit. Include the patient’s full name, date of birth, diagnosis, prescription, and phone number. Ensure the content is natural, but entirely synthetic.

Example 4: Support Ticket

Generate a short customer support ticket (3–5 sentences). Include the customer’s name, contact email, phone number, and a reference number. Ensure the message is natural and realistic, but all details are synthetic.

Appendix B. Membership Inference Attack (MIA) Setup

We use an MIA to test whether the target models reveal training data. The idea is that training examples (members) often obtain lower loss values than unseen examples (non-members), which allows an attacker to separate the two groups.

First, we split the dataset into two equal parts. The first half is used to train the target model and serves as the member set, while the second half is held out and serves as the non-member set. All text is tokenized with the GPT-2 tokenizer, truncated or padded to a maximum length of 128 tokens, and padding tokens are ignored during the loss computation.

Then, we train the target model using GPT-2 with a causal LM head (GPT2LMHeadModel). The model is fine-tuned for up to 10 epochs with a batch size of 4, a learning rate of

5 \times 10^{- 5}

, the AdamW optimizer, and weight decay of

0.01

. Early stopping with patience three is applied based on validation loss, where 10% of the member set is used as validation data. Gradient clipping is set to 1.0. The same training setup is used across all redaction strategies to ensure comparability.

Next, for each input

x = (w_{1}, \dots, w_{N})

, we compute the length-normalized average cross-entropy loss

L (x; θ) = - \frac{1}{N} \sum_{t = 1}^{N} log p_{θ} (w_{t} ∣ w_{< t}),

(A1)

where N is the sequence length. The membership score is then defined as

S (x) = - L (x),

(A2)

so that higher values of

S (x)

indicate a higher likelihood that the example is a member of the training set.

Finally, we apply a loss-threshold attack. An input is classified as a member if its score

S (x)

is greater than a threshold

τ

. No additional classifier is trained; the attack relies only on the target model’s loss values. We evaluate the attack by computing the ROC–AUC between member and non-member scores. Higher AUC values indicate stronger membership leakage, while lower values suggest improved privacy protection.

References

Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Carlini, N.; Tramer, F.; Wallace, E.; Jagielski, M.; Herbert-Voss, A.; Lee, K.; Roberts, A.; Brown, T.; Song, D.; Erlingsson, U.; et al. Extracting training data from large language models. In Proceedings of the 30th USENIX security symposium (USENIX Security 21), Virtual, 11–13 August 2021; pp. 2633–2650. [Google Scholar]
Lehman, E.; Jain, S.; Pichotta, K.; Goldberg, Y.; Wallace, B.C. Does BERT pretrained on clinical notes reveal sensitive data? arXiv 2021, arXiv:2104.07762. [Google Scholar] [CrossRef]
Veale, M.; Van Kleek, M.; Binns, R. Fairness and accountability design needs for algorithmic support in high-stakes public sector decision-making. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, 21–26 April 2018; pp. 1–14. [Google Scholar]
Powell, O. IOTW: Samsung Employees Allegedly Leak Proprietary Information via ChatGPT. Available online: https://www.cshub.com/data/news/iotw-samsung-employees-allegedly-leak-proprietary-information-via-chatgpt (accessed on 1 July 2025).
Neumann, M.; King, D.; Beltagy, I.; Ammar, W. ScispaCy: Fast and robust models for biomedical natural language processing. arXiv 2019, arXiv:1902.07669. [Google Scholar] [CrossRef]
Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), Vienna, Austria, 24–28 October 2016. [Google Scholar]
Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership Inference Attacks Against Machine Learning Models. In Proceedings of the IEEE Symposium on Security and Privacy, San Jose, CA, USA, 22–24 May 2017. [Google Scholar]
Carlini, N.; Liu, C.; Erlingsson, Ú.; Kos, J.; Song, D. The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks. In Proceedings of the USENIX Security Symposium, Santa Clara, CA, USA, 14–16 August 2019. [Google Scholar]
Mendels, E.; Poliak, A.; Belinkov, Y.; Mages, K.; Szpektor, I. Adversarial De-identification of Text Data. In Proceedings of the Workshop on Privacy in NLP, New Orleans, LA, USA, 16–18 August 2018. [Google Scholar]
Sundararaj, D.; Usha, G. Implementation of optimisation adopted adaptive elliptical curve cryptography with additive homomorphic encryption-based privacy preservation in mobile crowd sensing environment. Int. J. Sens. Netw. 2024, 46, 127–141. [Google Scholar] [CrossRef]
Hisamoto, S.; Post, M.; Duh, K. Membership Inference Attacks on Sequence-to-Sequence Models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020. [Google Scholar]
Jagannatha, A.; Rawat, B.; Yu, H. Membership Inference Attack Susceptibility of Clinical Language Models. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Online, 16–20 November 2021. [Google Scholar]
Nguyen, T.T.; Huynh, T.T.; Ren, Z.; Nguyen, P.L.; Liew, A.W.-C.; Yin, H.; Nguyen, Q.V.H. A Survey of Machine Unlearning. arXiv 2022, arXiv:2209.02299. [Google Scholar] [CrossRef]
Mireshghallah, F.; Goyal, K.; Uniyal, A.; Berg-Kirkpatrick, T.; Shokri, R. Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Abu Dhabi, United Arab Emerites, 7–11 December 2022. [Google Scholar]
Zhang, C.; Ippolito, D.; Lee, K.; Jagielski, M.; Tramèr, F.; Carlini, N. Counterfactual memorization in neural language models. Adv. Neural Inf. Process. Syst. 2023, 36, 39321–39362. [Google Scholar]
Chen, Y.; Li, T.; Liu, H.; Yu, Y. Hide and seek (has): A lightweight framework for prompt privacy protection. arXiv 2023, arXiv:2309.03057. [Google Scholar] [CrossRef]
Staab, R.; Vero, M.; Balunović, M.; Vechev, M.T. Beyond Memorization: Violating Privacy via Inference with Large Language Models. arXiv 2023, arXiv:2310.07298. [Google Scholar] [CrossRef]
Farquhar, S. Indirect Data Leakage in Language Models: Semantic Inference without Explicit Memorization. arXiv 2024, arXiv:2402.00002. [Google Scholar]
Aufschläger, R.; Wilhelm, S.; Heigl, M.; Schramm, M. ClustEm4Ano: Clustering text embeddings of nominal textual attributes for microdata anonymization. In Proceedings of the International Database Engineered Applications Symposium, Bayonne, France, 28–31 August 2024; pp. 122–137. [Google Scholar]
Frikha, A.; Walha, N.; Nakka, K.K.; Mendes, R.; Jiang, X.; Zhou, X. Incognitext: Privacy-enhancing conditional text anonymization via llm-based private attribute randomization. arXiv 2024, arXiv:2407.02956. [Google Scholar]
Yang, T.; Zhu, X.; Gurevych, I. Robust Utility-Preserving Text Anonymization Based on Large Language Models. arXiv 2024, arXiv:2407.11770. [Google Scholar] [CrossRef]
Kim, K.; Jeon, H.; Shin, J. Self-Refining Language Model Anonymizers via Adversarial Distillation. 2025. Available online: http://arxiv.org/abs/2506.01420 (accessed on 1 June 2025).
Zhan, J.; Zhang, W.; Zhang, Z.; Xue, H.; Zhang, Y.; Wu, Y. Portcullis: A Scalable and Verifiable Privacy Gateway for Third-Party LLM Inference. Proc. AAAI Conf. Artif. Intell. 2025, 39, 1022–1030. [Google Scholar] [CrossRef]
Wiest, I.C.; Leßmann, M.E.; Wolf, F.; Ferber, D.; Treeck, M.V.; Zhu, J.; Ebert, M.P.; Westphalen, C.B.; Wermke, M.; Kather, J.N. Deidentifying medical documents with local, privacy-preserving large language models: The LLM-anonymizer. NEJM AI 2025, 2, AIdbp2400537. [Google Scholar] [CrossRef]
McIntosh, F.; Murina, S.; Chen, L.; Vargas, H.; Becker, A. Keeping Private Patient Data Off the Cloud: A Comparison of Local LLMs for Anonymizing Radiology Reports. Eur. J. Radiol. Artif. Intell. 2025, 2, 100020. [Google Scholar] [CrossRef]
Staab, R.; Vero, M.; Balunovic, M.; Vechev, M. Language models are advanced anonymizers. In Proceedings of the Thirteenth International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
Xin, R.; Mireshghallah, N.; Li, S.S.; Duan, M.; Kim, H.; Choi, Y.; Tsvetkov, Y.; Oh, S.; Koh, P.W. A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage. In Proceedings of the NeurIPS 2024 Workshop on Safe Generative AI, Vancouver, BC, Canada, 15 December 2024; Available online: https://openreview.net/forum?id=3JLtuCozOU (accessed on 15 September 2025).
Guo, H.; Mao, Y.; He, X.; Wu, J. Two-phase privacy-preserving scheme for federated learning in edge networks. Int. J. Sens. Netw. 2023, 42, 170–182. [Google Scholar] [CrossRef]
Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2009. [Google Scholar]
Reimers, N.; Gurevych, I. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020. [Google Scholar]
Honnibal, M.; Montani, I.; Landeghem, S.V.; Boyd, A. spaCy 2: Natural Language Understanding with Bloom Embeddings, Convolutional Neural Networks and Incremental Parsing. 2020. Available online: https://spacy.io (accessed on 24 July 2025).
Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 7871–7880. [Google Scholar] [CrossRef]
Youden, W.J. Index for rating diagnostic tests. Cancer 1950, 3, 32–35. [Google Scholar] [CrossRef] [PubMed]
Cukierski, W. Enron Email Dataset. 2015. Available online: https://www.kaggle.com/datasets/wcukierski/enron-email-dataset (accessed on 1 August 2025).
Hagberg, A.; Swart, P.J.; Schult, D.A. Exploring Network Structure, Dynamics, and Function Using NetworkX; Technical Report; Los Alamos National Laboratory (LANL): Los Alamos, NM, USA, 2008. [Google Scholar]
Shleifer, S.; Rush, A. DistilBART: A Distilled Version of BART. 2020. Available online: https://huggingface.co/sshleifer/distilbart-cnn-12-6 (accessed on 1 August 2025).
Carnegie Mellon University. The Enron Email Dataset. Available online: https://www.cs.cmu.edu/~enron/ (accessed on 1 August 2025).
Carlini, N.; Tramer, F.; Wallace, E.; Jagielski, M.; Herbert-Voss, A.; Lee, K.; Roberts, A.; Brown, T.B.; Song, D.; Erlingsson, U.; et al. Membership Inference Attacks from First Principles. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 24–27 May 2021; pp. 1897–1914. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR 2018, abs/1810.04805. Available online: http://arxiv.org/abs/1810.04805 (accessed on 1 June 2025).
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
Jean-Baptiste. Jean-Baptiste/Roberta-Large-Ner-English. 2024. Available online: https://huggingface.co/Jean-Baptiste/roberta-large-ner-english (accessed on 31 August 2025). MIT License. RoBERTa-Large fine-tuned for NER (CoNLL-2003); tested on informal email/chat data.
Qi, P.; Zhang, Y.; Zhang, Y.; Bolton, J.; Manning, C.D. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Online, 5–10 July 2020; pp. 101–108. [Google Scholar]

Figure 1. Overview of the SPADR pipeline.

Figure 2. Flowchart of SPADR graph analysis strategy. Boxes represent processing modules; diamonds represent decision points. YES/NO edges show branching conditions.

R i s k_S c o r e

= 100 means the text is fully sensitive and deleted.

b o o s t e d_b y_g r a p h ?

indicates whether graph analysis raised the privacy risk score. Generalization, PII Redaction, and Deletion represent different remediation strategies.

Figure 2. Flowchart of SPADR graph analysis strategy. Boxes represent processing modules; diamonds represent decision points. YES/NO edges show branching conditions.

R i s k_S c o r e

= 100 means the text is fully sensitive and deleted.

b o o s t e d_b y_g r a p h ?

indicates whether graph analysis raised the privacy risk score. Generalization, PII Redaction, and Deletion represent different remediation strategies.

Figure 3. Undirected entity graph showing a privacy-sensitive combination: Person + Organization + Date. The graph structure enables SPADR to detect this risk, even if individual terms are redacted.

Figure 4. Flowchart of SPADR’s second graph analysis strategy. Diamonds indicate decision points, boxes indicate remediation actions, and arrows indicate the flow of logic.

Figure 5. Graph visualization for CASE 1.

Figure 6. Graph visualization for CASE 2.

Figure 7. Graph visualization for CASE 3.

Figure 8. Graph visualization for CASE 4.

Figure 9. Case 1: Text boosted by graph for granular redaction.

Figure 10. Case 2: Text boosted by graph for summarization.

Figure 11. Case 3: Text boosted by graph for deletion.

Table 1. Statistical distribution of risk scores.

Statistic	Value
mean	68.555551
std	19.436047
min	35.657451
25%	52.495023
50%	63.609770
75%	84.015910
max	100.000000

Table 2. Hyperparameters for denoising autoencoder training, SPADR, and remediation thresholds.

Parameter	Value
Input Dimension	384
Encoder Hidden Layer 1	128 units
Encoder Hidden Layer 2	64 units
Decoder Hidden Layer 1	128 units
Decoder Output Layer	Same as input dimension
Activation Function	ReLU
Noise Factor	0.1
Loss Function	Mean Squared Error (MSE)
Optimizer	Adam
Learning Rate	0.001
Batch Size	64
Best Epochs	78
Dropout	0.5
Early Stopping ( Patience)	10 epochs
SPADR Threshold	61.81
Remediation Threshold	55.80

Table 3. Comparison of privacy protection performance across methods.

Method	Document PII Leak Rate	Attribute Leakage (AAA)	BERTScore F1
Original (Raw)	100.00	100.00	100.00
NER BERT [40]	22.63	1.46	99.90
NER Roberta [41,42]	48.91	0.00	93.50
SPADR (S1)	21.90	12.41	86.43
SPADR (S2)	16.06	10.95	88.03

Table 4. Comparison of SpaCy NER vs. RoBERTa and Stanza NER.

	SpaCy	RoBERTa and Stanza
Labels	It contains more than 18 different types of labels, such as PERSON, ORG, GPE, LOC, DATE, and MONEY, among others. It offers a broader capability for capturing various types of sensitive NER entities within the text.	Contains only four labels: PERSON, ORG, LOC, and MISC. Both have a very narrow scope, as they cannot capture sensitive entities that might be leaked in the text.
NER Scores derivation	NER scores are used in conjunction with DAE and semantic scores to derive the overall risk scores for our strategy. Since SpaCy NER has a broader range of labels, it can derive much more accurate NER scores.	Whereas RoBERTa and Stanza only contain four labels and cannot capture sensitive labels, such as GPE, MONEY, DATE, and LAW, making their NER scores less reliable.
Graph Analysis	A more accurate graphical representation of texts is produced, with hidden sensitive relationships identified only by SpaCy NER.	A poor graphical representation of the text is produced, and the identification of hidden sensitive relationships is compromised.
Remediation Strategies	The graph accurately highlights the sensitive texts, enabling the application of the most effective remediation strategies for maintaining privacy.	The sensitive texts are incorrectly boosted by the graph, resulting in ineffective remediation strategies and increased privacy risks.

Table 5. SpaCy vs. RoBERTa on Example 1.

	SpaCy	RoBERTa
Overall Risk_Score	83.16%	55.98%
Boosted_By_Graph	TRUE (relationships exists)	FALSE (no relationships)
Graphical Analysis		No entities found
Applied Remediation Strategy	Summarization	None (returns original text)

Table 6. SpaCy vs. Stanza on Example 2.

	SpaCy	Stanza
Overall Risk_Score	87.84%	51.82%
Boosted_By_Graph	TRUE (relationships exists)	TRUE (catches only one relationship)
Graphical Analysis
Applied Remediation Strategy	Deletion (due to high privacy violation)	Returns generalized text

Table 7. SPADR performance on public knowledge.

Public Texts	Risk_Score	SPADR/Graph Analysis	Remediation	Final Text
Who wrote the play Romeo and Juliet ?	50.79%	NO	NONE	who wrote the play romeo and juliet
Who developed the theory of relativity?	48.11%	NO	NONE	who developed the theory of relativity

Table 8. Sensitivity analysis of privacy risk score weighting on a subset of 500 records. Metrics are averaged over the evaluation set.

Weights (Semantic/Anomaly/NER)	PII Leak Rate (↓)	AAA (↓)	BERTScore F1 (↑)
0.4/0.3/0.3 (default)	16.06	10.95	88.03
0.5/0.3/0.2	16.30	11.20	87.90
0.3/0.4/0.3	16.95	11.40	87.92
0.2/0.4/0.4	17.12	12.00	87.65
0.5/0.2/0.3	16.28	11.05	87.96

Table 9. Membership Inference Attack (MIA) results. A lower AUC indicates better privacy preservation.

Redaction Method	Training Size	MIA ROC AUC
Original (Raw)	10,000+	0.6626
NER-Based Redaction	10,000+	0.5022
SPADR-S2 (Graph-Aware)	10,000+	0.4637
SPADR-S1 (Semantic + Graph Boost)	10,000+	0.4231

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Asiri, S.; Alshehri, R.; Kamran, F.; Laznam, H.; Xiao, Y.; Alzahrani, S. SPADR: A Context-Aware Pipeline for Privacy Risk Detection in Text Data. Electronics 2025, 14, 3725. https://doi.org/10.3390/electronics14183725

AMA Style

Asiri S, Alshehri R, Kamran F, Laznam H, Xiao Y, Alzahrani S. SPADR: A Context-Aware Pipeline for Privacy Risk Detection in Text Data. Electronics. 2025; 14(18):3725. https://doi.org/10.3390/electronics14183725

Chicago/Turabian Style

Asiri, Sultan, Randa Alshehri, Fatima Kamran, Hend Laznam, Yang Xiao, and Saleh Alzahrani. 2025. "SPADR: A Context-Aware Pipeline for Privacy Risk Detection in Text Data" Electronics 14, no. 18: 3725. https://doi.org/10.3390/electronics14183725

APA Style

Asiri, S., Alshehri, R., Kamran, F., Laznam, H., Xiao, Y., & Alzahrani, S. (2025). SPADR: A Context-Aware Pipeline for Privacy Risk Detection in Text Data. Electronics, 14(18), 3725. https://doi.org/10.3390/electronics14183725

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SPADR: A Context-Aware Pipeline for Privacy Risk Detection in Text Data

Abstract

1. Introduction

2. Related Work

3. Research Problem

4. Methodology

4.1. Data Preprocessing

4.2. PII Redaction Technique

4.3. Anomaly Detection Using Denoising Autoencoder

4.4. Semantic Classification

4.5. Privacy Scoring

4.6. Graph Analysis and Remediation

4.6.1. Strategy 1: Graph-Based Risk Detection and Remediation

4.6.2. Strategy 2: Graph-Boosted Threshold-Based Remediation

5. Experiments

5.1. Data Collection

Synthetic Data Generation

5.2. Model Training

5.3. Evaluation Metrics

5.3.1. Attribute Inference Attack Accuracy (AAA)

5.3.2. Membership Inference Attack (MIA)

5.3.3. BERTScore for Utility Preservation

5.4. Model Performance

5.5. Runtime Efficiency

5.6. Graph Analysis and Remediation

5.6.1. Strategy 1: Fixed Rule-Based Remediation

5.6.2. Strategy 2: Dynamic Graph-Based Remediation Framework

5.7. Comparison of spaCy Library with Other NLP Libraries

6. Discussion

Risks to Validity

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Example Prompts for Synthetic Data Generation

Appendix B. Membership Inference Attack (MIA) Setup

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI