AI-Controlled Modular Decoy Generation for Reconstruction-Resistant Hybrid and Multi-Cloud Storage Systems

Ahmed, Munir; Yuan, Jiann-Shiun

doi:10.3390/electronics15061231

Open AccessFeature PaperArticle

AI-Controlled Modular Decoy Generation for Reconstruction-Resistant Hybrid and Multi-Cloud Storage Systems

by

Munir Ahmed

^* and

Jiann-Shiun Yuan

^*

Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL 32816, USA

^*

Authors to whom correspondence should be addressed.

Electronics 2026, 15(6), 1231; https://doi.org/10.3390/electronics15061231

Submission received: 4 February 2026 / Revised: 3 March 2026 / Accepted: 13 March 2026 / Published: 16 March 2026

(This article belongs to the Special Issue Data Privacy and Security in Blockchain, Decentralised Storage and IoT Systems)

Download

Browse Figures

Versions Notes

Abstract

Although cloud storage is widely trusted by users and enterprises, externally stored encrypted and fragmented data remain vulnerable to reconstruction and inference attacks following partial exposure. Existing decoy-based defenses often rely on static configurations or randomly generated artifacts that can be filtered during adversarial analysis. This paper presents an Artificial Intelligence (AI)-controlled modular decoy generation method to enhance reconstruction resistance in distributed storage systems. The method operates as a system-agnostic post-fragmentation layer and does not require modification of encryption or storage architecture. Given encrypted fragments as input, decoys are generated using a supervised Extreme Gradient Boosting (XGBoost) regression model that adapts decoy quantity based on system telemetry and resource conditions. Decoys maintain statistical alignment with real encrypted fragments in size and Shannon entropy characteristics. To address scalability, the method is evaluated across small, medium, and large deployments comprising up to 413 externally exposed fragments and compared against fixed-ratio (10%, 20%) and randomized baselines. Experimental evaluation demonstrates increased adversarial uncertainty without altering legitimate reconstruction procedures or encryption mechanisms. Kolmogorov–Smirnov analysis indicates no statistically significant difference between AI-generated decoys and real fragments, whereas baseline decoys produce significant deviations in size and entropy distributions, supporting reconstruction resistance at scale in multi-cloud environments.

Keywords:

adversarial filtering; AI-controlled decoy generation; encrypted data fragmentation; hybrid and distributed storage; reconstruction attacks; system-agnostic security

1. Introduction

Cloud storage is widely trusted by users and enterprises due to its scalability, availability, and cost-effectiveness. However, prior research has shown that externally stored encrypted and fragmented data may remain vulnerable to reconstruction and inference attacks when adversaries gain access to sufficient subsets of fragments or related metadata [1,2]. Such attacks exploit observable fragment-level properties, including size distributions, entropy characteristics, and structural regularities, enabling adversaries to reduce effective reconstruction search space [3,4,5].

To mitigate these risks, prior studies have proposed decoy-based defenses that introduce misleading artifacts alongside real encrypted fragments [6]. Strategic and adaptive deception mechanisms have also been explored in broader cyber-defense contexts [7]. Nevertheless, many existing decoy approaches rely on static configurations, fixed insertion ratios, or randomly generated artifacts that remain susceptible to entropy- or size-based filtering [8]. When decoys are not distributionally aligned with real encrypted fragments, observable discrepancies may allow adversaries to reduce the candidate fragment pool prior to reconstruction, limiting sustained resistance under adaptive attack models [9,10].

In parallel, numerous secure storage and multi-cloud fragmentation frameworks have been proposed to enhance confidentiality and distributed resilience across heterogeneous environments [11,12]. Some approaches integrate protection mechanisms directly into storage architectures, including protocol-bound security layers and consensus-driven frameworks [13,14]. While effective within specific deployments, such architecture-coupled solutions depend on storage-specific assumptions and therefore lack portability as modular overlays across diverse distributed infrastructures.

Collectively, existing approaches exhibit two principal limitations. First, most static or heuristic decoy strategies do not formally enforce fragment-level statistical indistinguishability and therefore remain vulnerable to adversarial filtering based on observable characteristics. Second, architecture-coupled defenses are not readily deployable as portable overlays independent of specific encryption, fragmentation, or storage designs. Furthermore, relatively few prior works jointly validate fragment-level indistinguishability through hypothesis testing while modeling reconstruction complexity under progressively stronger adversarial assumptions. A portable post-fragmentation mechanism integrating adaptive decoy quantity control with statistically aligned decoy construction therefore remains insufficiently explored.

The proposed AI-controlled modular overlay directly addresses these limitations by enforcing joint size–entropy statistical alignment while operating as a post-fragmentation, system-agnostic layer independent of encryption schemes and storage architectures. By combining adaptive decoy scaling with empirically validated distributional indistinguishability, the method bridges the gap between heuristic decoy insertion and architecture-bound security mechanisms.

Our previous work introduced a telemetry-guided adaptive fragmentation architecture for secure multi-cloud storage [15], in which fragment sizes were dynamically predicted from real-time system and network conditions. That architecture focused on how genuine encrypted fragments are created and distributed. In contrast, the present study addresses a complementary problem: the generation of AI-controlled decoy fragments as a modular defense layer applied after encryption and fragmentation. Unlike the prior architecture, the proposed method does not modify fragmentation strategies or storage pipelines. Instead, it operates independently to expand the reconstruction search space and degrade fragment-level classification accuracy.

Motivated by evidence that adaptive and telemetry-informed security control can outperform fixed policies under dynamic conditions [8,16], this work proposes an AI-controlled modular decoy generation overlay applied at the fragment exposure stage. The AI component enables resource-aware decoy quantity selection based on system telemetry, workload descriptors, and cloud configuration, thereby maintaining reconstruction resistance without reliance on static rules.

The novelty of this work lies in treating reconstruction resistance as a fragment-level statistical indistinguishability problem and addressing it through a telemetry-aware AI controller that adaptively determines decoy quantity while enforcing joint size–entropy alignment. Rather than treating decoy insertion as heuristic obfuscation, the proposed approach formalizes indistinguishability as a measurable property validated through hypothesis testing and combinatorial reconstruction modeling.

Operationally, the proposed system consists of four sequential stages:

Encrypted fragments produced by an arbitrary upstream mechanism are treated as opaque inputs.
System telemetry and workload descriptors are processed by a supervised regression model to predict decoy quantity.
Statistically aligned decoy fragments are generated to preserve observable size and entropy characteristics.
Real and decoy fragments are jointly stored across distributed environments, increasing adversarial reconstruction uncertainty under post-exposure analysis.

The primary contributions of this study are as follows:

(i).: The design of a portable, post-fragmentation decoy generation overlay independent of encryption schemes, fragmentation policies, and storage architectures;
(ii).: An AI-based, telemetry-driven decoy quantity controller with bounded ratio enforcement that preserves predictable storage overhead;
(iii).: Multi-scale experimental validation across Small, Medium, and Large deployments comprising up to 413 externally exposed fragments, evaluated under naive, statistical filtering, and adaptive reconstruction models with direct comparison against fixed-ratio and randomized decoy baselines.

The proposed framework is validated experimentally across three deployment scales (Small, Medium, and Large). Statistical indistinguishability between real and decoy fragments is evaluated using two-sample Kolmogorov–Smirnov testing (α = 0.05). Across all scales, the AI-controlled policy yields KS D-statistics below 0.04 for fragment size and maintains p-values above the significance threshold, indicating no statistically significant separation, whereas fixed-ratio and randomized baselines exhibit significant separability. Reconstruction resistance is further assessed under naive, statistical filtering, and adaptive ranking adversarial models. In the Large deployment (R = 344, D = 69), blind reconstruction complexity scales combinatorially with N = 413 fragments, and adaptive filtering enforces a measurable recall–complexity tradeoff: reducing the candidate pool size decreases recall unless larger fragment sets are retained to preserve reconstruction completeness. These results demonstrate that the proposed overlay preserves statistical alignment (Attack A), amplifies combinatorial effort (Attack B), and limits deterministic fragment isolation under adaptive filtering (Attack C), while maintaining bounded decoy-to-real ratios between 5% and 20%.

Figure 1 illustrates the conceptual placement of the proposed method as a modular security overlay applied after encryption and fragmentation and before external storage across distributed environments.

Encrypted and fragmented data are treated as opaque inputs within the trusted domain. A telemetry-aware regression model determines the number of decoy fragments under bounded resource constraints, and statistically aligned decoys are generated to preserve observable distributional characteristics. Real and decoy fragments are then jointly distributed across multi-cloud storage environments. In the external observation domain, adversaries observe a combined fragment set (N = R + D) without access to encryption keys or trusted metadata mappings.

To contextualize the proposed method within existing research, Section 2 reviews representative decoy-based and architecture-coupled defenses and positions the present work relative to these approaches.

2. Related Work

Decoy-based storage defenses in the literature can be grouped into five principal categories: static, randomized, fixed-ratio, entropy-aware, and architecture-coupled approaches. These categories differ in decoy generation strategy, adaptability, and the extent to which fragment-level statistical properties are aligned with real encrypted data [6,8].

Static Decoy Mechanisms:

Static strategies introduce pre-generated artifacts without adapting to workload characteristics or encrypted fragment distributions. While simple to deploy, such decoys often diverge from real fragments in size or entropy, enabling statistical filtering under post-exposure analysis [2,9].

Randomized Decoy Schemes:

Random padding or probabilistic placement increases superficial uncertainty but does not enforce joint distributional alignment. Multivariate discrepancies may remain detectable when large fragment sets are analyzed [9].

Fixed-Ratio Decoy Insertion:

Constant decoy-to-real ratios simplify deployment but remain agnostic to system conditions and fragment statistics. Predictable insertion policies may facilitate adversarial elimination under size- or entropy-based filtering [8].

Entropy-Aware Decoy Strategies:

Some approaches attempt to match decoy entropy to that of encrypted fragments. However, focusing solely on entropy without aligning size distributions leaves residual statistical discriminators exploitable under adaptive analysis [10].

Architecture-Coupled Defenses:

Other works embed deception or obfuscation within encryption pipelines, distributed consensus frameworks, or storage-layer access controls [13,14]. While strengthening system-level guarantees, such approaches are tightly integrated with specific architectures and may not provide a portable fragment-level overlay applicable across heterogeneous environments.

Across these categories, a common limitation persists: joint statistical alignment between real and decoy fragments is rarely enforced and validated under progressively stronger adversarial models. Static and heuristic strategies introduce distributional inconsistencies, whereas architecture-coupled defenses lack portability as independent overlays. Table 1 summarizes representative approaches and highlights differences in adaptability, statistical alignment rigor, integration requirements, scalability, and resistance to adversarial filtering.

As shown in Table 1, static, randomized, and fixed-ratio approaches lack adaptive scaling and do not enforce joint size–entropy statistical alignment. Entropy-aware strategies partially address distributional similarity but remain limited to single-feature alignment without guaranteeing joint multivariate indistinguishability under adaptive adversarial filtering. Architecture-coupled defenses strengthen system-level guarantees but sacrifice modular portability. These structural limitations collectively motivate the need for a telemetry-aware, statistically validated overlay capable of enforcing joint fragment-level alignment while remaining independent of specific storage architectures.

In contrast to prior methods, the proposed AI-controlled modular overlay integrates telemetry-aware decoy quantity control with empirically validated joint distribution alignment while remaining independent of specific encryption schemes, fragmentation policies, and storage-provider implementations. This design directly targets reconstruction resistance under strong post-exposure adversarial assumptions while preserving portability across hybrid and multi-cloud environments.

3. Materials and Methods

This section defines the deployment model, adversarial assumptions, and the two-layer decoy generation framework comprising the following:

(i).: An AI-based control layer for adaptive decoy quantity selection;
(ii).: A statistical construction layer for fragment-level distributional alignment.

The proposed method operates strictly after encryption and fragmentation. Encrypted fragments are treated as opaque inputs, and upstream encryption schemes, fragmentation policies, and storage architectures remain unchanged.

The objective is to increase reconstruction complexity under post-exposure conditions by reducing fragment-level distinguishability while maintaining bounded operational overhead.

Software and Implementation Environment:

The proposed AI-controlled decoy generation framework was implemented in Python 3.13.2. (Python Software Foundation, Wilmington, DE, USA). Machine-learning components were implemented using the XGBoost library (XGBoost Contributors, open-source project) together with scikit-learn (scikit-learn Developers, open-source project) for model training and evaluation. Data preprocessing and numerical analysis were performed using NumPy (NumPy Developers, open-source project) and pandas (pandas Developers, open-source project). Statistical computations, including Kolmogorov–Smirnov testing, were conducted using SciPy (SciPy Developers, open-source project).

3.1. System Model and Assumptions

The system model represents a hybrid or multi-cloud storage environment in which encrypted and fragmented data are distributed across one or more external providers [11]. Encryption algorithms, key management strategies, and fragmentation mechanisms are assumed to be pre-existing and remain outside the scope of this work.

External storage environments are treated as untrusted with respect to confidentiality but are not assumed malicious at the infrastructure level. Fragment exposure may arise from credential compromise, misconfiguration, or leakage, as documented in cloud exposure studies. Once exposed, adversaries may observe fragment-level characteristics including size, entropy, and structural patterns.

Let

R

denote the number of real encrypted fragments produced upstream. The proposed overlay augments this set with

D

statistically aligned decoy fragments prior to placement, producing the externally observable fragment set:

N = R + D

Real fragments are not regenerated or modified. The overlay remains compatible with heterogeneous storage infrastructures.

Encrypted data splits produced by an external encryption and fragmentation mechanism are treated as opaque inputs within the trusted domain. A telemetry-aware regression model predicts the decoy quantity based on system and storage conditions. Decoy fragments are constructed using bootstrap sampling from the empirical real-fragment size distribution combined with cryptographically secure random payload generation to preserve size and entropy alignment. Real and decoy fragments are jointly distributed across multiple cloud storage providers. Encrypted metadata maintains real/decoy mapping and index information within the trusted domain and is not exposed to the external observation domain. This overlay-based design preserves system agnosticism and is intended to maintain statistical alignment under adversarial fragment-level observation. The overall system workflow and placement of the AI-controlled decoy overlay within a representative hybrid multi-cloud pipeline are illustrated in Figure 2.

3.2. Adversarial Model

The threat model assumes a post-exposure adversary whose objective is to reconstruct the original data by identifying the correct subset of real fragments from the externally exposed fragment set through statistical filtering and combinatorial enumeration [2].

Adversarial Capabilities

The adversary is assumed to possess the following capabilities:

Fragment Access: Full access to externally stored encrypted fragments following exposure.
Statistical Observation: Ability to compute fragment-level metrics including size distributions, Shannon entropy, byte-frequency statistics, and derived ranking scores. Such techniques can reduce effective reconstruction search space when distributional discrepancies exist [3].
Adaptive Filtering: Ability to rank and selectively retain fragments using heuristic or feature-based scoring strategies.
Offline Computation: Sufficient computational capacity to perform combinatorial reconstruction over retained candidate pools.

This constitutes a strong yet realistic post-exposure adversarial model.

Adversarial Limitations

The adversary is not assumed to possess the following:

Encryption keys;
Internal system secrets;
Decoy-generation parameters;
AI model parameters;
Trusted metadata mapping real and decoy fragments.

The model excludes denial-of-service and cryptographic key-compromise scenarios and focuses exclusively on reconstruction under fragment exposure [17,18].

Partial Metadata Exposure

Limited metadata exposure is considered. If fragment identifiers, storage locations, or timestamps become accessible, reconstruction difficulty remains governed primarily by fragment-level statistical indistinguishability rather than placement secrecy, consistent with defense-in-depth principles [18].

If complete decoy-mapping metadata were compromised together with fragment data, reconstruction resistance reduces to the strength of the underlying encryption scheme. The architecture therefore assumes separation between fragment storage and trusted metadata control domains.

Worst-Case Reconstruction Behavior

Effective defenses must degrade adversarial filtering and prioritization rather than prevent exposure outright. The adversary’s filtering behavior determines effective reconstruction complexity.

In the worst-case adaptive setting,

If filtering isolates real fragments, reconstruction complexity decreases.
If decoys remain statistically indistinguishable, filtering removes real fragments together with decoys.
To preserve recall, the adversary must retain larger candidate pools.
Reconstruction complexity becomes dominated by combinatorial search over indistinguishable fragment sets.

Let

k

denote the retained fragment pool size. Reconstruction complexity scales on the order of

𝒪 ((\binom{k}{R}))

Thus, increasing statistically indistinguishable fragment count directly amplifies adversarial workload.

3.3. AI-Based Telemetry-Aware Decoy Generation

The proposed framework consists of two tightly coupled layers:

Control Layer: Adaptive decoy quantity selection using telemetry-aware regression.
Construction Layer: Statistical alignment of decoy fragments with real fragment distributions.

Figure 3 illustrates the integrated architecture.

Upstream fragmented real splits

R

are treated as opaque inputs. System telemetry and workload descriptors are transformed into a feature vector

x \in R^{m}

and processed by an XGBoost regression model to produce the predicted decoy count

\hat{D}

. Runtime policy enforcement applies bounded ratio constraints (5–20% of

R

) with a minimum decoy floor

2 C

, yielding the deployed decoy count

D

. The empirical size distribution extracted from real fragments guides bootstrap-based decoy construction, while cryptographically secure random payload generation preserves entropy alignment. The merged fragment set

N = R + D

is then distributed across multi-cloud storage environments.

3.3.1. Control Layer: Telemetry-Aware Decoy Quantity Selection

The decoy quantity controller is implemented using XGBoost regression, a scalable gradient boosting framework designed for efficient supervised learning and nonlinear function approximation [19]. The model captures nonlinear relationships between telemetry features and decoy requirements while maintaining low inference overhead. Compared with heuristic-based controllers, the learned model adapts continuously to heterogeneous system conditions without fixed threshold tuning.

Training Data and Label Construction:

The model is trained on 8300 hybrid telemetry records derived from four datasets (Hybrid_300, Hybrid_1000, Hybrid_2000, Hybrid_5000). Each record includes the following:

System telemetry (CPU availability, RAM utilization, disk throughput);
Network conditions (bandwidth, latency, packet loss);
Workload descriptors (file size, split size, number of real fragments $R$ );
Number of cloud providers $C$ .

The real fragment count is computed as

R = ⌈\frac{{FileSize}_{K B}}{{OptimalSplit}_{K B}}⌉

Training labels

D^{t r a i n}

were generated using a telemetry-conditioned envelope:

0.40 R \leq D^{t r a i n} \leq 1.50 R

A normalized performance score maps system conditions to decoy counts within this interval. A ± 10% multiplicative jitter is applied prior to clamping to reduce deterministic bias.

The model is trained using Mean Squared Error (MSE):

L = \frac{1}{n} \sum_{i = 1}^{n} (D_{i} - {\hat{D}}_{i})^{2}

The selected configuration (400 estimators, maximum depth 6, learning rate 0.05, subsampling rate 0.9) achieved the following:

$R^{2} = 0.993$ ;
Mean Absolute Error = 16.7 fragments.

Runtime Deployment Constraints:

At runtime, telemetry features form a vector

x \in R^{m}

. The predicted decoy count is

\hat{D} = f_{θ} (x)

Deployment enforces bounded ratio constraints:

ρ_{m i n} = 0.05, ρ_{m a x} = 0.20

A minimum absolute floor proportional to the number of cloud providers is enforced:

D_{a b s} = 2 C

The runtime bounds are

D_{m i n} = m a x (⌈ ρ_{m i n} R ⌉, 2 C)

D_{m a x} = ⌈ ρ_{m a x} R ⌉

The deployed decoy count is

D = m i n (D_{m a x}, m a x (D_{m i n}, ⌊ \hat{D} ⌋))

Inference requires a single model evaluation and incurs minimal latency relative to encryption and transmission overhead.

3.3.2. Construction Layer: Statistical Decoy Alignment

Decoy fragments are constructed to align with observable fragment size

S

and Shannon entropy

H

. Let

P_{r} (S, H)

and

P_{d} (S, H)

denote the joint distributions of real and decoy fragments.

The alignment objective enforces

P_{d} (S, H) \approx P_{r} (S, H) .

First-order constraints:

∣ μ_{S, r} - μ_{S, d} ∣ \to 0

∣ μ_{H, r} - μ_{H, d} ∣ \to 0

Implementation proceeds as follows:

Decoy sizes are sampled with replacement from the empirical real-fragment size distribution to preserve its observed statistical characteristics.
Decoy payloads are generated using cryptographically secure random byte sequences to preserve entropy alignment.

Statistical indistinguishability is evaluated using two-sample Kolmogorov–Smirnov testing at

α = 0.05

[20], as reported in Section 4. The construction layer operates independently of the underlying encryption mechanisms and storage infrastructures, thereby maintaining modular compatibility across heterogeneous environments.

3.4. Design Goals

The proposed method is guided by the following design goals:

Modularity and System Agnosticism:

The method operates as a modular post-fragmentation security layer. It does not require modification of cryptographic primitives, fragmentation algorithms, key management schemes, or storage provider architectures. This ensures compatibility across heterogeneous hybrid and multi-cloud environments.

2.: Reconstruction Resistance:

The approach degrades adversarial reconstruction and statistical filtering attacks by increasing fragment-level uncertainty and reducing reliable distinguishability between real and decoy fragments under observable size and entropy characteristics.

3.: Adaptive Resource Awareness:

Decoy quantity is determined dynamically using an AI-based regression model informed by system telemetry, workload descriptors, and deployment configuration parameters. This enables reconstruction resistance to adapt under varying operational conditions while maintaining bounded overhead.

4.: Minimal Integration Overhead:

The method is designed for practical deployment with bounded computational and storage overhead. Decoy insertion and statistical alignment operate without altering upstream encryption or fragmentation workflows. Runtime inference incurs negligible latency relative to encryption and network transmission costs.

4. Results

The evaluation assumes upstream encryption and fragmentation have already been applied; the proposed method operates on externally produced fragment sets without modifying encryption mechanisms. This section evaluates the effectiveness of the AI-controlled modular decoy generation method against adversarial reconstruction and filtering attacks. The analysis examines how statistically aligned decoy insertion affects an external adversary’s ability to reduce reconstruction uncertainty when analyzing externally exposed fragment sets. Consistent with the threat model defined in Section 3.2, the evaluation assumes an offline adversary with full access to externally exposed fragments, but without access to encryption keys, internal control logic, decoy-generation parameters, or trusted user-side metadata.

4.1. Experimental Setup

The proposed decoy-generation method is evaluated using a hybrid storage pipeline that integrates encrypted data fragmentation, AI-controlled decoy insertion, and distributed external storage placement. All evaluation is performed at the fragment level, reflecting the adversary’s observable view under the threat model defined in Section 3.2. This fragment-centric evaluation aligns with the adversary’s observable domain and is consistent with reconstruction-focused assessment paradigms in distributed storage security research.

To assess scalability, experiments were conducted across three deployment scales:

Small: N = 39 total fragments;
Medium: N = 66 total fragments;
Large: N = 413 total fragments.

Here,

N

denotes the total number of externally exposed fragments, including both real and decoy fragments

(N = R + D)

, where

R

represents the number of real fragments produced by the upstream fragmentation mechanism and D denotes the number of inserted decoy fragments. For example, in the Large deployment, the evaluated configuration contains

R = 344

real fragments and

D = 69

decoy fragments, yielding

N = 413

.

Each deployment scale and decoy policy was evaluated under a controlled and reproducible configuration. For policies incorporating randomness (e.g., RANDOM), a fixed random seed was used. Results are reported for representative configurations at each scale, emphasizing reconstruction behavior and cross-policy comparisons rather than variance estimation across repeated trials.

Four decoy policies were evaluated:

AI (proposed): decoy quantity determined by the telemetry-aware regression model.
FIXED10: $D = 0.10 R$ .
FIXED20: $D = 0.20 R$ .
RANDOM: $D$ selected randomly within the same bounded ratio range as the proposed method.

This configuration enables direct comparison between adaptive telemetry-aware decoy control and non-adaptive baselines, consistent with limitations of static decoy strategies discussed in prior work [4,5].

Direct quantitative comparison with previously published decoy-based storage systems is limited by the absence of standardized fragment-level statistical indistinguishability metrics and reconstruction recall–complexity modeling in prior literature. Existing studies primarily report architectural designs or qualitative resilience characteristics without standardized fragment-level KS testing or combinatorial reconstruction analysis. Therefore, evaluation in this work is conducted against representative fixed-ratio and randomized decoy policies reflecting strategies described in prior research. These baselines provide a controlled and reproducible reference for assessing statistical separability and reconstruction resistance under progressively stronger adversarial assumptions.

The AI-based controller was implemented using supervised regression trained on hybrid telemetry datasets derived from multi-source system conditions. Features included processor utilization, memory availability, disk capacity, upload bandwidth, network latency, packet loss, file size, split configuration parameters, and cloud distribution factors. Model selection compared Linear Regression, Neural Network, Random Forest, and XGBoost using MAE, RMSE, and

R^{2}

. XGBoost achieved the strongest generalization performance among the evaluated models and was therefore selected for deployment.

All evaluations focus exclusively on adversarial reconstruction behavior, abstracting away encryption performance and provider-specific storage characteristics. This abstraction isolates the security contribution of AI-controlled decoy generation and is consistent with modular overlay evaluation strategies in prior work [9].

4.2. Baseline Configuration

As a baseline, reconstruction analysis is first conducted on fragmented data without decoy insertion. This configuration represents fragmentation-based storage systems in which only real fragments are externally stored, consistent with prior secure storage studies [11]. Baseline measurements provide reference distributions for fragment size and entropy, which serve as comparison points for evaluating how decoy insertion alters adversarial filtering behavior and reconstruction uncertainty. The observable distribution of real fragment sizes in the baseline configuration for the Large deployment is shown in Figure 4.

The histogram illustrates the observable size distribution of exposed real fragments in the absence of decoys. Most fragments cluster around the configured split size (≈200 KB), reflecting the upstream fragmentation mechanism. The corresponding distributions of fragment size and Shannon entropy for real and AI-generated decoy fragments in the Large deployment are presented in Figure 5a,b.

Substantial overlap is observed around the configured split size, indicating that fragment size alone does not provide reliable statistical separation.

Both fragment types cluster near 8 bits/byte, consistent with high-entropy payloads. The overlap indicates that Shannon entropy alone does not provide reliable separation. Prior analyses have shown that compressed and encrypted data may still exhibit detectable statistical regularities under certain analytical models, reinforcing the need for joint feature alignment beyond single-metric evaluation [21].

4.3. Adversarial Reconstruction Attacks

Three reconstruction attack scenarios model progressively increasing adversarial capability. Recent reconstruction studies have shown that expected-distribution modeling and partially known dataset assumptions can reduce adversarial uncertainty under leakage conditions [22,23]. Multi-dimensional response-hiding analyses further indicate that cross-dimensional leakage interactions may enable iterative refinement of sensitive query structures [24]. The three attacks reflect distinct levels of adversarial knowledge and filtering sophistication [4] and collectively evaluate how AI-controlled decoy insertion reduces reconstruction effectiveness.

4.3.1. Attack A: Naive Reconstruction

Attack A models an adversary that performs reconstruction without statistical filtering or prioritization. All fragments are treated as potentially authentic. Kolmogorov–Smirnov (KS) tests compare fragment size and entropy distributions across deployment scales. The resulting D-statistics and p-values are summarized in Table 2.

For the AI-controlled policy, the null hypothesis of identical distributions between real and decoy fragments cannot be rejected at

α = 0.05

across all deployment scales. KS D-statistics for fragment size remain below 0.04 across all scales, and p-values for both size and entropy remain above

α = 0.05

, indicating no statistically significant separation.

In contrast, the FIXED10, FIXED20, and RANDOM baselines exhibit statistically significant separability in fragment size across all deployment scales, with multiple cases also showing significant entropy separation. These discrepancies indicate that non-adaptive decoy policies introduce measurable distributional differences that may enable adversarial filtering.

In the Small deployment, FIXED10 exhibits borderline entropy separability (

p \approx 0.0504

), but fragment size remains clearly separable (

p \approx 0.0012

), indicating that size-based filtering remains feasible even when entropy approaches the threshold.

Overall, Attack A establishes that the AI-controlled policy preserves statistical indistinguishability under unfiltered conditions, whereas fixed-ratio and randomized baselines introduce exploitable deviations. The distribution of KS p-values across deployment scales and decoy policies is illustrated on a logarithmic scale in Figure 6.

p-values for the AI-controlled policy remain above the

α = 0.05

across all scales, while FIXED20 and RANDOM fall far below the threshold.

4.3.2. Attack B: Statistical Filtering-Based Reconstruction

Attack B models an adversary that applies statistical filtering prior to reconstruction. After compromise, the adversary observes

N = R + D

fragments and must select the correct subset of

R

real fragments.

The number of possible combinations is

(\binom{N}{R})

If ordering matters, each combination can be permuted in

R!

ways, yielding a total blind reconstruction search space:

(\binom{N}{R}) \cdot R!

When decoys remain statistically aligned, pre-filtering cannot reliably reduce

N

without sacrificing recall, and reconstruction complexity remains dominated by factorial scaling in

R

. The resulting blind reconstruction search-space magnitude under Attack B for representative deployment scales is illustrated in Figure 7a,b.

In the Small deployment (

R = 32

), the AI-controlled policy selects a higher decoy count (

D = 7

) than FIXED10 or RANDOM, yielding the largest reconstruction search space.

Comparison of

{l o g}_{10} ((\binom{N}{R}))

and

{l o g}_{10} ((\binom{N}{R}) \cdot R!)

illustrates the dominant contribution of the permutation term

R! .

Even moderate increases in fragment count produce multiplicative escalation in reconstruction complexity.

4.3.3. Attack C: Adaptive Reconstruction

Attack C models an adversary that ranks fragments using observable characteristics such as size and entropy. The adversary selects a candidate pool of size

K

from

N

fragments and attempts reconstruction from this reduced set.

Two metrics are evaluated:

Recall: $Recall = \frac{Captured real fragments}{R}$ , where $R$ is the total number of real fragments.
Post-filter reconstruction complexity: ${l o g}_{10} ((\binom{K}{R}) \cdot R!)$

If ranking successfully isolates real fragments, high recall can be achieved at small

K

, reducing reconstruction complexity. If decoys remain statistically aligned, filtering removes real fragments as well, forcing the adversary to retain larger candidate pools. The relationship between post-filter reconstruction complexity and recall, as well as the effect of filtering strength on recall, are illustrated in Figure 8a,b.

Baseline policies exhibit near-vertical trajectories: recall remains ≈1.0 even as

K

decreases, indicating that ranking heuristics can isolate real fragments. In contrast, the AI-controlled policy exhibits a sloped tradeoff: stronger filtering reduces recall, forcing the adversary to retain larger candidate pools.

Baseline policies maintain high recall even at strong filtering levels. Under the AI-controlled policy, recall decreases as

K / N

decreases, enforcing a tradeoff between filtering strength and reconstruction complexity.

These results indicate that distributional alignment under naive testing is necessary but not sufficient for robustness against adaptive ranking. Effective defenses must also prevent deterministic prioritization of real fragments under aggressive filtering by enforcing a recall–complexity tradeoff.

4.4. Results and Analysis

Experimental results show consistent degradation of adversarial reconstruction effectiveness as adversarial capability increases.

Attack A: The AI-controlled policy preserves statistical overlap between real and decoy fragments, preventing reliable separation. Fixed and randomized baselines exhibit significant separability.
Attack B: Telemetry-controlled decoy insertion amplifies combinatorial reconstruction complexity. Small deployments experience substantial expansions in search space, while large deployments exhibit rapid growth due to factorial scaling.
Attack C: Adaptive ranking fails to deterministically isolate real fragments under the AI-controlled policy. High recall requires retaining larger candidate pools, preserving substantial reconstruction complexity. Baseline policies do not enforce this tradeoff.

Across all scenarios, AI-controlled decoy generation provides layered defense by

Amplifying combinatorial uncertainty through controlled fragment-count expansion;
Degrading inference-based filtering by preserving statistical alignment.

Together, these results show that statistical indistinguishability alone is insufficient unless accompanied by adaptive decoy scaling, which jointly enforces combinatorial expansion and filtering resistance.

Table 3 consolidates the layered defensive effects observed across Attacks A–C. As adversarial capability increases from naive to adaptive filtering, the AI-controlled policy preserves reconstruction uncertainty by maintaining statistical alignment, amplifying combinatorial effort, and enforcing recall–complexity tradeoffs.

4.5. Summary of Findings

The evaluation confirms that AI-controlled modular decoy generation degrades adversarial reconstruction capability across naive, statistical, and adaptive attack models. Telemetry-aware decoy-quantity selection, combined with statistically aligned decoy construction, increases reconstruction uncertainty while preserving bounded storage overhead.

Across deployment scales, Kolmogorov–Smirnov testing at

α = 0.05

indicates no statistically significant separation between real and decoy fragments under the AI-controlled policy, whereas fixed-ratio and randomized baselines exhibit statistically significant separability in fragment size and, in several cases, entropy. These results are achieved without modifying encryption schemes, fragmentation mechanisms, or storage-provider infrastructure, reinforcing the method’s role as a system-agnostic security overlay.

By jointly integrating telemetry-aware decoy scaling with empirically validated distributional alignment, the proposed approach transforms decoy insertion from heuristic obfuscation into a measurable reconstruction-resistance mechanism under post-exposure analytical threat conditions.

5. Discussion

This section interprets the experimental findings and situates the proposed AI-controlled modular decoy generation method within the broader context of secure storage systems vulnerable to reconstruction and filtering attacks. The discussion emphasizes how adaptive decoy quantity control combined with statistical alignment addresses limitations of static and heuristic decoy strategies while preserving compatibility with heterogeneous storage environments.

5.1. Effectiveness Against Adversarial Filtering

The results demonstrate that decoy effectiveness depends on both decoy quantity and statistical alignment with observable fragment-level characteristics. Prior decoy-based defenses relying on static or randomly generated artifacts are often vulnerable to statistical filtering once adversaries analyze fragment size distributions, entropy values, or structural regularities [25].

In contrast, the proposed approach preserves substantial distributional overlap between real and decoy fragments across deployment scales. This overlap reduces the reliability of size- and entropy-based filtering strategies and limits the adversary’s ability to deterministically isolate real fragments using observable features. As shown in Attacks A and C, statistical alignment reduces feature-based separability, forcing the adversary to retain larger candidate pools to maintain recall.

These findings indicate that decoy indistinguishability should be evaluated under adaptive filtering conditions rather than solely through naive distributional testing.

5.2. Role of AI-Controlled Adaptation

The telemetry-aware control model enables adaptive selection of decoy quantity based on system telemetry, workload characteristics, and deployment configuration [13,26]. This adaptive behavior maintains defensive effectiveness while avoiding unnecessary overhead under varying operational constraints.

Unlike fixed-ratio decoy policies, the proposed method dynamically adjusts decoy insertion in response to changing system states, supporting sustained reconstruction resistance across exposed storage environments [19]. This adaptivity is central to balancing reconstruction complexity with operational efficiency.

Importantly, adaptive control affects not only the number of decoys but also the adversary’s effective search space under Attacks B and C. Because combinatorial reconstruction complexity increases with fragment count, telemetry-aware decoy selection directly influences adversarial workload while preserving bounded overhead.

5.3. Comparison with Prior Decoy-Based Defenses

Existing decoy-based storage defenses typically rely on fixed heuristics or static configurations that do not account for adversarial evolution or system variability [6]. Such approaches may increase superficial uncertainty but remain vulnerable to informed filtering when distributional discrepancies exist.

The proposed method differs by integrating AI-based quantity control with statistically aligned decoy construction. This combination enables resistance against progressively stronger adversarial models without modifying encryption algorithms, fragmentation strategies, or storage-provider mechanisms.

By separating decoy control logic from underlying cryptographic operations, the method functions as a deployable security overlay rather than a tightly coupled architectural redesign. This modularity facilitates integration into hybrid and distributed storage environments while preserving compatibility with existing cryptographic workflows.

5.4. Practical Deployment Considerations

Because the decoy generation logic operates as a modular overlay, it can be integrated into existing hybrid, distributed, or multi-cloud storage pipelines without changes to encryption, fragmentation, or storage-provider infrastructure. This portability distinguishes the proposed approach from tightly coupled security mechanisms requiring provider-side support or architectural modification [8].

Decoy insertion increases external payload size and therefore increases storage and transmission cost. In this work, decoy quantity is explicitly bounded, with the decoy-to-real ratio constrained to 5–20% to preserve predictable overhead across deployments. This ratio represents an explicit and configurable tradeoff between reconstruction resistance and storage cost.

For example, in the Large deployment configuration,

R = 344

and

D = 69

(

N = 413

), yielding a storage amplification factor of

\frac{D}{R} \approx 0.20

(approximately 20%). Because transmission overhead scales linearly with decoy count, the increase in storage and network cost remains proportional and analytically predictable under adaptive quantity control.

This bounded overhead is associated with increased reconstruction uncertainty and substantial combinatorial complexity under Attacks A–C, establishing a measurable and controllable security–overhead tradeoff.

5.5. Limitations and Scope

The evaluation focuses on reconstruction and filtering attacks under post-breach conditions in which adversaries obtain externally exposed encrypted fragments but lack cryptographic keys or privileged system access. The analysis does not model insider threats, cryptographic key compromise, or full system intrusion.

Exhaustive brute-force enumeration of all fragment combinations is not explicitly simulated due to computational infeasibility at large scales; instead, reconstruction difficulty is characterized analytically through combinatorial modeling. In such settings, adversarial effectiveness is governed primarily by filtering efficiency rather than raw enumeration capacity.

The method assumes that decoy-generation parameters and trusted metadata remain separated from externally exposed fragment storage. If both fragment data and complete decoy-mapping metadata were compromised, reconstruction resistance would depend primarily on the guarantees of the underlying encryption scheme.

Additionally, the current evaluation models adversarial filtering primarily based on observable statistical characteristics. More advanced adversaries employing machine learning-based discrimination strategies could extend the threat model, as demonstrated in recent research on inference-based attacks and defense mechanisms in machine learning systems [27,28]. Future work may investigate robustness under adversarial learning and adaptive model-based filtering. The current evaluation does not include repeated-trial variance estimation or confidence interval analysis; such statistical robustness assessment is reserved for future work.

Future research may also explore complementary defenses addressing integrity disruption, availability degradation, coordinated multi-vector attacks, and redundancy-aware hybrid storage architectures.

Overall, the proposed method advances decoy-based reconstruction resistance by integrating adaptive quantity control with statistically validated fragment alignment under bounded operational constraints.

6. Conclusions

This work presented an AI-controlled modular decoy generation approach for enhancing reconstruction resistance in externally exposed storage environments. The proposed method operates as a system-agnostic security overlay that augments encrypted and fragmented data with statistically aligned decoy fragments prior to external storage, without modifying encryption schemes, fragmentation strategies, or storage architectures.

Experimental evaluation across naive, statistical filtering, and adaptive adversarial models demonstrated that AI-controlled decoy insertion degrades reconstruction effectiveness under the defined post-exposure threat assumptions. In distributional testing (Attack A), real and decoy fragments exhibited no statistically significant separation in observable size and entropy characteristics. Combinatorial reconstruction modeling (Attack B) showed that telemetry-controlled decoy quantity selection expanded the blind reconstruction search space relative to fixed and randomized baselines. Under adaptive ranking (Attack C), the proposed approach enforced a measurable recall–complexity tradeoff, limiting deterministic isolation of real fragments through heuristic filtering.

A central contribution of this work is the integration of AI-based, telemetry-aware control for adaptive decoy quantity selection. Unlike fixed or heuristic decoy policies, the proposed method dynamically adjusts decoy insertion in response to system and workload conditions while enforcing bounded ratio constraints (5–20%) to maintain predictable storage amplification across deployment scales. This adaptive control enables sustained defensive effectiveness without excessive operational overhead.

Importantly, the decoy generation mechanism remains independent of specific cryptographic primitives, fragmentation algorithms, and storage-provider implementations. This modularity supports deployment as a portable security overlay that complements existing protection mechanisms rather than replacing them.

Overall, the results indicate that AI-controlled, distribution-aware decoy generation provides a practical and scalable mechanism for increasing reconstruction uncertainty under post-breach exposure scenarios at the evaluated deployment scales. By combining adaptive quantity control with empirically validated statistical alignment, the proposed framework establishes a measurable security–overhead tradeoff across naive, filtering-based, and adaptive adversarial models. Future work will investigate robustness against learning-based adversaries and explore extensions addressing integrity, availability, and coordinated multi-vector threat scenarios.

Author Contributions

M.A.: conceptualization, methodology, system design and implementation, data collection and analysis, software development, visualization, and manuscript preparation. J.-S.Y.: supervision, validation, methodological guidance, manuscript review, and final approval. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study used publicly available system performance characteristics and controlled synthetic fragment modeling. No human or animal subjects were involved, and no personally identifiable information was processed; therefore, institutional ethical approval was not required. Generative AI tools were used solely for grammar and stylistic refinement. All methodological design, experimental implementation, data processing, statistical evaluation, and interpretation were conducted by the authors.

Data Availability Statement

The data, trained models, and evaluation scripts are available from the corresponding author upon reasonable request.

Acknowledgments

The authors extend appreciation to the University of Central Florida for providing academic and technical support. Various digital tools were used responsibly to assist with formatting, data visualization, and language refinement. All outputs were carefully reviewed and verified by the authors, who assume full responsibility for the final manuscript content.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Markatou, E.A.; Falzon, F.; Tamassia, R.; Schor, W. Reconstructing with Less: Leakage Abuse Attacks in Two Dimensions. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS ’21); ACM: New York, NY, USA, 2021; pp. 2243–2261. [Google Scholar] [CrossRef]
Skračić, K.; Petrović, J.; Pale, P. Classification of Low- and High-Entropy File Fragments Using Randomness Measures and Discrete Fourier Transform Coefficients. Vietnam. J. Comput. Sci. 2023, 10, 433–462. [Google Scholar] [CrossRef]
Xu, L.; Duan, H.; Zhou, A.; Yuan, X.; Wang, C. Interpreting and Mitigating Leakage-Abuse Attacks in Searchable Symmetric Encryption. IEEE Trans. Inf. Forensics Secur. 2021, 16, 5310–5325. [Google Scholar] [CrossRef]
Kornaropoulos, E.M.; Papamanthou, C.; Tamassia, R. Response-Hiding Encrypted Ranges: Revisiting Security via Parametrized Leakage-Abuse Attacks. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP); IEEE: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
Yang, Y.; Fan, H.; Zhang, J.; Li, B.; Ma, H.; Gu, X. SMSSE: Size-Pattern Mitigation Searchable Symmetric Encryption. IEEE Trans. Inf. Forensics Secur. 2025, 20, 3176–3189. [Google Scholar] [CrossRef]
Farrag, M.; Sayed, S.G.; Zamzam, M. Bluffing the Hackers: Automated Decoy Creation and Real-Time Cyber Deception. In Proceedings of the 7th International Conference on Signal Processing and Information Security (ICSPIS 2024); IEEE: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
Russo, S.; Zanasi, C.; Colajanni, M. Cyber Defense through Strategic Dynamic Deception. In Proceedings of the 17th International Conference on Cyber Conflict (CyCon 2025); NATO CCDCOE: Tallinn, Estonia, 2025. [Google Scholar] [CrossRef]
Du, R.; Tai, Y.; Li, M. A Highly Accurate Statistical Attack against Searchable Symmetric Encryption. In Proceedings of the IEEE International Conference on Parallel and Distributed Systems (ICPADS); IEEE: New York, NY, USA, 2023. [Google Scholar] [CrossRef]
Xu, L.; Zhou, A.; Duan, H.; Wang, C.; Wang, Q.; Jia, X. Toward Full Accounting for Leakage Exploitation and Mitigation in Dynamic Encrypted Databases. IEEE Trans. Dependable Secur. Comput. 2024, 11, 1918–1934. [Google Scholar] [CrossRef]
Hofmann, J.; Truong, K.T. End-to-End Encrypted Cloud Storage in the Wild: A Broken Ecosystem. In Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security (CCS ’24); ACM: New York, NY, USA, 2024; pp. 3988–4001. [Google Scholar] [CrossRef]
Loh, R.; Thing, V.L.L. Data Privacy in Multi-Cloud: An Enhanced Data Fragmentation Framework. In Proceedings of the 18th International Conference on Privacy, Security and Trust (PST); IEEE: New York, NY, USA, 2021; pp. 1–5. [Google Scholar] [CrossRef]
Tang, X.; Jin, L. Data Splitting Based Double Layer Encryption for Secure Ciphertext Deduplication in Cloud Storage. In Proceedings of the IEEE International Conference on Cloud Computing (CLOUD); IEEE: New York, NY, USA, 2024; pp. 153–163. [Google Scholar] [CrossRef]
Ahmed, M.; Yuan, J.-S. AI-Driven Hybrid Architecture for Secure, Reconstruction-Resistant Multi-Cloud Storage. Future Internet 2026, 18, 70. [Google Scholar] [CrossRef]
Zhu, M.; Anwar, A.H.; Wan, Z.; Cho, J.-H.; Kamhoua, C.A.; Singh, M.P. A Survey of Defensive Deception: Approaches Using Game Theory and Machine Learning. IEEE Commun. Surv. Tutor. 2021, 23, 2460–2493. [Google Scholar] [CrossRef]
Adawadkar, A.M.K.; Kulkarni, N. Cyber-Security and Reinforcement Learning—A Brief Survey. Eng. Appl. Artif. Intell. 2022, 114, 105116. [Google Scholar] [CrossRef]
Khan, I.; Ghani, A.; Saqlain, S.M.; Ashraf, M.U.; Alzahrani, A.; Kim, D.-H. Secure Medical Data Against Unauthorized Access Using Decoy Technology in Distributed Edge Computing Networks. IEEE Access 2023, 11, 144560–144573. [Google Scholar] [CrossRef]
Aminuddin Mohd Kamal, A.A.; Okada, M.; Fujisawa, M. Privacy-Preserving Keyword Search with Access Control for Secret Sharing-Based Data Outsourcing. IEEE Access 2025, 13, 73625–73651. [Google Scholar] [CrossRef]
De Gaspari, F.; Hitaj, D.; Pagnotta, G.; De Carli, L.; Mancini, L.V. Reliable Detection of Compressed and Encrypted Data. Neural Comput. Appl. 2022, 34, 20379–20393. [Google Scholar] [CrossRef]
Zambianco, M.; Facchinetti, C.; Siracusa, D. Resource-Aware Cyber Deception for Microservice-Based Applications. IEEE Trans. Serv. Comput. 2024, 17, 4211–4224. [Google Scholar] [CrossRef]
Tian, H.; Huang, G. Research on Distributed Secure Storage Framework of Industrial Internet of Things Data Based on Blockchain. Electronics 2024, 13, 4812. [Google Scholar] [CrossRef]
Seth, B.; Dalal, S.; Jaglan, V.; Mohan, S.; Srivastava, G. Integrating Encryption Techniques for Secure Data Storage in the Cloud. Trans. Emerg. Telecommun. Technol. 2022, 33, e4108. [Google Scholar] [CrossRef]
Izhikevich, K.; Voelker, G.M.; Savage, S.; Izhikevich, L. Using Honeybuckets to Characterize Cloud Storage Scanning in the Wild. In Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P); IEEE: New York, NY, USA, 2024; pp. 95–113. [Google Scholar] [CrossRef]
Du, R.; She, X.; Li, M.; Wang, Z. Full Database Reconstruction: Leakage-Abuse Attacks Based on Expected Distributions. In Advanced Intelligent Computing Technology and Applications (ICIC 2024); Springer: Singapore, 2024; pp. 110–121. [Google Scholar] [CrossRef]
Ning, J.; Huang, X.; Poh, G.S.; Yuan, J.; Li, Y.; Weng, J.; Deng, R.H. LEAP: Leakage-Abuse Attack on Efficiently Deployable, Efficiently Searchable Encryption with Partially Known Dataset. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS ’21); ACM: New York, NY, USA, 2021; pp. 2307–2320. [Google Scholar] [CrossRef]
Markatou, E.A.; Falzon, F.; Espiritu, Z.; Tamassia, R. Attacks on Encrypted Response-Hiding Range Search Schemes in Multiple Dimensions. Proc. Priv. Enhancing Technol. 2023, 2023, 200–219. [Google Scholar] [CrossRef]
Yang, Z.; Zhu, Y.; Wan, J.; Xiang, C.; Tang, T.; Wang, Y.; Xu, R.; Wang, L.; Zhang, F.; Xu, J.; et al. Defending Data Inference Attacks Against Machine Learning Models by Mitigating Prediction Distinguishability. IEEE Trans. Dependable Secur. Comput. 2025, 22, 2687–2704. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16); ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
Massey, F.J. The Kolmogorov–Smirnov Test for Goodness of Fit. J. Am. Stat. Assoc. 1951, 46, 68–78. [Google Scholar] [CrossRef]

Figure 1. Conceptual overview of the AI-controlled modular decoy generation method overlay.

Figure 2. System workflow and placement of the AI-controlled decoy overlay within a representative hybrid multi-cloud pipeline.

Figure 3. AI-controlled decoy generation framework comprising telemetry-aware quantity selection and statistical alignment-based decoy construction.

Figure 4. Baseline real fragment size distribution for the Large deployment. The evaluation focuses on adversarial reconstruction behavior rather than stochastic variance estimation. (R = 344).

Figure 5. (a) Fragment size distributions of real and AI-generated decoy fragments (Large deployment). (b) Shannon entropy distributions of real and AI-generated decoy fragments (Large deployment).

Figure 6. Log-scale visualization of KS p-values under Attack A.

Figure 7. (a) Blind reconstruction search-space magnitude under Attack B (Small deployment). (b) Blind reconstruction search-space magnitude under Attack B (Large, AI).

Figure 8. (a) Post-filter Reconstruction Complexity vs. Recall (Large). (b) Recall vs. Filter Strength (K/N) (Large).

Table 1. Comparative analysis of decoy and storage defense strategies.

Method Category	Decoy Strategy	Adaptability	Statistical Alignment	System Integration Requirements	Reconstruction Resistance	Scalability	Filtering Resistance	References
Static Decoys	Pre-generated or fixed-pattern decoys	None	None; size/entropy often mismatched	Minimal; post-processing only	Weak—separable via size/entropy	Low	Very Weak—trivial filtering	[2,9]
Random Decoys	Random padding or random-size decoys	Low	Weak; randomness does not ensure distributional matching	Minimal	Weak—statistically separable	Low–Moderate	Weak—entropy/size mismatch exploitable	[9]
Fixed-Ratio Decoys	Constant decoy-to-real ratio (e.g., 10%, 20%)	None	None; ratio does not enforce distributional alignment	Minimal	Weak—predictable patterns	Moderate	Weak—predictable statistical patterns	[8]
Entropy-Matched Decoys	Entropy-aware decoy generation	Low–Moderate	Partial; entropy aligned but size often mismatched	Moderate; partial integration needed	Moderate—entropy aligned; size vulnerable	Moderate	Moderate—size filtering still effective	[10]
Architecture-Coupled Defenses	Blockchain-assisted or protocol-bound storage security	Architecture-dependent	Not applicable (focus on protocol integrity)	High; requires modifying storage or protocol layers	Strong for integrity; limited against post-exposure analysis	High	Moderate—depends on protocol design	[13,14]
Proposed AI-Controlled Overlay	Telemetry-aware joint size–entropy decoy insertion	High	Strong; joint size–entropy alignment validated via KS testing (α = 0.05)	Low; modular post-fragmentation overlay	Strong—resistant to naïve/statistical/adaptive filtering	High—validated across small–large deployments	Strong—resistant to naïve, statistical, and adaptive filtering	This work

Table 2. KS D-statistics and p-values for fragment size and entropy under naive reconstruction (Attack A), evaluated across Small, Medium, and Large deployments and decoy policies (α = 0.05).

Bucket	Mode	KS p (Size)	KS p (Entropy)	KS Stat (Size)	KS Stat (Entropy)
Small	AI	1	0.287762311	0.03125	0.383928571
Small	FIXED10	3.416 × 10⁻³	0.043878268	0.96875	0.75
Small	FIXED20	3.47 × 10⁻⁵	5.99833 × 10⁻⁴	0.96875	0.833333333
Small	RANDOM	3.47 × 10⁻⁵	1.089685 × 10⁻³	0.96875	0.802083333
Medium	AI	1	0.357522279	0.018181818	0.290909091
Medium	FIXED10	1.25 × 10⁻⁵	1.94 × 10⁻⁵	0.981818182	0.963636364
Medium	FIXED20	6.50 × 10⁻⁹	6.50 × 10⁻⁹	0.981818182	0.981818182
Medium	RANDOM	6.50 × 10⁻⁹	7.47 × 10⁻⁵	0.981818182	0.709090909
Large	AI	1	0.701729245	0.002906977	0.091464442
Large	FIXED10	1.67 × 10⁻²⁸	7.22 × 10⁻²³	0.997093023	0.891415869
Large	FIXED20	7.84 × 10⁻⁵²	2.94 × 10⁻³⁹	0.997093023	0.866616111
Large	RANDOM	7.84 × 10⁻⁵²	1.47 × 10⁻³²	0.997093023	0.788338389

Table 3. Structured comparison of reconstruction attack scenarios, adversarial strategies, evaluation indicators, and the observed impact of AI-controlled decoy generation.

Attack Scenario	Adversary Capability	Primary Strategy	Key Evaluation Indicator	Observed Effect of AI-Controlled Decoys
Attack A (Unfiltered Reconstruction)	Low	Blind fragment combination without filtering	Statistical indistinguishability (KS test on size and entropy, α = 0.05)	Real and decoy fragments remain statistically indistinguishable, limiting reliable filtering based on observable characteristics.
Attack B (Statistical Filtering)	Medium	Size- and entropy-based fragment filtering followed by blind reconstruction	log10(C(N, R) × R!), where N = R + D	Telemetry-controlled decoy insertion increases fragment cardinality (N), amplifying blind reconstruction complexity through expansion of C(N, R) × R!.
Attack C (Adaptive Filtering)	High	Iterative heuristic-driven ranking and prioritization	Recall = Captured Real Fragments/R, and log10(C(K, R) × R!), where K ≤ N	Statistically aligned decoys enforce a measurable recall–complexity tradeoff, limiting deterministic isolation of real fragments under adaptive ranking.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ahmed, M.; Yuan, J.-S. AI-Controlled Modular Decoy Generation for Reconstruction-Resistant Hybrid and Multi-Cloud Storage Systems. Electronics 2026, 15, 1231. https://doi.org/10.3390/electronics15061231

AMA Style

Ahmed M, Yuan J-S. AI-Controlled Modular Decoy Generation for Reconstruction-Resistant Hybrid and Multi-Cloud Storage Systems. Electronics. 2026; 15(6):1231. https://doi.org/10.3390/electronics15061231

Chicago/Turabian Style

Ahmed, Munir, and Jiann-Shiun Yuan. 2026. "AI-Controlled Modular Decoy Generation for Reconstruction-Resistant Hybrid and Multi-Cloud Storage Systems" Electronics 15, no. 6: 1231. https://doi.org/10.3390/electronics15061231

APA Style

Ahmed, M., & Yuan, J.-S. (2026). AI-Controlled Modular Decoy Generation for Reconstruction-Resistant Hybrid and Multi-Cloud Storage Systems. Electronics, 15(6), 1231. https://doi.org/10.3390/electronics15061231

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AI-Controlled Modular Decoy Generation for Reconstruction-Resistant Hybrid and Multi-Cloud Storage Systems

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. System Model and Assumptions

3.2. Adversarial Model

3.3. AI-Based Telemetry-Aware Decoy Generation

3.3.1. Control Layer: Telemetry-Aware Decoy Quantity Selection

3.3.2. Construction Layer: Statistical Decoy Alignment

3.4. Design Goals

4. Results

4.1. Experimental Setup

4.2. Baseline Configuration

4.3. Adversarial Reconstruction Attacks

4.3.1. Attack A: Naive Reconstruction

4.3.2. Attack B: Statistical Filtering-Based Reconstruction

4.3.3. Attack C: Adaptive Reconstruction

4.4. Results and Analysis

4.5. Summary of Findings

5. Discussion

5.1. Effectiveness Against Adversarial Filtering

5.2. Role of AI-Controlled Adaptation

5.3. Comparison with Prior Decoy-Based Defenses

5.4. Practical Deployment Considerations

5.5. Limitations and Scope

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI