Next Article in Journal
Distributed Robust Routing Optimization for Laser-Powered UAV Cluster with Temporary Parking Charging
Previous Article in Journal
Comparative Study of Force and Deformation Characteristics of Closed Cavity Thin-Walled Components in Prefabricated Metro Station
Previous Article in Special Issue
Resilient Software Design Through Cognitive-Aware Antipattern Propagation in 4+1 Architectural Views
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhancement of the Generation Quality of Generative Linguistic Steganographic Texts by a Character-Based Diffusion Embedding Algorithm (CDEA)

1
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
2
School of Cybersecurity, Tarim University, Alar City 843300, China
3
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
4
School of Cyber Science and Engineering, Nanjing University of Science and Technology, Wuxi 214443, China
5
Zhongke Yungang Technology Co., Ltd., Beijing 100010, China
6
Beijing Lianan Hengda Technology Co., Ltd., Beijing 100141, China
*
Authors to whom correspondence should be addressed.
PhD Alumni, Department of Electronic Engineering, Tsinghua University, Beijing 100190, China.
Appl. Sci. 2025, 15(17), 9663; https://doi.org/10.3390/app15179663
Submission received: 30 July 2025 / Revised: 28 August 2025 / Accepted: 29 August 2025 / Published: 2 September 2025
(This article belongs to the Special Issue Cyber Security and Software Engineering)

Abstract

Generative linguistic steganography aims to produce texts that remain both perceptually and statistically imperceptible. The existing embedding algorithms often suffer from imbalanced candidate selection, where high-probability words are overlooked and low-probability words dominate, leading to reduced coherence and fluency. We introduce a character-based diffusion embedding algorithm (CDEA) that uniquely leverages character-level statistics and a power-law-inspired grouping strategy to better balance candidate word selection. Unlike prior methods, the proposed CDEA explicitly prioritizes high-probability candidates, thereby improving both semantic consistency and text naturalness. When combined with XLNet, it effectively generates longer sensitive sequences while preserving quality. The experimental results showed that CDEA not only produces steganographic texts with higher imperceptibility and fluency but also achieves stronger resistance to steganalysis compared with the existing approaches. Future work will be to enhance statistical imperceptibility, integrate CDEA with larger language models such as GPT-5, and extend applications to cross-lingual, multimodal, and practical IoT or blockchain communication scenarios.

1. Introduction

With the rapid digital transformation driven by technologies such as the Internet of Things (IoT) [1] and blockchain [2], secure communication has become increasingly urgent. Steganography [3], the practice of hiding information within digital media to make it undetectable, provides a promising method to protect sensitive data.
Recent advances in artificial intelligence (AI), particularly neural network-based language models, have further accelerated the development of generative linguistic steganography. Unlike traditional text-based methods, this approach has leveraged generative models to directly embed sensitive information into natural language text, achieving both concealment and readability. It offers higher embedding capacity and stronger statistical imperceptibility, making it an attractive tool for secure communication.
Despite these advantages, a major challenge remains: generating steganographic text that is both perceptually imperceptible (fluent and natural to human readers) and statistically imperceptible (resistant to computational detection). The existing embedding algorithms, including Huffman coding, arithmetic coding, and distribution-copy methods, have attempted to balance information encoding with language quality. However, they often suffer from word selection imbalance, in which low-probability words are chosen too frequently while high-probability words are underutilized. Some approaches mitigate this issue by narrowing the candidate word pool, but this comes at the cost of embedding capacity and limits applicability in scenarios that require encoding large amounts of sensitive information.
To address these challenges, we propose a Character-based Diffusion Embedding Algorithm (CDEA). Unlike prior methods that attempted to minimize the influence of sensitive information, CDEA exploits its statistical characteristics by incorporating character-level frequency patterns and applying a grouping mechanism based on power-law distributions. This strategy increases the likelihood of high-probability word selection, reduces reliance on low-probability words, and enhances the fluency and coherence of steganographic text. Furthermore, we adopt XLNet as the generation model, leveraging its strong ability to capture long-range dependencies and maintain consistency in longer sensitive sequences.

Contributions

  • We proposed a CDEA that improves the quality of steganographic text while maintaining high embedding capacity.
  • We mitigated word selection imbalance by utilizing character-level frequency patterns and a grouping mechanism based on power-law distributions.
  • We conducted a quantitative analysis of common embedding algorithms, including perfect binary trees, Huffman coding, arithmetic coding, and distribution-copy methods, under consistent conditions to evaluate their impact on steganographic text quality.
  • The experimental results demonstrated that combining CDEA with XLNet significantly enhances the perceptual imperceptibility of generated steganographic text.
The remainder of this paper is organized as follows: Section 2 reviews related works, including generative semantic steganography and deep learning-based text steganalysis. Section 3 introduces preliminary knowledge, focusing on the XLNet model. Section 4 presents the methodology of the proposed CDEA-based framework. Section 5 describes the experimental results and analysis. Section 6 concludes the paper with a mention of future work.

2. Related Works

This section reviews prior research on generative linguistic steganography and deep learning-based text steganalysis. We first discussed methods for generating steganographic texts, highlighting their development, limitations, and challenges, and then we summarized recent deep learning-based approaches for detecting steganographic texts.

2.1. Generative Text Steganography

Generative text steganography has evolved from early Markov chain-based models to recent large language models. Early approaches primarily relied on Markov chains, embedding covert information via transition probabilities to represent bits 0 or 1 [4]. These models were simple and easy to implement, but they often struggled to produce high-quality or semantically coherent texts, especially for longer payloads. Huang et al. [5] have introduced Song Ci templates for steganographic generation to improve syntactic and semantic coherence; however, the reliance on fixed templates limits flexibility and applicability across diverse corpora.
With the development of neural networks, RNN- and LSTM-based methods have become dominant in text generation, capturing long-distance dependencies and semantic nuances more effectively than the Markov model. Early approaches have combined RNNs or LSTMs with classical poetry [6] or lyric templates [7] to increase embedding capacity and concealment. Despite these improvements, generating long, high-quality steganographic texts remained challenging.
The rise in large pretrained language models (LLMs) [8,9,10,11,12] has enabled fluent and semantically rich text generation, providing a strong foundation for linguistic steganography. Building on these advances, researchers have combined generative models with coding schemes to improve statistical security. For example, Zachary et al. [13] have integrated advanced text generation with algorithmic coding. Huang et al. [14] have proposed embedding schemes based on fixed-length perfect binary trees and variable-length Huffman coding, while Dai [15] has introduced patient-Huffman coding to enhance concealment. Fang [16] has explored imperceptibility in LSTM-based steganographic text generation.
For finer-grained embedding, Xiang et al. [17] have reduced the generation unit to characters and introduced strategies to select the best candidate text from multiple starting strings. Similarly, Omer et al. [18] have embedded information at the letter level in Arabic poetry using LSTM and Baudot coding. These approaches have improved adaptability to different text lengths but faced scalability and cross-linguistic generalization challenges.
Further advances have addressed perceptual and statistical imperceptibility. Huang et al. [19] have proposed a VAE-based generation mechanism to balance both objectives. Zhou et al. [20] have used adaptive probability distributions and GANs [21], while Yi et al. [22] have combined BERT [11] with Gibbs sampling to reduce auxiliary information sharing, though embedding capacity remained limited. Cao et al. [23] have introduced plug-and-play models and embeddable candidate pools to avoid low-probability word selection. Other methods have focused on contextual coherence and robustness, including Yan’s ambiguity elimination [24], Ding et al.’s NMT-stega [25], Rajba et al.’s T-function preprocessing [26], Yu et al.’s multi-channel steganography [27] and Huang et al’s DNA steganography [28]. More recent LLM-based models, such as LLM-Stega [29,30], Lin et al. [31], and Sun et al. [32], have supported high-quality, zero-shot, or topic-controlled text generation.
Recent works (2023–2024) have further advanced the field. Sun et al. [33] have proposed FreStega, which dynamically adjusts token probabilities to enhance both imperceptibility and embedding capacity. Lin et al. [34] have introduced OD-stega, employing optimized probability distributions to minimize divergence from the LLM’s natural output. Yang et al. [35] have proposed linguistic steganography, a framework using ontology-entity trees for robust and high-capacity information hiding. Discop [36] has achieved provably secure steganography using distribution-copy techniques.
Despite substantial progress, the existing approaches share several persistent limitations. High-probability words are often underused while low-probability ones are selected too frequently, reducing text naturalness and weakening steganographic security. In addition, most methods are focused on optimizing text generation, providing limited systematic control over the embedding process. Many approaches are also trained or evaluated on specific corpora, such as English texts, poetry, or reviews, which may restrict their generalization across diverse domains. Furthermore, balancing statistical imperceptibility and text naturalness often involves trade-offs between fluency and embedding capacity.
To address these limitations, we propose a CDEA. Unlike prior works that primarily optimize the text generator, CDEA enhances the embedding process itself. By leveraging character-level statistical features and a grouping mechanism based on power-law distributions, CDEA improves the selection of high-probability words, reduces reliance on low-probability words, and enhances semantic coherence, fluency, and perceptual imperceptibility.

2.2. Deep Learning-Based Text Steganalysis

While generative methods have improved text quality, they also pose new challenges for detection, motivating advances in deep learning-based steganalysis. A range of deep learning-based detectors has been developed, including CNN-based [37], RNN-based [38,39], R-BiLSTM [40], and hierarchical attention frameworks [41]. Huang et al. [42] have proposed GSDF, leveraging multi-dimensional weak signals to capture cognitive inconsistencies for steganalysis.
These studies have demonstrated that balancing naturalness, semantic coherence, and imperceptibility remains challenging for generative steganography. The proposed CDEA addresses this context by enhancing the embedding process, aiming to further improve fluency, semantic consistency, and resistance to detection while maintaining high embedding capacity.

3. Methods and Metrics

This section introduces the key components and metrics used in our generative steganography framework. We first describe XLNet, the text generation model that provides candidate words for embedding sensitive information. We then present two readability metrics, Flesch Reading Ease ( F R E ) and Gunning Fog Index ( G F I ), which guide the selection of higher-quality steganographic texts during the generation process.

3.1. XLNet

To generate high-quality steganographic text over long sequences and maintain semantic consistency, we adopted XLNet [43,44] as the underlying text generation model. XLNet is well-suited for this task because it can model long-range dependencies and generate coherent, contextually appropriate long-form text. Two key mechanisms enable this capability.

3.1.1. Segmented Recurrence

XLNet incorporates a recurrence mechanism across segments by integrating hidden states from the previous segment using a stop-gradient operation. This approach maintains contextual information across long inputs while controlling gradient flow. The modified hidden state for layer n 1 of segment τ + 1 is defined as follows:
h ˜ τ + 1 n 1 = S G h τ n 1 h τ + 1 n 1 ,
where h τ + 1 n 1 denotes the hidden state of the n − 1 layer for segment τ + 1 , S G · represents the stop-gradient, and h ˜ τ + 1 n 1 represents the modified hidden state.
To prepare the input for the attention mechanism, the query, key, and value vectors for layer n are generated as follows:
q τ + 1 n , k τ + 1 n , v τ + 1 n = h τ + 1 n 1 W q , h ˜ τ + 1 n 1 W k , h ˜ τ + 1 n 1 W v ,
where W q , W k , and W v are learnable projection matrices. The query vector q τ + 1 n is computed from the current hidden state h τ + 1 n 1 , while the key and value vectors k τ + 1 n and v τ + 1 n are computed from the modified hidden state h ˜ τ + 1 n 1 that integrates contextual information from the previous segment. This design allows the attention mechanism to incorporate both local and long-range dependencies.
The output of the n -th layer for segment τ + 1 is then computed as follows:
h τ + 1 n = Transformer - layer q τ + 1 n , k τ + 1 n , v τ + 1 n ,
where h τ + 1 n denotes the output of the n -th layer, and T r a n s f o r m e r · refers to the standard Transformer architecture that processes the query, key, and value vectors via the attention mechanism to produce a contextually enriched representation.

3.1.2. Dual-Stream Attention

XLNet also uses a dual-stream attention mechanism, which consists of a query stream and a content stream. This mechanism captures long-distance dependencies and incorporates relative positional information. The query stream is computed as follows:
g z t m Attention Q = g z t m 1 , K V = h z < t m 1 ; θ ,
where g z t m represents the query stream attention mechanism and θ denotes the parameters of the attention mechanism, including learnable weights and biases.
h z t m Attention Q = h z t m 1 , K V = h z t m 1 ; θ ,
where h z t m represents the content stream attention mechanism and h z t m 1 denotes the hidden states of inputs that occur at or before time step t at layer m 1 .
In our framework, XLNet generates a candidate pool for the next token based on the current prompt. The embedding algorithm then selects the most appropriate candidate to convey the sensitive information. XLNet’s ability to generate long, coherent, and contextually appropriate text is crucial for embedding information naturally within extended textual content.

3.2. Flesch Reading Ease

In our framework, the Flesch Reading Ease ( F R E ) score is used to select higher-quality steganographic texts by assessing readability and sentence smoothness.
The F R E score is a widely used metric for evaluating the readability of English text. It quantifies how easy or difficult a passage is to read based on two factors: average sentence length and average syllables per word. The formula for F R E is as follows:
F R E = 206.835 1.015 × total   words total   sentences 84.6 × total   syllables total   words ,
A higher F R E score indicates easier readability, with scores above 90 considered very easy and suitable for general audiences, while scores below 30 indicate complex text better suited for expert readers. This metric helps optimize texts intended for broad or diverse readerships.

3.3. Gunning Fog Index

In our framework, the Gunning Fog Index ( G F I ) is used to guide the selection of higher-quality steganographic texts, ensuring semantic clarity and fluent sentence structure.
The G F I estimates the educational level required to comprehend a piece of English text on first reading. It considers sentence length and the proportion of complex words, defined as words containing three or more syllables. The index is calculated as follows:
Fog   Index = 0.4 × words sentences + 100 × complex   words words ,
A Fog Index of 12 suggests the text is suitable for readers with a high school education, while values above 18 indicate more advanced, technical material. This measure helps writers tailor content complexity to the intended audience’s comprehension ability.

4. Proposed Approach

This section introduces the character-based diffusion embedding algorithm (CDEA) and its integration with XLNet for generative linguistic steganography. CDEA leverages character-level statistics and grouping strategies to prioritize high-probability candidate words, while text quality is further guided by metrics introduced in Section 3, ensuring fluent, semantically coherent, and readable steganographic text.

4.1. CDEA

To systematically address the imbalance in candidate word selection, we first described the design and operation of CDEA, highlighting its character ranking, grouping, and confirmation-bit mechanisms.
CDEA addresses the underutilization of high-probability words and over-selection of low-probability words in generative linguistic steganography. The algorithm ranks characters based on statistical features and stores them using a grouping strategy derived from power-law distributions, converting sensitive information into corresponding numerical values.
First, the frequency of each character in a language is calculated and sorted in descending order. The sorted set C is represented as follows, where c i denotes a character:
C = { c 1 ,   c 2 ,   ,   c R 1 ,   c R   P c i > P c i + 1 , 1 i < R } ,
Using the ASCII code as an example, a corpus such as Wikipedia is selected, and the frequency distribution of each character is calculated. The results are shown in Table 1.
Odd-numbered rows indicate the ranking order, while even-numbered rows show the corresponding characters. Non-ASCII characters can first be converted into their ASCII representations before sorting, often by truncating or splitting binary encodings into 7- or 8-bit segments. Such truncation may alter the original frequency distribution, potentially overestimating originally low-frequency characters. Manual calibration of the frequency distribution is, therefore, necessary to better reflect intended character usage. Consequently, handling sensitive information encoded with non-ASCII characters requires additional refinement to ensure that relatively frequent candidates remain prioritized.
In generative linguistic steganography, a one-to-one mapping between sensitive information and candidate words is usually enforced to guarantee accurate extraction. Encoding ASCII characters in a single generation step requires at least 128 distinct candidate words. Although necessary, using such a large pool increases computational cost and may reduce the readability of the generated text.
A simple approach maps high-frequency characters to high-probability candidate words and low-frequency characters to low-probability words. When the sensitive information contains many low-frequency characters, this can result in frequent selection of low-probability words, degrading text quality. To address this, CDEA employs a grouping strategy based on the power-law distribution to segment and store the character set C , improving overall generation quality.
Character frequencies in natural language generally follow a power-law distribution:
P x     x α , α > 0 ,
Leveraging this property, CDEA groups C according to the following:
g n + 1 = m g n ,
where g i denotes the number of elements in the i -th group and m is the common ratio. The last group may not strictly follow this formula, but this does not affect the generation of steganographic text or the correct extraction of embedded information.
For example, assuming g 0 = 2 and m = 2 , the ordered ASCII characters shown in Table 1 can be grouped as illustrated in Table 2, where each character is represented by its rank in the sorted order.
After grouping, each character c i can be indexed by its group number (prefix) and its relative position within the group (suffix). For low-frequency characters with both large prefixes and suffixes, the generation model may be forced to select low-probability words, which can degrade text quality. To address this issue, a confirmation bit (cb) is introduced for low-frequency characters. The cb increases the likelihood of selecting higher-probability words while preserving accurate information extraction. With the cb, the character’s suffix is mapped from its absolute position to a new relative position a , which is calculated as follows:
a r p   m o d g n 2 ,
where r p is the original relative position, g n is the number of elements in the group, and rounding ensures that the mapped position represents only two possible characters. Each character can then be represented as follows:
c i Z i = p c i , s c i , b c i , cb 0 , 1 , ,
where p c i , s c i , and b c i denote the prefix, suffix, and cb, respectively. Each mapping c i Z i is stored in a shared table maintained by both communicating parties. The character ranking in Table 1 can be adjusted based on user requirements. Figure 1 illustrates the main workflow of CDEA.

4.2. Steganographic Mechanism

Building on CDEA, the steganographic mechanism selects candidate words from XLNet’s predictions. To ensure fluency and readability, candidate texts are evaluated using perplexity, Flesch Reading Ease ( F R E ), and Gunning Fog Index ( G F I ), which guide the final selection of output text.
Figure 2 shows the workflow of a generative linguistic steganography system based on CDEA and XLNet. The system first uses CDEA to build a mapping table that converts ASCII characters into corresponding triplets (e.g., e (0, 1), Z   (5, 32, 0)). This mapping is created based on character distributions, power-law distributions, and group storage strategies.
Next, the sensitive information is taken as input, and each character is encoded into a triplet. These triplets are stored in a list, after which the XLNet model is activated. XLNet uses communication history as a prompt to predict the next possible words, forming a candidate pool (CP). The first value is then popped from the list and used as an index to select a word from the CP. The selected word is appended to the prompt, updating the context for the next prediction step. This process continues iteratively until the entire list has been processed.
The candidate pool is constructed through a two-stage filtering process. First, top- p (nucleus) sampling selects the smallest set of tokens whose cumulative probability exceeds a predefined threshold p , retaining semantic diversity while reflecting the predicted probability distribution. However, because the size of this set can vary, index-based selection may become unstable. To address this, top- k filtering is applied afterward, restricting the candidate pool to the k most probable tokens within the nucleus. This combination maintains linguistic diversity while providing a controlled and predictable set of candidates for steganographic embedding.
Algorithm 1 summarizes the specific steps of the steganography mechanism based on CDEA and XLNet. To improve the embedding rate during ASCII encoding, only the first group is treated as the high-frequency group; confirmation bits for elements in this group are left empty. Accordingly, the threshold is fixed at 1, as confirmation bits are assigned only from the second group onward. The first group contains relatively few elements and does not require confirmation bits.
Algorithm 1: Single-step steganographic text generation
01 Input: sensitive information SI, communication history data CHD
02 Output: steganographic text ST
03 Mapping Table(MT) CDEA
04 Initialize list[]
05 threshold == 1
06 for  c i in SI do
07  // item == [prefix, suffix]
08  item Encoding ( c i , MT)
09  if item[0] > threshold do
10   modulus int(len(MT[item[0]])/2)
11   // item == [prefix, suffix, cb]
12   if item[1) > modulus do
13    item[1) item[1] % modulus
14    item.append(1)
15   else do
16    item.append(0)
17   end if
18  end if
19  list.add(item)
20 end for
21 CHD == CHD, CHD_initial == CHD
22 while len(list) > 0 do
23  //All word[j] == ( p i , w i )
24  All Word XLNet(CHD)
25  Candidate Pool top- p and top- k (All Word)
26  Temp Candidate Pool[list[0]]
27  list list[1:]
28  CHD CHD.append(temp)
29 end while
30 ST CHD.remove(CHD_initial)
31 Return ST
To enhance the quality of steganographic text, multiple prompts of varying lengths are used to generate several candidate texts. The final text is selected based on the highest evaluation score, ensuring that the output is both natural and coherent.
During generation, each candidate is evaluated using three metrics reflecting linguistic fluency and readability: perplexity ( p p l ), Flesch Reading Ease ( F R E ), and the Gunning Fog Index ( Fog   Index ). Perplexity measures model prediction accuracy, with lower values indicating better fluency. F R E and Fog   Index assess readability from different perspectives, where higher F R E and lower Fog   Index values correspond to easier-to-read text. These metrics enable quantitative comparison of candidate texts, guiding the selection of the final output.
To ensure comparability, all scores are normalized to the range [0, 1] using predefined intervals derived from typical natural language distributions. Perplexity is log-scaled within 10–200, which covers most well-formed texts generated by large language models. FRE is linearly scaled within 60–90, where higher values indicate better readability. The Fog Index is linearly scaled within 6–12, and its normalized score is inverted so that lower index values correspond to higher scores. These ranges stabilize evaluation across samples.
The weighting coefficients are set to α = 0.8 , β = 0.1 , and γ = 0.1 , giving higher weight to perplexity as it directly reflects fluency, while FRE and Fog Index provide complementary readability information. The final evaluation score is computed as follows:
Final   score = α × n o r m p p l + β × n o r m F R E + γ × n o r m ( Fog   Index )
This scoring scheme allows consistent comparison of candidates generated under different prompt lengths and facilitates the selection of the most natural and coherent steganographic text.

4.3. Discussions

We briefly explain how CDEA enhances the occurrence frequency of high-probability words in the candidate pool (CP) while reducing the occurrence frequency of low-probability words.
Each character c i is represented by three components: a prefix p c i , a suffix s c i , and a confirmation bit b c i . Given the predefined statistical frequency parameters for each character, the occurrence probability of c i can be expressed as follows:
P r I c i = Pr I p c i · Pr I s c i · Pr b c i = ,     i f   H F Pr I p c i · Pr I s c i · Pr b c i ,     i f   L F ,
where I * denotes the occurrence of an event, and H F and L F denote high-frequency and low-frequency characters, respectively.
Each c i consists of two or three independent numerical values, and selecting candidate words requires considering two or three consecutive time points. For the first two time points, the selection strategy is as follows:
w t c i arg max w C P Pr w : x C P : Pr x > Pr w = p c i w t + 1 c i arg max w C P Pr w : x C P : Pr x > Pr w = s c i ,
The grouping strategy based on power-law distributions ensures that groups with lower indices contain fewer elements. Consequently, high-frequency characters typically have smaller prefix and suffix values, which serve as indices to prioritize higher-probability candidate words. Since high-frequency characters often constitute a substantial portion of the sensitive information, the overall selection frequency of high-probability words is increased during text generation.
For low-frequency characters, a confirmation bit (cb) is introduced to enhance their representation. When handling a low-frequency character c i , the candidate word selection at the third time point follows a strategy analogous to that in Equation (16). In contrast, when processing two consecutive high-frequency characters without a cb, the selection at the third time point depends directly on the prefix p c i of the second character:
w t + 2 c i arg m a x w C P Pr w , i f   b c i = 0 arg m a x w C P Pr w : x C P : Pr x > Pr w = 1 , i f   b c i = 1 arg m a x w C P ( Pr w : | x C P : Pr x > Pr w } | = p c i , i f   b c i = ,
Next, we have investigated how the introduction of a cb can further enhance the selection frequency of high-probability candidate words within the CP while simultaneously reducing the selection frequency of low-probability words. To effectively represent each suffix without ambiguity, the dimensionality of the CP must be at least equal to the maximum value of the suffixes. However, the introduction of a cb can significantly reduce the dimensionality of the candidate pool corresponding to the suffix, decreasing it to half of its original size.
Given that the update mechanism has followed a congruence relation and there exists a bijective relationship between suffixes and candidate words, when the suffix is relatively large, the frequency of selecting the original candidate word will be aggregated into the relatively prioritized congruent candidate word after the update, as illustrated in Equation (17). Through this mechanism, we can effectively increase the selection frequency of high-probability candidate words while decreasing that of low-probability ones:
P r I w t + 1 c i = P r I s c i + P r I s c j P r I w t + 1 c i = 0 ,
where w t + 1 c i represents the mapped candidate word s c i , which must be less than half of the current group length g n 2 as specified by the CDEA. Additionally, s c i and s c j maintain a congruence relationship, and both are located within the same grouping.

5. Experiments

This section primarily elaborates on the evaluation metrics for generative linguistic steganography mechanisms, compares different generative linguistic steganography methods, and presents ablation experiments.

5.1. Evaluation Metrics

Common evaluation metrics for steganography include embedding capacity and concealment. For generative linguistic steganographic mechanisms, concealment has become the primary focus of research. To enable more fine-grained evaluation, concealment is further divided into perceptual imperceptibility and statistical imperceptibility [19].
Perceptual imperceptibility refers to the quality of generated text in terms of readability, making it difficult for third parties to recognize the existence of hidden information. To fairly evaluate the perceptual imperceptibility of steganographic texts, we employ commonly used text quality assessment metrics in Natural Language Processing (NLP), including diversity ( d i v ), perplexity ( p p l ), descriptiveness ( d e s ), and Kullback–Leibler ( K L ) divergence:
d i v = set w o r d s S ,
p p l = 2 1 n l o g P s ,
d e s = v 1 S , v = N , A , R ,
K L P t | Q g = P t x log P t x Q g x ,  
where · denotes the total number of words in the steganographic text, and set · ensures that the counted words are unique. Specifically, S represents the steganographic text; N , A , and R denote the counts of nouns, adjectives, and adverbs, respectively. P t represents the true data distribution obtained during training, while Q g represents the statistical distribution of the generated text.
The diversity assessment aims to check for the presence of repeated tokens within the sentences. Furthermore, perplexity is utilized to measure the complexity and interpretability of the text. Descriptiveness assesses the richness and expressiveness of the text, and finally, Kullback–Leibler divergence is employed to evaluate the distance between the two distributions. Collectively, these metrics constitute a framework for assessing the perceptual imperceptibility of steganographic text.
Statistical imperceptibility refers to the ability of the generated text to resist detection by steganalysis tools. A steganalyzer can essentially be viewed as a binary classifier that differentiates between natural and steganographic text. To comprehensively assess statistical imperceptibility, we use the following metrics: accuracy ( a c c ), precision ( p r e ), recall ( r e c ), and F 1 score:
a c c = T P + T N T P + T N + F P + F N ,
p r e = T P T P + F P ,
r e c = T P T P + F N ,
F 1 = 2 × p r e · r e c p r e + r e c
where T P , T N , F P , and F N represent the numbers of true positives, true negatives, false positives, and false negatives, respectively. These metrics provide a quantitative basis for evaluating the statistical imperceptibility of steganographic texts.

5.2. Comparative Analysis

Based on the evaluation metrics defined above, we next compared the proposed method with several representative generative linguistic steganography mechanisms to demonstrate its effectiveness in both perceptual and statistical imperceptibility.
To evaluate the effectiveness of our method, we compared it with four established baselines: RNN-stega [14], VAE-stega [19], Discop [36], and LLM-stega [35]. RNN-stega and VAE-stega both employ Huffman coding [46] for embedding but differ in their text generation models: RNN-stega is based on an RNN [47], whereas VAE-stega uses a VAE [8]. Discop [36], in contrast, applies a distribution-copy-based embedding algorithm and relies on XLNet [43,44] for text generation. Finally, LLM-stega [35] represents a modern LLM-based steganographic approach and adopts GPT as its underlying language model.
Our method has encoded character streams, while the other four approaches operate on bit streams. To ensure a fair comparison, we evaluated the perceptual and statistical imperceptibility of the steganographic texts under comparable bits-per-word ( b p w ) conditions, calculated as follows:
b p w = b i t s s e n s i t i v e i n f o r m a t i o n S ,
where b p w serves as the standard metric for embedding rate in generative linguistic steganography.
Parameter settings for each method at approximate b p w levels are provided in Table 3.
Unlike bitstream-based encoding mechanisms, which flexibly adjust the candidate pool size by selecting k-bit blocks (yielding 2 k candidates), our character-based encoding employs predefined triplets: a prefix, a suffix, or a confirmation bit. Consequently, the candidate pool size is constrained by the discrete nature of characters and the grouping strategy. For the last group, directly mapping to 66 characters would require 66 candidates. By introducing confirmation bits, however, the receiver can distinguish different mappings, thereby reducing the required candidate pool size to 33. This fixed pool size not only guarantees that smaller groups can also be encoded but also facilitates fast and unambiguous extraction of sensitive information on the receiver side.
Steganographic texts were generated for all the methods using the same prompts and sensitive information, producing 14,000 samples from movie reviews. To evaluate applicability across different contexts, we additionally generated 2000 samples each based on news articles and fairy tales.
All the experiments, including training, embedding, and evaluation, were conducted in the environment and are summarized in Table 4. This setup provides sufficient computational resources for efficient model training and text generation while enabling consistent evaluation of fluency, semantic consistency, and robustness against detection.
Furthermore, to assess scalability and computational efficiency, we evaluated runtime and memory usage during large-scale text generation, thereby analyzing trade-offs between text fluency and embedding capacity. The detailed computational overhead of each method, including generation time per steganographic, extraction time, GPU allocation/reservation, and CPU memory usage, is summarized in Table 5.
Moreover, for each piece of sensitive information, we conducted three rounds of generation using prompts of different lengths. Starting from an initial prompt, the second-round prompt is created by removing the last word of the initial prompt, and the third-round prompt is obtained by further removing the last word from the second-round prompt. This approach results in steganographic texts corresponding to three distinct prompt lengths. The number three was chosen to achieve a balance between steganographic effectiveness and computational efficiency. It is important to note that this multi-round generation strategy is unique to the proposed mechanism, and users can adjust this parameter according to their specific requirements.

5.2.1. Perceptual Imperceptibility

Perceptual imperceptibility is a key metric for evaluating the quality of steganographic text generation. It reflects the extent to which the generated text conforms to human writing habits while preserving logical coherence and semantic clarity. To objectively assess this property across four different mechanisms, we employed the BERT model [11] as an evaluation tool and considered multiple metrics, including perplexity, K L divergence, diversity, and descriptiveness. Among these, perplexity and K L divergence are particularly important: lower perplexity indicates greater fluency and naturalness, whereas lower K L divergence implies that the generated text is statistically closer to the training corpus. The comparison results are presented in Table 6.
The analysis of Table 6 shows that our method consistently achieves lower perplexity and K L divergence than Discop [36], RNN-stega [14], and VAE-stega [19] under comparable b p w conditions. Across all three domains (movie reviews, news, and fairy tales), it produces texts with substantially lower perplexity (5.54, 5.72, and 6.21) and K L divergence (2.33, 2.41, and 2.54) while maintaining b p w around 3.8–4.0. This improvement stems from our design, which biases the selection toward high-probability words while suppressing low-probability ones, thereby reducing prediction uncertainty and narrowing the gap from the training distribution. Regarding auxiliary metrics, VAE-stega [19] attains slightly higher diversity (up to 0.78) and descriptiveness (up to 0.38) compared with our method (div: 0.54–0.58; des: 0.32–0.35). These gains, however, are marginal, and our approach remains competitive in these aspects while achieving a more favorable trade-off between imperceptibility and embedding capacity.

5.2.2. Statistical Imperceptibility

To evaluate the performance of the proposed method across different steganalysis detectors, we have selected six mainstream deep learning-based text steganalyzers, including CNN [37], TS-RNN [38], FCN [39], RBILSTM [40], HiduNet [41], and GSDF [42]. Comparative experiments were conducted on four evaluation metrics: accuracy, precision, recall, and F 1 score against Discop (Discop-3.7) [36], VAE-stega (VAE-3.4 and VAE-4.0) [19], RNN-stega (RNN-3.5 and RNN-4.4) [14], and LLM-stega (LLM-3.9) [35]. According to the evaluation criteria, lower values of these metrics indicate stronger statistical imperceptibility, as the generated steganographic texts are harder to detect. The results are illustrated in Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8.
The steganographic texts generated by our method across three domains (movie reviews, news articles, and fairy tales) consistently achieve significantly lower metric values than those of VAE-stega [19] and RNN-stega [14], especially in the “News” and “Movie” scenarios, demonstrating superior statistical imperceptibility. This advantage is attributed to our CDEA strategy and multi-round generation mechanism, which aligns the statistical features of generated texts with those of natural language, thereby avoiding distributional shifts that CNN-based steganalyzers [37] could exploit.
Under the TS-RNN steganalyzer [38], our method outperforms others in accuracy, recall, and F 1 . Notably, in the “News” scenario, it achieves the lowest accuracy and F 1 , indicating statistical distributions more similar to real texts and thus higher statistical imperceptibility. This stems from its ability to preserve consistent language distributions and semantic coherence during embedding, effectively reducing feature leakage.
For the FCN steganalyzer [39], our method shows clear advantages in recall and F 1 , particularly in the “Movie” and “Fairy” scenarios. For example, in the “Movie” scenario, the accuracy is 65.60%, substantially lower than Discop’s 85.59%, reflecting stronger statistical imperceptibility. Notably, for certain samples, accuracy and precision are slightly higher than those of some competing methods. This can be attributed to FCN’s reliance on fixed-size convolutional filters for local pattern detection, whereas our multi-round generation method primarily has optimized global statistical features. As a result, minor deviations in local n -gram patterns may occur, slightly affecting FCN’s accuracy and precision. Nevertheless, overall statistical imperceptibility remains high, as indicated by consistently lower F 1 scores compared with competing methods, demonstrating the effectiveness of our embedding scheme in closely matching the semantic and statistical characteristics of real texts.
With the RBILSTM steganalyzer [40], our method consistently achieves lower metrics than the VAE-stega [19] and RNN-stega [14] series across all domains. In particular, the “News” and “Fairy” scenarios yield the lowest accuracy, precision, and F 1 , confirming strong statistical imperceptibility. This results from the method’s ability to retain natural semantic structures and distributions, mitigating anomalies introduced by embedding and improving robustness.
Similarly, under the HiduNet steganalyzer [41], our method demonstrates clear advantages, achieving the lowest accuracy, precision, and F 1 scores in the "New" scenario, with strong performance also in the "Fairy" scenario. These results further validate the method’s strong generalization capability and stability across diverse text types and detectors.
For the GSDF steganalyzer [42], our method shows clear advantages in recall and F 1 across the “Movie,” “News,” and “Fairy” scenarios. For example, in the “Movie” scenario, accuracy is 72.33%, substantially lower than VAE-3.4 (87.17%) and RNN-3.5 (86.83%), reflecting stronger statistical imperceptibility. Slightly higher accuracy or precision in some samples may result from GSDF’s reliance on graph-based semantic dependency features, while our method is focused on global statistical distributions. Overall, lower F 1 scores (Movie: 73.82; News: 74.66; Fairy: 73.87) have demonstrated that our embedding scheme effectively preserves the semantic and statistical characteristics of natural text.
In summary, our proposed method not only generates high-quality steganographic texts but also significantly enhances their resistance against statistical imperceptibility against mainstream steganalyzers, demonstrating strong effectiveness in steganographic security and practical applicability.

5.3. Ablation

Our goal is to enhance the quality of generated steganographic texts through a replacement embedding algorithm. To validate the feasibility and effectiveness of the proposed method, we conducted ablation experiments in which only the embedding algorithm was varied, while the text generation model, communication history data, and sensitive information remained unchanged, enabling a clear evaluation of the embedding algorithm’s impact.
For comparison, we selected several representative embedding algorithms, including the perfect binary tree (PBT) algorithm from RNN-stega [14]; the Huffman coding-based (HC) algorithm from VAE-stega [19], which is grounded in Huffman coding theory [47]; the arithmetic coding-based (AC) algorithm from NMT [25], which relies on arithmetic coding principles [38]; and the distribution-copy-based algorithm from Discop [36]. In all the experiments, XLNet [43,44] was employed as the text generation model to ensure consistency across comparisons.
The computational complexity of these embedding strategies, in terms of both time and space, is summarized in Table 7. Our method achieves linear time and space complexity, demonstrating efficiency compared with other approaches.

5.3.1. Perceptual Imperceptibility in Ablation

The results presented in Table 8 indicate that under identical text generation model configurations, the steganographic texts produced by the proposed method achieve superior scores in perplexity and K L divergence, reflecting higher fluency and naturalness. Texts generated using the PBT [14] algorithm demonstrate enhanced diversity. Although the steganographic texts generated by HC [46] and AC [48] exhibite relatively strong descriptiveness, qualitative analysis reveals notable semantic inconsistencies, particularly in AC [48] outputs, where repeated words or punctuation often appear. Representative samples of the five embedding algorithms are provided in Table A1 (see Appendix A).
The superior performance of CDEA can be attributed to its strategy of preferentially selecting higher-probability candidates from a determined candidate pool, which maintains semantic coherence and naturalness. In contrast, HC [46] and AC [48] prioritize mapping the longest possible bitstreams, often compromising linguistic fluency. Similarly, the distribution-copy-based method, Discop [36], relies more on random selection during candidate word selection, resulting in lower performance compared with CDEA. These findings highlight the practical significance of CDEA for real-world applications, such as cybersecurity and covert communications, where high-quality, semantically consistent steganographic texts are crucial for undetectability and reliability.

5.3.2. Statistical Imperceptibility in Ablation

From Table 9, it is evident that the proposed method (Our-3.8) demonstrates strong statistical imperceptibility across several spatial and convolutional feature-based detectors. Specifically, under FCN, Our-3.8 achieved a low accuracy of 65.60%, which is lower than most competing methods, indicating that embedding traces are effectively concealed from convolutional feature extraction. Under CNN, Our-3.8 attained 92.35% accuracy, lower than HC-4.3, AC-4.3, and PBT-3.9 but higher than Discop-3.7. For TS-RNN, Our-3.8 obtained 75.82% accuracy, slightly higher than Discop-3.7 but substantially lower than HC-4.3, AC-4.3, and PBT-3.9. These results highlight the method’s advantage in concealing spatial and convolutional statistical features. At the same time, the slightly higher detection rate under TS-RNN suggests that sequential dependencies are less thoroughly suppressed, representing a potential area for improvement.
For temporal and multi-dimensional detectors, Our-3.8 generally maintains competitive imperceptibility. Under RBILSTM, it achieved 98.53% accuracy, slightly lower than HC-4.3, AC-4.3, and PBT-3.9, but higher than Discop-3.7. Under HiDuNet, Our-3.8 attained 90.62%, lower than most competing approaches, reflecting effective concealment of spatial/global features. For GSDF, it reached 72.33%, higher than Discop-3.7 but lower than HC-4.3, AC-4.3, and PBT-3.9, indicating a balanced trade-off between statistical imperceptibility and text quality. Overall, these results emphasize the method’s strong advantage against spatial and global feature-based detectors while highlighting that robustness against temporal or multi-dimensional correlations can be further enhanced.

5.4. Safety Proof

In this section, we demonstrate the security of the proposed CDEA. According to the power-law distribution and Equation (10), the length of each group can be expressed as g j = m j 1 . Since the statistical frequency of characters within each group is approximately equal, we assumed that the characters are uniformly distributed within each group, while the inter-group probabilities decay according to a power-law. Let the prior distribution of the ASCII character set be denoted by P c i .
The mutual information between each character and its corresponding triplet can be expressed as follows:
I c i ; Z i = I c i ; p c i , s c i , b s i ,
Applying the chain rule, we have obtained the following:
I c i ; p c i , s c i , b s i = I c i ; p c i + I c i ; s c i | p c i + I c i ; b s i | p c i , s c i ,
The number of characters in each group has grown according to a power-law, while the frequencies within each group remain nearly uniform. Thus, after grouping, the characters can be viewed as a uniformly probable set, which has maximized information entropy. Therefore, even with knowledge of the prefix p c i , one must still guess from g p c i equally likely characters:
H c i | p c i = j P p j log 2 g j ,
The original entropy is given by the following:
H c i = i P c i log 2 P c i ,
Since the growth rate of g j is significantly faster than the compression of P p j , we approximate the following:
I c i ; p c i =   H c i H c i | p c i 0 ,
Thus, the prefix does not leak character semantics.
The second term of the chain rule corresponds to the leakage of characters given the prefix. Because s c i is a linear encoding and the character frequencies within a group are approximately equal, with no semantic structure, from an attacker’s perspective, s c i is simply an index among uniformly probable characters. Once both p c i and s c i are known, the character is fully determined, leading to the following:
H c i | p c i , s c i = 0 ,
Following the derivation of mutual information, we obtain the following:
I c i ; s c i | p c i = log 2 g p c i ,
It is important to note that although characters can be uniquely identified, this is contingent on the attacker knowing the CDEA and assuming that the hidden information is plaintext, which is not publicly available. Therefore, while the suffix can identify the character, the attacker cannot reverse-engineer effective information, which is equivalent to the index number itself and contains no semantic information.
The confirmation bit operates only for low-frequency groups and serves as an additional layer of obfuscation. This design ensures that the same character may correspond to two candidate paths, while different characters may share overlapping paths. Moreover, b s i   does not independently determine the character; it only influences the path selection. Hence,
H c i | p c i , s c i , b s i = 0 ,
Consequently,
I c i ; b s i | p c i , s c i = 0 ,
Therefore, the mutual information between the CDEA’s encoding triplet and the character is approximately zero, which satisfies the conditions for semantic security.

6. Conclusions

To mitigate the frequent selection of high-probability candidate words and excessive occurrence of low-probability candidate words in generative steganographic mechanisms, we proposed a CDEA, inspired by power-law distributions and employing a grouped storage approach based on the general statistical properties of characters. By integrating CDEA with XLNet, we introduced a generative semantic steganographic mechanism. The experimental results demonstrated that the proposed mechanism generates steganographic texts with high imperceptibility, as well as good semantic coherence and logical fluency. Moreover, when tested against multiple steganalysis tools, it has shown stronger statistical imperceptibility than existing methods, although performance on some tools remained comparable to current approaches.
Future research will focus on further enhancing the statistical imperceptibility of steganographic texts and exploring the integration of CDEA with GPT-5 and other large language models to improve embedding capacity and concealment. Additionally, we have a plan to investigate applications in cross-lingual and multi-modal steganography. Its deployment in practical IoT and blockchain communication scenarios will also be explored. While intended for secure and privacy-preserving communication, CDEA could potentially be misused for malicious purposes, so awareness of such dual-use risks is important too.

Author Contributions

Conceptualization, Y.C.; methodology, Y.C., A.B., and X.W.; software, Y.C., A.B., X.W., and H.L.; validation, X.W. and Q.C.; formal analysis, Y.C., Q.L., A.B., L.Z., and Y.X.; investigation, Y.C. and Q.C.; writing—original draft preparation, Y.C., Q.L., A.B., X.W., H.L., Q.C., L.Z., and Y.X.; writing—review and editing, Y.C., Q.L., A.B., X.W., H.L., Q.C., L.Z., and Y.X.; visualization, Q.L. and A.B.; supervision, Q.L. and A.B.; project administration, Q.L.; funding acquisition, Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is funded by the 2024 Jiangsu Province Frontier Technology R&D Project “Research on Cross-Domain Multi-Dimensional Security Technology of Intelligent Systems for AI Computing Networks” (No: BF2024071).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

Author Le Zhu was employed by the company Zhongke Yungang Technology Co., Ltd.; and Yan Xiao was employed by the company Beijing Lianan Hengda Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CDEACharacter-based Diffusion Embedding Algorithm
cbConfirmation Bit
AIArtificial Intelligence
NLPNatural Language Processing
VAEVariational AutoEncoders
GANGenerative Adversarial Networks
CSCharacter Sequence
CHDCommunication History Data
CPCandidate Pool
MTMapping Table
d i v Diversity
p p l Perplexity
g s Grammaticality Score
d e s Descriptiveness
K L Kullback Leibler   ( K L ) Divergence
PBTPerfect Binary Trees
HCHuffman Coding
ACArithmetic Coding

Appendix A

Table A1. Representative steganographic text samples generated by five different embedding algorithms under identical text generation model settings.
Table A1. Representative steganographic text samples generated by five different embedding algorithms under identical text generation model settings.
MethodSample
CEDA“I was in a position of power where there had been a tremendous number of people who had no experience with electricity,” recalled the director, the director of photography. “It had to be a man with a strong desire and knowledge of how electricity works to get the best of people on the job, to make sure that they did the job. The only thing you can do is to keep it going.”
Discop [36]You have that others do well and do not even take in the administrative feet; for instance, one holds startled and skeptical-witnessed various-the-themes, as auditorium manager Freddie Lasker considers one ‘classic example of how city officials react in a dramatic fashion,’ like many World War I-era locations where Columbus—with its mix of German, Austrian, and Scottish immigrants, alongside non-German foreigners and settled Irish—saw minimal Jewish recruitment despite New York’s 150-year Jewish presence, while ethnic German influences led to Italian surnames and epithets reflecting local culture
HC [46]documentary titled Secret Story-Game-Second Strip—although comedy does make claims—titled Second Strip—feature fictional TV featuring David Campbell originally titled Episode—Larry Liar starring David Roberts whom Howard originally titled Al Liar starring Robert P. Smith creator Howard Terry Carell himself previously portrayed Joe Porky starring Sam Roberts despite having written prior titles fictional version himself playing Tom Liar himself earlier aired TV plays titled His Son—You Don Will Live—playing himself starring Robert
AC [48]non structured loosely structured sequence according similar terms specified today among comedy films popular genre written towards Western genres performed similarly titled respectively adaptations films published towards western genres written toward Western genres written toward western genres those previously popular throughout Spaghetti genres written towards Western genres produced among genres whose novels feature darker genres involved non melodrama plot genre audiences executed directly towards Western genres involving genres important characters written specifically directly toward Western genres? Like Rosemary
PBT [14]increasingly stringent controls regulating events caused throughout nature itself—see role changes section!—leading me somewhat less pleased today how you’re using action again without killing humans despite controlling incidents throughout Earth mythology according controls within itself without resorting strictly using rules enforced across mankind itself—e.g., murder outside themselves even less popular attacks within Europe within India according controls—giving power solely control across humankind overall according how regulations regulate certain types—particularly human behavior using methods similar compared

References

  1. Bhattacharjya, A.; Zhong, X.; Wang, J. Strong, efficient and reliable personal messaging peer to peer architecture based on hybrid RSA. In Proceedings of the International Conference on Internet of Things and Cloud Computing (ICC 2016), The Møller Centre-Churchill College, Cambridge, UK, 22–23 March 2016; ISBN 978-1-4503-4063-2/16/03. [Google Scholar]
  2. Kumar, J.R.H.; Bhargavramu, N.; Durga, L.S.N.; Nimmagadda, D.; Bhattacharjya, A. Blockchain Based Traceability in Computer Peripherals in Universities Scenarios. In Proceedings of the 2023 3rd International Conference on Electronic and Electrical Engineering and Intelligent System (ICE3IS), Yogyakarta, Indonesia, 9–10 August 2023. [Google Scholar]
  3. Cachin, C. An Information-Theoretic Model for Steganography; International Workshop on Information Hiding; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
  4. Wu, N.; Shang, P.; Fan, J.; Yang, Z.; Ma, W.; Liu, Z. Research on coverless text steganography based on single bit rules. J. Phys. Conf. Ser. 2019, 1237, 022077. [Google Scholar] [CrossRef]
  5. Luo, Y.; Huang, Y.; Li, F.; Chang, C. Text steganography based on ci-poetry generation using Markov chain model. KSII Trans. Internet Inf. Syst. (TIIS) 2016, 10, 4568–4584. [Google Scholar]
  6. Luo, Y.; Huang, Y. Text steganography with high embedding rate: Using recurrent neural networks to generate chinese classic poetry. In Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security, Philadelphia, PA, USA, 20–21 June 2017. [Google Scholar]
  7. Tong, Y.; Liu, Y.; Wang, J.; Xin, G. Text steganography on RNN-generated lyrics. Math. Biosci. Eng. 2019, 16, 5451–5463. [Google Scholar] [CrossRef]
  8. Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
  9. Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative adversarial networks: An overview. IEEE Signal Process. Mag. 2018, 35, 53–65. [Google Scholar] [CrossRef]
  10. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems. Curran Associates Inc.: Red Hook, NY, USA, 2017; Volume 30, pp. 6000–6010. [Google Scholar]
  11. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
  12. Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018, Volume 3. Available online: https://www.mikecaptain.com/resources/pdf/GPT-1.pdf (accessed on 4 January 2025).
  13. Ziegler, Z.M.; Deng, Y.; Rush, A.M. Neural linguistic steganography. arXiv 2019, arXiv:1909.01496. [Google Scholar] [CrossRef]
  14. Yang, Z.-L.; Guo, X.-Q.; Chen, Z.-M.; Huang, Y.-F.; Zhang, Y.-J. RNN-stega: Linguistic steganography based on recurrent neural networks. IEEE Trans. Inf. Forensics Secur. 2018, 14, 1280–1295. [Google Scholar] [CrossRef]
  15. Dai, F.Z.; Cai, Z. Towards near-imperceptible steganographic text. arXiv 2019, arXiv:1907.06679. [Google Scholar] [CrossRef]
  16. Fang, T.; Jaggi, M.; Argyraki, K. Generating steganographic text with LSTMs. arXiv 2017, arXiv:1705.10742. [Google Scholar] [CrossRef]
  17. Xiang, L.; Yang, S.; Liu, Y.; Li, Q.; Zhu, C. Novel linguistic steganography based on character-level text generation. Mathematics 2020, 8, 1558. [Google Scholar] [CrossRef]
  18. Adeeb, O.F.A.; Kabudian, S.J. Arabic text steganography based on deep learning methods. IEEE Access 2022, 10, 94403–94416. [Google Scholar] [CrossRef]
  19. Yang, Z.-L.; Zhang, S.-Y.; Hu, Y.-T.; Hu, Z.-W.; Huang, Y.-F. VAE-Stega: Linguistic steganography based on variational auto-encoder. IEEE Trans. Inf. Forensics Secur. 2020, 16, 880–895. [Google Scholar] [CrossRef]
  20. Zhou, X.; Peng, W.; Yang, B.; Wen, J.; Xue, Y.; Zhong, P. Linguistic steganography based on adaptive probability distribution. IEEE Trans. Dependable Secur. Comput. 2021, 19, 2982–2997. [Google Scholar] [CrossRef]
  21. Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Commun. ACM 2014, 63, 139–144. [Google Scholar] [CrossRef]
  22. Yi, B.; Wu, H.; Feng, G.; Zhang, X. ALiSa: Acrostic linguistic steganography based on BERT and Gibbs sampling. IEEE Signal Process. Lett. 2022, 29, 687–691. [Google Scholar] [CrossRef]
  23. Cao, Y.; Zhou, Z.; Chakraborty, C.; Wang, M.; Wu, Q.M.J.; Sun, X.; Yu, K. Generative steganography based on long readable text generation. IEEE Trans. Comput. Soc. Syst. 2022, 11, 4584–4594. [Google Scholar] [CrossRef]
  24. Yan, R.; Yang, Y.; Song, T. A secure and disambiguating approach for generative linguistic steganography. IEEE Signal Process. Lett. 2023, 30, 1047–1051. [Google Scholar] [CrossRef]
  25. Ding, C.; Fu, Z.; Yang, Z.; Yu, Q.; Li, D.; Huang, Y. Context-aware Linguistic Steganography Model Based on Neural Machine Translation. IEEE/ACM Trans. Audio Speech Lang. Process. 2023, 32, 868–878. [Google Scholar] [CrossRef]
  26. Rajba, P.; Keller, J.; Mazurczyk, W. Proof-of-work based new encoding scheme for information hiding purposes. In Proceedings of the 18th International Conference on Availability, Reliability and Security, Benevento, Italy, 29 August–1 September 2023. [Google Scholar]
  27. Yu, L.; Lu, Y.; Yan, X.; Wang, X. Generative Text Steganography via Multiple Social Network Channels Based on Transformers. In Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing, Guilin, China, 24–25 September 2022; Springer International Publishing: Cham, Switzerland, 2022. [Google Scholar]
  28. Huang, C.; Yang, Z.; Hu, Z.; Yang, J.; Qi, H.; Zhang, J.; Zheng, L. DNA Synthetic Steganography Based on Conditional Probability Adaptive Coding. IEEE Trans. Inf. Forensics Secur. 2023, 18, 4747–4759. [Google Scholar] [CrossRef]
  29. Li, Y.; Zhang, R.; Liu, J.; Lei, Q. A Semantic Controllable Long Text Steganography Framework Based on LLM Prompt Engineering and Knowledge Graph. IEEE Signal Process. Lett. 2024, 31, 2610–2614. [Google Scholar] [CrossRef]
  30. Wu, J.; Wu, Z.; Xue, Y.; Wen, J.; Peng, W. Generative text steganography with large language model. In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia, 28 October–1 November 2024. [Google Scholar]
  31. Lin, K.; Luo, Y.; Zhang, Z.; Ping, L. Zero-shot Generative Linguistic Steganography. arXiv 2024, arXiv:2403.10856. [Google Scholar] [CrossRef]
  32. Sun, B.; Li, Y.; Zhang, J.; Xu, H.; Ma, X.; Xia, P. Topic Controlled Steganography via Graph-to-Text Generation. CMES-Computer Model. Eng. Sci. 2023, 136, 157–176. [Google Scholar] [CrossRef]
  33. Pang, K. FreStega: A Plug-and-Play Method for Boosting Imperceptibility and Capacity in Generative Linguistic Steganography for Real-World Scenarios. arXiv 2024, arXiv:2412.19652. [Google Scholar] [CrossRef]
  34. Huang, Y.-S.; Just, P.; Narayanan, K.; Tian, C. OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized Distributions. arXiv 2024, arXiv:2410.04328. [Google Scholar] [CrossRef]
  35. Bai, M.; Yang, J.; Pang, K.; Huang, Y.; Gao, Y. Semantic Steganography: A Framework for Robust and High-Capacity Information Hiding using Large Language Models. arXiv 2024, arXiv:2412.11043. [Google Scholar] [CrossRef]
  36. Ding, J.; Chen, K.; Wang, Y.; Zhao, N.; Zhang, W.; Yu, N. Discop: Provably Secure Steganography in Practice Based on Distribution Copies. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 21–25 May 2023; pp. 2238–2255. [Google Scholar] [CrossRef]
  37. Wen, J.; Zhou, X.; Zhong, P.; Xue, Y. Convolutional neural network based text steganalysis. IEEE Signal Process. Lett. 2019, 26, 460–464. [Google Scholar] [CrossRef]
  38. Yang, Z.; Wang, K.; Li, J.; Huang, Y.; Zhang, Y.-J. TS-RNN: Text steganalysis based on recurrent neural networks. IEEE Signal Process. Lett. 2019, 26, 1743–1747. [Google Scholar] [CrossRef]
  39. Yang, Z.; Huang, Y.; Zhang, Y.-J. A fast and efficient text steganalysis method. IEEE Signal Process. Lett. 2019, 26, 627–631. [Google Scholar] [CrossRef]
  40. Niu, Y.; Wen, J.; Zhong, P.; Xue, Y. A hybrid R-BILSTM-C neural network based text steganalysis. IEEE Signal Process. Lett. 2019, 26, 1907–1911. [Google Scholar] [CrossRef]
  41. Peng, W.; Li, S.; Qian, Z.; Zhang, X. Text steganalysis based on hierarchical supervised learning and dual attention mechanism. IEEE/ACM Trans. Audio Speech Lang. Process. 2023, 31, 3513–3526. [Google Scholar] [CrossRef]
  42. Huang, K.; Zhang, Z.; Wei, Y.; Zhang, T.; Yang, Z.; Zhou, L. GSDFuse: Capturing Cognitive Inconsistencies from Multi-Dimensional Weak Signals in Social Media Steganalysis. arXiv 2025, arXiv:2505.17085. [Google Scholar]
  43. Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
  44. Dai, Z.; Yang, Z.; Yang, Y.; Carbonell, J.; Le, Q.; Salakhutdinov, R. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv 2019, arXiv:1901.02860. [Google Scholar]
  45. Wikipedia Contributors. Letter Frequency. Wikipedia, The Free Encyclopedia. Available online: https://en.wikipedia.org/wiki/Letter_frequency (accessed on 22 August 2025).
  46. Huffman, D.A. A method for the construction of minimum-redundancy codes. Proc. IRE 1952, 40, 1098–1101. [Google Scholar] [CrossRef]
  47. Graves, A. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 37–45. [Google Scholar]
  48. Langdon, G.; Rissanen, J. A simple general binary source code (Corresp.). IEEE Trans. Inf. Theory 1982, 28, 800–803. [Google Scholar] [CrossRef]
Figure 1. CEDA process overview.
Figure 1. CEDA process overview.
Applsci 15 09663 g001
Figure 2. Linguistic steganography workflow based on CDEA and XLNet: First, the user runs CDEA to obtain the mapping table between ASCII codes and triplets. Sensitive information is then encoded into prefixes, suffixes, and confirmation bits according to this mapping table. The communication history is fed into the XLNet model to predict the candidate pool for the next position. For each position, a value from the sensitive information triplet is used as an index to select a word w i from the candidate pool, which is appended to the communication history. This process continues until all prefixes, suffixes, and confirmation bits are embedded.
Figure 2. Linguistic steganography workflow based on CDEA and XLNet: First, the user runs CDEA to obtain the mapping table between ASCII codes and triplets. Sensitive information is then encoded into prefixes, suffixes, and confirmation bits according to this mapping table. The communication history is fed into the XLNet model to predict the candidate pool for the next position. For each position, a value from the sensitive information triplet is used as an index to select a word w i from the candidate pool, which is appended to the communication history. This process continues until all prefixes, suffixes, and confirmation bits are embedded.
Applsci 15 09663 g002
Figure 3. The performance of steganographic text on the CNN [37] steganalysis model.
Figure 3. The performance of steganographic text on the CNN [37] steganalysis model.
Applsci 15 09663 g003
Figure 4. The performance of steganographic text on the TS-RNN [38] steganalysis model.
Figure 4. The performance of steganographic text on the TS-RNN [38] steganalysis model.
Applsci 15 09663 g004
Figure 5. The performance of steganographic text on the FCN [39] steganalysis model.
Figure 5. The performance of steganographic text on the FCN [39] steganalysis model.
Applsci 15 09663 g005
Figure 6. The performance of steganographic text on the RBILSTM [40] steganalysis model.
Figure 6. The performance of steganographic text on the RBILSTM [40] steganalysis model.
Applsci 15 09663 g006
Figure 7. The performance of steganographic text on the HiduNet [41] steganalysis model.
Figure 7. The performance of steganographic text on the HiduNet [41] steganalysis model.
Applsci 15 09663 g007
Figure 8. The performance of steganographic text on the GSDF [42] steganalysis model.
Figure 8. The performance of steganographic text on the GSDF [42] steganalysis model.
Applsci 15 09663 g008
Table 1. Sorted frequency distribution of ASCII characters based on Wikipedia articles [45].
Table 1. Sorted frequency distribution of ASCII characters based on Wikipedia articles [45].
Character Order
12345678
Spaceetaoins
910111213141516
hrdlcumw
1718192021222324
fgypbvkj
2526272829303132
xqz01234
3334353637383940
56789.,
4142434445464748
!?;:()[
4950515253545556
]{}@#$%^
5758596061626364
&*_-+=~
6566676869707172
/\|< > ETA
7374757677787980
OINSHRLD
8182838485868788
CUMWFGPY
8990919293949596
BKVJXQZNUL
979899100101102103104
SOHSTXETXEOTENQACKBELBS
105106107108109110111112
HTLFVTFFCRSOSIDLE
113114115116117118119120
DC1DC2DC3DC4NAKSYNETBCAN
121122123124125126127128
EMSUBESCFSGSRSUSDEL
Table 2. Group details.
Table 2. Group details.
Group No.012345
Elements within a group{1–2}{3–6}{7–14}{15–30}{31–62}{63–128}
Table 3. Parameter settings.
Table 3. Parameter settings.
MethodCP Build StrategiesSize of CP b p w
Our top - k   + top - p 333.8 ± 0.2
Discop [36] top - p 32 3.7 ± 0.3
RNN-stega [14] top - k 16 3.2 ± 0.3
32 4.0 ± 0.3
VAE-stega [19] top - k 16 3.3 ± 0.3
32 3.8 ± 0.3
LLM-stega [35] top - k 32 3.9 ± 0.3
Table 4. Experimental environment.
Table 4. Experimental environment.
CategoryConfiguration/VersionCategoryConfiguration/Version
GPUNVIDIA A40, 46 GBOperating systemUbuntu 22.04
CUDA12.2PyTorch2.6.0+cu118
CPUIntel XeonTransformers4.42.4
RAM128 GBpandas2.2.2
Table 5. Computational cost comparison.
Table 5. Computational cost comparison.
MethodGeneration TimeExtraction TimeGPU (Allocation/Reservation)CPU
Our2.95 (s)2.00 (s)1406.1 MB/1452.0 MB814.2 MB
Discop [36]3.54 (s)2.77 (s)1397.9 MB/1506.0 MB2241.3 MB
RNN-stega [14]3.07 (s)2.54 (s)155.8 MB/250.0 MB532.7 MB
VAE-stega [19]2.99 (s)2.35 (s)293.8 MB/300.0 MB473.9 MB
LLM-stega [35]2.82 (s)4.18 (s)634.3 MB/734.0 MB2404.5 MB
Table 6. Comparison table of perceptual imperceptibility by different steganographic mechanisms.
Table 6. Comparison table of perceptual imperceptibility by different steganographic mechanisms.
Method b p w p p l K L d i v d e s
Our-movie review3.8 ± 0.25.542.330.540.33
Discop [36] 3.7 ± 0.3 7.202.780.570.38
RNN-stega [14] 3.2 ± 0.3 33.368.200.450.29
RNN-stega [14] 4.0 ± 0.3 52.548.820.560.29
VAE-stega [19] 3.3 ± 0.3 22.674.170.750.38
VAE-stega [19] 3.8 ± 0.3 30.564.630.780.37
Our-news3.9 ± 0.25.722.410.580.35
Our-fairy tales 4.0 ± 0.2 6.212.540.560.32
LLM-stega [35] 3.9 ± 0.3 8.323.570.530.28
Table 7. Time and space complexity comparison of different embedding algorithms.
Table 7. Time and space complexity comparison of different embedding algorithms.
MethodTime ComplexitySpace Complexity
Our O n O n
Discop [36] O T · n O T + n
HC [47] O n / l o g n O n
AC [48] O N / l o g n O n
PBT [14] O n O n
Table 8. Comparison table of perceptual imperceptibility by different embedding algorithms.
Table 8. Comparison table of perceptual imperceptibility by different embedding algorithms.
Methodbpw p p l K L d i v d e s
Our-3.83.85.542.330.540.33
Discop-3.7 [36]3.77.202.780.570.38
HC-4.3 [46]4.322.927.190.50.7
AC-4.3 [48]4.320.406.500.470.7
PBT-3.9 [14]3.927.317.780.660.58
Table 9. Comparison table of statistical imperceptibility by different embedding algorithms.
Table 9. Comparison table of statistical imperceptibility by different embedding algorithms.
MethodCNN [37]TS-RNN [38]FCN [39]
Metrics (%) a c c p r e r e c F 1 a c c p r e r e c F 1 a c c p r e r e c F 1
Our-3.892.3589.7495.5592.3675.8279.2766.6571.6365.6059.8797.6973.76
Discop-3.7 [40]85.7283.3789.9185.9771.2373.1267.9968.4385.5977.8199.0087.02
HC-4.3 [47]99.8099.7999.7999.7995.6793.5197.9395.6068.3161.5898.7675.52
AC-4.3 [48]99.8099.8699.7299.7995.1293.4696.7494.9971.4172.8883.6673.02
PBT-3.9 [14]99.7899.9399.6199.7794.4591.1798.1894.4669.8462.7599.9776.75
MethodRBILSTM [40]HiDuNet [41]GSDF [42]
Metrics (%) a c c p r e r e c F 1 a c c p r e r e c F 1 a c c p r e r e c F 1
Our-3.898.5398.0198.9598.4790.6288.0494.3390.6072.3370.0678.0073.82
Discop-3.7 [40]92.4889.9095.9792.4692.0088.6896.5591.9752.6751.7498.0367.73
HC-4.3 [47]99.8299.7299.8999.8196.0094.3198.0295.9987.1789.2084.7786.93
AC-4.3 [48]99.7399.7999.6599.7295.0093.3697.0494.9988.3388.5789.1488.85
PBT-3.9 [14]99.8399.7999.8699.8296.6294.7998.7696.6291.7890.1295.0992.54
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Li, Q.; Bhattacharjya, A.; Wu, X.; Li, H.; Chang, Q.; Zhu, L.; Xiao, Y. Enhancement of the Generation Quality of Generative Linguistic Steganographic Texts by a Character-Based Diffusion Embedding Algorithm (CDEA). Appl. Sci. 2025, 15, 9663. https://doi.org/10.3390/app15179663

AMA Style

Chen Y, Li Q, Bhattacharjya A, Wu X, Li H, Chang Q, Zhu L, Xiao Y. Enhancement of the Generation Quality of Generative Linguistic Steganographic Texts by a Character-Based Diffusion Embedding Algorithm (CDEA). Applied Sciences. 2025; 15(17):9663. https://doi.org/10.3390/app15179663

Chicago/Turabian Style

Chen, Yingquan, Qianmu Li, Aniruddha Bhattacharjya, Xiaocong Wu, Huifeng Li, Qing Chang, Le Zhu, and Yan Xiao. 2025. "Enhancement of the Generation Quality of Generative Linguistic Steganographic Texts by a Character-Based Diffusion Embedding Algorithm (CDEA)" Applied Sciences 15, no. 17: 9663. https://doi.org/10.3390/app15179663

APA Style

Chen, Y., Li, Q., Bhattacharjya, A., Wu, X., Li, H., Chang, Q., Zhu, L., & Xiao, Y. (2025). Enhancement of the Generation Quality of Generative Linguistic Steganographic Texts by a Character-Based Diffusion Embedding Algorithm (CDEA). Applied Sciences, 15(17), 9663. https://doi.org/10.3390/app15179663

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop