Robust Image Steganography in Online Social Networks via Neural Style Transfer

Luo, Peng; Liu, Jia; Dang, Qian; Mu, Dejun

doi:10.3390/math14040629

Open AccessArticle

Robust Image Steganography in Online Social Networks via Neural Style Transfer

by

Peng Luo

^1,2,*

,

Jia Liu

²

,

Qian Dang

² and

Dejun Mu

¹

School of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, China

²

School of Cryptography Engineering, Engineering University of PAP, Xi’an 710086, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(4), 629; https://doi.org/10.3390/math14040629

Submission received: 5 December 2025 / Revised: 11 January 2026 / Accepted: 20 January 2026 / Published: 11 February 2026

(This article belongs to the Special Issue Information Security and Image Processing)

Download

Browse Figures

Versions Notes

Abstract

Existing style-transfer steganography schemes suffer from three critical limitations: insufficient robustness against online social network (OSN) processing pipelines, susceptibility to steganalytic detection, and degraded visual quality. To address these challenges holistically, we propose StegTransfer—a unified framework that integrates: (1) forward non-differentiable distortion simulation, which emulates realistic OSN operations to enhance robustness; (2) adversarially hardened embedding through joint training with steganalyzers to improve security; and (3) payload-preserving style enhancement that optimizes visual aesthetics without sacrificing embedding capacity. Experimental evaluations demonstrate that StegTransfer achieves superior performance in visual fidelity (NIMA score: 6.32), robustness (PSNR up to 30.2 dB under JPEG compression), and security (detection rates as low as 15.5% and 62.3% under StegExpose and SiaStegNet, respectively.

Keywords:

style-transfer steganography; robustness; adversarial training; steganalysis; visual quality; distortion simulation

MSC:

68T07

1. Introduction

Digital steganography, due to its unique advantages in covert communication and privacy protection, is increasingly emerging as a vital supplement to cryptography and has attracted significant research attention in recent years [1]. Deep learning has profoundly transformed the field of computer vision, enabling breakthroughs in tasks ranging from image classification and object detection to sophisticated image generation and manipulation. Among these advancements, Neural Style Transfer (NST) [2,3,4] has emerged as a particularly impactful and popular application, allowing for the creation of artistic images by merging the content of one image with the stylistic elements of another. The widespread sharing of such style-transferred images across social media platforms has made them a novel and prevalent form of digital media, presenting a unique opportunity for covert communication.

Traditional image steganography techniques typically modify selected pixel values in a cover image using predefined distortion functions governed by handcrafted heuristic rules. Early non-adaptive methods, such as LSB (Least Significant Bit) replacement and its variants, directly alter pixel bits to embed secrets [5,6]. However, these approaches introduce detectable artifacts, including value-pair effects and statistical inconsistencies, making them vulnerable to basic steganalysis like chi-square tests. To enhance security, adaptive steganography emerged, assigning varying embedding costs to pixels based on regional complexity. Algorithms like STC [7] optimized distortion minimization and detection resistance by embedding secrets in noisy regions. Despite these advancements, traditional methods still rely on manual distortion rules and heuristic feature engineering to guide modifications. This dependence on human-crafted strategies limits their ability to fully conceal subtle embedding traces, leaving them susceptible to modern deep learning-based steganalysis tools that exploit imperceptible statistical patterns.

Deep steganography leverages neural networks to embed secret data into cover media while preserving perceptual imperceptibility. Using an encoder–decoder framework, the encoder merges secret data with a cover to produce a stego, while the decoder extracts hidden information [8]. Adversarial training optimizes concealment and reconstruction accuracy, enabling adaptive embedding tailored to cover characteristics [9]. Compared to traditional steganography, deep learning-based steganography has significantly improved in terms of hiding capacity and anti-detection capabilities. However, a critical issue remains unresolved: the masking effect of the steganographic act itself. To conduct covert communication, senders frequently transmit steganographic images to receivers. This pattern of behavior makes them vulnerable to detection by steganalysts, leading to communication failure.

Consequently, researchers have turned their attention to style transfer technology for implementing covert communication. Since social networks are flooded with stylized images, hiding messages within them and publishing these images on social media platforms allows receivers to download and extract the messages. This approach significantly enhances the concealment of steganographic communication. Wang et al. introduced STNet [10], embedding secrets in style features via AdaIN and VGG for artistic steganography. Li et al. [11] presented a GAN method inserting data post-first convolution for 1bpp capacity. Shi et al. [12] developed a scheme with VGG19-based networks to embed secrets during style transfer. Zheng et al. [13] proposed inserting secrets at the decoder’s upsampling layer with atrous convolution by optimizing latent space dimensions. However, these methods share critical limitations: (1) insufficient consideration of distortion risks in OSNs, which reduces message extraction accuracy; (2) low artistic quality in generated style images containing secrets, where repeated transmission of inferior stylized media raises suspicion; (3) failure to leverage existing deep learning-based steganalyzers for adversarial training, resulting in limited improvement in detection resistance.

Therefore, in order to solve the above problems of existing image steganography based on style transfer, this paper proposes robust steganography via style transfer in OSNs, called StegTransfer. The main contributions are summarized below:

This paper critically re-examines the practical limitations of style transfer-based steganography and proposes a novel algorithm to address these challenges.
The distortion patterns inherent in OSNs were systematically analyzed, and a forward noise stimulation module was developed to mitigate the insufficient simulation of non-differentiable distortions in training frameworks.
To enhance robustness against deep learning-based steganalyzers, a XuNet-based steganalysis discriminator was constructed and integrated into the unified framework for end-to-end adversarial training.
For the first time, a module capable of enhancing the aesthetic quality of steganographic images was introduced in style transfer-based steganography, and a quantitative evaluation was conducted during the experimental process.

The remainder of this paper is organized as follows: Section 2 reviews related work in deep steganography and style transfer-based data hiding. Section 3 details the architecture and training strategy of the proposed StegTransfer framework. Experimental setup, results, and comparative analyses are presented in Section 4. Finally, Section 5 concludes the paper and suggests future research directions.

2. Related Work

2.1. Image Steganography

Image steganography, rooted in Simmons’ “prisoner’s problem” model proposed in 1983 [14], aims to conceal sensitive information in seemingly benign digital images for covert communication while evading detection by steganalyzers. Existing steganographic methods can be categorized into three main types based on their technical frameworks.

Cost-based steganography is a content-adaptive approach that assigns a cost and a modification probability to each cover element according to image content. The core objective is to minimize the total predicted cost of modified pixels during secret embedding. Representative cost-design methods include UNIWARD [15], WOW [16], and HILL [17], which are often combined with near-optimal coding schemes such as Syndrome-Trellis Codes (STCs) [7] and Steganographic Polar Codes (SPCs) [18] to enhance embedding performance. While this type of method achieves good imperceptibility by adapting to image content, its heuristic-based cost functions lack a direct mathematical connection to embedding detectability. Additionally, adversaries can leverage embedding modification rates as side information, using selection-channel-aware features or Convolutional Neural Networks (CNNs) to improve steganalysis accuracy.

Model-based steganography establishes mathematical models for carrier distributions, aiming to preserve the inherent distribution characteristics of carriers as much as possible during embedding. A typical example is MiPOD [19], which assumes that noise residuals in digital images follow a zero-mean Gaussian distribution with variance estimated individually for each cover element. This method theoretically resists advanced detectors constructed by adversaries but faces challenges in practical application due to variations in distribution models across multimedia data from different sensors and the impact of temporal and environmental factors on pixel distributions, making it difficult to develop a universally applicable model.

Coverless steganography differs from cost-based and model-based methods by eliminating the need for pre-prepared covers and modification of pixel values. Traditional coverless methods achieve secret transmission by selecting images matching the message, while recent advances driven by generative models have shifted toward embedding information during image generation or processing. Generative Adversarial Networks (GANs) have been widely adopted in this field; methods proposed by Hu et al. [20], Wang et al. [21], and others directly generate stego images from secret messages through unsupervised learning. However, these GAN-based approaches often suffer from unnatural visual quality of generated images due to the interference of secret information on image features, making them susceptible to suspicion.

Traditional steganographic methods, such as those based on Least Significant Bits (LSBs) replacement [22], are vulnerable to steganalysis using high-dimensional features. Although adaptive steganographic algorithms under the minimum distortion embedding framework improve security by embedding secrets in textured and noisy regions, they still face threats from deep learning-based steganalysis techniques [23,24].

2.2. Image Style Transfer

Image style transfer, which modifies image style while preserving content, has evolved into two primary research directions: Non-Photorealistic Rendering (NPR) and computer vision-based methods. NPR has become an important branch of computer graphics, but most NPR stylization algorithms are designed for specific artistic styles and lack extensibility to other styles.

Computer vision-based methods regard style transfer as a texture synthesis problem, focusing on extracting and transforming source textures into target textures. Early “image analogy” learning frameworks achieved universal style transfer by learning from unstructured and stereotypical image examples but only utilized low-level image patterns, failing to capture advanced structural features effectively. The breakthrough came with the introduction of CNNs: Gatys et al. [25] proposed an iterative optimization method based on the VGG19 network, which models image content as intermediate-layer features of pre-trained CNNs and artistic style as statistics of these features (e.g., Gram matrix derived from feature maps with specified channel, height, and width dimensions). This method produces high-quality stylized results for arbitrary inputs but requires forward and backpropagation through the VGG19 network in each optimization step, leading to low efficiency.

Subsequent advancements have addressed efficiency and flexibility issues. Johnson et al. [26] proposed a real-time style transfer method using perceptual losses. With the development of CNN-based style transfer networks, practical applications such as Prisma and Deep Forger have gained popularity, making artistic style images prevalent on social media. This prevalence has laid the foundation for steganography using stylized images as carriers, as they are less likely to arouse suspicion in covert communication.

2.3. Image Steganography Based on Style Transfer

The convergence of style transfer and steganography has emerged as a promising research direction, leveraging the popularity of stylized images to enhance steganographic security. Existing related works can be summarized as follows:

Zhong et al. [27] proposed a steganographic method for stylized images, generating two similar stylized images with different parameters—one for embedding secrets and the other as a reference. However, this method still relies on the embedding distortion framework and STC coding, leaving it vulnerable to detection by adversaries who generate cover-stego pairs and train classifiers.

Li et al. [11] proposed a GAN-based framework that embeds secret messages during the artistic style transfer process. Their method takes a content image, style image, and secret messages as inputs, and generates a stylized image containing hidden information through an encoder–decoder architecture. The approach utilizes adversarial training to enhance imperceptibility and achieves an embedding capacity of 1 bit per channel per pixel.

Shi et al. [12] proposed a steganography method based on style transfer: the processing network encodes secret information into a QR code-like image, which is then input together with the content image and style image. During the style transfer process, the QR code features are simultaneously embedded into the generated image, while the receiving network extracts and decodes the QR code from the stylized output to recover the secret. This approach avoids post-processing modifications and enhances concealment. However, the artistic quality of the generated images remains insufficient, which may raise suspicion. Future work could integrate more advanced style transfer models to improve the naturalness and security of the steganographic carriers.

Wang et al. [13] proposed the STNet steganographic network based on style transfer, using the VGG19 network as the encoder. While achieving secret embedding during style transfer, STNet is limited to the rigid combination of single-style and content images, failing to consider artistic details of stylized images. Additionally, it relies on pre-trained CNNs on ImageNet, introducing additional biases and resulting in inferior visual quality of generated images compared to recent methods.

Existing steganographic methods based on style transfer still face several limitations: insufficient consideration of artistic details leads to low visual quality of stego images; limited style diversity restricts application scenarios; and detectable embedding traces create a vulnerability for deep learning-based steganalysis. Addressing these challenges is crucial for developing more secure, efficient, and practical steganographic algorithms based on style transfer.

3. Proposed Scheme

In this section, we describe the proposed StegTransfer model. Figure 1 illustrates the overall framework, which consists of five components. The primary objective of our scheme is to train a generator and an extractor. The generator aims to produce stylized stegos that exhibit strong artistic effects while resisting detection by deep neural network-based steganalyzers. The extractor is trained to accurately recover hidden secret images from stegos after they undergo distortions in online social networks. To achieve these goals, our pipeline incorporates three key modules: an improved XuNet [23] based pretrained steganalyzer, a distortion simulation module that mimics potential artifacts in OSNs, and a Style-Artistic Network [28] (SANet) to enhance the artistic quality of stylized images. These modules can be trained end-to-end. Our proposed steganography framework is built upon existing deep learning-based style transfer architectures, with a prioritized focus on stylization quality. In covert communication scenarios, transmitting low-quality images tends to attract adversarial attention, significantly increasing the risk of detection. To address this, we incorporate the SANet as a core mechanism to enhance stylization quality. Specifically, SANet first performs deep feature fusion between the content and style images, generating high-fidelity stylized features. These fused features are then spatially aligned with the secret image to be embedded through upsampling operations. Finally, an adaptive superimposition strategy is applied to complete the steganographic embedding, ensuring that the stego image seamlessly conceals the secret information while preserving artistic style integrity.

The process is formalized as follows: Let

I_{c} \in R^{H \times W \times 3}

and

I_{s} \in R^{H \times W \times 3}

denote the content and style images, respectively. A pretrained VGG-19 encoder extracts multi-scale features from layers

l \in

{Relu_4_1, Relu_5_1} as Equations (1) and (2):

F_{c}^{l} = VGG {(I_{c})}^{l}

(1)

F_{s}^{l} = VGG {(I_{s})}^{l}

(2)

where

F_{c}^{l}, F_{s}^{l} \in R^{h \times w \times d}

.

The style-attentional module computes learnable attention maps to align style and content features as in Equation (3):

A_{i j k l}^{l} = \frac{exp (〈 W_{f} \bar{F_{c}^{l, i}}, W_{g} \bar{F_{s}^{l, k}} 〉)}{\sum_{k} exp (〈 W_{f} \bar{F_{c}^{l, i}}, W_{g} \bar{F_{s}^{l, k}} 〉)}

(3)

where

W_{f}, W_{g}

are

1 \times 1

convolutions,

\bar{F}

denotes mean-variance normalized features, and

〈 \cdot 〉

is a cosine similarity. The fused feature

F_{c s}^{l}

is computed by Equation (4):

F_{c s}^{l} = \sum_{k} A_{i j k l}^{l} \cdot (W_{h} F_{s}^{l, k})

(4)

where

W_{h}

is another

1 \times 1

convolution.

Features from two SANets (Relu_4_1 and Relu_5_1) are combined to balance local and global style patterns as Equations (5)–(7):

F_{c s c}^{Relu_4_1} = F_{c}^{Relu_4_1} + W_{c s}^{4} F_{c s}^{Relu_4_1}

(5)

F_{c s c}^{Relu_5_1} = F_{c}^{Relu_5_1} + W_{c s}^{5} F_{c s}^{Relu_5_1}

(6)

F_{c s c}^{m} = {conv}_{3 \times 3} (F_{c s c}^{Relu_4_1} + Upsample (F_{c s c}^{Relu_5_1}))

(7)

The secret image

M \in R^{H \times W \times 3}

is embedded into the stylized features

F_{c s c}^{m} \in R^{h \times w \times d}

via masked superposition, where

h < H

and

w < W

. To align the spatial dimensions and channel numbers between the two tensors for element-wise addition, we employ a two-step projection process. First, the secret image M undergoes a channel projection through a

1 \times 1

convolutional layer

W_{M}

to match the channel dimension d of

F_{c s c}^{m}

, yielding

\hat{M} = {Conv}_{1 \times 1} (M) \in R^{H \times W \times d}

. Second, the feature map

F_{c s c}^{m}

is upsampled to match the spatial dimensions of the projected secret image using bilinear interpolation, resulting in

{\hat{F}}_{c s c}^{m} \in R^{H \times W \times d}

. The weighted fusion is then performed as Equation (8):

F_{fused} = α \cdot {\hat{F}}_{c s c}^{m} + (1 - α) \cdot \hat{M}

(8)

where

α \in [0, 1]

is a hyperparameter balancing artistic quality and secrecy. The fused feature map

F_{fused}

is subsequently fed into the decoder to generate the

S t e g o

image as in Equation (9):

S t e g o = Decoder (F_{fused})

(9)

where the structure of decoder follows [28].

3.1. OSNs Distortions and Forward Simulation Model

Mainstream social media platforms such as WeChat, Twitter, and Meta impose multiple distortions on user-uploaded images to optimize storage and transmission efficiency. These distortions primarily include [29]: (1) lossy compression using JPEG/WebP algorithms (quality factors 50–90), which introduce high-frequency detail loss and block artifacts; (2) resolution downsampling (e.g., limiting the longest edge to 1080 pixels), causing aliasing effects; (3) color space conversions (RGB-to-sRGB/YUV) that induce quantization errors; and (4) platform-specific filters such as Meta’s auto-contrast enhancement or Twitter’s sharpening operations, which apply non-linear transformations.

To address these challenges, we extend the forward distortion simulation framework from [29]. The model operates in following stages:

Gradient truncation isolates the

S t e g o

from backward propagation interference by Equation (10):

I_{g} = Detach (S t e g o)

(10)

where

Detach (\cdot)

blocks gradient flow.

Composite distortions are injected sequentially through Equation (11):

I_{platform} = C \circ R \circ J (I_{g})

(11)

where

J (\cdot)

,

R (\cdot)

, and

C (\cdot)

represent JPEG-50 compression, resolution downsampling, and color quantization, respectively.

Distortion residual calculation by Equation (12):

diff = I_{g} - I_{platform}

(12)

During training, the residual is added as a constant perturbation to the original image via Equation (13):

I_{augmented} = S t e g o + Detach (diff)

(13)

which forces the model to learn robustness against hybrid distortions through forward propagation only.

3.2. Improved XuNet-Based Steganalyzer

To address the unique challenges of artistic steganography in style-transferred images, we propose three synergistic enhancements to the original XuNet steganalyzer. These modifications enable effective detection in RGB color space while maintaining computational efficiency for weak generative tasks.

The original XuNet was designed for grayscale images with a

1 \times 5 \times 5 \times 8

convolutional kernel. To accommodate RGB inputs of stylized stegos, we expand the input dimensionality to

3 \times 5 \times 5 \times 8

and introduce a cross-channel feature fusion layer. This adaptation preserves XuNet’s high-pass filtering prior while capturing inter-channel steganographic correlations.

Given that artistic stylization constitutes a weak generative problem where texture patterns interfere with steganographic features, we implement a two-phase training protocol. During pre-training, we construct a specialized dataset using VGG-19 features extracted from both clean style-transferred images and stego images. The pre-training follows a standard cross-entropy objective where the discriminator learns to distinguish between clean and stego-containing images. The pre-trained weights are then frozen during end-to-end optimization.

During generator training, the fixed XuNet outputs stego probability

p_{stego}

, which drives the adversarial loss following the cross-entropy formulation.

3.3. Secret Image Extraction Network

The secret extractor is designed to operate without requiring any knowledge of the style image used during stego generation. It decodes the secret information directly from the content structure and embedded patterns of the received stego image, a process independent of the image’s artistic style.

We design a convolutional neural network to directly reconstruct the original secret RGB image from the stego image. The extraction network follows an encoder–decoder architecture similar to SteganoGAN [9] but optimized for image-to-image recovery rather than binary message extraction. The extraction process consists of three key operations:

The stego image first passes through a series of convolutional blocks to extract multi-scale features as Equations (14)–(16):

F_{1} = ReLU (BN ({Conv}_{7 \times 7} (I_{stego})))

(14)

F_{2} = ReLU (BN ({Conv}_{5 \times 5} (F_{1})))

(15)

F_{3} = ReLU (BN ({Conv}_{3 \times 3} (F_{2})))

(16)

The features then pass through residual bottleneck blocks to learn the mapping between stego and secret image representations as Equation (17):

R_{i} = F_{3} + ReLU (BN ({Conv}_{3 \times 3} (ReLU (BN ({Conv}_{3 \times 3} (F_{3}))))))

(17)

Finally, transposed convolutions progressively upsample the features to reconstruct the RGB output as Equation (18):

\hat{M} = Tanh ({Conv}_{7 \times 7} (Upsample (R_{i})))

(18)

3.4. Loss Function Design

The StegTransfer model optimizes three core objectives through a unified loss framework.

Stylization loss is consisted of

L_{content}

and

L_{style}

computed by Equations (19)–(21), respectively.

L_{content} = {∥ ϕ_{4} (Stego) - ϕ_{4} (I_{c}) ∥}_{2}^{2}

(19)

where

ϕ_{4}

denotes VGG-19 ReLU4_1 features and

I_{c}

is the content image.

L_{style} = \sum_{l = 1}^{5} w_{l} {∥ G (ϕ_{l} (Stego)) - G (ϕ_{l} (I_{s})) ∥}_{2}^{2}

(20)

where

G (\cdot)

computes the Gram matrix [30],

ϕ_{l}

represents VGG-19 features at layer l (ReLU

l_1

),

I_{s}

is the style image, and

w_{l}

are layer weights.

L_{cs} = λ_{c} L_{content} + λ_{s} L_{style}

(21)

where

λ_{c}

and

λ_{s}

balance content and style preservation.

Secret reconstruction loss is computed by Equation (22)

L_{secret} = {∥ M - \hat{M} ∥}_{1} + γ (1 - SSIM (M, \hat{M}))

(22)

where M is the original secret image,

\hat{M} = Extractor (Stego)

is the extracted image,

{∥ \cdot ∥}_{1}

denotes L1-norm (mean absolute error), SSIM is the Structural Similarity Index Measure,

γ = 0.3

weights the SSIM term.

Steganalysis evasion loss is computed by Equation (23)

L_{dis} = p_{stego} = D (ϕ (Stego))

(23)

where

p_{stego}

is the XuNet’s detection probability (range [0, 1]), D denotes the fixed XuNet steganalyzer, and

ϕ

is the VGG feature extractor.

Total optimization objective is computed by Equation (24)

L_{total} = L_{style} + λ_{secret} L_{secret} + λ_{detect} L_{detect}

(24)

where

λ_{secret}

prioritizes secret recovery accuracy, and

λ_{detect}

balances stealthiness against quality degradation.

4. Experimental Results and Analysis

4.1. Experimental Environment Setup

For the steganography framework, content-style pairs were sourced from the MS-COCO [31] dataset (content images) and WikiArt [32] dataset (style images), while secret images were uniformly sampled from the NWPU-RESISC45 [33] remote sensing dataset. All images were center-cropped and resized to

256 \times 256

pixels to ensure spatial alignment between content images and secret images.

For quantitative evaluation, we constructed a well-defined experimental dataset: 80,000 images were randomly selected from the MS-COCO dataset as the content set, and 5000 images were randomly chosen from the WikiArt dataset as the style set. The secret set consists of 10,000 images randomly sampled from the NWPU-RESISC45 dataset. During the training, validation, and testing phases, these sets were randomly partitioned to ensure no information leakage.

4.2. Fidelity

Fidelity is crucial for steganography evaluation. Traditionally, it measures the visual imperceptibility between stego and cover images. In our artistic style steganography context, fidelity refers specifically to the quality of the stego image’s stylized artistic effect: higher stylized quality equals higher fidelity.

We evaluated the fidelity of our scheme both qualitatively (Figure 2) and quantitatively (Table 1). When the training reaches 20,000 epochs, our method is capable of generating stego images with favorable visual quality on the test set, as illustrated in Figure 2, which demonstrates vivid colors and precise brushstrokes. Moreover, the extracted secret images exhibit only minor residuals compared to the original secret images, indicating effective convergence of the model. In contrast, competing methods tend to introduce noticeable artifacts such as color bleeding and over-smoothing.

Table 1 presents quantitative results using the mean Sobel gradient magnitude measuring edge sharpness, CIEDE2000 [34] measuring color fidelity, and the NIMA [35] model predicting aesthetic quality based on human preferences. Testing 500 stego images per scheme, we report mean scores and standard deviations.

Combining Figure 2 and Table 1 results, our scheme shows significant artistic fidelity superiority. This improvement stems directly from SANet module, which uses style-aware feature fusion to align artistic patterns with content structure. Crucially, by establishing deep style-content correlations during embedding, SANet preserves high-frequency details. This mechanism suppresses style distortion and blending, ensuring highly faithful artistic style in the final stego image.

4.3. Robustness

Building upon the systematic analysis of social media distortions presented in [29], we conducted specialized robustness tests covering four common distortion categories: JPEG compression (quality factor QF = 70), resolution downsampling (scale factor = 4/5), color space conversion (RGB to YCbCr), and hybrid distortions. The hybrid distortions were applied sequentially as: downsampling → color conversion → JPEG compression. Since the compared algorithms did not provide open-source implementations, we faithfully reimplemented them based on the descriptions in their respective papers to ensure a fair comparison. To quantitatively evaluate the fidelity of extracted secret images after transmission through lossy channels (e.g., online social networks), we used two standard metrics: Peak Signal-to-Noise Ratio (PSNR) [36] and Structural Similarity Index (SSIM) [37], both computed by comparing the extracted secret images against the original ones. Higher PSNR and SSIM values indicate better quality preservation. As shown in Table 2, our approach, which integrates a dedicated distortion simulation module, demonstrates significantly enhanced resilience compared to the reimplemented baseline methods.

The PSNR is derived from the Mean Squared Error (MSE) between the original image I and the processed image K (both of size

m \times n

pixels), as defined in Equations (25) and (26). The SSIM metric evaluates the similarity between local windows x (from I) and y (from K) by considering luminance, contrast, and structural cues, as defined in Equation (27). Specifically, the MSE and PSNR are calculated as follows:

MSE (I, K) = \frac{1}{m n} \sum_{i = 0}^{m - 1} \sum_{j = 0}^{n - 1} {[I (i, j) - K (i, j)]}^{2},

(25)

PSNR (I, K) = 10 \cdot {log}_{10} (\frac{{MAX}_{I}^{2}}{MSE (I, K)}) (dB),

(26)

where

{MAX}_{I}

denotes the maximum possible pixel value (e.g., 255 for 8-bit images). The SSIM is defined as:

SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})},

(27)

where

μ_{x}

and

μ_{y}

are the pixel means of windows x and y, respectively;

σ_{x}^{2}

and

σ_{y}^{2}

are their variances;

σ_{x y}

is their covariance. The constants

C_{1} = {(0.01 \times {MAX}_{I})}^{2}

and

C_{2} = {(0.03 \times {MAX}_{I})}^{2}

are used to ensure numerical stability. The overall SSIM value for the entire image is obtained by averaging the local SSIM values across all windows.

4.4. Security

To verify the security of StegTransfer, we conducted comprehensive safety tests using two widely adopted steganalysis methods: StegExpose [38] and SiaStegNet [39]. StegTransfer’s 15.5% detection rate under StegExpose is significantly lower than that of the comparative methods [12]. For further evaluation, we employed the more recent SiaStegNet detector. Under this method, StegTransfer maintained a detection rate of 62.3%, which is 9.8% lower than that of the closest competitor [12], as illustrated in Figure 3.

4.5. Ablation Study

To validate the effectiveness of each key component in the StegTransfer framework, we conduct comprehensive ablation studies by systematically evaluating four configurations: the Full Model with all modules (SANet, distortion simulation, and improved XuNet discriminator); w/o Distortion Simulation, which removes the forward non-differentiable noisy module during training; w/o XuNet Discriminator, which excludes the adversarial steganalysis component from training; and w/o SANet, which replaces the Style-Attentional Network with a basic style transfer encoder–decoder. All configurations are trained with identical hyperparameters and evaluated on the same test set of 500 stego images. Quantitative results are presented in Table 3, Table 4 and Table 5.

As shown in Table 3, the full model achieves superior performance across all robustness metrics. Removing the distortion simulation module causes the most significant degradation, confirming that OSN distortion simulation is crucial for maintaining extraction accuracy after social media transmission. The absence of XuNet discriminator or SANet also reduces robustness, though to a lesser extent, indicating their complementary roles in preserving embedded information.

Table 4 demonstrates that SANet plays the most critical role in artistic quality enhancement. Removing SANet causes the largest drop in NIMA score (from 6.32 to 4.90), edge sharpness, and color fidelity. This confirms that style-attentional feature fusion is essential for generating high-quality stylized stego images. The distortion simulation and XuNet modules have minimal impact on aesthetic quality, as expected, since they primarily target robustness and security, respectively.

Security evaluation under state-of-the-art steganalyzers (Table 5) reveals that the XuNet discriminator is vital for anti-detection capability. Without adversarial training, the detection rate increases by 12.5% under StegExpose and 15.2% under SiaStegNet. Interestingly, removing SANet also slightly increases detectability, suggesting that high-quality stylization helps conceal embedding traces. The distortion simulation module shows minimal impact on security, as it primarily affects post-transmission recovery rather than inherent detectability.

The ablation studies conclusively demonstrate that each component in StegTransfer addresses distinct challenges: the distortion simulation module is essential for robustness against OSN processing pipelines; the XuNet discriminator significantly enhances security against deep learning-based steganalysis; and the SANet critically improves aesthetic quality while marginally aiding security. The full integration of these components creates a synergistic effect, enabling StegTransfer to simultaneously achieve high robustness, security, and visual quality—addressing the three key limitations of existing style-transfer steganography methods.

5. Conclusions

This letter proposes StegTransfer, a robust image steganographic framework for online social networks. It integrates a non-differentiable distortion simulation module, an improved XuNet adversarial discriminator, and an aesthetic enhancement network. This trio addresses key limitations in existing style-transfer steganography: robustness to platform distortions, artistic quality, and detection resistance. The distortion module enhances embedded content recoverability after transmission, while adversarial training strengthens resistance to deep learning steganalysis. The style-attentional network ensures high-quality stego-images, preventing visual suspicion.

Author Contributions

Conceptualization P.L., J.L. and D.M.; methodology, P.L. and Q.D.; validation, P.L., Q.D. and D.M.; formal analysis, P.L. and J.L.; investigation, P.L.; writing—original draft, P.L., J.L. and Q.D.; writing—review & editing, P.L., J.L., Q.D. and D.M.; supervision, P.L., J.L. and D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by NSF of China under Grant 62074131, 62272389, 62372069 and Shaanxi Provincial Key R&D Program 2023-ZDLGY-32.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, Q.; Ma, B.; Wang, X.; Wang, C.; Gao, S. Image Steganography in Color Conversion. IEEE Trans. Circuits Syst. II Express Briefs 2024, 71, 106–110. [Google Scholar] [CrossRef]
Li, X.; Cao, H.; Zhang, Z.; Hu, J.; Jin, Y.; Zhao, Z. Artistic Neural Style Transfer Algorithms with Activation Smoothing. In Proceedings of the 2025 2nd International Conference on Informatics Education and Computer Technology Applications, New York, NY, USA, 17–19 January 2025; pp. 1–6. [Google Scholar] [CrossRef]
Hang, J.; Yang, X.; Ye, C. Image Style Transfer based Data Augmentation for Sanitary Ceramic Defect Detection. IEEE Trans. Instrum. Meas. 2025, 74, 5013310. [Google Scholar] [CrossRef]
Gong, Z.; Tan, X.; Gan, M.; Xie, W.; Wang, W. TranSegWGAN: Segmentation of Histology Images Through Stain Style Transfer Based on a Wasserstein Generative Adversarial Network. Tsinghua Sci. Technol. 2025. [Google Scholar] [CrossRef]
Li, J.; Zhang, M.; Niu, K.; Zhang, Y.; Ke, Y.; Yang, X. High-Security HEVC Video Steganography Method Using the Motion Vector Prediction Index and Motion Vector Difference. Tsinghua Sci. Technol. 2025, 30, 813–829. [Google Scholar] [CrossRef]
Luo, P.; Liu, J.; Ke, Y.; Zhang, M.; Mu, D. Hiding Functions within Functions: Steganography by Implicit Neural Representations. Tsinghua Sci. Technol. 2026, 31, 1058–1074. [Google Scholar] [CrossRef]
Filler, T.; Judas, J.; Fridrich, J. Minimizing Additive Distortion in Steganography Using Syndrome-Trellis Codes. IEEE Trans. Inf. Forensics Secur. 2011, 6, 920–935. [Google Scholar] [CrossRef]
Baluja, S. Hiding Images within Images. IEEE Transacations Pattern Anal. Mach. Intell. 2020, 42, 1685–1697. [Google Scholar] [CrossRef]
Zhang, K.A.; Cuesta-Infante, A.; Xu, L.; Veeramachaneni, K. SteganoGAN: High capacity image steganography with GANs. arXiv 2019, arXiv:1901.03892. [Google Scholar] [CrossRef]
Wang, Z.; Gao, N.; Wang, X.; Xiang, J.; Liu, G. STNet: A style transformation network for deep image steganography. In Proceedings of the Neural Information Processing: 26th International Conference, ICONIP 2019, Sydney, Australia, 12–15 December 2019; Part II 26. pp. 3–14. [Google Scholar]
Li, L.; Zhang, X.; Chen, K.; Feng, G.; Wu, D.; Zhang, W. Image Steganography and Style Transformation Based on Generative Adversarial Network. Mathematics 2024, 12, 615. [Google Scholar] [CrossRef]
Shi, R.; Wang, Z.; Hao, Y.; Zhang, X. Steganography in Style Transfer. IEEE Trans. Artif. Intell. 2024, 5, 6054–6065. [Google Scholar] [CrossRef]
Zheng, S.; Tang, Y.; Zhang, Y.; Hu, D. Image Steganography based on Style Transfer. In Proceedings of the 2024 3rd International Conference on Image Processing and Media Computing (ICIPMC), Hefei, China, 17–19 May 2024; pp. 211–217. [Google Scholar] [CrossRef]
Simmons, G.J. The prisoner’s problem and the subliminal channel. In Advances in Cryptology; Springer: Boston, MA, USA, 1984; pp. 51–67. [Google Scholar]
Holub, V.; Fridrich, J. Designing steganographic distortion using directional filters. In Proceedings of the IEEE Workshop on Information Forensic and Security, Tenerife, Spain, 2–5 December 2012; pp. 234–239. [Google Scholar]
Holub, V.; Fridrich, J.; Denemark, T. Universal distortion function for steganography in an arbitrary domain. EURASIP J. Inf. Secur. 2014, 2014, 1. [Google Scholar] [CrossRef]
Li, B.; Wang, M.; Huang, J.; Li, X. A new cost function for spatial image steganography. In Proceedings of the IEEE International Conference on Image Processing, Paris, France, 27–30 October 2014; pp. 4206–4210. [Google Scholar]
Yao, Q.; Zhang, W.; Chen, K.; Yu, N. LDGM codes based near-optimal coding for adaptive steganography. IEEE Trans. Commun. 2023, 72, 2138–2151. [Google Scholar] [CrossRef]
Sedighi, V.; Cogranne, R.; Fridrich, J. Content-adaptive steganography by minimizing statistical detectability. IEEE Trans. Inf. Forensics Secur. 2015, 11, 221–234. [Google Scholar] [CrossRef]
Zhang, J.; Chen, K.; Li, W.; Zhang, W.; Yu, N. Steganography with generated images: Leveraging volatility to enhance security. IEEE Trans. Dependable Secur. Comput. 2023, 21, 3994–4005. [Google Scholar] [CrossRef]
Chen, K.; Zhou, H.; Wang, Y.; Li, M.; Zhang, W.; Yu, N. Cover reproducible steganography via deep generative models. IEEE Trans. Dependable Secur. Comput. 2022, 20, 3787–3798. [Google Scholar] [CrossRef]
Chan, C.K.; Cheng, L.M. Hiding data in images by simple LSB substitution. Pattern Recognit. 2004, 37, 469–474. [Google Scholar] [CrossRef]
Xu, G.; Wu, H.Z.; Shi, Y.Q. Structural design of convolutional neural networks for steganalysis. IEEE Signal Process. Lett. 2016, 23, 708–712. [Google Scholar] [CrossRef]
Boroumand, M.; Chen, M.; Fridrich, J. Deep residual network for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 2018, 14, 1181–1193. [Google Scholar] [CrossRef]
Gatys, L.A.; Ecker, A.S.; Bethge, M. A neural algorithm of artistic style. arXiv 2015, arXiv:1508.06576. [Google Scholar] [CrossRef]
Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 694–711. [Google Scholar]
Zhong, N.; Qian, Z.; Wang, Z.; Zhang, X. Steganography in stylized images. J. Electron. Imaging 2019, 28, 033005. [Google Scholar] [CrossRef]
Park, D.Y.; Lee, K.H. Arbitrary Style Transfer With Style-Attentional Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 5873–5881. [Google Scholar] [CrossRef]
Zhang, C.; Karjauv, A.; Benz, P.; Kweon, I. Towards Robust Deep Hiding Under Non-Differentiable Distortions for Practical Blind Watermarking. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 20–24 October 2021. [Google Scholar]
Li, Y.; Wang, N.; Liu, J.; Hou, X. Demystifying Neural Style Transfer. In Proceedings of the International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017. [Google Scholar]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Doll’ar, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. pp. 740–755. [Google Scholar]
Phillips, F.; Mackintosh, B. Wiki Art Gallery, Inc.: A case for critical thinking. Issues Account. Educ. 2011, 26, 593–608. [Google Scholar] [CrossRef]
Cheng, G.; Han, J.; Lu, X. Remote sensing image scene classification: Benchmark and state of the art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef]
Luo, M.R.; Cui, G.; Rigg, B. The development of the CIE 2000 colour-difference formula: CIEDE2000. Color Res. Appl. 2001, 26, 340–350. [Google Scholar] [CrossRef]
Talebi, H.; Milanfar, P. NIMA: Neural Image Assessment. IEEE Trans. Image Process. 2018, 27, 3998–4011. [Google Scholar] [CrossRef]
Almohammad, A.; Ghinea, G. Stego image quality and the reliability of PSNR. In Proceedings of the International Conference on Image Processing, Hong Kong, China, 26–29 September 2010. [Google Scholar]
Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Boehm, B. Stegexpose-A tool for detecting LSB steganography. arXiv 2014, arXiv:1410.6656. [Google Scholar]
You, W.; Zhang, H.; Zhao, X. A siamese cnn for image steganalysis. IEEE Trans. Inf. Forensics Secur. 2020, 16, 291–306. [Google Scholar] [CrossRef]

Figure 1. The flowchart of StegTransfer. This figure is only used to illustrate the end-to-end training process of the scheme. Before implementing the actual steganography, the trained decoder and the specific VGG-19 model need to be deployed on the sender’s side, and the trained extractor needs to be deployed on the receiver’s side. The Forward Nondifferential Noisy module is used to simulate the distortion existing in Online Social Networks.

Figure 2. Visualization and comparison of steganographic and recovery effects under different training epochs.

Figure 3. Comparison of detection rates among different steganalysis algorithms (Proposed scheme: StegTransfer; Comparative schemes: Wang et al. [10], Li et al. [11], Shi et al. [12], Zheng et al. [13]).

Table 1. Aesthetic score comparison of stego images.

Method	NIMA Score	Edge Sharpness	Color Fidelity
Wang et al. [10]	$4.55 \pm 0.21$	$18.1 \pm 1.5$	$6.9 \pm 0.7$
Li et al. [11]	$4.89 \pm 0.25$	$18.7 \pm 1.6$	$7.5 \pm 0.8$
Shi et al. [12]	$3.35 \pm 0.29$	$17.3 \pm 1.8$	$7.2 \pm 0.9$
Zheng et al. [13]	$4.50 \pm 0.25$	$18.7 \pm 1.6$	$7.3 \pm 0.8$
StegTransfer	6.32 $\pm 0.18$	$23.5 \pm 1.2$	$9.8 \pm 0.4$

Table 2. Robustness Testing and Comparison with Other Methods Under Various Distortion Conditions.

Method	JPEG Compression PSNR/SSIM	Resolution Downsampling PSNR/SSIM	Color Quantization PSNR/SSIM	Hybrid Distortions PSNR/SSIM
Wang et al. [10]	$25.1 db / 0.72$	$28.5 db / 0.81$	$30.2 db / 0.85$	$26.3 db / 0.75$
Li et al. [11]	$26.3 db / 0.75$	$29.1 db / 0.83$	$31.0 db / 0.87$	$27.5 db / 0.78$
Shi et al. [12]	$24.8 db / 0.70$	$27.9 db / 0.80$	$29.8 db / 0.84$	$25.9 db / 0.74$
Zheng et al. [13]	$25.9 db / 0.74$	$28.8 db / 0.82$	$30.5 db / 0.86$	$27.0 db / 0.77$
StegTransfer	$30.2 db / 0.85$	$31.5 db / 0.89$	$31.0 db / 0.91$	$29.5 db / 0.81$

Table 3. Ablation study: robustness comparison under JPEG compression (QF = 70).

Configuration	PSNR (dB)	SSIM
Full Model	30.2	0.85
w/o Distortion Simulation	27.1	0.78
w/o XuNet Discriminator	29.5	0.83
w/o SANet	28.8	0.81

Table 4. Ablation study: aesthetic quality comparison.

Configuration	NIMA Score	Edge Sharpness	Color Fidelity
Full Model	6.32 ± 0.18	23.5 ± 1.2	9.8 ± 0.4
w/o Distortion Simulation	6.28 ± 0.19	23.3 ± 1.3	9.7 ± 0.5
w/o XuNet Discriminator	6.25 ± 0.20	23.1 ± 1.4	9.6 ± 0.5
w/o SANet	4.90 ± 0.22	18.9 ± 1.7	7.5 ± 0.6

Table 5. Ablation study: security comparison (detection rate %).

Configuration	StegExpose	SiaStegNet
Full Model	15.5	62.3
w/o Distortion Simulation	16.1	63.8
w/o XuNet Discriminator	28.0	77.5
w/o SANet	17.2	65.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luo, P.; Liu, J.; Dang, Q.; Mu, D. Robust Image Steganography in Online Social Networks via Neural Style Transfer. Mathematics 2026, 14, 629. https://doi.org/10.3390/math14040629

AMA Style

Luo P, Liu J, Dang Q, Mu D. Robust Image Steganography in Online Social Networks via Neural Style Transfer. Mathematics. 2026; 14(4):629. https://doi.org/10.3390/math14040629

Chicago/Turabian Style

Luo, Peng, Jia Liu, Qian Dang, and Dejun Mu. 2026. "Robust Image Steganography in Online Social Networks via Neural Style Transfer" Mathematics 14, no. 4: 629. https://doi.org/10.3390/math14040629

APA Style

Luo, P., Liu, J., Dang, Q., & Mu, D. (2026). Robust Image Steganography in Online Social Networks via Neural Style Transfer. Mathematics, 14(4), 629. https://doi.org/10.3390/math14040629

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Image Steganography in Online Social Networks via Neural Style Transfer

Abstract

1. Introduction

2. Related Work

2.1. Image Steganography

2.2. Image Style Transfer

2.3. Image Steganography Based on Style Transfer

3. Proposed Scheme

3.1. OSNs Distortions and Forward Simulation Model

3.2. Improved XuNet-Based Steganalyzer

3.3. Secret Image Extraction Network

3.4. Loss Function Design

4. Experimental Results and Analysis

4.1. Experimental Environment Setup

4.2. Fidelity

4.3. Robustness

4.4. Security

4.5. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI