A Steganalysis Method Based on Relationship Mining

Yang, Ruiyao; Yang, Yu; Zhou, Linna; Meng, Xiangli

doi:10.3390/electronics14214347

Open AccessArticle

A Steganalysis Method Based on Relationship Mining

School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(21), 4347; https://doi.org/10.3390/electronics14214347 (registering DOI)

Submission received: 14 October 2025 / Revised: 31 October 2025 / Accepted: 2 November 2025 / Published: 6 November 2025

(This article belongs to the Topic Recent Advances in Artificial Intelligence for Security and Security for Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Steganalysis is a critical research direction in the field of information security. Traditional approaches typically employ convolution operations for feature extraction, followed by classification on noise residuals. However, since steganographic signals are inherently weak, convolution alone cannot fully capture their characteristics. To address this limitation, we propose a steganalysis method based on relationship mining, termed RMNet, which leverages positional relationships of steganographic signals for detection. Specifically, features are modeled as graph nodes, where both locally focused and globally adaptive dynamic adjacency matrices guide the propagation paths of these nodes. Meanwhile, the results are further constrained in the feature space, encouraging intra-class compactness and inter-class separability, thereby increasing inter-class separability of positional features and yielding a more discriminative decision boundary. Additionally, to counter signal attenuation during network propagation, we introduce a multi-scale perception module with cross-attention fusion. Experimental results demonstrate that RMNet achieves performance comparable to state-of-the-art models on the BOSSbase and BOWS2 datasets, while offering superior generalization capability.

Keywords:

attention mechanism; graph neural networks; relationship mining; feature clustering; distance constraint

1. Introduction

Image steganography is a technique that invisibly embeds secret information into image carriers to enable covert communication. The counterpart of steganography is image steganalysis, which aims to detect any hidden information being transmitted among different entities [1]. These two technologies evolve in an adversarial manner, driving mutual advancements.

Current steganalysis techniques can be broadly categorized into two groups: traditional methods reliant on hand-crafted features and modern approaches based on deep learning [2]. Traditional methods primarily depend on manually designed prior knowledge, which limits their transferability. In recent years, convolutional neural network (CNN)-based steganalysis methods have developed rapidly. These approaches leverage convolution operations to automatically extract and amplify steganographic signals while suppressing semantic information, thereby achieving superior accuracy and generalization compared to traditional techniques. Representative models in this category include Yedroudj-Net [3], SRNet [4], GBRAS-Net [5], SiaStegNet [6], LWENet [7], Luo-Net [8], DFNet [9], and Wei’s model [10].

Although CNN-based steganalysis networks have achieved remarkable progress, they still suffer from inherent limitations. Specifically, stacked convolutions can have insufficient receptive fields, causing over-sensitivity to local features and a lack of global attention. This drawback can trap the models in local analysis. Steganalysis analyzes noise residuals, which are highly dependent on their spatial distribution.Consequently, investigating both the local and global distributions of steganographic signals is particularly critical.

Graph neural networks (GNNs) [11] have emerged with global relation modeling as their distinctive advantage. In domains such as social networks, knowledge graphs, and fine-grained image classification, several cutting-edge studies [12,13,14,15] have demonstrated the strong capability of GNNs in global relationship mining. To better leverage the global information of steganographic signals, we propose a relationship-mining-based steganalysis framework, termed RMNet, which incorporates APPNP [16] to enhance interaction among graph nodes, enabling features to retain local details while also capturing global representations. The proposed approach integrates shallow and deep features through a cross-attention fusion mechanism, and further employs a distance-adaptive adjacency matrix to guide the APPNP algorithm. In this design, nodes closer to the current node are assigned higher dynamic weights, while distant nodes are adaptively weighted. The resulting features preserve local information while showing strong global characteristics. Moreover, to further improve the generalization ability of the model, we introduce a feature clustering and distance constraint module to guide the distribution of features in the representation space. The main contributions of this work can be summarized as follows:

We propose a steganalysis approach that leverages relationship mining, where local-focused and globally adaptive patterns are employed to guide graph relation modeling and capture global features. In the feature space, feature clustering and contrastive learning are incorporated to enlarge inter-class global differences, thereby generating a more discriminative decision boundary and maximizing the utilization of global relationships for classification.
We analyze the characteristics of features at different layers in traditional convolutional networks. To balance the large receptive field but insufficient local detail of deep features with the small receptive field but fine-grained detail of shallow features, we propose a cross-attention method between deep and shallow features. In this approach, deep features guide shallow features to obtain representations that not only maintain a large receptive field but also preserve sufficient local details.
We conduct extensive experiments to demonstrate the effectiveness of the proposed method. Ablation studies are performed to verify the contribution of each module, and visualization analyses are provided to further illustrate the performance of our model.

2. Related Work

In this section, we review the development of steganalysis and discuss several representative methods that exploit spatial positional relationships for analysis.

2.1. Development of Steganalysis

Steganalysis development has two stages: early hand-crafted features and recent deep learning methods.Early steganalysis methods primarily relied on manually designed statistical attributes of features [17], with representative approaches including rich models and ensemble-based methods [18,19,20]. However, these approaches were inherently constrained by the limitations of hand-crafted feature descriptors [21]. With the rapid progress of deep learning, CNN-based steganalysis methods have gradually become mainstream. CNNs use convolution to automatically extract features, reducing reliance on manual engineering. Qian et al. [22] were the first to introduce CNNs into this domain, proposing a three-part architecture consisting of high-pass filtering, convolutional feature extraction, and fully connected classification, which achieved performance comparable to traditional methods. Since then, a variety of advanced networks have emerged. XuNet [23] was the first to combine residual networks with spatial pyramid pooling. SRNet [4] introduced residual connections and non-fixed filters. SiaStegNet employed a parameter-sharing Siamese network structure. LWENet [7] integrated depthwise separable convolution, multiple normalization techniques, and global pooling for multi-view classification. SAANet [24] was the first to incorporate attention-augmented convolution. Tong et al. [9] proposed J-DFNet and S-DFNet, which combine residual and dense paths with coordinate attention mechanisms.

Although CNN-based steganalysis models have already achieved high levels of accuracy, they still suffer from certain limitations, such as the loss of feature details caused by convolutional stacking and insufficient utilization of the global spatial information of steganographic signals. These limitations remain major bottlenecks preventing CNNs from achieving even higher accuracy.

2.2. Development of Relationship Mining Techniques

Relationship mining, via GNNs, excels at modeling structured information and capturing global dependencies [11]. Its core is modeling element relationships, where message-passing allows nodes to acquire global and neighborhood context. In the field of fine-grained classification, Bera et al. [14] proposed a method that combines GNNs with attention mechanisms. By exploiting spatial relation-aware feature transformation and attention-based contextual modeling, their approach significantly improved the accuracy of fine-grained image classification. Sikdar et al. [15] further advanced this direction by constructing both inter-region and intra-region graphs to promote high-order feature interactions. Without requiring part-level bounding box annotations, their method automatically captured long-range dependencies and local details across different regions of objects, thereby achieving substantial improvements in classification accuracy.

However, the application of GNNs in the field of steganalysis remains underexplored. Liu et al. [25] conducted a preliminary study on employing GNNs for steganalysis. Their method transformed images into graph structures, extracted node features using shallow CNNs, and applied a graph attention network (GAT) [26] to learn global representations for distinguishing between cover and stego images. Subsequently, Liu et al. [27] extended this approach to the JPEG domain by partially removing pooling layers and setting the convolution stride to 1, in order to prevent the weakening of steganographic signals in deeper feature layers. Nonetheless, this method lacked further performance comparisons with more advanced models. These attempts showed GNNs’ advantages for global analysis but failed to resolve two key issues: signal attenuation in deep features and localized clustering patterns.

3. Method

3.1. Overview

The proposed framework (Figure 1) has three main components: the Feature Extraction Module, the Feature Fusion Module, and the Relationship Mining Module. The Feature Extraction Module employs LWENet as the backbone network to extract steganographic features. The Feature Fusion Module integrates shallow and deep features obtained from the extraction stage, preserving their respective advantages. Building upon the fused representations, the Relationship Mining Module converts features into graph nodes and computes graph relations to obtain global feature representations. The implementations of the Feature Fusion Module and the Relationship Mining Module are described in detail in Section 3.2 and Section 3.3, respectively.

3.2. Feature Fusion Module

The attention mechanism [28] is a core deep learning component due to its feature selection capability. In convolution-based steganalysis, classification is typically performed using deep features, which capture global representations, while the fine-grained local details contained in shallow features are often lost. To achieve an effective integration of local details and global context, we propose a cross-attention-based Feature Fusion Module, as illustrated in part (b) of Figure 1. Building upon the self-attention mechanism, we modify the usage of queries (q), keys (k), and values (v) to design a cross-attention mechanism tailored specifically for steganalysis.

Given the deep features as queries

q \in R^{B \times C \times H \times W}

, the shallow features as keys

k \in R^{B \times C \times H \times W}

, and the shallow features as values

v \in R^{B \times C \times H \times W}

(where B denotes the batch size, C the number of channels, and

H \times W

the spatial resolution), the features are first projected through linear transformations to obtain Query, Key, and Value:

\begin{matrix} Q = {Conv}_{1 \times 1} (q) = W_{q} * q, \\ K = {Conv}_{1 \times 1} (k) = W_{k} * k, \\ V = {Conv}_{1 \times 1} (v) = W_{v} * v, \end{matrix}

(1)

where

W_{q}

,

W_{k}

, and

W_{v}

are learnable convolutional weights. For multi-head attention, the features are divided into

N_{h}

heads, each with a dimension of

d_{h} = C / N_{h}

:

\begin{matrix} \tilde{Q} = Reshape (Q) \in R^{B \times N_{h} \times (H \cdot W) \times d_{h}}, \\ \tilde{K} = Reshape (K) \in R^{B \times N_{h} \times (H \cdot W) \times d_{h}}, \\ \tilde{V} = Reshape (V) \in R^{B \times N_{h} \times (H \cdot W) \times d_{h}} . \end{matrix}

(2)

After reshaping, the cross-scale similarity between deep features

\tilde{Q}

and shallow features

\tilde{K}

is computed as:

S = \frac{\tilde{Q} {\tilde{K}}^{T}}{\sqrt{d_{k}}} \in R^{B \times N_{h} \times (H \cdot W) \times (H \cdot W)},

(3)

where

\sqrt{d_{k}}

is a scaling factor. The similarity matrix S is normalized using the softmax function:

A = softmax (S) .

(4)

The detailed information from shallow features

\tilde{V}

is then aggregated with the attention weights:

\tilde{O} = A {\tilde{V}}^{T} \in R^{B \times N_{h} \times (H \cdot W) \times d_{h}} .

(5)

Finally, the aggregated representation

\tilde{O}

is reshaped back to the original spatial format and fused through a

1 \times 1

convolution:

Output = {Conv}_{1 \times 1} (Reshape (\tilde{O})) \in R^{B \times C \times H \times W} .

(6)

3.3. Relationship Mining Module

The core of steganalysis lies in analyzing the noise residuals between cover and stego images, which are inherently dependent on the spatial distribution of steganographic signals. Therefore, investigating both the local and global distributions of these signals is particularly important. To enable the network to make more effective use of global steganographic information, we propose a Relationship Mining Module, as illustrated in part (c) of Figure 1. This module leverages the interaction modeling capability of graph neural networks to capture global representations of features. It is composed of three components: an adaptive adjacency matrix, relation-aware learning, and feature distribution enhancement.

3.3.1. Adaptive Adjacency Matrix

Steganographic signals are typically embedded into textured regions. These regions exhibit certain local continuity, while the overall distribution tends to be scattered. Thus, emphasizing only global connections while ignoring local structures may obscure weak but important steganographic traces. Based on this observation, we propose a dynamic adjacency matrix that combines local and global connections among graph nodes, ensuring that features can preserve local consistency while also capturing global representations.

For the adjacency matrix, we define the local neighborhood

N_{i}^{l o c a l}

for each graph node i. For any node

j \in N_{i}^{l o c a l}

, a local full connection is adopted as follows:

A_{i j}^{l o c a l} = \{\begin{matrix} 1, & j \in N_{i}^{l o c a l}, \\ 0, & else . \end{matrix}

(7)

When node j falls outside the local neighborhood

N_{i}^{l o c a l}

, we further adopt a dynamic adjacency strategy. Specifically, we compute the cosine similarity between node i and node j, normalized to the range

[0, 1]

:

S_{i j} = 0.5 \times (\frac{f_{i} \cdot f_{j}}{∥ f_{i} ∥ ∥ f_{j} ∥} + 1),

(8)

where

f_{i}

and

f_{j}

denote the feature vectors of nodes i and j, respectively. We then preserve connections with stronger global relevance by keeping

S_{i j}

values greater than a threshold

τ

, while setting the others to zero. This yields the adaptive global adjacency:

A_{i j}^{g l o b a l} = \{\begin{matrix} S_{i j}, & S_{i j} \geq τ and j \notin N_{i}^{l o c a l}, \\ 0, & else . \end{matrix}

(9)

Finally, the overall adjacency matrix is obtained by combining the local and global components:

A_{i j} = A_{i j}^{l o c a l} + A_{i j}^{g l o b a l} .

(10)

3.3.2. Relation-Aware

The adjacency matrix specifies the connectivity of graph nodes. For a given feature graph

F_{g r a p h} \in R^{B \times N \times C}

, where B is the batch size, C is the number of channels, and

N = H \times W

denotes the spatial dimension, we adapt the Approximate Personalized Propagation of Neural Predictions (APPNP) message-passing algorithm [16]. APPNP achieves linear computational complexity by approximating topic-sensitive PageRank through power iteration. The initial feature representation is defined as:

H^{(0)} = F_{g r a p h},

(11)

and the iterative process of Approximate Personalized PageRank propagation is given by:

H^{(k)} = (1 - α) \hat{A} H^{(k - 1)} + α H^{(0)},

(12)

where

\hat{A} = D^{- \frac{1}{2}} A D^{- \frac{1}{2}}

is the symmetrically normalized adjacency matrix,

D = diag (\sum_{j} A_{i j})

is the diagonal degree matrix of

\hat{A}

,

α \in (0, 1)

is the teleport probability, and

k = 1, 2, \dots, K

denotes the propagation steps.

The final graph representation is obtained as:

H_{g r a p h} = Sigmoid (H^{(K)}) .

(13)

Based on

H_{g r a p h}

, node features are aggregated to obtain the representation

H_{f i n a l}

for subsequent classification:

H_{f i n a l} = \frac{1}{N} \sum_{i = 1}^{N} H_{i}^{g r a p h} \in R^{B \times C} .

(14)

3.3.3. Feature Distribution Enhancement

To enlarge the global differences between different classes, we introduce the ideas of feature clustering and contrastive learning. This compacts intra-class features and separates inter-class features, yielding more discriminative representations and a clearer decision boundary. These two strategies are formulated with separate loss functions.

(1) Feature clustering loss: We encourage same-class features to cluster densely around their class center, while enlarging inter-class distances. This reduces intra-class distances and increases inter-class separability, thereby improving the model’s ability to learn discriminative features. Let

C \in R^{p \times m}

denote the set of class centers, where p is the number of classes and m is the feature dimension. For a batch of extracted features, the clustering loss is defined as:

L_{F C} = \frac{\sum_{i}^{b} \sum_{j}^{m} {∥ f_{i}^{j} - c_{i}^{j} ∥}_{2}^{2}}{b \cdot σ},

(15)

where b is the batch size,

f_{i}

represents the feature extracted by the network, m is the feature dimension,

c_{i}

is the class center, and

σ

is a normalization factor.

(2) Distance constraint loss: Inspired by contrastive learning, we enforce a distance constraint between similar and dissimilar features. Specifically, we minimize the distance among features of the same class, while maximizing the distance among features of different classes. This enhances inter-class separability and promotes more discriminative learning. The distance constraint loss is formulated as:

L_{D C} = - log \frac{\sum_{i = 1}^{b} \sum_{j : y_{j} = y_{i}} exp (\frac{sim (f_{i}, f_{j})}{τ})}{\sum_{i = 1}^{b} \sum_{j : y_{j} = y_{i}} exp (\frac{sim (f_{i}, f_{j})}{τ}) + \sum_{i = 1}^{b} \sum_{j : y_{j} \neq y_{i}} exp (\frac{sim (f_{i}, f_{j})}{τ})},

(16)

where b is the batch size,

y_{i}

denotes the label,

τ

is the temperature parameter controlling the smoothness, and

sim (\cdot)

is the cosine similarity function.

Finally, the overall loss L consists of the traditional cross-entropy loss

L_{C L S}

, the feature clustering loss

L_{F C}

, and the distance constraint loss

L_{D C}

:

L = λ_{1} L_{C L S} (f_{l o g i t s}) + λ_{2} L_{F C} (f_{X}) + λ_{3} L_{D C} (f_{X}),

(17)

where

λ_{1}

,

λ_{2}

, and

λ_{3}

are weighting coefficients for the three losses.

4. Experiments

4.1. Datasets

The datasets used in this work are derived from two sources, namely BOSSbase1.01 [29] and BOWS2 [30]. Both BOSSbase and BOWS2 contain 10,000 high-quality images, allowing effective evaluation in an ideal setting.

We combine BOSSbase1.01 and BOWS2 into a single dataset, which consists of 20,000 images. All images are resized to

256 \times 256

pixels using MATLAB R2021b, and the dataset is split into training, validation, and test sets with a ratio of 14:1:5. Notably, all 10,000 images from BOWS2 are used for training. This dataset setup primarily simulates the ideal case of distinguishing between cover and stego samples. To generate stego images, we adopt three widely used adaptive steganographic algorithms: WOW [31], S-UNIWARD [32], and Hill [33].

4.2. Experimental Environment

Our backbone network LWENet follows the same configuration as the original model [7]. The weight decay and momentum values were set to 0.0005 and 0.9, respectively. L2 regularization was disabled for the bias terms in all convolutional and fully connected layers. The learning rate was initialized at 0.01 and reduced by a factor of 10 at the 80th, 140th, and 180th epochs, with a total of 200 training epochs.

The Relationship Mining Module is implemented with an APPNP layer (

K = 1

,

α = 0.3

), where edge weights modulate the message-passing intensities to achieve a balance between local feature preservation and neighborhood aggregation. A sigmoid activation function is applied to the outputs. All models were implemented using Python 3.8 and PyTorch 2.0.1, and experiments were conducted on an NVIDIA RTX 3090 GPU.

4.3. Comparison with Other Approaches

We compare RMNet with several recently proposed state-of-the-art models, including SRNet [4], GBRAS-Net [5], SiaStegNet [6], LWENet [7], and DFNet [9]. As shown in Table 1, our proposed method achieves the highest detection accuracy across the three steganographic algorithms at 0.4 bpp, 0.3 bpp, and 0.2 bpp. In particular, RMNet outperforms LWENet by up to 1.17%, DFNet by 0.62%, SiaStegNet by 6.62%, GBRAS-Net by 5.43%, and SRNet by 4.71%. At 0.1 bpp, RMNet obtains the best accuracy on the Hill algorithm, while achieving the second-best results on WOW and S-UNIWARD. As shown in Figure 2a, the ROC curves demonstrate the performance of six models on the Hill steganographic algorithm at 0.4 bpp. RMNet attains the largest area under the ROC curve, suggesting that it achieves the best detection performance among all compared models. As shown in Figure 2b, the validation accuracy evolves steadily with increasing epochs, where RMNet likewise attains the highest performance among all evaluated models.

4.4. Model Generalization Performance

We evaluate generalization by testing on mismatched steganographic algorithms. Specifically, six models—LWENet, DFNet, SiaStegNet, GBRAS-Net, SRNet, and RMNet—are trained on datasets generated using the same 0.4 bpp S-UNIWARD algorithm, and then tested on datasets generated by WOW, HUGO, MiPOD, and Hill algorithms to assess their generalization capability. As shown in Table 2, all models exhibit varying degrees of accuracy degradation when detecting cross-algorithm steganography. Nevertheless, our proposed RMNet consistently maintains the highest detection performance across all cases.

4.5. Detailed Exploration of the Relationship Mining Module

4.5.1. Investigation of Graph Node Partitioning

In the Relationship Mining Module, converting two-dimensional feature maps into graph-structured data is a critical step. The number of node partitions serves as an important hyperparameter. Too few partitions causes information loss via pooling; too many increases computational complexity and hinders effective global relation modeling. To investigate the optimal partitioning strategy, we conducted the experiments summarized in Table 3. The results indicate that, within a reasonable complexity range, partitioning the feature map into

16 \times 16

nodes yields the best overall performance.

4.5.2. On the Local Fully-Connected Range in the Adjacency Matrix

To investigate the impact of the local fully-connected range (R) in the adjacency matrix on model performance, we designed comparative experiments ranging from purely dynamic connections (

R = 0

) to global full connections (

R = Global

). As shown in Table 4, the size of the local range R has a significant influence on performance. When

R = 5

, the model achieves the best performance under all test conditions, with accuracy markedly higher than other configurations. This indicates that a moderate local fully-connected range is crucial. Both

R = 0

(no local connections) and excessively large R values lead to performance degradation, thereby validating the effectiveness of our proposed hybrid strategy of “local full connection with global dynamic connection”. An excessively large R introduces redundant weak connections as noise, while too small an R fails to sufficiently exploit essential local relationships.

4.5.3. Investigation of the Threshold for Dynamic Connections

In addition, we conducted experiments on the threshold

τ

in the global dynamic connections. This threshold determines the connection condition between the current node and nodes outside its local region. As

τ

increases, only nodes with stronger similarity retain their connection weights, while those with similarity lower than

τ

are considered weak connections and set to zero to prevent interference with global features. The experimental results in Table 5 indicate that a higher filtering threshold is effective. Specifically,

τ = 0.8

can filter out a large number of weak connections with low similarity, thereby avoiding the introduction of irrelevant noise and ensuring that the preserved global connections correspond to strongly related nodes. Consequently, the message passing process becomes more efficient and precise.

4.5.4. Comparison Between RMNet and Vision Transformer-Based Steganalysis Models

According to the experimental results presented in Table 6, the RMNet model based on relationship mining demonstrates significantly superior detection performance compared with the Vision Transformer–based Luo-Net. This finding empirically substantiates the advantage of relationship mining techniques: by explicitly modeling the local dependencies among pixels through a graph-structured representation, RMNet can more directly and efficiently capture the subtle statistical perturbations introduced by steganographic operations. Consequently, it exhibits stronger robustness and higher detection accuracy when confronted with complex image content. In contrast, although Vision Transformer (ViT) models possess powerful global modeling capabilities, their attention mechanisms tend to be more susceptible to interference from semantic information in images, thereby reducing their efficiency in distinguishing weak steganographic signals from strong background content.

4.5.5. Investigation of the Teleport Probability $α$ and the Loss Function Weights in Graph Computation

We conducted an in-depth investigation into the teleport probability (

α

) used in the graph computation process. A higher value of

α

indicates that each node has a greater tendency to retain its own information during propagation, while a lower value implies that the node relies more heavily on the information aggregated from its neighboring nodes. As shown in Table 7, the model achieves its best performance when

α = 0.3

, indicating that the most effective message propagation occurs when neighboring information dominates and self-information serves as auxiliary support. This result underscores the importance of maintaining an appropriate balance between self-retention and neighborhood dependency in relationship mining–based graph learning frameworks.

In the feature distribution enhancement module, we investigated the relationship among the weight coefficients of the cross-entropy loss, the feature distribution loss, and the distance constraint loss, denoted as

λ_{1}

,

λ_{2}

and

λ_{3}

, respectively. According to the data presented in Table 7, the model achieves the highest detection accuracy when the three weights are approximately equal. This observation indicates that the model performs best when these three loss components contribute jointly and uniformly to the gradient optimization process.

4.6. Visualization and Complexity Analysis

We use t-SNE to visualize feature discriminability, as class separation reflects the learned representations. We compared the t-SNE visualizations of features learned by the baseline model and by our model with the feature fusion and relationship mining modules. As shown in Figure 3, after applying the feature fusion and relationship mining modules, the distributions of the Cover and Stego features in the reduced-dimensional space are significantly optimized. Compared to Figure 3a, the overlapping and confusing regions near the class decision boundaries in RMNet are greatly reduced, demonstrating that the proposed modules are capable of learning more discriminative feature representations.

In addition, we employed the Grad-CAM [34] technique to visualize the regions of the images that the model attends to. In Grad-CAM, warmer colors indicate stronger attention from the model. As shown in Figure 4, compared with the heatmaps of the backbone network in Figure 4c, the heatmaps of RMNet in Figure 4d focus more on the regions where steganographic information is embedded, thereby improving detection accuracy. This further validates the effectiveness of our proposed framework.

Model complexity is a key performance indicator. We evaluated each model’s training time and parameters to assess efficiency. According to the data presented in Table 8, the proposed RMNet contains slightly more parameters (0.91 million) than DFNet, LWENet, and SiaStegNet. However, it requires less training time than SiaStegNet and DFNet, while simultaneously achieving higher detection accuracy with only a marginal increase in parameters. This demonstrates the efficiency advantage of the proposed model in terms of both computational cost and training effectiveness.

4.7. Ablation Studies

4.7.1. Effectiveness of the Feature Fusion Module

As shown in Table 9, we compared the performance differences before and after introducing the Feature Fusion Module. The experimental results demonstrate that incorporating the Feature Fusion Module consistently improves performance under all test conditions. This validates the module’s effectiveness. The cross-attention mechanism effectively integrates shallow local details with deep global semantics, thereby enhancing the strength of steganographic signals and improving classification performance.

4.7.2. Effectiveness of the Relationship Mining Module

To further analyze the role of the Relationship Mining Module, we designed four comparative experiments in Table 10: Model 1 represents the baseline model; Model 2 incorporates only the relation-aware component; Model 3 incorporates only the feature distribution enhancement component; and Model 4 includes the complete Relationship Mining Module (combining both relation-aware and feature distribution enhancement). The experimental results demonstrate that progressively adding module components leads to consistent performance improvements. Using only the relation-aware component (Model 2) or only the feature distribution enhancement component (Model 3) provides certain gains, but the complete module (Model 4) achieves the best overall performance. This indicates a synergy between the two components: the relation-aware component captures spatial dependencies via the adaptive adjacency matrix, while the feature distribution enhancement component optimizes global representations. Their combination enables more effective modeling of the spatial distribution characteristics of steganographic signals, thereby improving detection performance.

4.8. Performance on the Alaska #2 Dataset

To further verify the effectiveness of the proposed model, we also conducted experiments on the ALASKA#2 dataset [35], which is more representative of real-world scenarios since the images are collected from various devices. This dataset enables us to evaluate the model’s detection capability under more complex conditions. Specifically, we selected 10,000 grayscale images, resized them to

256 \times 256

, and split them into training, validation, and testing sets with a ratio of 6:2:2. As shown in Table 11, the proposed model also achieves the best detection performance on the ALASKA#2 dataset.

5. Conclusions

We proposed a novel relation-mining steganalysis model to address CNN limitations in capturing global spatial dependencies. The core of the proposed model lies in two innovative modules: the Feature Fusion Module (FFM) and the Relation Mining Module (RMM). The FFM leverages a cross-attention mechanism to effectively integrate the complementary strengths of deep and shallow convolutional features. The RMM employs graph-based techniques to achieve global feature modeling, with an adaptive adjacency matrix strategy that ensures a balance between local and global properties. Furthermore, a feature distribution enhancement mechanism within the RMM further amplifies global feature distinctions, thereby improving overall model performance.

Ablation studies validated each component, and we compared our approach with state-of-the-art models. Performance was evaluated on accuracy, generalization, and complexity, using t-SNE and Grad-CAM visualizations. Experimental results demonstrate that our model achieves state-of-the-art detection performance on the BOWS2, BOSSbase, and ALASKA#2 datasets.

In future work, we plan to extend this relation mining framework to the domain of few-shot steganalysis, aiming to further improve existing methods and achieve competitive or even superior performance with substantially fewer training samples.

Author Contributions

Conceptualization, R.Y. and Y.Y.; methodology, R.Y.; software, R.Y.; validation, R.Y. and X.M.; formal analysis, R.Y.; investigation, R.Y.; resources, Y.Y. and L.Z.; data curation, R.Y.; writing—original draft preparation, R.Y.; writing—review and editing, X.M. and Y.Y.; visualization, R.Y.; supervision, Y.Y. and L.Z.; project administration, Y.Y.; funding acquisition, Y.Y. and L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The Natural Science Foundation of China (Grant No. 72293583, Grant No. 72293580), the Opening Project of Police Integration Computing Key Laboratory of Sichuan Province (No. JWRH202401002), and the 111 Project (Grant No. B21049).

Data Availability Statement

To prove the reliability of our experimental results, we have released the source code to https://github.com/Yang-Da-xiong/RMNet (accessed on 1 November 2025).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Farooq, N.; Selwal, A. Image steganalysis using deep learning: A systematic review and open research challenges. J. Ambient Intell. Human Comput. 2023, 14, 7761–7793. [Google Scholar] [CrossRef]
Lin, K.; Li, B.; Li, W.; Barni, M.; Tondi, B.; Liu, X. Constructing an intrinsically robust steganalyzer via learning neighboring feature relationships and self-adversarial adjustment. IEEE Trans. Inf. Forensics Secur. 2024, 19, 9390–9405. [Google Scholar] [CrossRef]
Yedroudj, M.; Comby, F.; Chaumont, M. Yedroudj-Net: An efficient CNN for spatial steganalysis. In Proceedings of the IEEE International Conference Acoustics, Speech, and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 2092–2096.
Boroumand, M.; Chen, M.; Fridrich, J. Deep residual network for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 2019, 14, 1181–1193. [Google Scholar] [CrossRef]
Reinel, T.-S.; Brayan, A.A.H.; Alejandro, B.O.M.; Alejandro, M.R.; Daniel, A.G.; Alejandro, A.G.; Alejandro Buenaventura, B.J.; Simon, O.-A.; Gustavo, I.; Raúl, R.-P.; et al. JGBRAS-Net: A convolutional neural network architecture for spatial image steganalysis. IEEE Access 2021, 9, 14340–14350. [Google Scholar] [CrossRef]
You, W.; Zhang, H.; Zhao, X. A Siamese CNN for image steganalysis. IEEE Trans. Inf. Forensics Secur. 2021, 16, 291–306. [Google Scholar] [CrossRef]
Weng, S.; Chen, M.; Yu, L.; Sun, S. Lightweight and effective deep image steganalysis network. IEEE Signal Process. Lett. 2022, 29, 1888–1892. [Google Scholar] [CrossRef]
Luo, G.; Wei, P.; Zhu, S.; Zhang, X.; Qian, Z.; Li, S. Image steganalysis with convolutional vision transformer. In Proceedings of the IEEE International Conference Acoustics, Speech, and Signal Processing (ICASSP), Singapore, 22–27 May 2022; pp. 3089–3093.
Fu, T.; Chen, L.; Jiang, Y.; Jia, J.; Fu, Z. Image steganalysis based on dual-path enhancement and fractal downsampling. IEEE Trans. Inf. Forensics Secur. 2025, 20, 1–16. [Google Scholar] [CrossRef]
Wei, K.; Luo, W.; Huang, J. Color image steganalysis based on pixel difference convolution and enhanced transformer with selective pooling. IEEE Trans. Inf. Forensics Secur. 2024, 19, 9970–9983. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference Learning Representations (ICLR), Toulon, France, 24–26 April 2017; pp. 1–14. [Google Scholar]
Sharma, K.; Lee, Y.-C.; Nambi, S.; Salian, A.; Shah, S.; Kim, S.-W.; Kumar, S. A survey of graph neural networks for social recommender systems. ACM Comput. Surv. 2024, 56, 1–34. [Google Scholar] [CrossRef]
Ye, Z.; Kumar, Y.J.; Sing, G.O.; Song, F.; Wang, J. A comprehensive survey of graph neural networks for knowledge graphs. IEEE Access 2022, 10, 75729–75741. [Google Scholar] [CrossRef]
Bera, A.; Wharton, Z.; Liu, Y.; Bessis, N.; Behera, A. SR-GNN: Spatial relation-aware graph neural network for fine-grained image categorization. IEEE Trans. Image Process. 2022, 31, 6017–6031. [Google Scholar] [CrossRef] [PubMed]
Sikdar, A.; Liu, Y.; Kedarisetty, S.; Zhao, Y.; Ahmed, A.; Behera, A. Interweaving insights: High-order feature interaction for fine-grained visual recognition. Int. J. Comput. Vis. 2025, 133, 1755–1779. [Google Scholar] [CrossRef] [PubMed]
Klicpera, J.; Bojchevski, A.; Günnemann, S. Predict then propagate: Graph neural networks meet personalized PageRank. In Proceedings of the International Conference Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Lyu, S.; Farid, H. Detecting hidden messages using higher-order statistics and support vector machines. In Proceedings of the International Workshop on Information Hiding, Berlin, Germany, 7–9 October 2002; pp. 340–354. [Google Scholar]
Fridrich, J.; Kodovsky, J. Rich models for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 2012, 7, 868–882. [Google Scholar] [CrossRef]
Kodovsky, J.; Fridrich, J.; Holub, V. Ensemble classifiers for steganalysis of digital media. IEEE Trans. Inf. Forensics Secur. 2012, 7, 432–444. [Google Scholar] [CrossRef]
Tang, W.; Li, H.; Luo, W.; Huang, J. Adaptive steganalysis based on embedding probabilities of pixels. IEEE Trans. Inf. Forensics Secur. 2016, 11, 734–745. [Google Scholar] [CrossRef]
Ye, J.; Ni, J.; Yi, Y. Deep learning hierarchical representations for image steganalysis. IEEE Trans. Inf. Forensics Secur. 2017, 12, 2545–2557. [Google Scholar] [CrossRef]
Qian, Y.; Dong, J.; Wang, W.; Tan, T. Learning and transferring representations for image steganalysis using convolutional neural network. In Proceedings of the IEEE International Conference Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 2752–2756. [Google Scholar]
Xu, G.; Wu, H.Z.; Shi, Y.Q. Structural design of convolutional neural networks for steganalysis. IEEE Signal Process. Lett. 2016, 23, 708–712. [Google Scholar] [CrossRef]
Huang, S.; Zhang, M.; Ke, Y.; Bi, X.; Kong, Y. Image steganalysis based on attention augmented convolution. Multimed. Tools Appl. 2022, 81, 19471–19490. [Google Scholar] [CrossRef]
Liu, Q.; Zhou, L.; Wu, H. Graph representation learning for spatial image steganalysis. In Proceedings of the IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), Shanghai, China, 26–28 September 2022; pp. 1–5. [Google Scholar]
Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Liu, Q.; Yang, Z.; Wu, H. JPEG steganalysis based on steganographic feature enhancement and graph attention learning. J. Electron. Imaging 2023, 32, 033032. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
Bas, P.; Filler, T.; Pevný, T. Break our steganographic system: The ins and outs of organizing BOSS. In Proceedings of the International Workshop on Information Hiding, Prague, Czech Republic, 18–20 May 2011; pp. 59–70. [Google Scholar]
Bas, P.; Furon, T. BOWS-2. 2007. Available online: https://data.mendeley.com/datasets/kb3ngxfmjw/1 (accessed on 2 June 2023).
Holub, V.; Fridrich, J. Designing steganographic distortion using directional filters. In Proceedings of the IEEE International Workshop on Information Forensics and Security (WIFS), Costa Adeje, Spain, 2–5 December 2012; pp. 234–239. [Google Scholar]
Holub, V.; Fridrich, J.; Denemark, T. Universal distortion function for steganography in an arbitrary domain. EURASIP J. Inf. Secur. 2014, 1, 1. [Google Scholar] [CrossRef]
Li, B.; Wang, M.; Huang, J.; Li, X. A new cost function for spatial image steganography. In Proceedings of the IEEE International Conference Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 4206–4210. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
Cogranne, R.; Giboulot, Q.; Bas, P. The ALASKA steganalysis challenge: A first step towards steganalysis. In Proceedings of the ACM Workshop on Information Hiding and Multimedia Security, Paris, France, 3–5 July 2019; pp. 125–137. [Google Scholar]

Figure 1. The architecture of the RMNet network: (a) Backbone network for feature extraction, (b) Feature Fusion Module, (c) Relationship Mining Module.

Figure 2. Performance comparison: (a) ROC curves for Hill at 0.4 bpp. (b) Validation accuracy for WOW at 0.4 bpp.

Figure 3. (a) t-SNE visualization of features learned by the baseline model. (b) t-SNE visualization of features learned by RMNet, showing improved class separability between Cover and Stego samples.

Figure 4. Grad-CAM visualization of RMNet compared with backbone network LWENet after introducing the relationship mining technique. (a) Original image. (b) Stego image. (c) LWENet embedding heatmap. (d) RMNet embedding heatmap.

Table 1. Comparison of detection accuracy (%) of different models on BOSSbase & BOWS2 datasets at various embedding rates. Bold indicates the highest accuracy under the same algorithm.

Algorithm	bpp	LWENet	DFNet	SiaSteg-Net	GBRAS-Net	SRNet	RMNet
WOW	0.4	96.63	97.10	95.86	95.98	94.95	97.15
S-UNIWARD	0.4	96.42	96.39	95.42	95.41	94.72	96.86
Hill	0.4	92.18	93.02	90.99	90.13	89.37	93.35
WOW	0.3	95.31	95.41	93.29	93.55	90.70	95.66
S-UNIWARD	0.3	93.84	94.42	93.83	91.94	88.10	94.66
Hill	0.3	90.52	90.55	88.23	87.70	81.35	90.58
WOW	0.2	92.80	92.51	86.51	90.73	83.24	93.13
S-UNIWARD	0.2	90.92	91.27	85.03	89.30	81.62	91.61
Hill	0.2	85.59	85.93	80.35	81.25	76.47	85.96
WOW	0.1	85.96	86.47	78.23	80.32	74.13	86.20
S-UNIWARD	0.1	81.02	82.70	73.78	76.47	68.96	82.03
Hill	0.1	76.19	74.64	70.62	70.25	68.66	77.34

Table 2. Detection accuracy (%) of different models under mismatched training-testing scenarios on BOSSbase & BOWS2 datasets (training on 0.4 bpp S-UNIWARD). Bold indicates the highest accuracy under the same algorithm.

Algorithm	bpp	LWENet	DFNet	SiaSteg-Net	GBRAS-Net	SRNet	RMNet
WOW	0.4	94.36	95.25	94.50	92.93	93.56	95.75
HUGO	0.4	78.79	76.86	81.37	81.22	84.60	82.31
MiPOD	0.4	86.81	86.59	87.31	85.02	87.43	87.74
Hill	0.4	74.88	66.69	75.89	75.84	76.49	76.80

Table 3. Exploration of graph node partitioning on the BOSSbase & BOWS2 datasets (Detection accuracy %). Bold indicates the highest accuracy under the same algorithm.

Algorithm	bpp	4 × 4	8 × 8	16 × 16	32 × 32
WOW	0.4	96.12	96.83	97.15	96.25
S-UNIWARD	0.4	96.20	96.57	96.86	96.74
Hill	0.4	92.31	92.78	93.35	93.12

Table 4. On the local range of the adjacency matrix, where R = num indicates that all nodes within a distance of num from the current node are treated as fully connected. The dataset is BOSSbase & BOWS2 (Detection accuracy %). Bold indicates the highest accuracy under the same algorithm.

Algorithm	bpp	Model 1 (R = 0)	Model 2 (R = 1)	Model 4 (R = 3)	Model 5 (R = 5)	Model 5 (R = 7)	Model 6 (R = Global)
WOW	0.4	97.01	96.82	96.88	97.15	96.95	96.93
S-UNIWARD	0.4	96.71	96.53	96.34	96.86	96.42	96.31
Hill	0.4	92.71	92.58	92.63	93.35	92.22	92.10

Table 5. Selection of the global threshold

τ

on BOSSbase & BOWS2 (Detection accuracy %). Bold indicates the highest accuracy under the same algorithm.

Table 5. Selection of the global threshold

τ

on BOSSbase & BOWS2 (Detection accuracy %). Bold indicates the highest accuracy under the same algorithm.

Algorithm	bpp	Model 1 ( $τ = 0.2$ )	Model 2 ( $τ = 0.4$ )	Model 3 ( $τ = 0.6$ )	Model 4 ( $τ = 0.8$ )
WOW	0.4	96.35	96.65	96.82	97.15
S-UNIWARD	0.4	96.12	96.42	96.32	96.86
Hill	0.4	96.15	92.27	92.35	93.35

Table 6. Comparison between Vision Transformer–based Luo-Net and relationship mining–based RMNet under different steganographic algorithms (Detection accuracy %). Bold indicates the highest accuracy under the same algorithm.

Method	Model	Algorithms
Method	Model	0.4 bpp-Hill	0.4 bpp-WOW
Vision Transformer	Luo-Net	85.61	92.10
Relationship Mining	RMNet	89.45	95.18

Table 7. Discussion on the selection of teleport probability

α

and loss weights

λ_{1}

,

λ_{2}

, and

λ_{3}

on BOSSbase & BOWS2 (Detection accuracy %). Bold indicates the highest accuracy under the same algorithm.

Table 7. Discussion on the selection of teleport probability

α

and loss weights

λ_{1}

,

λ_{2}

, and

λ_{3}

on BOSSbase & BOWS2 (Detection accuracy %). Bold indicates the highest accuracy under the same algorithm.

Teleport Probability	Loss Weights			Algorithm
$α$	$λ_{1}$	$λ_{2}$	$λ_{3}$	0.4 bpp-Hill
0.3	1	0.5	0.5	92.94
0.3	1	2	2	93.17
0.3	1	5	5	91.26
0.3	1	1	1	93.35
0.1	1	1	1	92.94
0.5	1	1	1	92.62
0.7	1	1	1	92.86
0.9	1	1	1	92.38

Table 8. Comparison of training time and number of parameters between the proposed model and other methods. Bold indicates the highest accuracy under the same algorithm.

Models	Number of Parameters (M)	Training Time (h)
SRNet	4.77	31.7
DFNet	0.30	16.2
LWENet	0.38	11.5
SiaStegNet	0.71	14.3
RMNet	0.91	13.2

Table 9. Ablation experiment on the Feature Fusion Module. The dataset is BOSSbase & BOWS2 (Detection accuracy %). Bold indicates the highest accuracy under the same algorithm.

Algorithm	bpp	Feature Fusion ✕	Feature Fusion ✓
WOW	0.4	96.55	97.15
S-UNIWARD	0.4	96.69	96.86
Hill	0.4	93.05	93.35

Table 10. Ablation experiment on the Relationship Mining Module. The dataset is BOSSbase & BOWS2 (Detection accuracy %). Bold indicates the highest accuracy under the same algorithm.

Components		Algorithm
Relation-Aware	Feature Distribution Enhancement	WOW (0.4 bpp)	S-UNIWARD (0.4 bpp)	Hill (0.4 bpp)
✕	✕	96.63	96.42	92.18
✓	✕	96.72	96.63	92.34
✕	✓	96.85	96.77	92.82
✓	✓	97.15	96.86	93.35

Table 11. Comparison of detection accuracy (%) of different models on ALASKA#2 datasets. Bold indicates the highest accuracy under the same algorithm.

Algorithm	bpp	LWENet	DFNet	SiaSteg-Net	GBRAS-Net	SRNet	RMNet
WOW	0.6	72.43	72.03	71.73	64.92	70.80	73.05
S-UNIWARD	0.6	71.50	70.62	71.13	69.88	69.97	71.65
Hill	0.6	69.38	68.85	67.70	67.47	62.00	69.43

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, R.; Yang, Y.; Zhou, L.; Meng, X. A Steganalysis Method Based on Relationship Mining. Electronics 2025, 14, 4347. https://doi.org/10.3390/electronics14214347

AMA Style

Yang R, Yang Y, Zhou L, Meng X. A Steganalysis Method Based on Relationship Mining. Electronics. 2025; 14(21):4347. https://doi.org/10.3390/electronics14214347

Chicago/Turabian Style

Yang, Ruiyao, Yu Yang, Linna Zhou, and Xiangli Meng. 2025. "A Steganalysis Method Based on Relationship Mining" Electronics 14, no. 21: 4347. https://doi.org/10.3390/electronics14214347

APA Style

Yang, R., Yang, Y., Zhou, L., & Meng, X. (2025). A Steganalysis Method Based on Relationship Mining. Electronics, 14(21), 4347. https://doi.org/10.3390/electronics14214347

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Steganalysis Method Based on Relationship Mining

Abstract

1. Introduction

2. Related Work

2.1. Development of Steganalysis

2.2. Development of Relationship Mining Techniques

3. Method

3.1. Overview

3.2. Feature Fusion Module

3.3. Relationship Mining Module

3.3.1. Adaptive Adjacency Matrix

3.3.2. Relation-Aware

3.3.3. Feature Distribution Enhancement

4. Experiments

4.1. Datasets

4.2. Experimental Environment

4.3. Comparison with Other Approaches

4.4. Model Generalization Performance

4.5. Detailed Exploration of the Relationship Mining Module

4.5.1. Investigation of Graph Node Partitioning

4.5.2. On the Local Fully-Connected Range in the Adjacency Matrix

4.5.3. Investigation of the Threshold for Dynamic Connections

4.5.4. Comparison Between RMNet and Vision Transformer-Based Steganalysis Models

4.5.5. Investigation of the Teleport Probability α and the Loss Function Weights in Graph Computation

4.6. Visualization and Complexity Analysis

4.7. Ablation Studies

4.7.1. Effectiveness of the Feature Fusion Module

4.7.2. Effectiveness of the Relationship Mining Module

4.8. Performance on the Alaska #2 Dataset

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.5.5. Investigation of the Teleport Probability $α$ and the Loss Function Weights in Graph Computation