Hybrid CNN-NLP Model for Detecting LSB Steganography in Digital Images

Angulo, Karen; Gil, Danilo; Yáñez, Andrés; Espitia, Helbert

doi:10.3390/asi8040107

Open AccessArticle

Hybrid CNN-NLP Model for Detecting LSB Steganography in Digital Images

¹

Facultad de Ingeniería, Pontificia Universidad Javeriana, Bogotá 110231, Colombia

²

Facultad de Ingeniería, Universidad Distrital Francisco José de Caldas, Bogotá 110231, Colombia

^*

Author to whom correspondence should be addressed.

Appl. Syst. Innov. 2025, 8(4), 107; https://doi.org/10.3390/asi8040107

Submission received: 7 June 2025 / Revised: 16 July 2025 / Accepted: 22 July 2025 / Published: 30 July 2025

Download

Browse Figures

Versions Notes

Abstract

This paper proposes a hybrid model that combines convolutional neural networks with natural language processing techniques for least significant bit-based steganography detection in grayscale digital images. The proposed approach identifies hidden messages by analyzing subtle alterations in the least significant bits and validates the linguistic coherence of the extracted content using a semantic filter implemented with spaCy. The system is trained and evaluated on datasets ranging from 5000 to 12,500 images per class, consistently using an 80% training and 20% validation partition. As a result, the model achieves a maximum accuracy and precision of 99.96%, outperforming recognized architectures such as Xu-Net, Yedroudj-Net, and SRNet. Unlike traditional methods, the model reduces false positives by discarding statistically suspicious but semantically incoherent outputs, which is essential in forensic contexts.

Keywords:

convolutional neural network; steganography; image processing; least significant bit; natural language processing

1. Introduction

Over the years, forensic computing has evolved from being an auxiliary discipline to becoming a relevant and crucial pillar in the digital field since it not only combines computer science and forensic sciences to collect legally admissible digital evidence in criminal and civil cases, but also provides essential evidence to determine the what, how, when, and where of events [1] through scientific techniques that allow the identification of malicious manipulations in systems, files, or data during the different phases of an investigation [2]. In essence, it seeks to guarantee the authenticity, originality, sameness, and integrity of digital evidence.

Steganography has emerged as a significant challenge in forensic analysis due to its ability to hide information without perceptibly altering the carrier files. Thanks to this ability, it is a tool that can be misused to hide illicit communications or manipulate digital evidence [3,4,5,6], which poses a risk, since the chain of custody can be affected, and it can even induce judicial errors and compromise decisions based on apparently legitimate evidence [7].

Adopting advanced technological approaches to address these threats is essential as investigations become more complex. Steganography based on the least significant bit (LSB) method is particularly difficult to detect as the alterations it introduces in digital images are minimal and practically imperceptible to the naked eye.

Faced with this problem, the need arises to implement a variety of techniques ranging from pixel-to-pixel comparison to the use of convolutional neural networks (CNNs), whose implementation has shown significant results since, with the application of deep learning, these networks are capable of identifying complex patterns and subtle characteristics in digital images, allowing the detection of imperceptible modifications and, consequently, the presence of hidden messages with high precision [3,6,8].

From the above, the development of a CNN-based model is proposed for the detection of hidden messages in grayscale digital images (PNG type) affected by the least significant bit (LSB) steganographic method to find out with certainty whether the images are genuine or not and, at the same time, a semantic validation process for the extracted message is integrated using natural language processing (NLP) techniques, to verify if the hidden message has linguistic coherence in the Spanish language.

1.1. Background and Advances in Steganography with CNN

Steganography, a discipline within image processing, focuses on concealing information within digital files, such as images, audio, or videos. Although its origin dates back to ancient practices, such as the tattooing of messages on the scalp in ancient Greece in the 5th century BC [9], in the digital age, it has evolved into much more sophisticated methods that allow information to be hidden quickly and efficiently [10,11].

Unlike cryptography, which protects content through encryption, steganography aims to conceal the existence of the message. This characteristic makes it a helpful tool in contexts where monitoring or detection mechanisms are desired, as it allows confidential data to be transmitted covertly [7,12,13]. Currently, its application extends to artistic, activist, and even criminal spheres, which has prompted the development of multiple methods for its detection [14,15].

Among the most widely used methods is the LSB technique, which alters the lowest-order bit of each pixel, generating changes imperceptible to the naked eye [16,17]. However, if the process is not performed correctly, visual artifacts or statistical patterns can be generated that compromise the invisibility of the message.

To address this issue, detection tools have been developed, initially based on statistical methods such as Rich Models (RM) and Subtractive Residual Models (SRM), which, although useful in certain contexts, have limitations when trained from images with compression, noise, or natural variations [4,14].

In response to these limitations, deep learning-based techniques have emerged, which have proven effective in steganalysis tasks due to their ability to identify complex image patterns [18,19]. In particular, CNNs excel in terms of their hierarchical architecture, which enables the extraction of features at different levels of abstraction, thereby facilitating the detection of subtle differences between images with and without steganography [20,21].

Among the most relevant models designed specifically for this task are Xu-Net, Yedroudj-Net, and SRNet, which incorporate preprocessing stages to amplify the LSB residual. Other notable works include that of Linga Murthy et al. [22], who implemented transfer learning by pretraining the inputs before fine-tuning the network to optimize detection. Similarly, Kich et al. [23] structured their proposal in three stages, input preparation, pretraining, and binary classification using CNNs, achieving superior results compared to traditional methodologies.

While these models may present certain biases, they have demonstrated improvements over common algorithms. Other notable contributions include the work of Ke et al. [24], who developed a network with multiple filter sizes (

3 \times 3

,

6 \times 6

,

12 \times 12

, and

15 \times 15

) to adapt to different resolutions without losing accuracy, and that of Ker [25], who explored the modification of the two least significant bits per pixel, a technique that, although more visually risky, can evade some detection algorithms focused only on the first bit.

Similarly, Kim et al. [26] proposed a dual neural network that applies specialized preprocessing filters to each convolutional branch. This design enhances detection performance by identifying complementary features in parallel; however, utilizing two streams can lead to ambiguity in the outputs, potentially affecting model stability.

Although progress in detection has been significant, most research focuses solely on confirming the presence of steganography without delving into the content or its semantic validity. In this regard, works such as that of Vijjapu et al. [27] have explored message extraction in various formats, but their experimental tests were limited to BMP images, which limits the generalizability of their conclusions.

1.2. Paper Approach and Organization

Contemporary approaches to steganalysis primarily focus on detecting statistical patterns, often overlooking the semantic coherence of the extracted content. However, this issue is critical in forensic scenarios, where the volume of digital evidence is substantial, and progressive analysis demands prioritizing semantically relevant messages by minimizing irrelevant data, particularly by reducing false positives. While numerous tools are available for digital evidence processing, few integrate steganalytic capabilities, and, to date, none has incorporated post-detection linguistic validation. In response to this gap, the model proposed in this study is designed to perform efficient steganalysis on digital images subjected to LSB steganography and subsequently evaluate the semantic coherence of the recovered message, thereby enhancing the interpretability and evidentiary value of the extracted information.

The document is organized as follows: First, Section 2 introduces the methodology employed, while Section 3 presents the CNN-NLP architecture design for LSB steganography detection. Section 4 details the results and comparison with existing models. Finally, the discussion and conclusions are presented in Section 5 and Section 6.

2. Methodology

This paper proposes a hybrid model-based methodology that combines convolutional neural networks and natural language processing techniques to detect the least significant bit (LSB)-based steganography in digital images. The research is based on a labeled dataset composed of both clean and steganographic images, where the hidden data consist of Spanish-language texts generated or selected from standardized linguistic corpora. These images were labeled to facilitate supervised training of the model. It is relevant to mention that the image datasets were created from real-world urban and rural photographs, which include a wide variety of scenes with vegetation, buildings, roads, fields, and vehicles. These natural images contain diverse textures and structures, providing a realistic basis for training and evaluating the model.

The dataset was built using images from an open-access database, which were extracted and processed locally. Steganographic images were generated through controlled LSB insertion. Each stego-image included a hidden message composed of variable-length text containing complete syntactic structures and semantically coherent content.

The neural network was trained using MATLAB R2024b with the Deep Learning Toolbox for building and tuning the CNN architecture. Meanwhile, linguistic analysis was performed in Python 3.11, using the spaCy library (v3.7) along with the es_core_news_md model to validate the semantic coherence of the hidden messages extracted. The integration between both environments was handled through MATLAB’s Pyenv function and auxiliary scripts responsible for communication and result transfer.

The simulations were carried out on a desktop with an AMD Ryzen 7 5800X 8-core processor, 32 GB of RAM, and an NVIDIA GeForce RTX 3070 Ti GPU with 6 GB of memory. Training was accelerated on the GPU to optimize computation time, particularly when working with large datasets of up to 25,000 images.

Figure 1 illustrates the methodology employed in the development of this work. First, a documentary review was conducted on the current state of research on steganography detection using convolutional neural networks, revealing various developments (as discussed in the introduction). However, the previous research has not explored in-depth improvements to this process using natural language processing. Consequently, the next stage of the research approach was defined, which involved designing a hybrid CNN-NLP model to detect LSB steganography in digital images.

During the preprocessing stage, the original grayscale images (with a resolution of 640 × 640 pixels) were resized to 150 × 150 pixels using nearest-neighbor interpolation, which preserves pixel-level integrity and avoids introducing interpolated artifacts that could distort LSB patterns. All images were required to be in PNG format; images in other formats were converted using lossless compression tools to maintain the integrity of the embedded bits.

A Laplacian-based enhancement filter was applied to each image to compute a residual channel emphasizing local intensity variations. This residual was normalized to the range [0, 1] using Min–Max scaling. Simultaneously, the first four least significant bit planes were extracted using the “bitget” function. These five components (one residual and four LSB planes) were stacked into a 5-channel input tensor, serving as input to the CNN. This preprocessing scheme increased the capability of the model to capture subtle distortions indicative of steganographic content while maintaining consistency across the dataset.

The CNN component was responsible for extracting discriminative visual features from the processed images [28]. Its output was passed through dense layers and finalized by a binary classification layer, which determined whether the image contains steganographic content. The model was trained employing the Adam optimizer and a cross-entropy loss function with an 80–20% training–validation split. This decision was based on three factors: the large size of the dataset, the high computational cost of retraining CNN models in multiple folds, and consistency with protocols followed in benchmark architectures such as Xu-Net [11], Yedroudj-Net [27], and SRNet [29].

The selection of training hyper-parameters was guided by an empirical tuning strategy informed by previous research in CNN-based steganalysis [11,27,29]. Initial values were chosen based on their reported stability in similar tasks and refined through pilot experiments conducted on a stratified validation subset. Specifically, the learning rate was set to

1 \times 10^{- 4}

with a piecewise decay schedule (drop factor of 0.5 every 5 epochs), and the batch size was fixed at 64 to balance GPU memory usage and convergence behavior. The dropout rate (0.5) and

L_{2}

regularization factor (

10^{- 3}

) were selected to improve generalization and prevent overfitting. This configuration was found to provide consistent and robust performance across different dataset sizes, without the need for exhaustive hyper-parameter search frameworks.

After classification, the hidden message was decoded from the LSBs and analyzed using the NLP module. This module evaluated grammatical structure, named entities, and lexical patterns to validate the semantic coherence of the message and reduce false positives.

Next, the hidden texts were extracted using LSB decoding techniques and were subjected to NLP processes [30]. This process made it possible to leverage both visual and linguistic cues to improve steganography detection, allowing for the capture of relevant linguistic features that could reveal unusual patterns derived from concealment [31].

Linguistic coherence was assessed using semantic similarity scores calculated with the spaCy library. This produced a cosine similarity between the extracted message embedding and the Spanish reference texts. Then, from this base, a minimum threshold of 0.65 was established, meaning that only messages with a semantic similarity greater than or equal to this value were considered valid and meaningful (true positives). This threshold was selected by experimentally evaluating 100 extracted messages and analyzing their interpretability and linguistic fluency. Consequently, the value of 0.65 provided the best balance between detecting hidden but coherent messages and avoiding the meaningless or random sequences that can arise from false positives.

To further analyze the impact of the coherence threshold on the performance of the semantic validation stage, multiple similarity thresholds ranging from 0.50 to 0.90 were explored. A receiver operating characteristic (ROC) curve was generated to visualize the trade-off between the true-positive rate and the false-positive rate at each threshold. As expected, lower thresholds increased recall by admitting more messages, but at the expense of precision, as more incoherent sequences were incorrectly accepted. Conversely, higher thresholds reduced false positives but also excluded valid messages. The area under the curve (AUC) calculated was superior to 0.95, indicating that the NLP module exhibited strong discriminative capability in separating meaningful from meaningless messages across a wide range of similarity values.

Evaluation was performed using metrics such as accuracy, precision, recall, F1-score, and the confusion matrix to measure not only the overall accuracy of the system but also its ability to identify false positives and negatives. After obtaining the results, a comparison was made with other models used for image detection with stenography [32].

3. CNN-NLP Architecture Design for LSB Steganography Detection

The proposed model integrates two main components, a CNN specialized in steganography detection using the LSB method [22,23,26] and a semantic validation module implemented via NLP. This combination enables the classification of images as containing steganography, and also the verification of whether the extracted hidden content is linguistically consistent in Spanish, thereby increasing the system’s reliability in forensic contexts. The model consists of four main stages, described below.

3.1. Preprocessing

Each image in the dataset is grayscale and resized. A Laplacian filter is then applied to obtain a residual channel highlighting local intensity differences. The four least significant bit planes (LSBs) are extracted, producing a 5-channel input tensor (1 residual + 4 LSBs) designed to highlight subtle alterations introduced by masking techniques. For this case, the kernel is defined as follows:

K = \frac{1}{8} [\begin{matrix} - 1 & - 1 & - 1 \\ - 1 & 8 & - 1 \\ - 1 & - 1 & - 1 \end{matrix}]

(1)

This filter analyzes each pixel by comparing it to its eight neighbors and highlights points where the intensity changes abruptly, which may indicate the presence of a steganographic pattern. Applying the kernel generates the residual channel presented in Equation (2):

R (x, y) = \frac{1}{8} (- 8 I (x, y) + \sum_{i = - 1}^{1} \sum_{j = - 1}^{1} I (x + i, y + j))

(2)

Then, the first four bit planes of each pixel are extracted using Equation (3) to isolate the lowest weighted bits of each intensity value:

L S B_{k} (x, y) = \mod (⌊\frac{I (x, y)}{2^{k - 1}}⌋, 2)

(3)

Finally, the residual channel obtained in the previous stage and the four extracted bit planes are combined to form a three-dimensional tensor, where each of the five channels represents a distinct feature of the processed image. The tensor constitutes the direct input to the CNN and is defined as follows:

X (x, y, :) = [\frac{R (x, y)}{255}, L S B_{1} (x, y), L S B_{2} (x, y), L S B_{3} (x, y), L S B_{4} (x, y)]

(4)

Here,

X (x, y, :)

represents the characteristic vector corresponding to the spatial position

(x, y)

of the input three-dimensional tensor. The

(:)

operator indicates that all channels at that location are considered. The notation

L S B_{k} (x, y)

refers to the k-th least significant bit extracted from the intensity value of the pixel at position

(x, y)

.

3.2. CNN Architecture

The three-dimensional tensor resulting from the image preprocessing stage serves as the input to the proposed CNN architecture. The network consists of three convolutional blocks with residual connections, followed by a global average pooling (GAP) layer, a fully connected dense layer with 128 units (using 50% dropout) with the

L e a k y R e L U

activation function, and, finally, a “Softmax” layer utilized for binary classification.

Each residual convolutional block receives as input a three-dimensional tensor

X \in R^{150 \times 150 \times 5}

(see Equation (4)), which, as described above, is composed of a normalized residual image and four bit planes corresponding to the extracted LSBs. The residually connected convolutional blocks

B_{i}

are defined as follows:

Z_{1}^{(i)} = LeakyReLU (B N (W_{1}^{(i)} * X^{(i)} + b_{1}^{(i)}))

(5)

where a series of sequential operations is applied to

X^{(i)}

, and ∗ corresponds to the convolution operator. First, a two-dimensional convolution operation is performed with a set of filters

W_{1}^{(i)}

and a bias

b_{1}^{(i)}

. Next, the result of this operation is normalized using BN, which is batch normalization. Then, the

LeakyReLU

activation function is applied, which allows the passage of negative values, improving the gradient flow and preventing the deactivation of neurons.

The intermediate output

Z_{1}^{(i)}

is then processed by a second convolutional layer, which applies a new set of filters

W_{2}^{(i)}

and a bias

b_{2}^{(i)}

, followed again by batch normalization and a

LeakyReLU

activation, resulting in

Z_{2}^{(i)}

, as expressed in Equation (6):

Z_{2}^{(i)} = LeakyReLU (B N (W_{2}^{(i)} * Z_{1}^{(i)} + b_{2}^{(i)}))

(6)

Subsequently, as seen in Equation (7), a residual connection

Y^{(i)}

is implemented summing the output

Z_{2}^{(i)}

with the original input of the block

X^{(i)}

, which allows the input information to be preserved throughout the block, facilitating the training of deep networks. Finally, a spatial reduction with a window of

2 \times 2

and stride 2 is applied, which reduces the spatial dimensions of the activation map.

Y^{(i)} = MaxPool (Z_{2}^{(i)} + X^{(i)})

(7)

On the other hand, it is worth noting that

LeakyReLU

is the activation function used in all convolutional blocks, which allows the passage of gradients even when the activation is negative, thereby mitigating the problem of gradient flow and preventing neurons from becoming inactive during training, a common issue with

ReLU

. The activation function used for this case is defined in Equation (8):

f (x) = \{\begin{matrix} x, & if x > 0; \\ 0.2 x, & if x \leq 0 . \end{matrix}

(8)

All of the above makes the block residual and allows some of the original information to flow unaffected by the convolutions, improving learning and preventing the gradient from fading.

Once the three residual convolutional blocks have processed the input tensor, the resulting activation map passes through a GAP layer, which allows all the spatial information learned by the filters to be condensed into a feature vector

g \in R^{C}

by calculating the average of each channel (or filter) over the entire resulting image. The mathematical operation to generate this vector is shown in Equation (9):

g = \frac{1}{H W} \sum_{x = 1}^{H} \sum_{y = 1}^{W} Y^{(3)} (x, y, :)

(9)

The dimensions H and W correspond to the spatial height and width of the output of the last convolutional block. In the case of the proposed model, after applying three “Max-Pooling” layers with stride 2, the final dimensions are

H = 18

and

W = 18

, resulting in a global feature vector of a size equal to the number of filters (

C = 256

).

The resulting feature vector

g \in R^{C}

, obtained from the GAP layer as described in Equation (9), is then passed through a fully connected layer with 128 neurons. For this layer, a dropout process with a rate of

0.5

is applied, which helps reduce overfitting by preventing the co-adaptation of neurons during training. Finally, the result passes through the “Softmax” layer, which transforms the vector of raw outputs (logits) into a probability distribution according to the two classes present in this model. This results in a probability vector (see Equation (10)) associated with an input image belonging to class c where

{\hat{y}}_{c}

represents the probability of belonging to each class, With-Steganography or Without-Steganography, and

z_{c}

is the value of the

Logit

function (output without activating for class c).

{\hat{y}}_{c} = \frac{e^{z_{c}}}{\sum_{j = 1}^{2} e^{z_{j}}}, for c \in {1, 2}

(10)

Then, from the resulting probability,

\hat{y}

is compared with the true label, and the weighted cross-entropy loss function is used to quantify the error committed. The loss function is expressed in Equation (11):

L = - \sum_{c = 1}^{2} w_{c} y_{c} log ({\hat{y}}_{c})

(11)

Finally, it should be noted that the model was trained using the Adam optimizer, with an initial learning rate of

1 \times 10^{- 4}

, which was progressively reduced every 5 epochs (factor

0.5

). Likewise, to avoid overfitting,

L_{2}

regularization with a coefficient

λ = 0.001

was employed, and the early stopping mechanism was also utilized, which terminated the training when no improvements were observed in the validation metric over 10 consecutive iterations.

3.3. Message Classification and Extraction

After classification, if an image is identified as containing steganography, the hidden message is extracted from the LSBs of each pixel. The message is reconstructed by converting binary to text via ASCII encoding using Equation (12), where

b_{k}

denotes each bit sequentially extracted from the image:

Message = \sum_{k = 1}^{n} 2^{(7 - k)} \cdot b_{k}, \forall b_{k} \in {0, 1}

(12)

3.4. Semantic Validation with NLP

The extracted message is processed using the spaCy library to validate its linguistic coherence in Spanish and to determine whether it aligns with the expression in Equation (13), where T represents the message extracted from the image.

V (T) = \{\begin{matrix} 1, & if T contains valid entities or structures; \\ 0, & if T it is not an understandable text . \end{matrix}

(13)

Subsequently, the grammatical structure, the presence of named entities, and the frequency of words are analyzed according to a reference dictionary. To consider the message as valid

V (T) = 1

, the following condition must be met:

E_{T} + S_{T} + P_{T} > τ

(14)

where

τ

denotes the minimum linguistic validity threshold required to consider a message as meaningful. In this work, it was set to

0.3

. This evaluation considers features such as the presence of structured sentences (

S_{T}

), named entities (

E_{T}

), and the frequency of Spanish words (

P_{T}

), verified through natural language processing techniques.

The choice of

τ = 0.3

was determined by preliminary evaluations of a sample of 100 messages extracted from steganographic images. Several threshold values ranging from 0.1 to 0.6 were tested, and this value yielded the best trade-off between accepting coherent messages and filtering out semantically meaningless content. Messages classified as valid under this criterion showed consistent grammatical structure, named entities, and dictionary-based word frequency, with minimal inclusion of random or truncated strings. While this threshold is task-specific, it proved effective in reducing false positives during the semantic validation stage.

Additionally, based on experimental observations during the validation process, it was noted that messages shorter than 10 characters generally lacked sufficient syntactic structure or semantic content to be reliably validated. Therefore, this value can be considered a practical lower bound for the minimum detectable message length in the current CNN-NLP configuration. Messages below this threshold frequently failed to meet the combined linguistic criteria shown in Equation (14) and were classified as incoherent or invalid.

In brief, the developed model implements a specialized architecture for detecting LSB steganography in digital images, combining image processing techniques, deep convolutional neural networks, and semantic validation through natural language processing.

3.5. Model Summary

After training the convolutional neural network, the process displayed in Figure 2 is employed for a suspicious image. As observed, the procedure begins with retrieving the image from a database. Once obtained, the image is subjected to a preprocessing stage, during which it is transformed to meet the input requirements of the convolutional neural network. This preprocessing involves resizing and normalization operations to ensure compatibility with the model and optimize its performance.

Following this stage, the preprocessed image is fed into the CNN, which has been specifically trained to detect the presence of steganography—namely, the intentional concealment of information within digital images. The convolutional neural network analyzes the image and outputs a classification result indicating whether or not a hidden message is embedded in the image.

The CNN receives the preprocessed image (as a tensor) as input and processes it through an input layer followed by three convolutional blocks with residual connections. After these stages, a GAP layer is applied, which computes the spatial average of each feature map to produce a compact feature vector. This vector is then passed through a fully connected dense layer with 128 neurons. The final classification is obtained by applying the “Softmax” layer, which transforms the vector of raw outputs into a probability distribution for the two defined classes.

If the network determines that the image does contain hidden information (referred to as the With-Steganography case), the system proceeds to extract or decode the concealed message. However, the validity of the extracted message is not assumed by default. To ensure its linguistic coherence and meaningfulness, the message is subsequently analyzed using the spaCy library, a powerful tool for natural language processing in Python. This step helps verify that the message is not only structurally well-formed but also semantically coherent, thereby reducing the likelihood of false positives caused by noise or model inaccuracies.

This multi-stage architecture provides a robust and interpretable pipeline for detecting and validating hidden messages in images. By integrating advanced techniques from computer vision and natural language processing, the model ensures not only high classification accuracy but also semantic validation of the extracted message, reinforcing the forensic reliability of the detection process.

4. Results and Comparison with Existing Models

This section presents the main findings obtained after implementing and evaluating the CNN-NLP model. First, the experimental results derived from different dataset sizes are analyzed, including their impact on commonly used metrics. Furthermore, the model’s behavior during training is examined, with a focus on its stability and convergence capacity. Second, a comparison with representative models is conducted to assess the performance of the proposed approach in quantitative terms.

4.1. Evaluation of the Proposed Model

To evaluate the performance of the proposed CNN-NLP model, multiple experiments were conducted using data subsets of 5000, 7000, 9000, 10,000, and 12,500 images per class (With-Steganography and Without-Steganography). In all cases,

80 %

of the images were used for training and the remaining

20 %

for validation, ensuring a stratified and random distribution.

As an initial part of the performance analysis, Figure 3 presents the evolution of accuracy and the loss function over the training epochs, corresponding to the experiments with 7000, 9000, 10,000, and 12,500 images per class. This graph illustrates how, with a larger dataset, the model exhibits more progressive and stable convergence, characterized by noticeably smoother curves and fewer oscillations.

This behavior confirms that a larger volume of data significantly contributes to improving the robustness and stability of the learning process and the generalization capacity of the model.

Performance evaluation was carried out using classic binary classification metrics: accuracy corresponds to the proportion of correctly classified images out of the total evaluated; precision indicates how many of the images were correctly predicted to be With-Steganography, reducing false positives; recall reflects the model’s ability to identify all true cases, avoiding false negatives; and F1-score balances precision and sensitivity, representing an integrated performance measure, especially useful in balanced sets. Their definitions are presented in Equations (15)–(18).

\begin{matrix} Accuracy & = & \frac{T P + T N}{T P + T N + F P + F N} \end{matrix}

(15)

\begin{matrix} Precision & = & \frac{T P}{T P + F P} \end{matrix}

(16)

\begin{matrix} Recall & = & \frac{T P}{T P + F N} \end{matrix}

(17)

\begin{matrix} F 1 & = & 2 \frac{Precision \cdot Recall}{Precision + Recall} \end{matrix}

(18)

In Equations (15)–(18),

T P

represents the true positives,

T N

the true negatives,

F P

the false positives, and

F N

the false negatives. Table 1 presents the average results obtained for each experiment based on the number of images per class employed.

Figure 4 shows the confusion matrix obtained corresponding to the experiments with 7000, 9000, 10,000, and 12,500 images per class.

For the 5000-image dataset, the model achieved an accuracy of 98.20%, with 98.20% precision and an F1-score of 98.20%. Although the classification results were already solid, some variability in the training curves suggests the model was still learning to generalize under limited-data conditions.

In the 7000-image experiment, performance improved significantly: the model achieved an accuracy of 98.89%, with 98.93% precision and an F1-score of 99.12%. This suggests a more robust learning process with reduced error rates and earlier convergence.

In the case with 9000 images per class, the performance decreased slightly. This drop may be attributed to class imbalance or an increased sample diversity not fully captured during training, highlighting the importance of controlled dataset composition.

With 10,000 images per class, the model regained stability and delivered 99.35% accuracy, 99.35% precision, and an F1-score of 99.35%, supported by smooth learning curves and a notable reduction in classification errors across both classes.

Finally, for the largest set (12,500 images per class), a high-accuracy performance and 2500 hits per class were observed, with only two false positives and no false negatives, thus consolidating the highest accuracy value (99.96%), reflecting not only an excellent learning capacity but also robust behavior in large-scale conditions. These results confirm that increasing the volume of training data improves not only the precision but also the stability of the training process, as observed in the smoother loss and accuracy curves.

In addition to these classic metrics, the receiver operating characteristic (ROC) curves were analyzed to evaluate the discriminative capacity of the model under different decision thresholds (see Figure 5). An ROC curve plots the true-positive rate against the false-positive rate, and the area under the curve (AUC) quantifies the overall classification quality.

For the dataset with 5000 images per class, the AUC reached 0.9894, indicating a strong separation ability even with a moderate training volume. With 7000 and 10,000 images, the AUC values improved to 0.9941 and 0.9996, respectively, demonstrating greater robustness and convergence. Interestingly, the experiment with 9000 images showed a temporary drop in performance, yielding an AUC of 0.9417. This behavior may be attributed to variance in the validation subset or suboptimal tuning during training. Nevertheless, the best results were obtained with 12,500 images, where the AUC peaked at 0.9996, reflecting near-perfect separation between classes.

Overall, the high AUC values across the experiments reaffirm the model’s ability to distinguish between steganographic and non-steganographic images, complementing the precision and F1-scores obtained in earlier analyses.

In Figure 5 the result of the ROC curve shows the effect of varying this threshold on the true-positive rate and the false-positive rate. In all cases, the area under the curve value was above 0.94, which implies that the model performed with high efficiency in terms of its ability to discriminate messages with grammatical meaning.

Additionally, Figure 6, Figure 7 and Figure 8 display examples of the semantic validation process applied to messages extracted from images classified as With-steganography. In these cases, the model correctly identified the presence of steganography and evaluated whether the retrieved message made sense in Spanish. This evaluation was performed using natural language processing techniques, and only those messages that exceeded a minimum threshold of linguistic coherence were considered true positives.

4.2. Model Comparison

To validate the effectiveness of the proposed CNN-NLP model, a comparison was made with benchmark models in the field of steganalysis, including Xu-Net, Yedroudj-Net, and SRNet. These models have been specifically designed for image steganography detection and have been trained on databases such as BOSSbase. Xu-Net, for example, includes an initial convolution layer with fixed filters; Yedroudj-Net extends the architecture with additional layers and early batch normalization; and SRNet proposes a deeper network without pooling convolutions to preserve low-frequency information. The comparison focuses on three key aspects: accuracy, F1-score, and semantic validation capability.

As shown in Table 2, the proposed CNN-NLP model outperforms the traditional architectures in all performance metrics. In terms of accuracy, Xu-Net reaches 87.75%, Yedroudj-Net reaches 90.10%, and SRNet reaches 92.60%. In contrast, the CNN-NLP model achieves an accuracy of 99.96%, which represents an improvement of 7.36% compared to the best of the compared models (SRNet). This implies that most of the cases that it detects as carrying steganography effectively are carrying steganography, which significantly reduces false positives and strengthens the reliability of the system, a key aspect in forensic analysis.

The differences are also notable in terms of accuracy: the proposed model achieves 99.96%, representing an improvement of more than six percentage points compared to the best performer (93.80% with SRNet). This indicates that the proposed model not only accurately identifies altered images but also avoids misclassifying authentic images as manipulated, thus improving the overall accuracy of the system.

These results are due not only to the architectural design and the residual preprocessing applied, but also to the inclusion of semantic validation, which allows verification of whether the hidden message is consistent in Spanish. With additional validation and the absence of traditional models, false positives can be ruled out. Although they exhibit statistical patterns similar to those of a hidden message, they do not contain linguistically valid content. This enhances the utility of CNN-NLP, particularly in legal and forensic contexts where the content of the message is as crucial as its detection.

5. Discussion

The hybrid CNN-NLP model proposed for detecting LSB steganography in digital images demonstrated strong performance in controlled experimental settings, particularly in scenarios with well-preprocessed grayscale images and known encoding patterns. However, several significant limitations must be addressed before deploying the system in real-world conditions.

It is essential to point out that the comparison with models such as Xu-Net, Yedroudj-Net, and SRNet was based on different datasets, as these models were initially trained and evaluated on BOSSbase. While this limits direct equivalence in the performance metrics, the comparison still offers valuable insight into architectural differences and the potential impact of semantic validation. As a future line of research, it would be relevant to retrain the proposed CNN-NLP model using the BOSSbase dataset or similar standardized corpora, to conduct experiments under equivalent conditions and evaluate the generalizability of the model across diverse image sources.

One of the primary limitations is the reliance of the model on synthetically generated grayscale PNG datasets under idealized conditions. Although these datasets allow controlled experimentation and initial validation, they do not fully capture the variability and unpredictability of real-world steganographic cases, including JPEG compression distortions, color image variability, adversarial noise, or alternative embedding schemes. This limits the generalizability of the model to more diverse formats and practical forensic applications.

Specifically, JPEG images pose a unique challenge, as the lossy compression process tends to eliminate or distort the subtle bit-level patterns targeted by LSB-based steganography detection. Similarly, RGB color images introduce additional complexity due to inter-channel correlations and color-space transformations, requiring a different preprocessing pipeline (e.g., extraction of LSB planes from each RGB channel or transformation to the YCbCr color space). While the current model is optimized for grayscale PNGs to maximize LSB fidelity, adapting it to these other formats is a crucial direction for future work.

Another challenge concerns the computational cost of the hybrid architecture. The combination of a deep convolutional network with semantic analysis introduces a heavier inference load, making the system less suitable for deployment on low-power devices or in time-constrained environments. Additionally, while the NLP component significantly improves interpretability and semantic validation, its performance depends on the accurate recovery of the hidden message. Any distortion, truncation, or decoding error can compromise the text evaluation phase, leading to false negatives or overlooked threats.

An important limitation of the current semantic validation approach arises when the hidden messages are encrypted using ciphers such as Vigenère or other classical encryption schemes. In such cases, the extracted text lacks natural language structure, and the NLP module would correctly classify it as semantically invalid. While this behavior helps prevent false positives from random bit noise, it also implies that the system may fail to detect encrypted stego-content, resulting in false negatives.

This design choice is justified by the specific goal of identifying steganographic messages not only present but also interpretable in the linguistic domain, particularly relevant in forensic contexts where message content matters. Nevertheless, detecting encrypted payloads remains an open challenge. Future work could incorporate statistical text analysis (e.g., entropy estimation, n-gram frequency analysis, or cryptographic feature extraction) as a preliminary filter before NLP validation. Such enhancements could increase the capability of the system to flag hidden encrypted messages, even if their semantic content cannot be directly interpreted.

In the case where the method is applied to non-grayscale images (such as images in RGB or YCbCr formats), the computational complexity would increase substantially due to the need to process three color channels instead of one. Specifically, the preprocessing pipeline would need to extract residuals and bit planes from each channel (i.e., R, G, and B), resulting in a fifteen-channel tensor instead of the current five-channel input (one residual + four LSBs per channel). This would affect both memory usage and training time. The convolutional layers in the CNN would require additional parameters and operations in the first layers to handle the increased input dimensionality. Furthermore, training on color images would require larger batch sizes and longer convergence times to capture inter-channel dependencies.

While feasible, this extension would multiply the cost of forward and backward passes by approximately

2.5 \times

to

3 \times

, depending on the architecture depth and number of filters. Additional complexity would also arise in tuning the preprocessing logic and validating semantic consistency in more visually complex contexts. Despite this, the model’s architecture is modular and could be adapted to color image inputs by adjusting the input layer and increasing the CNN capacity. Future work could explore this direction using optimized color-specific representations such as YCbCr or residual-based fusion strategies.

Additionally, although the NLP component is currently configured for Spanish using the spaCy model “es_core_news_md”, the modular nature of the system allows for seamless integration of other language models. By switching to language-specific pipelines (e.g., English, French, or Portuguese), the system can adapt to different linguistic environments. This multilingual potential extends the applicability of the proposed architecture to international forensic scenarios, provided that reliable language models and linguistic resources are available.

Despite these challenges, the results suggest several promising directions for future research. An essential area is the construction and use of more datasets and realistic datasets, including color and compressed formats, and potentially real-world stego-content captured from forensic case samples. Improving the robustness of text extraction would also enhance the system’s adaptability. Furthermore, integrating explainable AI techniques could enable the system to highlight or visualize regions suspected of concealing information, thereby increasing transparency in forensic analyses.

Lastly, optimizing the architecture for resource-constrained environments remains a priority. Exploring lightweight CNN backbones, quantization strategies, or hybrid architectures that selectively activate the NLP module only under certain confidence thresholds could make the system more scalable and efficient.

6. Conclusions

This work presented a hybrid approach based on convolutional neural networks and natural language processing to detect LSB steganography in digital images. The combination of visual and linguistic analyses allowed the development of a system capable of identifying semantic patterns in image content and hidden data, overcoming the limitations of traditional unimodal approaches. The results obtained in experimental settings indicate that integrating both branches can significantly improve detection capacity, especially when the embedded messages have textual coherence.

The CNN-NLP model developed in this work achieves efficient metrics and redefines the traditional approach to digital steganalysis by integrating a semantic validation layer for the hidden message. Unlike the architectures evaluated, which focus solely on binary detection, the CNN-NLP enables the verification of whether the extracted content is linguistically coherent, thus elevating the analysis from a technical to an interpretive perspective.

Likewise, the inclusion of the NLP module allows the filtering of statistical false positives, providing a second line of verification based on natural language processing. The progressive analysis with different dataset sizes showed that the proposed architecture maintains stability, generalization capacity, and a low error rate, even in scenarios with visual noise or highly complex messages, making it a robust, replicable, and scalable alternative for detecting steganography in digital images.

Significant challenges were identified, such as the need for more realistic data, the computational complexity of the model, and limitations in accurately extracting hidden text. These aspects unlock new opportunities for future research aimed at improving the system’s robustness, efficiency, and applicability.

Author Contributions

Conceptualization, K.A., D.G., A.Y., and H.E.; methodology, K.A., D.G., A.Y., and H.E.; project administration, A.Y.; supervision, A.Y.; validation, K.A. and D.G.; writing—original draft, K.A., D.G., A.Y., and H.E.; writing—review and editing, K.A., D.G., A.Y., and H.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Prieto-Castellanos, B.J. El uso de los métodos deductivo e inductivo para aumentar la eficiencia del procesamiento de adquisición de evidencias digitales. Cuad. Contab. 2018, 18. [Google Scholar] [CrossRef]
Vacca, J.R.; Rudolph, K. System Forensics, Investigation, and Response, 1st ed.; Jones & Bartlett Learning: Sudbury, MA, USA, 2011. [Google Scholar]
Albkosh, F.M.A.; Mohamed, A.S.S.; Albakoush, A.A.; Albkush, A.A. An Enhanced Method to Secure the High Embedding Capacity Steganography Based on LSB Indicator. In Proceedings of the IEEE 4th International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering (MI-STA), Tripoli, Libya, 19–21 May 2024; pp. 494–499. [Google Scholar] [CrossRef]
Guaña-Moya, J. Uses and applications of steganography in the digital age. Rev. Retos Para Investig. 2023, 2, 61–75. [Google Scholar] [CrossRef]
Priscilla, C.V.; HemaMalini, V. Effective Analysis of Real World Stego Images through Deep Learning Techniques. In Proceedings of the 3rd International Conference on Applied Artificial Intelligence and Computing (ICAAIC), Salem, India, 5–7 June 2024; pp. 780–785. [Google Scholar] [CrossRef]
Shukla, R.K.; Sengar, A.S.; Gupta, A.; Chauhar, N.R. Deep Learning Model to Identify Hide Images using CNN Algorithm. In Proceedings of the 11th International Conference on System Modeling & Advancement in Research Trends (SMART), Moradabad, India, 16–17 December 2022; pp. 44–51. [Google Scholar] [CrossRef]
Yari, I.A.; Zargari, S. An Overview and Computer Forensic Challenges in Image Steganography. In Proceedings of the IEEE International Conference on Internet of Things (IThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), Exeter, UK, 21–23 June 2017; pp. 360–364. [Google Scholar] [CrossRef]
Luo, J.; He, P.; Liu, J.; Wang, H.; Wu, C.; Chen, Y.; Li, W.; Li, J. Content-adaptive Adversarial Embedding for Image Steganography Using Deep Reinforcement Learning. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 10–14 July 2023; pp. 49–54. [Google Scholar] [CrossRef]
Devi, A.G.; Thota, A.; Nithya, G.; Majji, S.; Gopatoti, A.; Dhavamani, L. Advancement of Digital Image Steganography using Deep Convolutional Neural Networks. In Proceedings of the International Interdisciplinary Humanitarian Conference for Sustainability (IIHC), Bengaluru, India, 18–19 November 2022; pp. 250–254. [Google Scholar] [CrossRef]
Taha-Ahmed, I.; Tareq-Hammad, B.; Jamil, N. Image Steganalysis based on Pretrained Convolutional Neural Networks. In Proceedings of the IEEE 18th International Colloquium on Signal Processing & Applications (CSPA), Bengaluru, India, 18–19 November 2022; pp. 283–286. [Google Scholar] [CrossRef]
Xu, G.; Wu, H.-Z.; Shi, Y.-Q. Structural Design of Convolutional Neural Networks for Steganalysis. IEEE Signal Process. Lett. 2016, 23, 708–712. [Google Scholar] [CrossRef]
Pasha, S.K.S.; Ajay, K.; Harish, N.; Sreenu, S. CNN—Powered Deep Learning Framework for Advanced Image Steganography and Data Security. In Proceedings of the International Conference on Visual Analytics and Data Visualization (ICVADV), Tirunelveli, India, 4–6 March 2025; pp. 1164–1168. [Google Scholar] [CrossRef]
Singh, V.; Sharma, M.; Shirode, A.; Mirchandani, S. A Comprehensive Exploration of Data Security Techniques Using Image Steganography. In Proceedings of the 4th International Conference on Sustainable Expert Systems (ICSES), Kaski, Nepal, 15–17 October 2024; pp. 396–401. [Google Scholar] [CrossRef]
Fridrich, J. Steganography in Digital Media; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar] [CrossRef]
Hossain, I.; Kadir, S.; Fagun, F.I.; Samiul, I.; Saukhin, R.Z.; Hossain, M.I. Enhanced CNN Approaches for Multi-Image Embedding in Image Steganography. In Proceedings of the 2nd International Conference on Information and Communication Technology (ICICT), Dhaka, Bangladesh, 21–22 October 2024; pp. 100–104. [Google Scholar] [CrossRef]
Eid, W.M.; Alotaibi, S.S.; Alqahtani, H.M.; Saleh, S.Q. Digital Image Steganalysis: Current Methodologies and Future Challenges. IEEE Access 2022, 10, 92321–92336. [Google Scholar] [CrossRef]
Lu, Y.Y.; Yang, Z.L.O.; Zheng, L.; Zhang, Y. Importance of Truncation Activation in Pre-Processing for Spatial and Jpeg Image Steganalysis. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 689–693. [Google Scholar] [CrossRef]
Sowmya, K.S.; Hegde, S.; Sunag, P.; Varun, R.P. Exploring the Effectiveness of Steganography Techniques: A Comparative Analysis. In Proceedings of the 3rd International Conference on Smart Data Intelligence (ICSMDI), Trichy, India, 30–31 March 2023; pp. 181–186. [Google Scholar] [CrossRef]
Thenmozhi, S.; Bharath, B.M. CNN based Image Steganography Techniques: A Cutting Edge/State of Art Review. In Proceedings of the Smart Technologies, Communication and Robotics (STCR), Sathyamangalam, India, 10–11 December 2022; pp. 1–5. [Google Scholar] [CrossRef]
Peng, Y.; Fu, G.; Luo, Y.; Yu, Q.; Wang, L. CNN-based Steganalysis Detects Adversarial Steganography via Adversarial Training and Feature Squeezing. In Proceedings of the 4th International Conference on Information Science, Parallel and Distributed Systems (ISPDS), Guangzhou, China, 14–16 July 2023; pp. 165–169. [Google Scholar] [CrossRef]
Qian, Y.; Dong, J.; Wang, W.; Tan, T. Learning and transferring representations for image steganalysis using convolutional neural network. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 2752–2756. [Google Scholar] [CrossRef]
Linga-Murthy, M.K.; Madhavi, P.B.; Ahamad, S.A.; Vamsi, K.; Mallikarjuna-Rao, Y. Implementation and Analysis of Image Steganography using Convolution Neural Networks. In Proceedings of the International Interdisciplinary Humanitarian Conference for Sustainability (IIHC), Bengaluru, India, 18–19 November 2022; pp. 108–113. [Google Scholar] [CrossRef]
Kich, I.; Ameur, E.B.; Taouil, Y.; Benhfid, A. Image Steganography Scheme Using Dilated Convolutional Network. In Proceedings of the 12th International Conference on Information and Communication Systems (ICICS), Valencia, Spain, 24–26 May 2021; pp. 305–309. [Google Scholar] [CrossRef]
Ke, Q.; Ming, L.D.; Daxing, Z. Image Steganalysis via Multi-Column Convolutional Neural Network. In Proceedings of the 14th IEEE International Conference on Signal Processing (ICSP), Beijing, China, 12–16 August 2018; pp. 550–553. [Google Scholar] [CrossRef]
Ker, A.D. Steganalysis of Embedding in Two Least-Significant Bits. IEEE Trans. Inf. Forensics Secur. 2007, 2, 46–54. [Google Scholar] [CrossRef]
Kim, J.; Kang, S.; Park, H.; Park, J.-I. Dual Convolutional Neural Network for Image Steganalysis. In Proceedings of the IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Jeju, Republic of Korea, 5–7 June 2019; pp. 1–4. [Google Scholar] [CrossRef]
Vijjapu, A.; Vinod, Y.S.; Murty, S.V.S.N.; Raju, B.E.; Satyanarayana, B.V.V.; Kumar, G.P. Steganalysis using Convolutional Neural Networks-Yedroudj Net. In Proceedings of the International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 23–25 January 2023; pp. 1–7. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2012, 60, 84–90. [Google Scholar] [CrossRef]
Boroumand, M.; Chen, M.; Fridrich, J. Deep Residual Network for Steganalysis of Digital Images. IEEE Trans. Inf. Forensics Secur. 2018, 14, 1181–1193. [Google Scholar] [CrossRef]
Marchetti, M.; Traini, D.; Ursino, D.; Virgili, L. Efficient token pruning in Vision Transformers using an attention-based Multilayer Network. Expert Syst. Appl. 2025, 279, 127449. [Google Scholar] [CrossRef]
Xue, Y.; Kong, L.; Peng, W.; Zhong, P.; Wen, J. An effective linguistic steganalysis framework based on hierarchical mutual learning. Inf. Sci. 2022, 586, 140–154. [Google Scholar] [CrossRef]
Fridrich, J.; Kodovsky, J. Rich models for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 2012, 7, 868–882. [Google Scholar] [CrossRef]

Figure 1. Proposed methodology for the development of the CNN-PLN model.

Figure 2. Flowchart of the proposed model.

Figure 3. Evolution of accuracy and loss during training: (a) Evolution of loss during training employing 7000 images per class. (b) Evolution of accuracy during training using 7000 images per class. (c) Evolution of loss during training employing 9000 images per class. (d) Evolution of accuracy during training using 9000 images per class. (e) Evolution of loss during training employing 10,000 images per class. (f) Evolution of accuracy during training using 10,000 images per class. (g) Evolution of loss during training employing 12,500 images per class. (h) Evolution of accuracy during training using 12,500 images per class.

Figure 4. The confusion matrix obtained for each case: (a) Confusion matrix using a dataset with 7000 images per class. (b) Confusion matrix using a dataset with 9000 images per class. (c) Confusion matrix using a dataset with 10,000 images per class. (d) Confusion matrix using a dataset with 12,500 images per class.

Figure 5. The receiver operating characteristic curves obtained: (a) ROC obtained for a dataset with 7000 images per class. (b) ROC obtained for a dataset with 9000 images per class. (c) ROC obtained for a dataset with 10,000 images per class. (d) ROC obtained for a dataset with 12,500 images per class.

Figure 6. Test with image without steganography. (a) Original image without steganography. (b) Histogram of the image (case of correct classification).

Figure 7. Test with low-complexity message. (a) Image with 10-character LSB message. (b) Histogram of the image (case of correct classification).

Figure 8. Test with high-complexity message. (a) Image with LSB message of 1398 characters. (b) Histogram of the image (case of correct classification).

Table 1. Performance of the CNN-NLP model according to the size of the dataset.

No. of Images per Class	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	ROC AUC
5000	98.20	98.20	98.26	98.20	0.9826
7000	98.89	98.93	99.05	99.12	0.9894
9000	94.17	94.17	94.78	94.15	0.9417
10,000	99.35	99.35	99.36	99.35	0.9941
12,500	99.96	99.96	99.96	99.96	0.9996

Table 2. Comparison between the proposed CNN-NLP model and reference models in LSB steganography detection.

Model	Database	Accuracy (%)	Precision (%)	Semantic Validation
Xu-Net [11]	Bossbase	88.30	87.75	No
Yedroudj-Net [27]	Bossbase	90.90	90.10	No
SRNet [29]	Bossbase	93.80	92.60	No
CNN-NLP	PNG(25K)	99.96	99.96	Yes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Angulo, K.; Gil, D.; Yáñez, A.; Espitia, H. Hybrid CNN-NLP Model for Detecting LSB Steganography in Digital Images. Appl. Syst. Innov. 2025, 8, 107. https://doi.org/10.3390/asi8040107

AMA Style

Angulo K, Gil D, Yáñez A, Espitia H. Hybrid CNN-NLP Model for Detecting LSB Steganography in Digital Images. Applied System Innovation. 2025; 8(4):107. https://doi.org/10.3390/asi8040107

Chicago/Turabian Style

Angulo, Karen, Danilo Gil, Andrés Yáñez, and Helbert Espitia. 2025. "Hybrid CNN-NLP Model for Detecting LSB Steganography in Digital Images" Applied System Innovation 8, no. 4: 107. https://doi.org/10.3390/asi8040107

APA Style

Angulo, K., Gil, D., Yáñez, A., & Espitia, H. (2025). Hybrid CNN-NLP Model for Detecting LSB Steganography in Digital Images. Applied System Innovation, 8(4), 107. https://doi.org/10.3390/asi8040107

Article Menu

Hybrid CNN-NLP Model for Detecting LSB Steganography in Digital Images

Abstract

1. Introduction

1.1. Background and Advances in Steganography with CNN

1.2. Paper Approach and Organization

2. Methodology

3. CNN-NLP Architecture Design for LSB Steganography Detection

3.1. Preprocessing

3.2. CNN Architecture

3.3. Message Classification and Extraction

3.4. Semantic Validation with NLP

3.5. Model Summary

4. Results and Comparison with Existing Models

4.1. Evaluation of the Proposed Model

4.2. Model Comparison

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI