Deep Learning for Image Watermarking: A Comprehensive Review and Analysis of Techniques, Challenges, and Applications

Bistroń, Marta; Żurada, Jacek M.; Piotrowski, Zbigniew

doi:10.3390/s26020444

Open AccessReview

Deep Learning for Image Watermarking: A Comprehensive Review and Analysis of Techniques, Challenges, and Applications

by

Marta Bistroń

^1,*

,

Jacek M. Żurada

²

and

Zbigniew Piotrowski

¹

Institute of Communication Systems, Faculty of Electronics, Military University of Technology, 00-908 Warsaw, Poland

²

Electrical and Computer Engineering, University of Louisville, Louisville, KY 40292, USA

^*

Author to whom correspondence should be addressed.

Sensors 2026, 26(2), 444; https://doi.org/10.3390/s26020444

Submission received: 21 November 2025 / Revised: 22 December 2025 / Accepted: 6 January 2026 / Published: 9 January 2026

(This article belongs to the Special Issue Image and Video Processing and Recognition Based on Artificial Intelligence: 3rd Edition)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Deep learning-based watermarking methods (CNN, GAN, Transformers, and diffusion models) significantly outperform traditional spatial- and frequency-domain techniques in terms of robustness, transparency, and adaptability to modern attack types.
Emerging architectures such as Vision Transformers, Swin Transformers, and diffusion models introduce new capabilities, notably higher resistance to generative and latent-space attacks, as well as increased watermark capacity.

What are the implications of the main findings?

The rapid evolution of neural network architectures accelerates the development of watermarking systems capable of protecting digital content against increasingly sophisticated threats, including AI-generated manipulations.
Future watermarking deployments will require optimized, scalable, and computationally efficient deep learning architectures to support real-time applications in cybersecurity, multimedia distribution, IoT systems, and content authenticity verification.

Abstract

The growing demand for digital content protection has significantly increased the importance of image watermarking, particularly in light of the rising vulnerability of multimedia content to unauthorized modifications. In recent years, research has increasingly focused on leveraging deep learning architectures to enhance watermarking performance, addressing challenges related to transparency, robustness, and payload capacity. Numerous deep learning-based watermarking methods have demonstrated superior effectiveness compared to traditional approaches, particularly those based on Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), Transformers, and diffusion models. This paper presents a comprehensive survey of recent developments in both conventional and deep learning-based image watermarking techniques. While traditional methods remain prevalent, deep learning approaches offer notable improvements in embedding and extraction efficiency, particularly when facing complex attacks, including those generated by advanced AI models. Applications in areas such as deepfake detection, cybersecurity, and Internet of Things (IoT) systems highlight the practical significance of these advancements. Despite substantial progress, challenges remain in achieving an optimal balance between invisibility, robustness, and capacity, particularly in high-resolution and real-time scenarios. This study concludes by outlining future research directions toward develop robust, scalable, and efficient deep learning-based watermarking systems capable of addressing emerging threats in digital media environments.

Keywords:

deep learning; digital watermarking; image watermarking; neural networks; robustness; watermarking algorithms

1. Introduction

The rapid development of internet and network technologies has led to the widespread digitization of everyday life [1]. Vast amounts of data (text, music, images, video) are processed and shared online on a daily basis. Digital data has become a fundamental resource underlying many social and economic activities. For this reason, one of the key challenges of the 21st century is to balance the protection of digital data with technological and economic advancement [2]. The main threats to digital content security include unauthorized copying and redistribution of content (multimedia piracy) [3,4,5], data manipulation and forgery [6], such as the creation of deepfakes [7], and the privacy-related abuses including unauthorized data usage.

To mitigate risks associated with digital data, various protection technologies have been developed, playing a pivotal role in ensuring data security. A classification of data-hiding techniques is presented in Figure 1. Among the mentioned methods, the most widely used are steganography, digital watermarking, and fingerprinting.

Steganography involves the creation of hidden communication channels that enable the transmission of data in a manner imperceptible to third parties. The data is concealed within a carrier, which may include an image, video, audio, or text file [9]. Another form of steganography is network steganography, which focuses on embedding data within transmitted network packets or the communication mechanisms of protocols [10]. The primary objective of steganography is to conceal the very existence of communication. This distinguishes it from cryptography, which protects data through encryption but does not obscure the fact that communication takes place [11]. Digital watermarking focuses on embedding a watermark into a digital carrier, which can be either visible or invisible [12]. Unlike steganography, its purpose is not to conceal the existence of data but, most often, to identify the content owner. A similar technique is fingerprinting, which involves inserting unique markers into each copy of digital content. Both methods frequently rely on comparable technical solutions, with the difference lying in their purpose and scope of application. A watermark is typically embedded in the original content before distribution, whereas a fingerprint serves as a unique marker for each copy, enabling its identification and tracking [13]. Both techniques are often used in combination to provide more comprehensive protection of digital content.

Although steganography, watermarking, and fingerprinting play a crucial role in protecting digital data, they increasingly face sophisticated attacks and the growing complexity of multimedia content. Traditional algorithms often prove insufficient in addressing these issues. An increasingly popular solution is the application of deep learning algorithms, which can automatically learn patterns, adapt to various conditions, and demonstrate resilience to interference, making them effective tools for modern data-hiding techniques.

The subsequent sections of this article will focus exclusively on digital watermarking—its currently employed solutions and future development directions utilizing deep learning techniques. The key contributions of this work are as follows:

A review and taxonomy of classical watermarking methods for images and video frames.
A comprehensive overview and taxonomy of deep learning-based image watermarking techniques, highlighting their advantages, limitations, and potential application areas.
Detailed comparisons of various deep learning architectures (CNNs, GANs, Transformers, and diffusion models) used in watermarking, with particular emphasis on their performance, robustness, and computational complexity.
A review and comparison of key datasets used for training watermarking algorithms.
An analysis of future research directions and practical challenges in areas such as deepfake detection, cybersecurity, and applications in IoT systems, with a special focus on the integration of deep learning methods into watermarking solutions.
A discussion of dataset availability, training strategies, and the role of transparency metrics based on neural networks, as well as specialized robustness metrics tailored to assess the impact of generative and adversarial attacks, providing practical guidelines useful for real-world implementations.

In contrast to previous surveys [14,15] on deep learning-based watermarking, this work adopts a more detailed and practice-oriented perspective, focusing not only on the classification of methods but also on their architectural evolution and practical deployment. Most existing reviews concentrate on conventional deep learning architectures, such as convolutional neural networks (CNNs) and generative adversarial networks (GANs), commonly used in watermarking tasks.

In this review, we frame the development of watermarking techniques through the lens of architectural and design philosophy. Each new model represents an effort to overcome limitations of prior approaches and enhance the overall capacity and robustness of watermarking systems. Accordingly, we incorporate and analyze the latest techniques based on advanced architectures such as Vision Transformers (ViT, Swin Transformer) and diffusion models, which—as discussed in detail later in the paper—demonstrate clear advantages over traditional CNN- and GAN-based solutions, particularly in terms of robustness against attacks and embedding flexibility. Furthermore, this survey goes beyond theoretical discussion to address practical considerations related to real-world implementation. We examine challenges such as computational complexity, data availability, and the evaluation of watermarking performance in terms of both perceptual transparency and resilience to adversarial and generative attacks. This holistic approach aims not only to synthesize the current body of knowledge, but also to facilitate its application in the development and deployment of robust, scalable watermarking systems in practical engineering contexts.

The remainder of this article is organized as follows. Section 2 presents the fundamentals of digital watermarking, providing a detailed description of the general workflow, key paradigms, and commonly used metrics. Section 3 reviews traditional watermarking methods, covering techniques in the spatial, frequency, and hybrid domains. Section 4 focuses on deep learning-based watermarking, discussing various architectures, including CNNs, GANs, Transformers, and diffusion models, with an in-depth comparison of their effectiveness and applications. Section 5 centers on datasets used in watermarking research, detailing available databases, their characteristics, and applications. Section 6 outlines future research directions, emphasizing architectural innovations and application-oriented challenges in areas such as deepfake detection, cybersecurity, and IoT. Section 7 concludes the article with a comprehensive summary, highlighting current challenges and potential solutions for the advancement of watermarking technology in the era of deep learning.

2. Fundamentals of Digital Watermarking

2.1. Watermarking Workflow

The watermarking process consists of two fundamental stages: watermark embedding and watermark extraction, as illustrated in Figure 2.

The watermark embedding process begins by transforming the original content (host) into a selected domain (e.g., the frequency domain), where the watermark is inserted. Additional transformations may be applied to further enhance the method’s effectiveness. Optionally, the watermark itself can be transformed depending on the method’s requirements, content type, and intended application. Digital signal processing (DSP) algorithms, by operating on signal parameters such as the phase angle, can embed additional information into the useful signal [16,17], which is particularly valuable in the watermarking of digital objects. During embedding, a chosen algorithm or architecture (encoder) introduces the watermark into the host through minor modifications to its content. The watermarked content is then transformed back into its original domain and transmitted through a telecommunication channel. During transmission and subsequent processing, it may be exposed to intentional or unintentional attacks that could remove or distort the watermark. The extraction process follows a similar approach. A decoding algorithm (decoder), compatible with the embedding method, processes the content in order to retrieve the embedded data. If the watermark underwent transformations or encryption during the embedding stage, inverse transformations are applied to restore it to its original form.

The described mechanism forms the foundation for a wide range of practical applications where effective and durable protection of digital content is required. Digital watermarking supports a broad spectrum of use cases depending on the target medium and protection goals. In commercial and legal contexts, it enables copyright protection and ownership verification. In the medical domain, it helps ensure the authenticity and traceability of diagnostic images. In military and IoT systems, watermarking contributes to data integrity and access control. The use cases illustrated in Figure 3 encompass both traditional applications (e.g., document security and media monitoring) and evolving needs in intelligent systems (e.g., teleconferencing and remote education environments).

2.2. Watermarking Taxonomy

Digital watermarking can be classified according to various criteria. Figure 4 presents a synthesized taxonomy derived from multiple classification schemes reported in the literature, complemented with additional author-defined elements to reflect recent developments and practical perspectives.

Based on their resistance to attacks, digital watermarking methods can be classified into robust, semi-fragile, and fragile categories. In most practical applications, robust methods are used, as they are designed to ensure that the watermark survives typical media processing operations and intentional attacks aimed at damaging or removing the watermark [18]. Semi-fragile methods are usually resistant to processing operations but not to more intensive modifications or deliberate attacks. They are intended for systems where minimal interference with quality is crucial, and intensive data processing is not anticipated. Fragile systems are intentionally designed so that the data is destroyed or damaged with any modification of the carrier. They are used for detecting manipulations and forgeries in the case of highly sensitive data.

The watermark can be embedded in various carriers, the most common being audio data [19,20,21], text [22,23,24], images [25,26,27], and video [28,29]. Due to technological advancements and the need to protect creators in emerging fields, more advanced forms of watermarking have appeared, such as software watermarking [30,31], database watermarking [32,33], neural network model watermarking [34,35], as well as watermarking techniques applied to digital 3D objects [36]. Methods used in these areas include code obfuscation [37], modification of neural network weights [38], and the use of specially prepared trigger sets, which do not affect the model’s functionality but enable the activation of the watermark under specific inputs [39].

Based on visibility, watermarks are classified as visible and invisible. Visible watermarks, such as a company logo, are placed in a way that is noticeable but does not hinder content consumption. In contrast, invisible watermarks are designed to be completely imperceptible to the viewer.

During the watermark extraction procedure, typically only the carrier with the embedded watermark is required; such methods are referred to as blind watermarking. If additional information is necessary for extraction, the methods are classified as semi-blind and non-blind. In semi-blind methods, reference data or a key used during watermark embedding is usually required, such as an attention mask. In non-blind methods, the use of the original data carrier is essential, and the extraction process involves comparing the watermarked data with the original content.

In practical watermarking systems, the use of secret keys is very popular because they play a key role in ensuring security, especially when it is assumed that embedding and extraction algorithms are widely known. The watermark key controls both the embedding and detection processes and is the primary mechanism for preventing unauthorized insertion or extraction of watermarks. Depending on the system design, the same key may be used for both embedding and detection, or separate keys may be used. This key-based security assumption is particularly important in semi-blind watermarking schemes, which dominate real-world applications.

There are two primary domains for watermark embedding: the spatial domain, where the watermark is inserted directly into image pixels or audio samples, and the frequency domain, where the watermark is embedded into the coefficients of a selected transform, allowing, among other benefits, higher resistance to lossy compression. To improve efficiency, hybrid domains are also used, typically combining the advantages of both the spatial and frequency domains [40]. Among the hybrid approaches, the time-frequency domain enables better performance for dynamic signals such as audio or video, while the time-spatial domain, mainly used for video, allows for embedding the watermark both statically and in temporal changes between frames. The semantic domain can be considered a variant of the spatial domain, where watermarks are embedded in areas with specific semantic significance to minimally impact perception.

The remainder of this article focuses on image watermarking methods that ensure robustness and invisibility.

2.3. Watermarking Paradigms and Metrics

In the commercial watermarking applications, it is essential for the technology to meet three fundamental paradigms that determine system’s effectiveness and quality: robustness, transparency, and bit capacity. These characteristics are interdependent, making it difficult to improve one criterion without affecting the others.

2.3.1. Transparency

Transparency, or the invisibility of the watermark to the human visual system, is a key feature that often determines whether a given technology will be implemented. It is assessed using both objective metrics, which measure differences between the original carrier and the carrier with the embedded watermark, and subjective metrics, which evaluate content quality based on user perception.

Objective metrics

The most commonly used objective transparency metrics applied to images are listed below:

MSE (Mean Squared Error)—the average squared error between the pixel values of the original image and the watermarked image.

$M S E = \frac{1}{m n} \sum_{0}^{m - 1} \sum_{0}^{n - 1} {‖f (i, j) - g (i, j)‖}^{2},$

(1)

f—the matrix data of the original image,
g—the matrix data of the watermarked image,
m—the number of pixel rows in the images and i represents the index of that row,
n—the number of pixel columns in the image and j represents the index of that column.

PSNR (Peak Signal-to-Noise Ratio)—measures the ratio of the original image signal to the noise introduced by the watermarking process.

P S N R = 20 {l o g}_{10} (\frac{{M A X}_{f}}{\sqrt{M S E}}),

(2)

MAX—the maximum possible value of a pixel (e.g., 255 for an 8-bit image).

SSIM (Structural Similarity Index) [41]—measures the structural similarity between two images by analyzing contrast, brightness, and texture. The values range from 0 to 1, with values closer to 1 indicating higher similarity.

S S I M (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})},

(3)

µ_x, µ_y–mean luminance values for X and Y images,
σ_x σ_y—standard deviation values for X and Y images,
σ_xy—covariance value between X and Y images,
C₁ C₂—stabilizing constants to prevent division by zero, where

C_{1} = {(K_{1} L)}^{2},

(4)

C_{2} = {(K_{2} L)}^{2},

(5)

L—dynamic range of the pixel values (255 for 8-bit grayscale images),
K₁, K₂—small constant, K₁ ≪ 1 and K₂ ≪ 1.

MS-SSIM (Multiscale Structural Similarity Index) [42]—an extension of SSIM that considers multiple spatial scales, calculated through a multi-stage downsampling process.

M S_S S I M (x, y) = \prod_{j = 1}^{M} {[{S S I M}_{j} (x, y)]}^{α_{j}},

(6)

M—number of scale levels,
α_j—weights (usually α_j = 1/M),
${S S I M}_{j} (x, y)$ —SSIM value at the j level.

VIF (Visual Information Fidelity)—measures the amount of visual information transferred from the original image to the watermarked image using the HVS (Human Visual System) model.

I_{0} = \sum {l o g}_{2} (1 + \frac{σ_{x_{k}}^{2}}{σ_{n_{k}}^{2}}),

(7)

I₀—the amount of information in the original channel,
$σ_{x_{k}}^{2}$ —the variance of signal at the k level,
$σ_{n_{k}}^{2}$ —the variance of noise at the k level.

I_{w} = \sum {l o g}_{2} (1 + \frac{σ_{x_{k}}^{2}}{{σ_{n_{k}}^{2} + σ}_{e_{k}}^{2}}),

(8)

I_w—the amount of information in the modified channel,
$σ_{e_{k}}^{2}$ —the variance of error resulting from the difference between the original and the modified image.

V I F = \frac{I_{w}}{I_{0}},

(9)

FSIM (Feature Similarity Index)—compares key visual features from the perspective of human perception.

F S I M (x, y) = \frac{\sum_{i \in Ω} P C_{m} (i) \cdot S_{L} (i) \cdot S_{P} (i)}{\sum_{i \in Ω} P C_{m} (i)},

(10)

$P C_{m} (i)$ —phase congruence at a point i,
$S_{L} (i)$ —luminance similarity function,
$S_{P} (i)$ —phase congruence similarity function.

Subjective Metrics

Subjective metrics are used to obtain direct user assessments, where individuals evaluate the quality of an image or video based on their own perception. Testers determine the extent to which content modifications are noticeable and how they affect the overall experience. The conditions and methodology for conducting these tests are thoroughly described in the International Telecommunication Union recommendations, specifically for television images [43] and video materials in lower bandwidth applications such as videoconferencing [44]. However, the methodology can also be successfully applied to static images:

DSIS (Double Stimulus Impairment Scale)—the method involves comparing two versions of the same material: the original reference version and the processed version. Users view both versions sequentially and then assess the degree of quality degradation in the processed version relative to the original. Ratings are collected on a quality scale from 1 to 5, where 5 indicates that the differences between the original and the modified content are imperceptible, and 1 signifies that the content quality has significantly deteriorated and is unacceptable.
DSCQS (Double Stimulus Continuous Quality Scale)—users are presented with both the original and the modified versions of the content, but they are not explicitly informed which is the original or the modified version. Similar to DSIS, users evaluate the content quality; however, the lack of clarity regarding which material has been altered provides a more objective assessment from the perspective of human perception.
Paired Comparison Test—users are shown two versions of the content: the original and the modified, without indicating which one has been altered. Participants evaluate which version they believe has higher quality or whether they can notice any differences.

AI-Based Perceptual Metrics

With the development of deep learning techniques, modern perceptual metrics utilizing neural networks have emerged, offering a better reflection of human image perception compared to classical similarity metrics [45]. Selected deep learning-based metrics used to evaluate the transparency of watermarking systems include:

LPIPS (Learned Perceptual Image Patch Similarity) [46]—this method employs convolutional networks trained on large datasets to measure perceptual differences between images. Metric values close to 0 indicate smaller differences and higher transparency of the method.
DISTS (Deep Image Structure and Texture Similarity) [47]—a metric that combines texture and structure analysis using deep features from neural networks, providing improved assessment of images with complex details.
PieAPP (Perceptual Image-Error Assessment through Pairwise Preference) [48]—a trained model that predicts image quality based on user preferences by evaluating pairs of images.

2.3.2. Robustness

Watermark robustness refers to the system’s ability to retain the watermark within the carrier even after processing operations such as compression, filtering, scaling, or intentional attacks aimed at removing the watermark. The assessment of robustness involves comparing the quality of the original watermark embedded in the carrier with the watermark extracted after processing operations. Among the robustness evaluation metrics, the following are distinguished:

BER (Bit Error Rate)—measures the percentage of bits that have been incorrectly extracted from the embedded watermark compared to the original watermark.

B E R = \frac{1}{N} \sum_{i = 1}^{N} | w_{i} - w_{i}^{^} |,

(11)

w_i—value of the i-th bit in the original watermark,
$w_{i}^{^}$ —value of the i-th bit in the extracted watermark,
N—total number of bits.

NC (Normalized Correlation)—a metric that measures the similarity between the original and the extracted watermark. A value close to 1 indicates high resistance to attacks.

N C = \frac{\sum_{i = 1}^{N} w_{i} \cdot w_{i}^{^}}{\sqrt{\sum_{i = 1}^{N} w_{i}^{2}} \cdot \sqrt{\sum_{i = 1}^{N} {w_{i}^{^}}^{2}}},

(12)

ASR (Attack Success Rate)—it represents the percentage of successful attempts to weaken or remove the watermark as a result of adversarial attacks [49].
RGA (Robustness Against Generative Attacks)—a metric evaluating the system’s resistance to attacks utilizing generative models. It analyzes the extent to which the watermark remains intact after being processed by these models [50].
APT (Adversarial Perturbation Tolerance)—defines the minimum level of perturbations introduced by adversarial attacks necessary to successfully remove or distort the watermark. A higher APT value indicates greater system robustness [51].

2.3.3. Capacity

Watermark capacity refers to the amount of information that can be embedded in the carrier while maintaining adequate transparency and robustness. Below are the most commonly used capacity metrics:

Payload Capacity—measures the number of watermark bits relative to the given carrier.

P = \frac{N_{b i t}}{N_{h o s t}},

(13)

N_bit—number of bits embedded as watermark,
N_host—number of host units (pixels for images, seconds for video or audio).

Embedding Capacity Efficiency—measures the efficiency with which the system utilizes the carrier’s space for embedding the watermark, taking into account the impact on quality and robustness.

E C E = \frac{N_{b i t}}{N_{m a x}} \cdot 100 %,

(14)

N_max—maximum capacity of the host, i.e., how many bits can be embedded before the quality of the content is noticeably degraded.

2.4. Attacks on Watermarking Systems

In practical applications, watermarking systems are exposed to various processing operations and deliberate attempts to remove or distort the embedded watermark. The previously described metrics measure how well a watermarking system withstands attempts to destroy or alter the watermark. However, different types of attacks can cause diverse distortions to the carrier and the embedded watermark, making the analysis of their effects crucial for a comprehensive system evaluation. Below is an overview of the main types of attacks along with their impact on carrier quality and watermark integrity, as summarized in Table 1. The attacks presented in the table are divided into three main categories: untargeted attacks including standard multimedia processing operations, targeted attacks; and deep learning attacks. The last category is distinguished as a separate group due to its dynamic development, high effectiveness, and the difficulty in counteracting such attacks.

Deep learning-based attacks, especially those exploiting latent space regeneration (e.g., via diffusion models or VAEs), tend to be significantly more effective than traditional signal-domain manipulations. The high effectiveness of these attacks stems from the fundamental operating principles of modern generative models. Diffusion models and variational autoencoders learn compact latent representations that capture the semantic structure of the image while discarding high-frequency or weakly correlated signal components. As a result, watermarks embedded in the pixel or frequency domain are often treated as noise during the generative reconstruction process and are not preserved when the image is regenerated from the latent space, yielding visually faithful content devoid of the original watermark.

Unlike traditional signal-domain attacks, which typically apply local distortions or global transformations to the original signal, latent-space regeneration reconstructs the image from learned data distributions. Consequently, synchronization-based and redundancy-based watermarking schemes-effective against compression or geometric attacks-become ineffective when the original content is replaced by a newly synthesized instance that preserves perceptual quality but not the embedded watermark.

This highlights the growing need for DL-robust watermarking strategies that are explicitly designed to counter generative and latent-space attacks. Recent research approaches include embedding watermarks directly into latent representations, integrating watermark-preservation constraints into the training objectives of diffusion or GAN-based generators, and developing joint generation–watermarking frameworks in which watermark survival becomes an inherent property of the synthesis process. Such strategies aim to shift watermarking from a post-processing operation to an integral component of content generation, thereby improving resistance to regeneration-based attacks.

3. Traditional Image Watermarking Methods

Traditional watermarking methods are primarily based on modifying the carrier data in the spatial and the frequency domain, as well as on hybrid-domain approaches that aim to combine the advantages of both, as extensively discussed in the classical watermarking literature [54].

3.1. Spatial Domain

Spatial domain methods are among the simplest and oldest techniques in digital watermarking. They rely on direct modification of pixel values in images or video frames without prior transformation into another domain. A well-known example is the LSB (Least Significant Bit) technique proposed by Turner [55], in which the least significant bits of pixels are modified to hide information. These methods are characterized by high capacity and low complexity but relatively low resistance to attacks. One of the first digital watermarking approaches utilizing this technique was proposed in 1994 [56]. Over the years, the method has undergone numerous modifications. In [57], the authors proposed using the third and fourth LSBs to improve data embedding. Subsequently, in [58], a combination of the LSB method with binary value inversion of the watermark was introduced, enhancing the method’s transparency. Arya and Saharan [59] increased both the robustness and transparency of the LSB method by generating the watermark image from the host image and securing it with a key derived from the same source. In [60], the authors modified the traditional approach by introducing a hashing mechanism for the watermark before embedding, which improved resistance to attacks. Another approach combines the LSB method with edge detection techniques [61] to identify suitable regions for watermark embedding and additionally encrypt the watermark to increase the method’s security.

The LSB method can also be applied to video signals when watermarking is performed on a frame-by-frame without considering temporal dependencies. In [62], the authors used LSB to embed the watermark into selected video frames and employed an FPGA architecture to accelerate the embedding and extraction processes in real-time. Similarly, in [63], the watermark was embedded into specific video frames using the LSB algorithm, with the frame selection based on histogram analysis.

Among other methods based on pixel value modification, the following can be distinguished: Pixel Value Differencing (PVD) [64,65], the Patchwork algorithm [66,67,68], Singular Value Decomposition (SVD) [69,70], and the Arnold transformation [71,72].

3.2. Frequency Domain

In frequency-domain watermark embedding methods, the content is transformed using a selected transform, after which the watermark is embedded into the transform coefficients. Once embedded, the watermarked content is transformed back into the original domain. This approach generally exhibits higher resistance to content processing operations, such as compression, scaling, and deliberate watermark attacks. In the case of images, watermark embedding most commonly utilizes the Discrete Cosine Transform (DCT) [73,74,75,76,77,78], the Discrete Fourier Transform (DFT) [79,80,81,82], and the Discrete Wavelet Transform (DWT) [83,84,85,86]. Transform-based methods offer numerous advantages over spatial domain watermarking approaches. DCT-based solutions are known for their high resistance to compression. DWT, owing to its multilevel analysis capabilities, increases resistance to basic processing operations, while DFT enables watermark embedding in frequency ranges that are less susceptible to modifications, making the watermark more robust against geometric transformations. Combinations of different transforms enable the integration of their individual strengths, resulting in solutions with greater robustness and higher transparency, such as DCT–DWT [87,88,89,90], DFT–DCT [91,92,93], DFT–DWT [94,95,96], and DFT–DCT–DWT [97].

Similar to the spatial domain approaches, it is possible to apply one-dimensional or two-dimensional transforms, including video signals, provided that watermarking is performed on individual frames without considering temporal context. This approach has been proposed in the following publications for the transforms: DCT [98,99,100,101,102], DWT [103,104,105,106], 1D-DFT [107], 2D-DFT [108,109], as well as their combinations, which aim to improve the method’s efficiency [110,111,112,113].

3.3. Hybrid Domains

Typical hybrid domains, such as the time-frequency and the time-spatial domain, are mainly applied in video signal watermarking. In the case of images, the semantic domain can be used, which is classified as hybrid since it combines elements from both the spatial and frequency domains. However, its key feature is content-level analysis at the semantic level. The watermark is embedded in selected areas of the image or video that are significant from a content perspective. This approach enables the watermark to be hidden in regions that are perceptually or semantically important to the user, making the watermarking process both more resistant to manipulation attempts and less noticeable. Semantic domain methods rely on various image processing techniques, such as segmentation, edge detection, and object detection [114].

3.4. Summary

In summary, the classification of traditional watermarking methods for images and video frames is presented in Figure 5.

4. Deep Learning-Based Watermarking

In recent years, the rapid development of deep learning-based technologies has revolutionized many areas of science, including digital watermarking. Unlike traditional methods, which rely on manually designed features, deep learning algorithms learn optimal data representations, enabling more efficient, robust, and invisible watermark embedding.

Deep learning is a subset of machine learning that utilizes multi-layer neural networks (deep neural networks) [115,116]. Due to their architecture and ability to automatically recognize and learn patterns, DL algorithms can learn to be robust to various types of targeted and untargeted attacks by appropriately defining loss functions. Additionally, neural networks are highly efficient in solving problems that require scalability. This is due to the hierarchical structure of feature learning, the ability to process in parallel, and support for multi-dimensional data [117], making neural networks very effective even in watermarking high-resolution images and videos, such as 4K resolution.

4.1. Deep Learning Architectures Used in Image Watermarking

The fundamental architectures used in image watermarking are convolutional neural networks. These are a type of deep neural networks specifically designed for analyzing data with a matrix-like structure, such as images or video frames, which has ledthem to dominate the processing of such data for years [118]. The network consists of convolutional blocks, which include convolutional layers, activation layers, and pooling layers. The key component are the convolutional layers, which consist of filters whose primary function is to extract features from images, enabling edge detection, texture identification, and the recognition of more complex patterns. In watermarking, CNNs are utilized in watermark encoder and decoder algorithms. The convolutional filters learn to modify input images in such a way that the watermark is embedded, ensuring that the watermark is both transparent and resilient to attacks. The primary advantage of such CNN-based solutions is their simplicity of implementation and flexibility in adapting to different input data and watermarks. The general block diagram of a convolutional network is shown in Figure 6.

Although CNNs form the basis of many watermarking algorithms, to achieve algorithms that meet more stringent requirements in terms of robustness and transparency, it is also necessary to use other architectures. Autoencoders provide an optimized approach to data encoding and decoding data. Their architecture allows data to be transformed into a different (hidden) space, followed by data reconstruction [119]. An autoencoder consists of two parts—Figure 7:

Encoder, which transforms the input data into a lower-dimensional representation, with the goal of reducing the data size while capturing the most important features of the input,
Decoder, which reconstructs the data based on the representation by applying reverse transformations to those used in the encoder.

Autoencoders are trained in an unsupervised manner, minimizing the difference between the input and output data by using an appropriately selected cost function [120,121,122]. The architecture is a natural choice for watermarking tasks because the encoder–decoder structure mirrors the process of embedding and extracting a watermark.

Another approach for creating more efficient watermarking algorithms is the use of Generative Adversarial Networks. This architecture relies on two competing neural networks [123]—Figure 8:

Generator, based on provided features or random noise, learns to generate new data,
Discriminator attempts to distinguish between real data and data generated by the generator.

In watermarking applications, an extension of this concept is typically used, namely DCGAN (Deep Convolutional Generative Adversarial Networks)—a GAN architecture built on convolutional networks [124]. The generator learns to embed the watermark into the provided data in an invisible manner, while the discriminator, acting as a critic, evaluates whether the watermark has been properly hidden. Due to their operating principles, GANs enables the development of algorithms characterized by high transparency and are also easily adaptable to different types of data.

The main drawback of GANs is their instability, as the competition between the generator and the discriminator makes them quite difficult to train. A potential solution to these problems lies in the use of diffusion models. These are generative structures that, through the use of a reverse diffusion process, learn to generate or reconstruct data from random noise [125,126]. The input data is iteratively noised in a controlled manner until it becomes completely random noise. Then, the model learns the reverse process—recovering the original data from the noisy data. The conceptual diagram of the model is shown in Figure 9. Diffusion models are not yet widely adopted in watermarking solutions, but due to their noise removal mechanism, they can be highly effective in extracting watermarks even after various types of attacks.

Currently, Transformer models are increasingly being used, surpassing the efficiency of their predecessors. A Transformer architecture consists of two components: an encoder with an attention mechanism that transforms the data into an internal representation, and a decoder with an attention mechanism that, based on the input representation and the target sequence, predicts the next sequence elements with a given probability [127]—Figure 10.

The central element of Transformers is the attention mechanism, which enables the capture of dependencies between components of the input sequence, regardless of their relative distance. This mechanism, known as self-attention, analyzes how each element of the sequence is related to the others. In image watermarking applications, a specialized version called spatial attention [128] is used, which focuses on spatial relationships in the input data, such as between pixels or pixel blocks in images. The spatial attention mechanism helps identify key areas for analysis, such as the optimal location for watermark embedding.

For image processing applications, a dedicated variant of the Transformer, the Vision Transformer (ViT), was developed. Unlike traditional convolutional neural networks (CNNs), ViTs divide the image into smaller fragments (called patches), which are then transformed into vectors and processed through the self-attention mechanism. This approach allows the model to capture global dependencies within the image. While Vision Transformers [129] can achieve higher performance compared to CNNs, they require a larger amount of training data to be effective.

4.2. Overview of Deep Learning-Based Image Watermarking Algorithms

Given the diversity of visual data, varying resolutions, and applications requirements, different solutions are employed, each tailored to specific problems. These solutions have been described and classified in literature reviews [130,131,132].

The first attempts to apply deep learning in image watermarking emerged around 2017. One of the initial approaches was based on a CNN that operates similarly to an autoencoder [133]. Two independent CNNs generate two image sets, created as a codebook, which are then permuted using cryptographic keys. For each bit of the watermark, an appropriate pair of codebook codes is selected and embedded into the image. This method enables the embedding of a 64 × 64-pixel watermark into a 128 × 128-pixel image, offering resistance to common image processing attacks and JPEG compression while ensuring a high level of security due to the use of cryptographic keys. In [134], the authors proposed a blind watermarking method based on CNNs using an end-to-end approach in which embedding and extraction processes are optimized together within a single architecture. The neural network embeds a 1-bit watermark in each sub-block of the image with dimensions 8 × 8 pixels. Subsequently, selected geometric attacks and signal processing operations are simulated to enhance the algorithm’s robustness.

In subsequent work, architectures based on the use of two or three main modules—a watermark encoder, decoder, and, optionally, a module simulating attacks on the watermark—became dominant. In [135], the authors proposed an architecture that enables embedding a watermark in the form of an audio file into an image. Two neural networks were utilized: WM Network and Similarity Network. The WM Network consists of the Encoder and Decoder modules, which map the watermark using LSTM, and Embedder and Extractor, which are based on convolutional layers. Similarity Network compares the original watermark with the extracted one, allowing the evaluation of the method’s effectiveness. A similar approach was adopted in [136]. This is a blind algorithm based on CNNs, designed to extract the watermark from images captured using mobile phones. The architecture includes components responsible for mapping and demapping the watermark, as well as its embedding and extraction, all composed of convolutional layers. A key element of the algorithm is the Invariance Layer, which affects the algorithm’s resistance to attacks. The module is based on fully connected layers and allows for dispersing information across different parts of the image.

In [137], the authors also used a CNN-based encoder and decoder, but during the preprocessing of the host image, they applied the Wavelet Transform and an additional convolutional network responsible for watermark preprocessing. Moreover, the authors implemented an attack simulator to enhance the algorithm’s robustness against basic image processing operations. A similar architecture, based on an encoder and decoder composed of convolutional layers along with an attack simulator, was implemented in [138]. The authors employed DWT before embedding the watermark and IDWT (Inverse Discrete Wavelet Transform) after embedding to restore the image to its original domain. The watermark is in the form of a binary image with dimensions of 32 × 32 pixels, eliminating the need for an additional watermark processing block.

Lu et al. [139] also used the Wavelet Transform but incorporated it within the embedding algorithms. As the encoder, they employed the U-Net autoencoder architecture [140], in which the central convolutional and deconvolutional blocks were replaced with DWT and IDWT blocks. In the decoder, convolutional layers and DWT blocks were implemented. Between the encoder and decoder blocks, an attack simulation block inspired by StegaStamp was introduced. StegaStamp [141] is a solution that enables the embedding of hyperlinks into images. The encoder uses a convolutional network, also similar to U-Net. A key component of the system is the distortion simulation block, which, unlike most watermarking techniques, enhances the algorithm’s resistance to distortions introduced during the printing of images. This feature allows the method to be used in real-world physical scenarios, such as printed photographs or billboards.

In order to increase the transparency of the algorithm, the authors in [142] utilized an autoencoder architecture in both the watermark encoder and decoder. In the encoding algorithm, the watermark embedding occurs in the latent space, after which the image is scaled back to its original resolution. In the decoder, a denoising autoencoder is first used to reduce the effects of noise and other distortions (if present). Subsequently, two encoders are employed to extract the watermark based on both the original image and the image with the embedded watermark. The network training is conducted in two stages.

In subsequent approaches, the encoder–decoder architecture was enhanced with an attention mechanism. Dasgupta and Zhong [143] proposed a solution utilizing the multi-head cross-attention mechanism (MHA) [127], which enables the model to learn mutual dependencies between two different data sequences, in this case, between the watermark and the host image. Additionally, the authors employed representation learning in an invariant domain using a triplet loss function [144], which optimizes the distances between images containing the same watermark content (anchor and positive) while maximizing the differences between them and a negative image (any other image). This approach improves the robustness and the model’s ability to learn the watermarking pattern.

The second group of algorithms consists of GAN-based approaches, where a discriminator module is added to the encoder and decoder blocks. The watermark encoder functions as a data generator, while the discriminator acts as a critic, assessing the visibility of the watermark. The first such solution was the HiDDeN architecture [145] proposed by Zhu et al. It is an end-to-end solution based on three CNNs. The encoder embeds a bit sequence into the host image; the watermarked image is then processed through a distortion layer, after which the decoder extracts the watermark. The discriminator verifies whether the watermarked image is sufficiently similar to the original image, allowing the method to achieve high transparency.

Wen and Aydore proposed an improvement to this approach by developing the ROMark algorithm [146]. The authors utilized the HiDDeN architecture, introducing min-max optimization, which involves optimizing losses under the worst-case scenario. This is achieved through the implementation of a dynamic noise layer that iteratively generates the most challenging possible distortions. Additionally, the range of applied distortions was expanded, and gradient propagation was improved to facilitate more efficient network training. The HiDDeN algorithm was also used in [147], where the authors extended the distortion module by adding a rotation layer and an additive noise layer. They also modified the loss function, which enabled a better trade-off between robustness and transparency. In [148], the authors similarly focused on optimizing the learning process proposed in [145]. They introduced a two-stage training process, training the encoder and decoder for the base model, and training separate decoders for different types of distortions in the second stage to better enhance the watermark’s resistance to various types of attacks.

A completely different GAN-based model was proposed in [149]. The authors utilized the Inverse Gradient Attention (IGA) mechanism, which dynamically identifies image pixels most resistant to distortions and assigns them higher weights during the message-hiding process. Additionally, the encoder module enables the compression of binary messages into real numbers, allowing the embedding of a larger number of bits (256 bits) without affecting image transparency. A similar approach, employing a different algorithm, was presented by Hao et al. [150]. The authors applied attention mechanisms in both the encoder (generator) and decoder to identify areas most resistant to disturbances and focus on them during watermark embedding. The implemented discriminator not only serves as a critic but also evaluates the visual quality of images and supports robustness optimization. A dynamic disturbance layer was also incorporated to simulate multiple attacks simultaneously. Attention mechanisms were further utilized in [151]. The ARWGAN model employs an attention mechanism in the encoder to generate an attention mask, enabling the placement of the watermark in the most optimal image regions. Additionally, a Feature Fusion Module was used to extract image features and leverage them to enhance robustness. The authors also implemented a Noise Subnetwork to simulate various types of distortions. In [152], the authors applied a GAN-LSTM structure combined with the Adaptive Gannet Optimization algorithm, which, similar to previous solutions, enables the selection of optimal locations for watermark embedding. The watermark undergoes preprocessing using DWT and Schur decomposition and is subsequently chaotically encrypted. Incorporating the LSTM architecture into the algorithm allows for better management of the watermark extraction process, improving both accuracy and robustness.

The previously described attention-based solutions enabled the dynamic identification of optimal image regions for watermark embedding. An extension of this concept involves the use of more advanced architectures, such as Transformers, which bring new capabilities to watermarking-related challenges. In [153], the authors implemented Transformers for both text processing and visual Transformers for image feature extraction as part of the proposed text embedding algorithm. The text is encoded into a vector representation with dimensions 16 × 64 using a Transformer, while the encoder and decoder are based on the ViT architecture. To improve algorithm robustness, noise was added to the encoded text representation, and common image distortions were applied to the watermarked image. In [154], a ViT was also employed for image authentication, manipulation detection, and image recovery. The source image undergoes preprocessing using several transforms: Discrete Wavelet Transform, Integer Wavelet Transform, Schur decomposition, and Curvelet Transform, the latter enabling the identification of high-entropy areas suitable for watermark embedding. The model generates encoded feature maps to serve as watermarks. Additionally, an authentication key is generated using Singular Value Decomposition.

Liu et al. [155] proposed a two-stage approach for watermark embedding using Transformers. The input images are transformed using DWT, enabling decomposition into frequency coefficients, with the watermark embedded in the low-frequency components to increase robustness. The attention module stores information about both low and high frequencies, enhancing the stability and accuracy of image reconstruction. As part of the two-stage embedding procedure, the encoder and decoder are first trained to embed and extract the watermark, and then a reversible information embedding procedure is introduced based on the first stage (freezing the encoder weights) to increase robustness. In [156], a transformer was combined with a GAN structure through the implementation of a discriminator that supports the transparency of the image with the embedded watermark. In the encoder, self-attention-based preprocessing was used to expand the watermark features and increase embedding efficiency by evenly distributing the watermark information across the image. Additionally, a Feature Enhancement Module was utilized to identify relationships between the image and the watermark using cross-attention, and a Soft Fusion Module, responsible for the final fusion of image and watermark features based on self-attention and cross-attention. The solution also includes a Noise Layer responsible for simulating disturbances and increasing robustness.

In [157,158], the authors employed the Swin Transformer architecture [159], which was designed for image processing and is characterized by better global information flow and lower computational complexity than ViT. In [157], the Swin Transformer was combined with CNN. The watermark encoder is based on CNN and blocks Squeeze-and-Excitation (SE), which support the extraction of important features. In the decoder, CNN was innovatively used to analyze local features, Swin Transformer, which allows for hierarchical feature processing and improves global feature representation, and the Identity module, which, through residual connections, facilitates feature extraction. Additionally, a Multi-scale Attentional Feature Fusion module (MA-FFM) was implemented to iteratively combine global and local features. In [158], the END (Encoder-Noise-Decoder) structure was used. The watermark encoder is based on the U-Net architecture, supplemented with a Locally-Channel Enhanced Swin Transformer Block, which utilizes window-based self-attention, and a Frequency-Enhanced Transformer Block, which applies an attention mechanism in the frequency domain using the cosine transform. The window-based self-attention mechanism involves dividing the image into smaller areas (windows) and analyzing dependencies only within the window area rather than globally across the entire image, followed by window shifting. The decoder structure is similar to the encoder but uses additional attention mechanisms to precisely recover the watermark.

The next significant step in the field of watermarking was the use of diffusion models, which allow for even greater resistance to disturbances and better control over the structure of embedded information. Initial research on watermarking in the context of diffusion models focused primarily on the detection and marking of images generated by these models [160,161,162]. In the subsequent phase, diffusion algorithms themselves began to be used for embedding and extracting watermarks. One of the first such approaches is the ZoDiac algorithm [163], described by Zhang et al., which utilizes the Stable Diffusion model [164] to embed watermarks in the latent space of the image. The watermark is embedded into the frequency coefficients of the Fourier transform, making it more resistant to attacks and manipulations, including generative attacks (Stable Diffusion-based removal attacks). Stable Diffusion is a text-to-image model developed for text-to-image generation. ZoDiac uses a pre-trained model with a denoising step of 50 to reconstruct the image in the latent space, which reduces the time and resources required to deploy the system.

The WaterDiff method proposed in [165] is also based on a pre-trained diffusion model. The host image is transformed by two separate encoder modules into a latent feature vector of high-level features and a matrix of low-level features. The watermark is then embedded into the coefficients of the wavelet transform, and the final watermarked image is reconstructed into its original form using the diffusion model. The extraction process proceeds analogously and relies on the use of the pre-trained probabilistic model. A distinctive feature of this approach is its very high watermark capacity of 1 bit per pixel (bpp) and the flexible ability to adjust the frequency subspace in which the watermark is to be hidden.

In [166], the SuperMark algorithm is described, which is also based on the use of a pre-trained diffusion model, but dedicated to super-resolution (SR). The watermark embedding process involves transforming the input image to a lower resolution compatible with the SR model input, embedding the watermark in the latent space through Gaussian Shading, and denoising and reconstructing the image with the embedded watermark back to its original resolution. During watermark extraction, the image is again scaled to a lower resolution and transformed into the latent space using variational autoencoder. The use of the diffusion model allows the reconstruction of the original Gaussian noise, from which the watermark bits are extracted.

4.3. Summary of Deep Learning-Based Image Watermarking Algorithms

The development of image watermarking techniques based on deep learning has followed the evolution of neural network architectures. Figure 11 illustrates a historical timeline of key deep learning architectures (marked in green) alongside their applications in watermarking (marked in orange). The watermarking methods referenced here were described in the previous subsection. The emergence of convolutional networks (1980) and their subsequent popularization (2012) revolutionized image processing. Furthermore, the introduction of deep autoencoders (2006) further expanded the capabilities of neural networks, leading to advanced image encoding techniques such as U-Net (2015).

The first application of CNN in watermarking appeared in 2017, and by 2018, the groundbreaking HiDDeN model was developed—the first deep learning-based watermarking method utilizing GANs. Simultaneously, the introduction of Transformers (2017) and their improvements dedicated to image processing (ViT, 2020; Swin Transformer, 2021) paved the way for watermarking applications leveraging the attention mechanism. In 2022, watermarking methods based on ViT emerged, followed by those utilizing the Swin Transformer in 2023.

The latest trend in image watermarking is the use of diffusion models, which, although introduced in 2015, gained significant popularity in 2023 following the publication of the Stable Diffusion model, capable of text-to-image conversion. Diffusion-based methods have the potential to significantly enhance resistance to adversarial attacks (particularly the increasingly common generative attacks) by embedding watermarks in the latent space.

It is worth noting the decreasing time between the development of a new architecture and its application in watermarking. In the early stages, technologies such as convolutional networks or autoencoders were used for watermarking only several years after their inception. However, with the advancement of deep learning, increased availability of pre-trained models, and improved computational resources, this time has significantly decreased: 4 years for the GAN architecture, and 2 years for Transformers and Swin Transformers. The most dynamic progress can be observed with Stable Diffusion, which was adapted for image watermarking within approximately one year after its release through the ZoDiac architecture [163]. This trend suggests that new technologies are being implemented in watermarking almost immediately after their development, driven by the growing availability of pre-trained models and the increasing demand for more resilient methods to protect digital content.

The evolution of image watermarking methods has led to the development of techniques with diverse properties and applications. Table 2 presents a comparison of key methods described in the previous subsection. Due to significant inconsistencies in experimental conditions across the surveyed studies—including variations in image resolution, watermark capacity, attack types, and evaluation metrics—it is not feasible to provide a standardized quantitative comparison of metrics such as PSNR, SSIM, or BER. As a result, Table 2 focuses on qualitative attributes that remain relatively comparable across works, such as architectural design, general watermark capacity, and robustness to commonly reported attacks.

The architectural changes in watermarking methods can be seen not merely as technical refinements, but as reflections of an evolving design philosophy—one that seeks to address the three fundamental challenges of image watermarking: embedding capacity, perceptual transparency, and robustness against attacks. The comparative analysis across CNNs, GANs, Transformers, and diffusion models clearly reflects how different design strategies prioritize or balance these criteria. The first approaches, based on CNNs in encoder–decoder architectures [134] or autoencoders [133] offer simplicity and computational efficiency, but may lack resistance to more complex manipulations. GAN-based solutions improved watermark concealment and realism, at the cost of stability and training complexity. Transformer-based models introduced greater capacity and robustness through attention mechanisms, while diffusion models provide enhanced resilience against generative attacks by embedding in the latent space. With the advancement of technology, the capacity of embedded watermarks has significantly increased, reaching values of bpp = 1 in some methods [165]. Additionally, a growing number of modern watermarking methods are being designed to be resolution-independent, allowing for effective application in HD (high definition) and 4K materials [167,168]. Thus, the architectural perspective adopted in this review implicitly captures the evolution of trade-offs between robustness, invisibility, and capacity—a key concern in watermarking system design.

5. Datasets for Image Watermarking

In principle, any image dataset, whether labeled or unlabeled, can be used to train a watermark embedding and extraction algorithm. However, in the case of deep learning-based methods, the quality and diversity of the training data play a crucial role in determining the model’s effectiveness. A well-chosen dataset enables better model generalization, which enhances resistance to attacks and improves watermarking effectiveness under various conditions [169]. Below are the key criteria that a dataset should meet to be suitable for training a watermarking algorithm:

High resolution—Modern watermarking methods should be tested not only on standard images of 128 × 128 or 256 × 256 pixels but also on high-resolution images such as HD (1080p) and 4K, which is essential for practical applications such as multimedia content protection [170];
Content diversity—The dataset should include both real-world images (landscapes, faces, animals, objects, vehicles) and graphics or textures. This is particularly important for methods utilizing attention mechanisms, which rely on contextual relationships between image elements [129];
Open access—Free and open access to data facilitates research replication and the comparison of different solutions’ effectiveness, forming the foundation for reliable evaluation of watermarking methods [171];
High visual quality—Images should be artifact-free, clear, and detailed, allowing for precise evaluation of watermark transparency and its impact on the visual quality of the image after embedding and extraction [172];
No lossy compression—Lossless formats (e.g., TIFF or PNG) are preferred to avoid artifacts resulting from lossy compression (e.g., JPEG), ensuring a reliable assessment of the watermarking method’s resistance to image degradation [173];
Synthetic and real images—With the growing popularity of generative models such as DALL-E [174] and Midjourney, there is an increasing need to watermark content generated by artificial intelligence. Therefore, the dataset should include both real and synthetic images to ensure algorithm effectiveness in both contexts [175].

Considering the above features when selecting a dataset is crucial for optimizing the effectiveness of watermarking methods and their resistance to various attack scenarios. Based on these criteria, the datasets used can be divided into four categories:

Benchmark datasets—These are classical image sets widely used in image processing and deep learning research. They are characterized by standard resolutions, usually 256 × 256 pixels, and a high diversity of content, enabling versatile usage;
High-resolution datasets—These include images with HD and 4K resolutions. Originally intended for training super-resolution algorithms, they are now successfully used to evaluate watermarking effectiveness in real-world applications where high visual quality is essential;
Synthetic datasets—Comprising images generated by AI models, these datasets feature visual characteristics that differ from those of real-world images. As a result, models trained on such datasets may require adapted watermarking methods to perform effectively;
Specialized datasets—These include images from specific fields, such as medicine, geoinformatics, security, or digital documents, where watermarking plays a key role in ensuring data integrity and authenticity. Such images are characterized by a high level of detail and specific visual features, often necessitating tailored watermarking approaches.

Table 3 presents an overview of the most commonly used datasets in image watermarking research. Key attributes are included, such as resolution, number of images, and thematic content. For each dataset, its main limitations and typical applications in watermarking experiments are also indicated. This aims to facilitate the evaluation of their suitability for various research objectives, such as testing embedding capacity, robustness, or imperceptibility. All listed datasets are publicly available resources.

Most datasets used in image watermarking research were not originally created for watermarking purposes but were primarily developed for classification [176,177,178,185,189], identification [188,190], or image segmentation [177,179]. Nevertheless, due to their diversity, high quality, and wide availability, they are successfully adapted to evaluate watermarking methods in terms of both resistance to attacks and transparency verification. In particular, benchmark datasets and high-resolution datasets enable the evaluation of watermarking methods under realistic conditions, while synthetic datasets are playing an increasingly significant role in watermarking AI-generated content.

6. New Research Directions and Challenges in Image Watermarking

The dynamic development of deep learning-based watermarking methods has made image watermarking not only more efficient and effective but also more technically and implementationally complex. Traditional approaches based on spatial and frequency domain transforms are gradually losing their effectiveness in the face of the increasing number of generative attacks and the growing need to ensure resistance to various visual distortions. As a result, watermarking is encountering new challenges and development directions, which define current research priorities and determine the future of this technology in the context of digital content protection [191].

6.1. Key Challenges in Implementing Image Watermarking Systems

The practical implementation of deep learning-based image watermarking solutions involves a range of technological, performance, and legal challenges. Methods utilizing artificial intelligence algorithms are significantly more computationally complex than their classical counterparts, which rely on much simpler mathematical algorithms [192]. Their implementation requires advanced models such as autoencoders, convolutional networks, generative networks, or transformers, all characterized by a high number of parameters and the necessity to operate on large datasets [193]. This issue becomes particularly significant when dealing with high-resolution images (e.g., 4K or 8K), where both the embedding and extraction processes demand substantial computational power.

In addition to computational complexity, a fundamental challenge in the practical design of watermarking systems remains the inherent trade-off between three key objectives: robustness, perceptual transparency, and embedding capacity. Achieving a balance between these aspects is non-trivial, as improvements in one dimension often lead to compromises in another. For example, highly robust watermarking methods—especially those based on diffusion models [141]—tend to introduce higher computational costs and longer processing times, while increasing capacity may impact transparency. In particular, diffusion-based approaches, though capable of superior robustness against generative and latent-space attacks, remain difficult to optimize for real-time or resource-constrained applications due to their iterative nature and substantial hardware demands. As a result, ongoing research increasingly focuses not only on model efficiency but also on understanding how architectural choices influence this robustness–transparency–capacity trade-off in practical deployment scenarios.

This trade-off becomes particularly evident in the recent architectures, where excessive computational load and prolonged data processing time constitute a major barrier to real-time deployment. In diffusion-based watermarking methods [194], despite their high robustness and transparency, impose substantial hardware and memory requirements, limiting their applicability in resource-constrained environments.

From a deployment perspective, it is also important to distinguish between training and inference costs in deep learning–based watermarking systems. While training phases—particularly for transformer- and diffusion-based models—are typically performed offline and can leverage high-performance computing resources, inference-time requirements directly affect system feasibility in real-world applications. Large model sizes, memory footprints, and iterative generation processes may limit the applicability of such approaches in real-time, embedded, or resource-constrained environments. Consequently, practical watermarking deployments often prioritize architectures with moderate model size and predictable inference latency, even at the cost of reduced robustness against advanced attacks.

From a quantitative perspective, deep learning architectures used in watermarking differ substantially in scale and computational cost. CNN-based watermarking models typically consist of only a few million parameters and allow single-pass inference, making them suitable for real-time processing on standard GPUs or even high-end embedded devices. In contrast, transformer-based architectures often require tens to hundreds of millions of parameters, while diffusion-based watermarking systems involve iterative generation processes with dozens or hundreds of diffusion steps, resulting in significantly higher inference latency and memory consumption [195]. These order-of-magnitude differences, rather than exact numerical values, are the dominant factor determining deployment feasibility across application scenarios [196].

In the face of increasing computational demands, there is a need to develop techniques that reduce the complexity of deep learning models in the field of watermarking, where embedding effectiveness, processing time, and resource efficiency are crucial. One approach is the use of lightweight architectures designed to require fewer parameters and computational operations while maintaining acceptable processing quality [197]. Examples include models like MobileNet [198] and SqueezeNet [199], which, thanks to a reduced number of filters and innovative convolutional layers, can efficiently operate even on mobile devices.

Another popular strategy is weight pruning, which involves eliminating model parameters that have minimal impact on the output. This reduces the number of insignificant connections in the neural network while maintaining comparable accuracy [200,201]. An alternative method is quantization, which does not eliminate parameters but reduces their bit precision, thereby decreasing model size and accelerating computations [202].

In the context of diffusion models, modifications are gradually emerging to reduce the number of iterations required to achieve high-quality results. Examples include methods that integrate the inference process into the training phase for joint optimization [203] or those aimed at accelerating sampling [204].

Transfer learning is also widely used, allowing previously trained models to be adapted to new domains or tasks, thus shortening training time and reducing data requirements. In diffusion-based learning, pretrained models are utilized to denoise images with embedded watermarks in the latent space, ensuring high transparency and bypassing the costly process of training a model from scratch [163].

Despite the implementation of numerous techniques to limit computational complexity, the scalability of watermarking systems remains a challenge. With the growing number of users, the diversity and volume of image data, and the demand for real-time processing, traditional solutions may prove inadequate [205]. Designing and deploying scalable methods require both suitable hardware architectures and efficient distributed algorithms capable of dynamically managing computational and memory resources [206].

Cloud infrastructure is one of the most popular solutions for improving the scalability of watermarking systems. The cloud enables, among other things, the centralization of deep learning models [207], which significantly simplifies the process of updating and deploying new versions without user-side intervention. Additionally, serverless services allow for automatic scaling in response to changing loads [208].

Another approach is edge computing, which shifts part of the computational operations closer to the data source—directly onto mobile devices or edge servers [209]. This is particularly important for real-time watermarking systems, as it minimizes latency during live transmission, facilitates watermarking of images in IoT systems, and enables fast multimedia content authentication [210]. Edge computing reduces network load since images do not need to be transmitted to a centralized server and also enhances data security [211], by allowing local watermarking without leaving the user’s device.

In the training process itself, federated learning is commonly used, enabling deep learning models to be trained using distributed datasets without needing to centralize data on a single server [212]. In watermarking, this approach is especially valuable when data is sensitive or too large to consolidate easily.

From a scalability perspective, federated learning enables efficient utilization of the computational resources of multiple devices simultaneously, reducing reliance on central servers and allowing the training process to scale to thousands of nodes [213]. This enables the parallelization of training process without increasing network load and mitigates bottlenecks associated with transmitting large datasets. An added advantage is the model’s adaptability to local data, which can improve watermarking effectiveness in specific conditions and for particular end devices. Federated learning is also used in training diffusion models, significantly reducing the number of parameters while maintaining high output image quality [214].

Ensuring scalability and efficiency in deep learning-based watermarking systems requires a multidimensional approach that combines architecture optimization, effective resource management, and technology adaptation to specific conditions. The solutions described demonstrate significant potential in reducing computational costs, improving response times, and enhancing data processing security. However, their effectiveness depends on multiple factors, with the greatest efficiency often achieved through a synergistic combination of several techniques. Integrated approaches like Edge Intelligence [215], which combine AI, edge computing, and federated learning, are already opening new possibilities for broad watermarking applications in resource-constrained environments and real-time systems.

6.2. Research Directions for Methods and Algorithms

The dynamic development of deep learning in watermarking opens new research directions focused on designing architectures aimed at improving the robustness, efficiency, and transparency of watermarking systems. As shown by the literature review conducted in Chapter 4, two groups of models have begun to dominate recent research: Transformers (Vision Transformer and Swin Transformer) and diffusion models. Both approaches offer an intriguing alternative to previously used solutions, namelyCNNs [216] and generative adversarial networks (GANs) [217].

From a critical standpoint, recent advances in deep learning-based watermarking reveal that the choice of network architecture directly determines not only robustness and transparency but also vulnerability to specific classes of attacks and the feasibility of real-world deployment. Consequently, CNN-, GAN-, Transformer-, and diffusion-based approaches should be viewed as complementary rather than competing solutions, each addressing different design constraints and threat models.

Transformer-based architectures, utilizing various attention mechanism variants, are increasingly applied to diverse image processing tasks [218]. ViT and its improved variant, Swin Transformer, introduce fundamental changes to how images are processed. Unlike traditional CNNs, which analyze images locally, Transformers capture global dependencies between different parts of an image. Swin Transformers further enhance this by introducing a sliding window mechanism, enabling hierarchical image processing and improving both computational efficiency and the ability to analyze local and global features [159].

Diffusion models are generative probabilistic models where image generation (or other processing operations) is performed through iterative noise addition and removal. Unlike GAN models, which often face convergence issues, diffusion model training is much more stable and yields higher quality generated images [115,125,219].

Table 4 and Table 5 present a comparative overview of the key features of CNNs and transformer-based architectures, as well as diffusion models and GANs, with a focus on their applications in image watermarking.

The analysis of the presented data indicates that further research on visual transformers in the context of watermarking should primarily focus on optimizing computational complexity to enhance their potential in real-time systems. Although architectures such as ViT and the Swin Transformer demonstrate high effectiveness in watermark embedding [153,154,157,158], their intensive computational requirements limit practical deployment possibilities. Therefore, an important research direction is the development of lightweight and more efficient versions of transformers capable of effectively operating on end devices such as smartphones or surveillance cameras [220,221,222].

Due to the simplicity and good efficiency of CNNs, combining transformers with convolutional networks in the form of hybrid architectures is a valuable approach. These models, merging the local precision of CNNs with the global context of transformers, can improve watermark durability and resilience while maintaining an acceptable level of computational complexity [223]. With the growing popularity of generative attacks, research should also focus on increasing watermark resistance to such threats [224]. A significant challenge lies in better understanding how the self-attention mechanism affects the placement and durability of the watermark in the image and how it can be used to embed better-hidden yet higher-capacity watermarks [225].

For diffusion models, research mainly focuses on developing technologies that reduce the number of diffusion steps without degrading the quality of the generated images [226]. Traditional models, such as Denoising Diffusion Probabilistic Model or Stable Diffusion, while providing high-quality watermark embedding [163,165,166], are characterized by long generation times, hindering their application in real-time and low-latency environments. In response, efforts are being made to develop accelerated sampling methods and to utilize latent diffusion models, which operate in the latent space [227], significantly shortening watermark embedding time. This approach not only improves watermark resistance against subsequent manipulations but also enables more controlled and precise digital content watermarking [164].

Research in this area also focuses on how the watermark can survive various transformations, such as lossy compression, scaling, and attacks using generative models, which are becoming increasingly common. Additionally, a key development direction involves designing hybrid architectures combining diffusion models with transformers, leveraging the global understanding of images provided by the self-attention mechanism and the stable generation process offered by diffusion [228,229]. Researchers are also exploring methods combining modern neural network architectures with quantum technology [230].

Taken together, the comparisons presented in Table 4 and Table 5 highlight a fundamental trade-off in deep learning–based watermarking. Architectures that provide the highest robustness against generative and latent-space attacks—such as diffusion models and large transformer-based solutions- are currently the most computationally demanding and the least suitable for real-time or resource-constrained deployment scenarios. In contrast, CNN-based and hybrid CNN-Transformer approaches, while offering lower resistance to advanced generative attacks, remain more practical for time-critical and embedded applications due to their reduced computational complexity.

Overall, current research clearly indicates that no single deep learning architecture can simultaneously satisfy all watermarking requirements, including robustness, transparency, capacity, and computational efficiency. As a result, future research increasingly points toward adaptive and hybrid solutions, in which architectural choices are guided by specific application scenarios, threat models, and deployment constraints rather than by the pursuit of a universal, one-size-fits-all watermarking framework.

6.3. Application-Oriented Research in Watermarking

Contemporary watermarking is increasingly expanding beyond traditional intellectual property protection [231]. With the use of deep learning, modern watermarking systems are being applied in many key areas, including the identification of AI-generated content, cybersecurity, monitoring systems, and IoT. Below are the current and future applications of watermarking that may dominate this field in the coming years.

6.3.1. Watermarking in Identification of Deepfakes

The rapid development of generative techniques has made the identification and counteraction of so-called deepfakes a key challenge in digital security [232]. Fake multimedia content, generated by artificial intelligence, is becoming increasingly realistic and difficult to distinguish from authentic materials, posing serious threats in the realms of politics, security, and privacy [233].

The use of watermarking techniques is playing an increasingly important role in the recognition of AI-generated content [234], including deepfakes detection. Watermarking algorithms can be employed to embed invisible markers during the image generation process, enabling subsequent verification of whether a given material was synthetically generated. An example of such an approach is SynthID [235], developed by Google, which integrates invisible watermarks into AI-generated content to facilitate their identification. However, ensuring the resilience of these markers against attempts at removal or modification is crucial for effectively combating disinformation on the Internet. Research must also consider techniques that are resistant to common deepfake attacks and operations, such as face alterations, voice modulation, or background modifications.

An important emerging research area is the embedding of metadata in deepfakes. By embedding source information or generation timestamps directly into multimedia content, it becomes possible to detect manipulations and track the origin of materials. Standards such as the Content Authenticity Initiative (CAI) and the Coalition for Content Provenance and Authenticity (C2PA) promote this approach, enabling the embedding of metadata in images and other media to verify their authenticity. The implementation of such standards can significantly aid in the fight against disinformation and facilitate the identification of AI-generated content.

6.3.2. Watermarking in Cybersecurity

In the context of cybersecurity, there is a growing number of applications utilizing watermarking technologies in various ways—often referred to as Cyber Watermarking [236]. A highly promising development direction is the integration of modern watermarking techniques with blockchain technology [237]. Combining these two solutions enables the creation of systems that not only mark digital content but also store cryptographic hashes of the content in a decentralized database [238]. This allows any user to verify the authenticity of multimedia files without concerns about unauthorized manipulation within the chain.

Watermarking is also gaining popularity as an authentication mechanism in both commercial and governmental applications. Embedding digital signatures into images and other documents can serve to confirm their authenticity and prevent unauthorized modifications [239]. Research in this area mainly focuses on increasing watermark resilience against compression, format conversions, and content edits that may occur during data transmission or archiving.

6.3.3. Watermarking in Monitoring and IoT Systems

The dynamic development of Internet of Things technologies and video surveillance systems has opened new perspectives for the application of watermarking, extending beyond traditional multimedia content protection. Modern surveillance systems, integral to smart cities and advanced security infrastructures, rely on the credibility of collected video data. In this context, watermarking algorithms play a crucial role in ensuring the integrity and authenticity of recordings, enabling rapid identification of visual materials and detection of manipulation attempts [240].

One of the most important applications of watermarking in surveillance systems is the verification of recording authenticity for criminal investigations and legal proceedings. Recordings containing an embedded watermark can serve as reliable evidence in court cases, confirming that the material has not been altered since its capture. When data is transmitted over networks or stored long-term, watermarks enable material tracking, allowing any attempt to replace or delete portions of the footage to be quickly detected.

Beyond traditional surveillance systems, watermarking is gaining significance in IoT ecosystems, which include a wide range of image-capturing devices such as drones, wearable cameras, and systems in autonomous vehicles [241]. In such cases, watermarking serves not only to identify the data source and control access but also to verify recording authenticity in real time.

One of the key challenges associated with the use of watermarking in monitoring systems is ensuring resistance to varying environmental conditions. Recordings can be susceptible to distortions caused by lighting changes, weather conditions, or recording angles.

7. Conclusions

The rapid development of deep learning technologies in recent years has significantly impacted the field of digital image watermarking, offering new possibilities for improving transparency, robustness, and embedding capacity of watermarking systems. This article provides a comprehensive review of both traditional watermarking methods and the latest deep learning-based approaches, covering architectures such as convolutional neural networks, generative adversarial networks, Vision Transformers, Swin Transformers, and diffusion models. The conducted analysis shows that, while deep learning-based methods offer significant advantages in terms of embedding effectiveness and attack resistance, they also present challenges related to high computational requirements, the need for large datasets, and architectural complexity.

The analysis indicates that the latest architectures —particularly transformers and diffusion models, offer a promising balance between transparency and robustness in watermarking systems. Compared to older neural network architectures, these models can effectively conceal watermarks without noticeably degrading image quality, while demonstrating high resilience against generative attacks, including those performed using advanced deep learning models. This is particularly important in the context of rapidly developing generative technologies, which pose an increasing threat to the integrity of digital content. These trends indicate that transformers and diffusion models playing an increasingly significant role in watermarking research in the coming years. However, it is important to emphasize that other architectures, such as CNNs and GANs, will not be entirely phased out. On the contrary, there is already a significant rise in the popularity of hybrid solutions that combine the strengths of different models—for example, systems integrating transformers with convolutional networks or transformers with diffusion models. Additionally, there is growing interest in supporting classical neural methods with quantum technologies, which may open new perspectives for enhancing performance and increasing the robustness of watermarking systems.

Overall, while deep learning-based watermarking methods have greatly expanded the possibilities for digital content protection, the challenges associated with their practical implementation and the dynamic evolution of generative technologies make further research in this area essential. Moreover, technological changes and an expanding range of applications clearly signal a sift away from an era in which watermarking was primarily associated with copyright protection. With the rapid development of new use cases—such as cybersecurity, Internet of Things (IoT), authentication systems, and multimedia forgery detection—watermarking is becoming crucial for ensuring data integrity, authenticity, and security. Thanks to these new possibilities, watermarking could evolve into a fundamental component of authorization systems and become one of the main tools for content origin identification and credibility verification in the digital ecosystem of the future.

Author Contributions

Conceptualization, M.B. and Z.P.; methodology, M.B.; validation, J.M.Ż. and Z.P.; formal analysis, M.B.; resources, M.B.; writing—original draft preparation, M.B.; writing—review and editing, Z.P. and J.M.Ż.; visualization, M.B.; supervision, Z.P. and J.M.Ż.; project administration, Z.P.; funding acquisition, Z.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Military University of Technology, Faculty of Electronics, grant number UGB 22-054/2025, titled: “New Neural Network Architectures for Signal and Data Processing in Radiocommunications and Multimedia”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created in this study. This survey analyzes and reviews publicly available datasets commonly used in image watermarking research, including ImageNet, COCO, CIFAR-10/100, Pascal VOC, BOSSBase, DIV2K, Flickr2K, LAION-5B, CIFAKE, ArtiFact, ImagiNet, NIH Chest X-ray, EuroSAT, and LFW. All datasets referenced in this article are openly accessible through their respective repositories as cited in the manuscript. Detailed dataset descriptions, resolutions, and usage contexts are provided in Table 3.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CNN	Convolutional Neural Network
DCT	Discrete Cosine Transform
DFT	Discrete Fourier Transform
DWT	Discrete Wavelet Transform
GAN	Generative Adversarial Network
IoT	Internet of Things
PSNR	Peak Signal-to-Noise Ratio
SSIM	Structural Similarity Index
ViT	Vision Transformer
VIF	Visual Information Fidelity

References

Spaiser, V. Chapter 19: Digital Data and Methods. In Sociology, Social Policy and Education; Elgar Publishing: Cheltenham, UK, 2021; ISBN 978-1-78990-685-1. [Google Scholar]
Seetharamu, S.; Cn, L.M.; Bhattacharya, A.; Bt, D.C. Digital Data Protection Laws: A Review. Int. J. Sci. Res. Sci. Eng. Technol. 2024, 11, 64–75. [Google Scholar] [CrossRef]
Czetwertyński, S. Digital Piracy: The Issue of Knowledge of the Institution of Copyright Law. Ekon. Prawo Econ. Law 2023, 22, 69–102. [Google Scholar] [CrossRef]
Townsened, C. The Consequences of Digital Piracy. Available online: https://www.uscybersecurity.net/digital-piracy/ (accessed on 7 October 2024).
Sadiku, M.N.O.; Ashaolu, T.J.; Ajayi-Majebi, A.; Musa, S.M. Digital Piracy. IJSCIA 2021, 2, 797–800. [Google Scholar] [CrossRef]
Javed, A.R.; Jalil, Z.; Zehra, W.; Gadekallu, T.R.; Suh, D.Y.; Piran, M.J. A Comprehensive Survey on Digital Video Forensics: Taxonomy, Challenges, and Future Directions. Eng. Appl. Artif. Intell. 2021, 106, 104456. [Google Scholar] [CrossRef]
Walczyna, T.; Piotrowski, Z. Quick Overview of Face Swap Deep Fakes. Appl. Sci. 2023, 13, 6711. [Google Scholar] [CrossRef]
Petitcolas, F.; Anderson, R.; Kuhn, M. Information Hiding—A Survey. Proc. IEEE 1999, 87, 1062–1078. [Google Scholar] [CrossRef]
Cheddad, A.; Condell, J.; Curran, K.; Mc Kevitt, P. Digital Image Steganography: Survey and Analysis of Current Methods. Signal Process. 2010, 90, 727–752. [Google Scholar] [CrossRef]
Jekateryńczuk, G.; Jankowski, D.; Veyland, R.; Piotrowski, Z. Detecting Malicious Devices in IPSEC Traffic with IPv4 Steganography. Appl. Sci. 2024, 14, 3934. [Google Scholar] [CrossRef]
Rocha, A.; Goldenstein, S. Steganography and Steganalysis in Digital Multimedia: Hype or Hallelujah? Rev. Informática Teórica Apl. 2008, 15, 83–110. [Google Scholar] [CrossRef]
Zhang, Y. Digital Watermarking Technology: A Review. In Proceedings of the 2009 ETP International Conference on Future Computer and Communication, Wuhan, China, 6–7 June 2009; pp. 250–252. [Google Scholar]
Bailey, J. Watermarking vs. Fingerprinting: A War in Terminology. Plagiarism Today 2007. Available online: https://www.plagiarismtoday.com/2007/10/09/watermarking-vs-fingerprinting-a-war-in-terminology/ (accessed on 6 November 2025).
Hosny, K.M.; Magdi, A.; ElKomy, O.; Hamza, H.M. Digital Image Watermarking Using Deep Learning: A Survey. Comput. Sci. Rev. 2024, 53, 100662. [Google Scholar] [CrossRef]
Hu, K.; Wang, M.; Ma, X.; Chen, J.; Wang, X.; Wang, X. Learning-Based Image Steganography and Watermarking: A Survey. Expert Syst. Appl. 2024, 249, 123715. [Google Scholar] [CrossRef]
Piotrowski, Z. Drift Correction Modulation Scheme for Digital Signal Processing. Math. Comput. Model. 2013, 57, 2660–2670. [Google Scholar] [CrossRef]
Piotrowski, Z. Angle Phase Drift Correction Method Effectiveness. In Proceedings of the Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2009, Poznan, Poland, 24–26 September 2009; pp. 82–86. [Google Scholar]
Piotrowski, Z.; Lenarczyk, P. Blind Image Counterwatermarking—Hidden Data Filter. Multimed. Tools Appl. 2017, 76, 10119–10131. [Google Scholar] [CrossRef]
Hua, G.; Huang, J.; Shi, Y.Q.; Goh, J.; Thing, V.L.L. Twenty Years of Digital Audio Watermarking—A Comprehensive Review. Signal Process. 2016, 128, 222–242. [Google Scholar] [CrossRef]
Uddin, M.S.; Ohidujjaman; Hasan, M.; Shimamura, T. Audio Watermarking: A Comprehensive Review. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 1410–1418. [Google Scholar] [CrossRef]
Shelke, R.D.; Nemade, M.U. Audio Watermarking Techniques for Copyright Protection: A Review. In Proceedings of the 2016 International Conference on Global Trends in Signal Processing, Information Computing and Communication (ICGTSPICC), Jalgaon, India, 22–24 December 2016; pp. 634–640. [Google Scholar]
Al-khafaji, A.; Alwan; Nur, N.; Sjarif, N.N.A. Digital Text Watermarking Techniques Classification and Open Research Challenges: A Review. Technol. Rep. Kansai Univ. 2020, 62, 18. [Google Scholar]
Kaur, M.; Mahajan, K. An Existential Review on Text Watermarking Techniques. Int. J. Comput. Appl. 2015, 120, 29–32. [Google Scholar] [CrossRef]
Kamaruddin, N.S.; Kamsin, A.; Por, L.Y.; Rahman, H. A Review of Text Watermarking: Theory, Methods, and Applications. IEEE Access 2018, 6, 8011–8028. [Google Scholar] [CrossRef]
Begum, M.; Uddin, M.S. Digital Image Watermarking Techniques: A Review. Information 2020, 11, 110. [Google Scholar] [CrossRef]
Wadhera, S.; Kamra, D.; Rajpal, A.; Jain, A.; Jain, V. A Comprehensive Review on Digital Image Watermarking. In Proceedings of the 5th International Conference on Computing Sciences (ICCS 2021), Beijing China, 4–6 December 2021. [Google Scholar]
Sharma, S.; Zou, J.J.; Fang, G.; Shukla, P.; Cai, W. A Review of Image Watermarking for Identity Protection and Verification. Multimed. Tools Appl. 2024, 83, 31829–31891. [Google Scholar] [CrossRef]
Joseph, I.; Mandala, J. Comprehensive Review on Video Watermarking Security Threats, Challenges, and Its Applications. ECS Trans. 2022, 107, 13833. Available online: https://iopscience.iop.org/article/10.1149/10701.13833ecst (accessed on 10 October 2024). [CrossRef]
Paul, R.T. Review of Robust Video Watermarking Techniques. IJCA Spec. Issue Comput. Sci. 2011, 3, 90–95. [Google Scholar]
Dey, A.; Bhattacharya, S.; Chaki, N. Software Watermarking: Progress and Challenges. INAE Lett. 2019, 4, 65–75. [Google Scholar] [CrossRef]
Dalla Preda, M.; Pasqua, M. Software Watermarking: A Semantics-Based Approach. Electron. Notes Theor. Comput. Sci. 2017, 331, 71–85. [Google Scholar] [CrossRef]
Li, Y. Database Watermarking: A Systematic View. In Handbook of Database Security: Applications and Trends; Gertz, M., Jajodia, S., Eds.; Springer: Boston, MA, USA, 2008; pp. 329–355. ISBN 978-0-387-48533-1. [Google Scholar]
Alqassab, A.; Alanezi, M. Relational Database Watermarking Techniques: A Survey. J. Phys. Conf. Ser. 2021, 1818, 012185. [Google Scholar] [CrossRef]
Boenisch, F. A Systematic Review on Model Watermarking for Neural Networks. Front. Big Data 2021, 4, 729663. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Wang, H.; Barni, M. A Survey of Deep Neural Network Watermarking Techniques. Neurocomputing 2021, 461, 171–193. [Google Scholar] [CrossRef]
Vasiljević, I.; Obradović, R.; Đurić, I.; Popkonstantinović, B.; Budak, I.; Kulić, L.; Milojević, Z.; Vasiljević, I.; Obradović, R.; Đurić, I.; et al. Copyright Protection of 3D Digitized Artistic Sculptures by Adding Unique Local Inconspicuous Errors by Sculptors. Appl. Sci. 2021, 11, 7481. [Google Scholar] [CrossRef]
Huang, T.; Huang, J.; Pang, Y.; Yan, H. Smart Contract Watermarking Based on Code Obfuscation. Inf. Sci. 2023, 628, 439–448. [Google Scholar] [CrossRef]
Li, L.; Jiang, B.; Wang, P.; Ren, K.; Yan, H.; Qiu, X. Watermarking LLMs with Weight Quantization. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, 6–10 December 2023. [Google Scholar]
Rathi, P.; Bhadauria, S.; Rathi, S. Watermarking of Deep Recurrent Neural Network Using Adversarial Examples to Protect Intellectual Property. Appl. Artif. Intell. 2022, 36, 2008613. [Google Scholar] [CrossRef]
Lenarczyk, P.; Piotrowski, Z. Parallel Blind Digital Image Watermarking in Spatial and Frequency Domains. Telecommun. Syst. 2013, 54, 287–303. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multiscale Structural Similarity for Image Quality Assessment. In Proceedings of the Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 9–12 November 2003; pp. 1398–1402. [Google Scholar]
Recommendation ITU-R BT.500-13; Methodology for the Subjective Assessment of the Quality of Television Pictures. ITU-R: Geneva, Switzerland, 2023.
Recommendation ITU-T P.910 (10/2023); Subjective Video Quality Assessment Methods for Multimedia Applications. ITU-R: Geneva, Switzerland, 2023. Available online: https://www.itu.int/epublications/publication/itu-t-p-910-2023-10-subjective-video-quality-assessment-methods-for-multimedia-applications (accessed on 19 November 2025).
Vallez, C.; Kucharavy, A.; Dolamic, L. Needle In A Haystack, Fast: Benchmarking Image Perceptual Similarity Metrics At Scale. arXiv 2022, arXiv:2206.00282. [Google Scholar] [CrossRef]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
Ding, K.; Ma, K.; Wang, S.; Simoncelli, E.P. Image Quality Assessment: Unifying Structure and Texture Similarity. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 2567–2581. [Google Scholar] [CrossRef] [PubMed]
Prashnani, E.; Cai, H.; Mostofi, Y.; Sen, P. PieAPP: Perceptual Image-Error Assessment through Pairwise Preference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Wu, J.; Zhou, M.; Zhu, C.; Liu, Y.; Harandi, M.; Li, L. Performance Evaluation of Adversarial Attacks: Discrepancies and Solutions. arXiv 2021, arXiv:2104.11103. [Google Scholar] [CrossRef]
Buzhinsky, I.; Nerinovsky, A.; Tripakis, S. Metrics and Methods for Robustness Evaluation of Neural Networks with Generative Models. arXiv 2020, arXiv:2003.01993. [Google Scholar] [CrossRef]
Rabhi, M.; Pietro, R.D. Adversarial Attacks Neutralization via Data Set Randomization. arXiv 2023, arXiv:2306.12161. [Google Scholar] [CrossRef]
Fang, H.; Chen, K.; Qiu, Y.; Ma, Z.; Zhang, W.; Chang, E.-C. DERO: Diffusion-Model-Erasure Robust Watermarking. In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia, 28 October–1 November 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 2973–2981. [Google Scholar]
Guo, Y.; Li, R.; Hui, M.; Guo, H.; Zhang, C.; Cai, C.; Wan, L.; Wang, S. FreqMark: Invisible Image Watermarking via Frequency Based Optimization in Latent Space. arXiv 2024, arXiv:2410.20824. [Google Scholar] [CrossRef]
Cox, I.J. Digital Watermarking and Steganography, 2nd ed.; The Morgan Kaufmann Series in Multimedia Information and Systems; Morgan Kaufmann Publishers: Amsterdam, The Netherlands; Boston, MA, USA, 2008; ISBN 978-0-12-372585-1. [Google Scholar]
Tuner, L.F. Digital Data Security System. Patent IPN WO 89/08915, 21 September 1989. [Google Scholar]
van Schyndel, R.G.; Tirkel, A.Z.; Osborne, C.F. A Digital Watermark. In Proceedings of the 1st International Conference on Image Processing, Austin, TX, USA, 13–16 November 1994; Volume 2, pp. 86–90. [Google Scholar]
Bamatraf, A.; Ibrahim, R.; Salleh, M. Digital Watermarking Algorithm Using LSB. In Proceedings of the 2010 International Conference on Computer Applications and Industrial Electronics, Kuala Lumpur, Malaysia, 5–8 December 2010; p. 159. [Google Scholar]
Bamatraf, A.; Ibrahim, R.; Salleh, M.N.M. A New Digital Watermarking Algorithm Using Combination of Least Significant Bit (LSB) and Inverse Bit. arXiv 2011, arXiv:1111.6727. [Google Scholar] [CrossRef]
Arya, R.K.; Saharan, R. Algorithm to Enhance the Robustness and Imperceptibility of LSB. In Proceedings of the 2015 Second International Conference on Advances in Computing and Communication Engineering, Dehradun, India, 1–2 May 2015; pp. 583–587. [Google Scholar]
Muyco, S.D.; Hernandez, A.A. Least Significant Bit Hash Algorithm for Digital Image Watermarking Authentication. In Proceedings of the 2019 5th International Conference on Computing and Artificial Intelligence; Association for Computing Machinery: New York, NY, USA, 2019; pp. 150–154. [Google Scholar]
Faheem, Z.B.; Ishaq, A.; Rustam, F.; de la Torre Díez, I.; Gavilanes, D.; Vergara, M.M.; Ashraf, I. Image Watermarking Using Least Significant Bit and Canny Edge Detection. Sensors 2023, 23, 1210. [Google Scholar] [CrossRef]
Yasin, H.M.; Sallow, A.B.; Mahmood, R.Z. High-Speed FPGA-Based Video Watermarking Using LSB Technique in the Spatial Domain. Int. J. Intell. Syst. Appl. Eng. 2024, 12, 644–653. [Google Scholar]
Kadhim, I.J.; Premaratne, P.; Alwan, Z.A. Invisible Video Watermarking Based on Frame Selection and LSB Embedding. Int. J. Intell. Eng. Syst. 2024, 17, 41–53. [Google Scholar] [CrossRef]
Gottimukkala, A.R.; Kumar, N.; Dash, J.K.; Swain, G. Image Watermarking Based on Remainder Value Differencing and Extended Hamming Code. J. Electron. Imaging 2023, 33, 011003. [Google Scholar] [CrossRef]
Venugopala, P.S.; Sarojadevi, H.; Chiplunkar, N.N.; Bhat, V. Video Watermarking by Adjusting the Pixel Values and Using Scene Change Detection. In Proceedings of the 2014 Fifth International Conference on Signal and Image Processing, Bangalore, India, 8–10 January 2014; pp. 259–264. [Google Scholar]
Yeo, I.-K.; Kim, H.J. Generalized Patchwork Algorithm for Image Watermarking. Multimed. Syst. 2003, 9, 261–265. [Google Scholar] [CrossRef]
Purna Kumari, B.; Subramanyam Rallabandi, V.P. Modified Patchwork-Based Watermarking Scheme for Satellite Imagery. Signal Process. 2008, 88, 891–904. [Google Scholar] [CrossRef]
Kang, H.; Yamaguchi, K.; Kurkoski, B.; Yamaguchi, K. Psychoacoustically-Adapted Patchwork Algorithm for Watermarking. In Proceedings of the Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007), Kaohsiung, Taiwan, 26–28 November 2007; pp. 267–270. [Google Scholar]
Ali, M. Robust Image Watermarking in Spatial Domain Utilizing Features Equivalent to SVD Transform. Appl. Sci. 2023, 13, 6105. [Google Scholar] [CrossRef]
Wang, T. Digital Image Watermarking Using Dual-Scrambling and Singular Value Decomposition. In Proceedings of the 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), Guangzhou, China, 21–24 July 2017; Volume 1, pp. 724–727. [Google Scholar]
Wang, H.; Hao, J.; Cui, F. Colour Image Watermarking Algorithm Based on the Arnold Transform. In Proceedings of the 2010 International Conference on Communications and Mobile Computing, Shenzhen, China, 12–14 April 2010; IEEE Computer Society: Washington, DC, USA, 2010; pp. 66–69. [Google Scholar]
Li, X.; Wang, X.; Yang, W.; Wang, X. A Robust Video Watermarking Scheme to Scalable Recompression and Transcoding. In Proceedings of the 2016 6th International Conference on Electronics Information and Emergency Communication (ICEIEC), Beijing, China, 17–19 June 2016; pp. 257–260. [Google Scholar]
Li, X.; Wang, X.; Chen, A.; Xiao, L. A Simplified and Robust DCT-Based Watermarking Algorithm. In Proceedings of the 2017 2nd International Conference on Multimedia and Image Processing (ICMIP), Wuhan, China, 17–19 March 2017; pp. 167–171. [Google Scholar]
Moosazadeh, M.; Ekbatanifard, G. An Improved Robust Image Watermarking Method Using DCT and YCoCg-R Color Space. Optik 2017, 140, 975–988. [Google Scholar] [CrossRef]
Badran, E.F.; Ghobashy, A.; El-Shennawy, K. DCT-Based Digital Image Watermarking Via Image Segmentation Techniques. In Proceedings of the 2006 ITI 4th International Conference on Information & Communications Technology, Cairo, Egypt, 10–12 December 2006; p. 1. [Google Scholar]
Alomoush, W.; Khashan, O.A.; Alrosan, A.; Attar, H.H.; Almomani, A.; Alhosban, F.; Makhadmeh, S.N. Digital Image Watermarking Using Discrete Cosine Transformation Based Linear Modulation. J. Cloud Comput. 2023, 12, 96. [Google Scholar] [CrossRef]
Li, H.; Guo, X. Embedding and Extracting Digital Watermark Based on DCT Algorithm. J. Comput. Commun. 2018, 6, 287–298. [Google Scholar] [CrossRef]
Xu, Z.J.; Wang, Z.Z.; Lu, Q. Research on Image Watermarking Algorithm Based on DCT. Procedia Environ. Sci. 2011, 10, 1129–1135. [Google Scholar] [CrossRef]
Senthilkumaran, N.; Abinaya, S. Digital Image Watermarking Using Dft Algorithm. Adv. Comput. Int. J. 2016, 7, 9–17. [Google Scholar] [CrossRef]
Cedillo-Hernandez, M.; Cedillo-Hernandez, A.; Garcia-Ugalde, F.J. Improving DFT-Based Image Watermarking Using Particle Swarm Optimization Algorithm. Mathematics 2021, 9, 1795. [Google Scholar] [CrossRef]
Jimson, N.; Hemachandran, K. DFT Based Digital Image Watermarking: A Survey. Int. J. Adv. Res. Comput. Sci. 2018, 9, 540–544. [Google Scholar] [CrossRef][Green Version]
Pun, C. A Novel DFT-Based Digital Watermarking System for Images. In Proceedings of the 2006 8th international Conference on Signal Processing, Guilin, China, 16–20 November 2006; Volume 2. [Google Scholar]
Malonia, M.; Agarwal, S.K. Digital Image Watermarking Using Discrete Wavelet Transform and Arithmetic Progression Technique. In Proceedings of the 2016 IEEE Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 5–6 March 2016; pp. 1–6. [Google Scholar]
Begum, M.; Shorif, S.B.; Uddin, M.S.; Ferdush, J.; Jan, T.; Barros, A.; Whaiduzzaman, M. Image Watermarking Using Discrete Wavelet Transform and Singular Value Decomposition for Enhanced Imperceptibility and Robustness. Algorithms 2024, 17, 32. [Google Scholar] [CrossRef]
Giri, K.J.; Quadri, S.M.K.; Bashir, R.; Bhat, J.I. DWT Based Color Image Watermarking: A Review. Multimed. Tools Appl. 2020, 79, 32881–32895. [Google Scholar] [CrossRef]
Ramos, A.M.; Artiles, J.A.P.; Chaves, D.P.B.; Pimentel, C. A Fragile Image Watermarking Scheme in DWT Domain Using Chaotic Sequences and Error-Correcting Codes. Entropy 2023, 25, 508. [Google Scholar] [CrossRef]
Al-Haj, A. Combined DWT-DCT Digital Image Watermarking. J. Comput. Sci. 2007, 3, 740–746. [Google Scholar] [CrossRef]
Fazli, S.; Moeini, M. A Robust Image Watermarking Method Based on DWT, DCT, and SVD Using a New Technique for Correction of Main Geometric Attacks. Optik 2016, 127, 964–972. [Google Scholar] [CrossRef]
Hamidi, M.; El Haziti, M.; Cherifi, H.; El Hassouni, M. A Hybrid Robust Image Watermarking Method Based on DWT-DCT and SIFT for Copyright Protection. J. Imaging 2021, 7, 218. [Google Scholar] [CrossRef]
Amirgholipour Kasmani, S.; Naghsh-Nilchi, A. Robust Digital Image Watermarking Based on Joint DWT-DCT. JDCTA 2009, 3, 42–54. [Google Scholar] [CrossRef][Green Version]
Radhi, H.Y.; Yousif, S.F.; Mohamed, W.Q.; Mohammed, A.H. Efficient Image Watermark Scheme Based on DCT-DFT and Singular Value Decomposition Algorithm with Ikeda/Arnold Maps. J. Wuhan Univ. Technol. Mater. Sci. Ed. 2023, 47, 885–898. [Google Scholar]
Li, L.; Zhang, H.-J.; Fan, H.-Y.; Lu, Z.-M. A DFT and IWT-DCT Based Image Watermarking Scheme for Industry. IEICE Trans. Inf. Syst. 2023, 106, 1916–1921. [Google Scholar] [CrossRef]
Varghese, J.; Bin Hussain, O.; Subash, S.; Razak, A. An Effective Digital Image Watermarking Scheme Incorporating DCT, DFT and SVD Transformations. PeerJ Comput. Sci. 2023, 9, e1427. [Google Scholar] [CrossRef]
Ansari, R.; Devanalamath, M.M.; Manikantan, K.; Ramachandran, S. Robust Digital Image Watermarking Algorithm in DWT-DFT-SVD Domain for Color Images. In Proceedings of the 2012 International Conference on Communication, Information & Computing Technology (ICCICT), Mumbai, India, 19–20 October 2012; IEEE Conference Publication; IEEE Xplore. Available online: https://ieeexplore.ieee.org/document/6398160 (accessed on 15 October 2024).
Chen, A.; Wang, X. An Image Watermarking Scheme Based on DWT and DFT. In Proceedings of the 2017 2nd International Conference on Multimedia and Image Processing (ICMIP), Wuhan, China, 17–19 March 2017; pp. 177–180. [Google Scholar]
Fullea, E.; Mart”nez, J.M. Robust Digital Image Watermarking Using DWT, DFT and Quality Based Average. In Proceedings of the Ninth ACM International Conference on Multimedia, Ottawa, ON, Canada, 30 September–5 October 2001; pp. 489–491. [Google Scholar]
Varghese, J.; Hussain, O.B.; Razak, T.A.; Subash, S. A Hybrid Digital Image Watermarking Scheme Incorporating DWT, DFT, DCT and SVD Transformations. J. Eng. Res. 2022, 10, 113–130. [Google Scholar] [CrossRef]
Li, J.; Sui, A. A Digital Video Watermarking Algorithm Based on DCT Domain. In Proceedings of the 2012 Fifth International Joint Conference on Computational Sciences and Optimization, Harbin, China, 23–26 June 2012; pp. 557–560. [Google Scholar]
Salih, M.; Ahmed, E.; Abdulkareem Al-Abaji, M. Digital Video Watermarking Methods Using DCT and SVD. J. Educ. Sci. 2019, 29, 266–278. [Google Scholar] [CrossRef]
Al-Gindy, A.; Omar, A.A.-C.; Mashal, O.; Shaker, Y.; Alhogaraty, E.; Moussa, S. A New Watermarking Scheme for Digital Videos Using DCT. Open Comput. Sci. 2022, 12, 248–259. [Google Scholar] [CrossRef]
Yang, C.-H.; Huang, H.-Y.; Hsu, W.-H. An Adaptive Video Watermarking Technique Based on DCT Domain. In Proceedings of the 2008 8th IEEE International Conference on Computer and Information Technology, Sydney, NSW, Australia, 8–11 July 2008; pp. 589–594. [Google Scholar]
Hu, J.C.; Huang, H.Y.; Liang, A.H. Video Watermarking Algorithm Based on DCT and SVD. Appl. Mech. Mater. 2013, 321–324, 2688–2692. [Google Scholar] [CrossRef]
Rajab, L.; Al-Khatib, T.; Al-Haj, A. A Blind DWT-SCHUR Based Digital Video Watermarking Technique. J. Softw. Eng. Appl. 2015, 8, 224–233. Available online: https://www.scirp.org/journal/paperinformation?paperid=55955 (accessed on 15 October 2024). [CrossRef]
Sen, C.; Kashyap, T. Digital Video Watermarking Using DWT for Data Security. IJARCCE 2015, 307–309. [Google Scholar] [CrossRef]
Chan, P.-W.; Lyu, M.R. A DWT-Based Digital Video Watermarking Scheme with Error Correcting Code. In Proceedings of the Information and Communications Security; Qing, S., Gollmann, D., Zhou, J., Eds.; Springer: Huhehaote, China, 2003; pp. 202–213. [Google Scholar]
Patil, S.A. Digital Video Watermarking Using Dwt and Pca. IOSR J. Eng. 2013, 3, 45–49. [Google Scholar] [CrossRef]
Liu, Y.; Zhao, J. A New Video Watermarking Algorithm Based on 1D DFT and Radon Transform. Signal Process. 2010, 90, 626–639. [Google Scholar] [CrossRef]
Sun, X.-C.; Lu, Z.-M.; Wang, Z.; Liu, Y.-L. A Geometrically Robust Multi-Bit Video Watermarking Algorithm Based on 2-D DFT. Multimed. Tools Appl. 2021, 80, 13491–13511. [Google Scholar] [CrossRef]
Yang, X.; Zhang, Z.; Jiao, Y.; Li, Z. A Robust Video Watermarking Algorithm Based on Two-Dimensional Discrete Fourier Transform. Electronics 2023, 12, 3271. [Google Scholar] [CrossRef]
Sang, J.; Liu, Q.; Song, C.-L. Robust Video Watermarking Using a Hybrid DCT-DWT Approach. J. Electron. Sci. Technol. 2020, 18, 100052. [Google Scholar] [CrossRef]
Mawande, S.; Dakhore, H. Video Watermarking Using DWT-DCT-SVD Algorithms. In Proceedings of the 2017 International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 18–19 July 2017; pp. 1161–1164. [Google Scholar]
Agarwal, A.; Bhadana, R.; Chavan, S. A Robust Video Watermarking Scheme Using DWT and DCT. Int. J. Comput. Sci. Inf. Technol. IJCSIT 2011, 2, 1711–1716. [Google Scholar]
Palaiyappan, C.; Raja Jeya Sekhar, T. A Block Based Novel Digital Video Watermarking Scheme Using Dct. IOSR J. Electron. Commun. Eng. 2013, 5, 34–44. [Google Scholar] [CrossRef]
Zuo, T.; Duan, Y.; Du, Q.; Tao, X. Semantic Security: A Digital Watermark Method for Image Semantic Preservation. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 4645–4649. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning, 196th ed.; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Dean, J.; Corrado, G.; Monga, R.; Chen, K.; Devin, M.; Le, Q.V.; Mao, M.; Ranzato, M.; Senior, A.; Tucker, P.; et al. Large Scale Distributed Deep Networks. In Proceedings of the Advances in Neural Information Processing Systems, Red Hook, NY, USA, 3–6 December 2012; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Red Hook, NY, USA, 3–6 December 2012; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25. [Google Scholar]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
Baldi, P. Autoencoders, Unsupervised Learning, and Deep Architectures. In Proceedings of the ICML Workshop on Unsupervised and Transfer Learning, Bellevue, WA, USA, 2 July 2011; JMLR Workshop and Conference Proceedings. pp. 37–49. [Google Scholar]
Ayinde, B.O.; Zurada, J.M. Deep Learning of Constrained Autoencoders for Enhanced Understanding of Data. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 3969–3979. [Google Scholar] [CrossRef] [PubMed]
Hosseini-Asl, E.; Zurada, J.M.; Nasraoui, O. Deep Learning of Part-Based Representation of Data Using Sparse Autoencoders with Nonnegativity Constraints. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 2486–2498. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 6840–6851. [Google Scholar]
Nichol, A.Q.; Dhariwal, P. Improved Denoising Diffusion Probabilistic Models. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 8162–8171. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Ben Jabra, S.; Ben Farah, M. Deep Learning-Based Watermarking Techniques Challenges: A Review of Current and Future Trends. Circuits Syst. Signal Process. 2024, 43, 4339–4368. [Google Scholar] [CrossRef]
Wang, Z.; Byrnes, O.; Wang, H.; Sun, R.; Ma, C.; Chen, H.; Wu, Q.; Xue, M. Data Hiding with Deep Learning: A Survey Unifying Digital Watermarking and Steganography. IEEE Trans. Comput. Soc. Syst. 2023, 10, 2985–2999. [Google Scholar] [CrossRef]
Zhong, X.; Das, A.; Alrasheedi, F.; Tanvir, A. A Brief, In-Depth Survey of Deep Learning-Based Image Watermarking. Appl. Sci. 2023, 13, 11852. [Google Scholar] [CrossRef]
Kandi, H.; Mishra, D.; Gorthi, S.R.K.S. Exploring the Learning Capabilities of Convolutional Neural Networks for Robust Image Watermarking. Comput. Secur. 2017, 65, 247–268. [Google Scholar] [CrossRef]
Mun, S.-M.; Nam, S.-H.; Jang, H.-U.; Kim, D.; Lee, H.-K. A Robust Blind Watermarking Using Convolutional Neural Network. Neurocomputing 2019, 337, 191–202. [Google Scholar] [CrossRef]
Das, A.; Zhong, X. A Deep Learning-Based Audio-in-Image Watermarking Scheme. In Proceedings of the 2021 International Conference on Visual Communications and Image Processing (VCIP), Munich, Germany, 5–8 December 2021; pp. 1–5. [Google Scholar]
Zhong, X.; Huang, P.-C.; Mastorakis, S.; Shih, F.Y. An Automated and Robust Image Watermarking Scheme Based on Deep Neural Networks. arXiv 2020, arXiv:2007.02460. [Google Scholar] [CrossRef]
Tavakoli, A.; Honjani, Z.; Sajedi, H. Convolutional Neural Network-Based Image Watermarking Using Discrete Wavelet Transform. Int. J. Inf. Technol. 2023, 15, 2021–2029. [Google Scholar] [CrossRef]
Ouyang, C.; Wei, Z. Deep Neural Network-Based Image Watermarking in Wavelet Transform Domain. In Proceedings of the 2023 International Conference on Artificial Intelligence and Automation Control (AIAC), Xiamen, China, 17–19 November 2023; pp. 164–167. [Google Scholar]
Lu, J.; Ni, J.; Su, W.; Xie, H. Wavelet-Based CNN for Robust and High-Capacity Image Watermarking. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan, 18–22 July 2022; pp. 1–6. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015. [Google Scholar]
Tancik, M.; Mildenhall, B.; Ng, R. StegaStamp: Invisible Hyperlinks in Physical Photographs. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2114–2123. [Google Scholar]
Singh, H.K.; Singh, A.K. Digital Image Watermarking Using Deep Learning. Multimed. Tools Appl. 2024, 83, 2979–2994. [Google Scholar] [CrossRef]
Dasgupta, A.; Zhong, X. Robust Image Watermarking Based on Cross-Attention and Invariant Domain Learning. In Proceedings of the 2023 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 13–15 December 2023. [Google Scholar]
Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A Unified Embedding for Face Recognition and Clustering. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
Zhu, J.; Kaplan, R.; Johnson, J.; Fei-Fei, L. HiDDeN: Hiding Data With Deep Networks. arXiv 2018, arXiv:1807.09937. [Google Scholar]
Wen, B.; Aydore, S. ROMark: A Robust Watermarking System Using Adversarial Training. arXiv 2019, arXiv:1910.01221. [Google Scholar] [CrossRef]
Hamamoto, I.; Kawamura, M. Neural Watermarking Method Including an Attack Simulator against Rotation and Compression Attacks. IEICE Trans. Inf. Syst. 2020, 103, 33–41. [Google Scholar] [CrossRef]
Zhang, L.; Li, W.; Ye, H. A Blind Watermarking System Based on Deep Learning Model. In Proceedings of the 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Shenyang, China, 20–22 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1208–1213. [Google Scholar]
Zhang, H.; Wang, H.; Cao, Y.; Shen, C.; Li, Y. Robust Data Hiding Using Inverse Gradient Attention. arXiv 2022, arXiv:2011.10850. [Google Scholar] [CrossRef]
Hao, K.; Feng, G.; Zhang, X. Robust Image Watermarking Based on Generative Adversarial Network. China Commun. 2020, 17, 131–140. [Google Scholar] [CrossRef]
Huang, J.; Luo, T.; Li, L.; Yang, G.; Xu, H.; Chang, C.-C. ARWGAN: Attention-Guided Robust Image Watermarking Model Based on GAN. IEEE Trans. Instrum. Meas. 2023, 72, 5018417. [Google Scholar] [CrossRef]
Shedole, S.M.; Santhi, V. Hybrid Deep Learning Based Digital Image Watermarking Using GAN-LSTM and Adaptive Gannet Optimization Techniques. Multimed. Tools Appl. 2024, 84, 20661–20691. [Google Scholar] [CrossRef]
Karki, B.; Tsai, C.-H.; Huang, P.-C.; Zhong, X. Deep Learning-Based Text-in-Image Watermarking. In Proceedings of the 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA, 7–9 August 2024. [Google Scholar]
Aberna, P.; Agilandeeswari, L.; Aashich, B. Vision Transformer-Based Watermark Generation for Authentication and Tamper Detection Using Schur Decomposition and Hybrid Transforms. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 2023, 15, 107–121. [Google Scholar]
Liu, Z.; Li, Z.; Zheng, L.; Li, D. Two-Stage Robust Lossless DWI Watermarking Based on Transformer Networks in the Wavelet Domain. Appl. Sci. 2023, 13, 6886. [Google Scholar] [CrossRef]
Luo, T.; Wu, J.; He, Z.; Xu, H.; Jiang, G.; Chang, C.-C. WFormer: A Transformer-Based Soft Fusion Model for Robust Image Watermarking. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 4179–4196. [Google Scholar] [CrossRef]
Wang, B.; Song, Z.; Wu, Y. Robust Blind Watermarking Framework for Hybrid Networks Combining CNN and Transformer. In Proceedings of the 15th Asian Conference on Machine Learning, Istanbul, Turkey, 27 February 2024; pp. 1417–1432. [Google Scholar]
Chen, W.; Li, Y. RoWSFormer: A Robust Watermarking Framework with Swin Transformer for Enhanced Geometric Attack Resilience. arXiv 2024, arXiv:2409.14829. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar] [CrossRef]
Cui, Y.; Ren, J.; Xu, H.; He, P.; Liu, H.; Sun, L.; Xing, Y.; Tang, J. DiffusionShield: A Watermark for Copyright Protection against Generative Diffusion Models. arXiv 2024, arXiv:2306.04642. [Google Scholar] [CrossRef]
Wen, Y.; Kirchenbauer, J.; Geiping, J.; Goldstein, T. Tree-Ring Watermarks: Fingerprints for Diffusion Images That Are Invisible and Robust. arXiv 2023, arXiv:2305.20030. [Google Scholar] [CrossRef]
Sha, Z.; Li, Z.; Yu, N.; Zhang, Y. DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models. arXiv 2023, arXiv:2210.06998. [Google Scholar]
Zhang, L.; Liu, X.; Martin, A.V.; Bearfield, C.X.; Brun, Y.; Guan, H. Attack-Resilient Image Watermarking Using Stable Diffusion. arXiv 2024, arXiv:2401.04247. [Google Scholar]
Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
Tan, Y.; Peng, Y.; Fang, H.; Chen, B.; Xia, S.-T. WaterDiff: Perceptual Image Watermarks Via Diffusion Model. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seul, Republic of Korea, 14–19 April 2024; pp. 3250–3254. [Google Scholar]
Hu, R.; Zhang, J.; Li, Y.; Li, J.; Guo, Q.; Qiu, H.; Zhang, T. SuperMark: Robust and Training-Free Image Watermarking via Diffusion-Based Super-Resolution. arXiv 2024, arXiv:2412.10049. [Google Scholar]
Bui, T.; Agarwal, S.; Collomosse, J. TrustMark: Universal Watermarking for Arbitrary Resolution Images. arXiv 2023, arXiv:2311.18297. [Google Scholar] [CrossRef]
Wang, Y.; Zhu, X.; Ye, G.; Zhang, S.; Wei, X. Achieving Resolution-Agnostic DNN-Based Image Watermarking: A Novel Perspective of Implicit Neural Representation. In Proceedings of the 32nd ACM International Conference on Multimedia, New York, NY, USA, 28 October–1 November 2024; pp. 10354–10362. [Google Scholar]
Kutter, M.; Petitcolas, F.A.P. Fair Benchmark for Image Watermarking Systems. In Proceedings of the Security and Watermarking of Multimedia Contents; SPIE: Bellingham, WA, USA, 1999; Volume 3657, pp. 226–239. [Google Scholar]
Duszejko, P.; Piotrowski, Z. 4KSecure: A Universal Method for Active Manipulation Detection in Images of Any Resolution. Appl. Sci. 2025, 15, 4469. [Google Scholar] [CrossRef]
Numajiri, H.; Hayashi, T. Analysis on Open Data as a Foundation for Data-Driven Research. Scientometrics 2024, 129, 6315–6332. [Google Scholar] [CrossRef]
Hadhoud, M.M.; Abd El-Samie, F.; El-Khamy, S.E. New Trends in High Resolution Image Processing. In Proceedings of the Fourth Workshop on Photonics and Its Application, Giza, Egypt, 4 May 2004; pp. 2–23. [Google Scholar]
Satone, K.N.; Deshmukh, A.S.; Ulhe, P.B. A Review of Image Compression Techniques. In Proceedings of the 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 20–22 April 2017; Volume 1, pp. 97–101. [Google Scholar]
Ramesh, A.; Pavlov, M.; Goh, G.; Gray, S.; Voss, C.; Radford, A.; Chen, M.; Sutskever, I. Zero-Shot Text-to-Image Generation. arXiv 2021, arXiv:2102.12092. [Google Scholar]
Dathathri, S.; See, A.; Ghaisas, S.; Huang, P.-S.; McAdam, R.; Welbl, J.; Bachani, V.; Kaskasoli, A.; Stanforth, R.; Matejovicova, T.; et al. Scalable Watermarking for Identifying Large Language Model Outputs. Nature 2024, 634, 818–823. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Bas, P.; Filler, T.; Pevný, T. “Break Our Steganographic System”: The Ins and Outs of Organizing BOSS. In Proceedings of the 13th International Conference on Information Hiding, Prague, Czech Republic, 18–20 May 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 59–70. [Google Scholar]
Agustsson, E.; Timofte, R. NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1122–1131. [Google Scholar]
Liu, J.; Liu, D.; Yang, W.; Xia, S.; Zhang, X.; Dai, Y. A Comprehensive Benchmark for Single Image Compression Artifact Reduction. IEEE Trans. Image Process. 2020, 29, 7845–7860. [Google Scholar] [CrossRef]
Zhang, K.; Li, D.; Luo, W.; Ren, W.; Stenger, B.; Liu, W.; Li, H.; Yang, M.-H. Benchmarking Ultra-High-Definition Image Super-Resolution. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; IEEE: Piscataway, NJ, USA; pp. 14749–14758. [Google Scholar]
Schuhmann, C.; Beaumont, R.; Vencu, R.; Gordon, C.; Wightman, R.; Cherti, M.; Coombes, T.; Katta, A.; Mullis, C.; Wortsman, M.; et al. LAION-5B: An Open Large-Scale Dataset for Training next Generation Image-Text Models. arXiv 2022, arXiv:2210.08402. [Google Scholar]
Bird, J.J.; Lotfi, A. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access 2024, 12, 15642–15650. [Google Scholar] [CrossRef]
Rahman, M.A.; Paul, B.; Sarker, N.H.; Hakim, Z.I.A.; Fattah, S.A. Artifact: A Large-Scale Dataset with Artificial and Factual Images for Generalizable and Robust Synthetic Image Detection. In Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, Malaysia, 8–11 October 2023; pp. 2200–2204. [Google Scholar]
Boychev, D.; Cholakov, R. ImagiNet: A Multi-Content Benchmark for Synthetic Image Detection. arXiv 2024, arXiv:2407.20020. [Google Scholar]
Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; Summers, R.M. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3462–3471. [Google Scholar]
Helber, P.; Bischke, B.; Dengel, A.; Borth, D. EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2217–2226. [Google Scholar] [CrossRef]
Knoche, M.; Hörmann, S.; Rigoll, G. Cross-Quality LFW: A Database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments. In Proceedings of the 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India, 15–18 December 2021; pp. 1–5. [Google Scholar]
Sood, A.; Mishra, D.; Surya, V.; Singh, H.; Sundaresan, R.; Pal, D.; Dharmaraju, R.; Satish, R.; Mishra, S.; Chavan, N.A.; et al. Challenges and Recommendations for Enhancing Digital Data Protection in Indian Medical Research and Healthcare Sector. Npj Digit. Med. 2025, 8, 48. [Google Scholar] [CrossRef] [PubMed]
Barni, M.; Bartolini, F. Watermarking Systems Engineering: Enabling Digital Assets Security and Other Applications (Signal Processing and Communications); CRC: Boca Raton, FL, USA, 2004; ISBN 978-0-8247-4806-7. [Google Scholar]
Pibre, L.; Jérôme, P.; Ienco, D.; Chaumont, M. Deep Learning Is a Good Steganalysis Tool When Embedding Key Is Reused for Different Images, Even If There Is a Cover Source-Mismatch. arXiv 2015, arXiv:1511.04855. [Google Scholar] [CrossRef]
Yang, L.; Zhang, Z.; Song, Y.; Hong, S.; Xu, R.; Zhao, Y.; Zhang, W.; Cui, B.; Yang, M.-H. Diffusion Models: A Comprehensive Survey of Methods and Applications. ACM Comput. Surv. 2023, 56, 1–39. [Google Scholar] [CrossRef]
Fuest, M.; Ma, P.; Gui, M.; Schusterbauer, J.; Hu, V.T.; Ommer, B. Diffusion Models and Representation Learning: A Survey. arXiv 2024, arXiv:2407.00783. [Google Scholar] [CrossRef]
Krichen, M.; Abdalzaher, M.S. Performance Enhancement of Artificial Intelligence: A Survey. J. Netw. Comput. Appl. 2024, 232, 104034. [Google Scholar] [CrossRef]
Zhou, Y.; Chen, S.; Wang, Y.; Huan, W. Review of Research on Lightweight Convolutional Neural Networks. In Proceedings of the 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 12–14 June 2020; pp. 1713–1720. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-Level Accuracy with 50× Fewer Parameters and <0.5 MB Model Size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
Cheng, H.; Zhang, M.; Shi, J.Q. A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommendations. arXiv 2024, arXiv:2308.06767. [Google Scholar] [CrossRef] [PubMed]
Ma, X.; Lin, S.; Ye, S.; He, Z.; Zhang, L.; Yuan, G.; Tan, S.H.; Li, Z.; Fan, D.; Qian, X.; et al. Non-Structured DNN Weight Pruning—Is It Beneficial in Any Platform? IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 4930–4944. [Google Scholar] [CrossRef]
Nagel, M.; Fournarakis, M.; Amjad, R.A.; Bondarenko, Y.; van Baalen, M.; Blankevoort, T. A White Paper on Neural Network Quantization. arXiv 2021, arXiv:2106.08295. [Google Scholar] [CrossRef]
Chen, Z.; Tan, X.; Wang, K.; Pan, S.; Mandic, D.; He, L.; Zhao, S. InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training. arXiv 2022, arXiv:2202.03751. [Google Scholar] [CrossRef]
Shih, A.; Belkhale, S.; Ermon, S.; Sadigh, D.; Anari, N. Parallel Sampling of Diffusion Models. arXiv 2023, arXiv:2305.16317. [Google Scholar] [CrossRef]
Mahon, S.; Varrette, S.; Plugaru, V.; Pinel, F.; Bouvry, P. Performance Analysis of Distributed and Scalable Deep Learning. In Proceedings of the 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, VIC, Australia, 11–14 May 2020; pp. 760–766. [Google Scholar]
Pham, M.; Li, H.; Yuan, Y.; Mou, C.; Ramachandran, K.; Xu, Z.; Tu, Y. Dynamic Memory Management in Massively Parallel Systems: A Case on GPUs. In Proceedings of the 36th ACM International Conference on Supercomputing; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1–13. [Google Scholar]
Sakthidevi, I.; Rajkumar, G.V.; Sunitha, R.; Sangeetha, A.; Krishnan, R.S.; Sundararajan, S. Machine Learning Orchestration in Cloud Environments: Automating the Training and Deployment of Distributed Machine Learning AI Model. In Proceedings of the 2023 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Kirtipur, Nepal, 11–13 October 2023; pp. 376–384. [Google Scholar]
Jangda, A.; Pinckney, D.; Brun, Y.; Guha, A. Formal Foundations of Serverless Computing. Proc. ACM Program. Lang. 2019, 3, 149. [Google Scholar] [CrossRef]
Liang, S.; Jin, S.; Chen, Y. A Review of Edge Computing Technology and Its Applications in Power Systems. Energies 2024, 17, 3230. [Google Scholar] [CrossRef]
Andriulo, F.C.; Fiore, M.; Mongiello, M.; Traversa, E.; Zizzo, V. Edge Computing and Cloud Computing for Internet of Things: A Review. Informatics 2024, 11, 71. [Google Scholar] [CrossRef]
Zeyu, H.; Geming, X.; Zhaohang, W.; Sen, Y. Survey on Edge Computing Security. In Proceedings of the 2020 International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), Virtual Event, 28–30 June 2020; pp. 96–105. [Google Scholar]
Yan, Z.; Yi, Y.Z.; JiLin, Z.; NaiLiang, Z.; YongJian, R.; Jian, W.; Jun, Y. Federated Learning Model Training Method Based on Data Features Perception Aggregation. In Proceedings of the 2021 IEEE 94th Vehicular Technology Conference (VTC2021-Fall), Norman, OK, USA, 27–30 September 2021; pp. 1–7. [Google Scholar]
Sáinz-Pardo Díaz, J.; López García, Á. Study of the Performance and Scalability of Federated Learning for Medical Imaging with Intermittent Clients. Neurocomputing 2023, 518, 142–154. [Google Scholar] [CrossRef]
de Goede, M.; Cox, B.; Decouchant, J. Training Diffusion Models with Federated Learning. arXiv 2024, arXiv:2406.12575. [Google Scholar] [CrossRef]
Qi, Y.; Feng, Y.; Wang, X.; Li, H.; Tian, J. Leveraging Federated Learning and Edge Computing for Recommendation Systems within Cloud Computing Networks. arXiv 2024, arXiv:2403.03165. [Google Scholar] [CrossRef]
Oubadriss, A.; Laassiri, J.; Makrani, A.E. An Overview Comparison between Convolutional Neural Networks and Vision Transformers. In Proceedings of the 7th International Conference on Networking, Intelligent Systems and Security, Ado Ekiti, Nigeria, 26–28 November 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1–9. [Google Scholar]
Chen, M.; Mei, S.; Fan, J.; Wang, M. Opportunities and Challenges of Diffusion Models for Generative AI. Natl. Sci. Rev. 2024, 11, nwae348. [Google Scholar] [CrossRef] [PubMed]
Jamil, S.; Jalil Piran, M.; Kwon, O.-J. A Comprehensive Survey of Transformers for Computer Vision. Drones 2023, 7, 287. [Google Scholar] [CrossRef]
von Krause, M.; Radev, S.T.; Voss, A.; Quintus, M.; Egloff, B.; Wrzus, C. Stability and Change in Diffusion Model Parameters over Two Years. J. Intell. 2021, 9, 26. [Google Scholar] [CrossRef]
Ravi, A.; Chaturvedi, V.; Shafique, M. ViT4Mal: Lightweight Vision Transformer for Malware Detection on Edge Devices. ACM Trans. Embed. Comput. Syst. 2023, 22, 117. [Google Scholar] [CrossRef]
Nag, S.; Liberty, L.; Sivakumar, A.; Yadwadkar, N.J.; John, L.K. Lightweight Vision Transformers for Low Energy Edge Inference. In Proceedings of the Machine Learning for Computer Architecture and Systems 2024, Buenos Aires, Argentina, 30 June 2024. [Google Scholar]
Setyawan, N.; Sun, C.-C.; Hsu, M.-H.; Kuo, W.-K.; Hsieh, J.-W. MicroViT: A Vision Transformer with Low Complexity Self Attention for Edge Device. arXiv 2025, arXiv:2502.05800. [Google Scholar]
Tang, H.; Chen, Y.; Wang, T.; Zhou, Y.; Zhao, L.; Gao, Q.; Du, M.; Tan, T.; Zhang, X.; Tong, T. HTC-Net: A Hybrid CNN-Transformer Framework for Medical Image Segmentation. Biomed. Signal Process. Control 2024, 88, 105605. [Google Scholar] [CrossRef]
Liu, X.; Liu, J.; Bai, Y.; Gu, J.; Chen, T.; Jia, X.; Cao, X. Watermark Vaccine: Adversarial Attacks to Prevent Watermark Removal. arXiv 2022, arXiv:2207.08178. [Google Scholar] [CrossRef]
Bistroń, M.; Piotrowski, Z. Efficient Video Watermarking Algorithm Based on Convolutional Neural Networks with Entropy-Based Information Mapper. Entropy 2023, 25, 284. [Google Scholar] [CrossRef] [PubMed]
Xu, T.; Mi, P.; Wang, R.; Chen, Y. Towards Faster Training of Diffusion Models: An Inspiration of A Consistency Phenomenon. arXiv 2024. [Google Scholar] [CrossRef]
Li, C.; Zhang, C.; Xu, W.; Xie, J.; Feng, W.; Peng, B.; Xing, W. LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync. arXiv 2024, arXiv:2412.09262. [Google Scholar] [CrossRef]
Fei, Z.; Fan, M.; Yu, C.; Li, D.; Zhang, Y.; Huang, J. Dimba: Transformer-Mamba Diffusion Models. arXiv 2024, arXiv:2406.01159. [Google Scholar] [CrossRef]
Liu, W.; Zhang, S.Q. HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization. arXiv 2024, arXiv:2405.19751. [Google Scholar]
De Falco, F.; Ceschini, A.; Sebastianelli, A.; Le Saux, B.; Panella, M. Quantum Hybrid Diffusion Models for Image Synthesis. KI–Künstl. Intell. 2024, 38, 311–326. [Google Scholar] [CrossRef]
Uehira, K.; Suzuki, K.; Ikeda, H. Applications of Optoelectronic Watermarking Technique to New Business and Industry Systems Utilizing Flat-Panel Displays and Smart Devices. In Proceedings of the 2014 IEEE Industry Application Society Annual Meeting, Vancouver, BC, Canada, 5–9 October 2014; pp. 1–9. [Google Scholar]
Juefei-Xu, F.; Wang, R.; Huang, Y.; Guo, Q.; Ma, L.; Liu, Y. Countering Malicious DeepFakes: Survey, Battleground, and Horizon. Int. J. Comput. Vis. 2022, 130, 1678–1734. [Google Scholar] [CrossRef]
Hameleers, M.; van der Meer, T.G.L.A.; Dobber, T. Distorting the Truth versus Blatant Lies: The Effects of Different Degrees of Deception in Domestic and Foreign Political Deepfakes. Comput. Hum. Behav. 2024, 152, 108096. [Google Scholar] [CrossRef]
Zhao, X.; Gunn, S.; Christ, M.; Fairoze, J.; Fabrega, A.; Carlini, N.; Garg, S.; Hong, S.; Nasr, M.; Tramer, F.; et al. SoK: Watermarking for AI-Generated Content. arXiv 2024, arXiv:2411.18479. [Google Scholar]
SynthID. Available online: https://deepmind.google/models/synthid/ (accessed on 20 November 2025).
Agbaje, M.; Awodele, O.; Ogbonna, C. Applications of Digital Watermarking to Cyber Security (Cyber Watermarking). In Proceedings of the Informing Science & IT Education Conference (InSITE), Tampa, FL, USA, 29 June–5 July 2015; pp. 1–11. [Google Scholar]
Dinesh Arokia Raj, A.; Jha, R.R.; Yadav, M.; Sam, D.; Jayanthi, K. Role of Blockchain and Watermarking Toward Cybersecurity. In Multimedia Watermarking: Latest Developments and Trends; Kumar Sahu, A., Ed.; Springer Nature: Singapore, 2024; pp. 103–123. ISBN 978-981-99-9803-6. [Google Scholar]
Blake, S. Embedded Blockchains: A Synthesis of Blockchains, Spread Spectrum Watermarking, Perceptual Hashing & Digital Signatures. arXiv 2020, arXiv:2009.00951. [Google Scholar]
Singh, B.; Sharma, M.K. Efficient Watermarking Technique for Protection and Authentication of Document Images. Multimed. Tools Appl. 2022, 81, 22985–23005. [Google Scholar] [CrossRef]
Kapre, B.S.; Rajurkar, A.M.; Guru, D.S. The Blind Robust Video Watermarking Scheme in Video Surveillance Context. Multimed. Tools Appl. 2024, 83, 38999–39025. [Google Scholar] [CrossRef]
Wazirali, R.; Ahmad, R.; Al-Amayreh, A.; Al-Madi, M.; Khalifeh, A. Secure Watermarking Schemes and Their Approaches in the IoT Technology: An Overview. Electronics 2021, 10, 1744. [Google Scholar] [CrossRef]

Figure 1. Classification of data hiding techniques based on [8].

Figure 2. The watermark embedding and extraction scheme, with a secret key controlling the embedding and detection processes.

Figure 3. Representative applications of digital watermarking.

Figure 4. Synthesized taxonomy of digital watermarking methods based on prior literature.

Figure 5. Taxonomy of traditional watermarking algorithms for images and video frames. Spatial-domain methods include LSB [56,57,58,59,60,61,62,63], PVD [64,65], PA [66,67,68], SVD [69,70], and Arnold’s transform [71,72]. Frequency-domain methods include DCT [73,74,75,76,77,78,98,99,100,101,102], DFT [79,80,81,82,107,108,109], and DWT [83,84,85,86,103,104,105,106]. Hybrid approaches combining DFT–DWT–DCT [97,113], DFT–DWT [94,95,96], DFT–DCT [91,92,93], and DCT–DWT [87,88,89,90,91,92,110,111,112] are also shown. Semantic watermarking methods are presented in [114].

Figure 6. Convolutional Neural Network schema.

Figure 7. Autoencoder schema.

Figure 8. Generative Adversarial Networks schema.

Figure 9. Diffusion model schema.

Figure 10. Transformer schema.

Figure 11. Timeline of development of deep learning-based image watermarking methods.

Table 1. Types of attacks on watermarking systems.

Group of Attacks	Type of Attacks	Description	Examples of Operations	Effect on the Watermark
Untargeted attacks	-	Attacks resulting from routine processing of the media, with no intention of removing the watermark, but which may affect its integrity.	Lossy compression (JPEG, MPEG), scaling, filtering.	Partial loss or distortion of the watermark.
Targeted attacks	General	Intentional manipulation of the media to remove, distort, or weaken the watermark.	Rotate, scale, change resolution.	Total or partial loss of the watermark.
Targeted attacks	Statistical attacks	Modifications using statistical analysis of the media to identify and remove the watermark.	Histogram attack, frequency distribution analysis, autocorrelation attack.	Removal of the watermark without significant changes in the perception of the media.
Targeted attacks	Sensitivity attacks	Minimal modifications to the media that do not affect the visual quality, but destroy the watermark.	Bit depth reduction, delicate pixel changes.	Distortion or complete loss of the watermark without visible changes in the media.
Targeted attacks	Destructive compression	Aggressive compression to remove the watermark by extremely reducing the media data.	High-loss JPEG compression.	Significant loss of data, complete destruction of the watermark, degradation of media quality.
Targeted attacks	Geometry attacks	Manipulations in the spatial structure of the media that distort the position of the watermark.	Rotation, translation, change of proportions.	Disturbance of the watermark position, loss of synchronization.
Deep learning-based attacks	Generative attacks	Use of generative models to regenerate media content and remove embedded watermark.	Image inpainting, deepfake generation, AI-based restoration.	Complete removal of the watermark without perceptual changes.
Deep learning-based attacks	Adversarial attacks	Modifications generated by neural networks to fool detection systems and weaken watermark extraction.	Adversarial noise, gradient-based attacks (FGSM, PGD).	Degradation or undetectability of the watermark.
Deep learning-based attacks	Neural network removal	Use of DL models trained to detect and remove watermarks.	CNN-based watermark removal, encoder–decoder architectures.	High probability of the watermark elimination with minimal distortion.
Deep learning-based attacks	Content replacement	Media content is regenerated using deep learning models to overwrite or bypass the watermark layer.	GAN-based texture replacement, style transfer techniques.	Loss or severe weakening of the watermark.
Deep learning-based attacks	Latent-space attacks	Attacks that exploit generative models (diffusion or VAE-based) to regenerate content in the latent space, effectively removing embedded watermarks.	Stable Diffusion regeneration, DERO [52], VAE sampling attacks [53].	Total removal of the watermark, especially in latent-domain watermarking schemes.

Table 2. Summary of watermarking algorithms based on deep learning.

Year	Reference	Architecture and Technology	Watermark Capacity	Host Image Resolution	Watermark Robustness
2017	[133]	CNN as autoencoder	64 × 64 pixels	128 × 128 pixels	Noise, cropping, JPEG compression, rotation
2017	[134]	CNN with residual blocks	4096 bits	512 × 512 pixels	Affine transform, cropping, JPEG compression, filtering, rotation, rescaling
2021	[135]	CNN and LSTM for audio mapping	8192 audio samples	128 × 128 pixels	Not implemented
2021	[136]	CNN and fully connected Invariance Layer	32 × 32 pixels/1024 bits	128 × 128 pixels	Noise, cropping, blur, JPEG compression
2023	[137]	CNN with DWT as preprocessing	256 bits	256 × 256 pixels	Noise, dropout, JPEG compression
2023	[138]	CNN with DWT as preprocessing	32 × 32 pixels/1024 bits	512 × 512 pixels	Noise, sharpening, smoothing, dropout, JPEG compression
2022	[139]	CNN autoencoder with DWT and IDWT blocks	50–700 bits	400 × 400 pixels	Perspective warp, motion, blur, noise, color manipulation, JPEG compression
2020	[141]	CNN autoencoder	100 bits	400 × 400 pixels	Perspective warp, camera misalignment, blur, color distortion, noise, JPEG compression
2023	[142]	CNN autoencoder and CNN denoising autoencoder	32 × 32 pixels/1024 bits	128 × 128 pixels	Noise, rotation, JPEG compression
2023	[143]	CNN with MHA in Invariant Domain	8 × 8 pixels/64 bits	128 × 128 pixels	Horizontal flip, blur, solarization, brightness adjustment, contrast variation, hue and saturation modulation
2018	[145]	GAN	30 bits	128 × 128 pixels (training) 512 × 512 pixels (testing)	JPEG compression, blur, cropping, dropout
2019	[146]	GAN with min-max optimization	30 bits	128 × 128 pixels	Cropping, cropout, dropout, blur, JPEG compression and combinations
2020	[147]	GAN	64 bits	512 × 512 pixels	Rotation, JPEG compression, noise, cropping, blur, brightness adjustment
2021	[148]	GAN	64 bits	256 × 256 pixels	JPEG compression, rotation, noise, blur cropping, brightness and contrast adjustment, color inversion
2020	[149]	GAN and IGA	256 bits	256 × 256 pixels	Cropping, dropout, JPEG compression, resizing
2020	[150]	GAN and attention	30 bits	64 × 64 pixels	Cropping, cropout, blur, flip, JPEG compression
2023	[151]	attention module, GAN, feature fusion	30 bits	128 × 128 pixels	Cropping, dropout, blur, JPEG compression, resizing
2024	[152]	GAN-LSTM, Adaptive Gannet Optimization	256 × 256 to 1024 × 1024 pixels	256 × 256 to 1024 × 1024 pixels	Noise, median filtering, blur, JPEG compression, cropping, rotation, scaling
2024	[153]	Transformer and ViT	16-word segments	224 × 224 pixels	JPEG compression, noise, rotation, cropping
2023	[154]	ViT	128 bits	256 × 256 pixels	Noise, median filtering, rotation, scaling
2023	[155]	Transformer with DWT preprocessing	binary image 24 × 24 pixels	96 × 96 pixels	Median and gaussian filtering, noise, SPN, JPEG compression, rotation, cropping, scaling
2023	[156]	Transformer, GAN	36 to 100 bits	128 × 128 pixels	Noise, cropout, dropout, JPEG compression, affine transformation
2023	[157]	Swin Transformer, CNN, MA-FFM, Identity module	64 bits	128 × 128 pixels	Cropping, noise, dropout, gaussian and median filtering, JPEG compression
2024	[158]	Swin Transformer with DCT attention block	64 bits	128 × 128 pixels	Cropout, dropout, rotation, scaling, affine transform
2024	[163]	Stable Diffusion	32 bits	64 × 64 × 4 (latent)	JPEG compression, rotation, noise, blur, generative attacks
2024	[165]	Diffusion Probabilistic Model	16384 bits	128 × 128 pixels	JPEG compression, regeneration attacks
2024	[166]	Diffusion Probabilistic Model	32 bits	512 × 512 pixels	JPEG compression, blur, noise, cropping, brightness adjustment, adaptive attacks

Table 3. Overview of commonly used datasets in image watermarking research.

Name	Category	Number of Images	Resolution [Pixels]	Type	Limitations/ Notes	Reference
ImageNet	Benchmark	14 million	~224 × 224 to 256 × 256 (resized)	Animals, vehicles, plants, tools	Large scale, requires preprocessing, good for robustness tests.	[176]
COCO	Benchmark	330,000	640 × 480	People, vehicles, food, animals	Moderate resolution, suitable for general-training.	[177]
CIFAR-10/CIFAR-100	Benchmark	60,000/100,000	32 × 32	Animals, vehicles	Very low resolution, not suitable for perceptual metrics, useful for capacity tests.	[178]
Pascal VOC	Benchmark	21,000	500 × 375	Animals, vehicles, people	Limited scale and resolution, used in simple robustness evaluations.	[179]
BOSSbase	Benchmark	10,000	512 × 512	Grayscale natural images	Designed for steganalysis, great for statistical robustness tests.	[180]
DIV2K	High resolution	1000	~2K (e.g., 2040 × 1080)	Landscape, buildings, architecture,	High quality, ideal for transparency tests.	[181]
Flickr2K	High resolution	2650	~2K (e.g., 2040 × 1350)	Natural photos: portraits, landscapes,	Unprocessed, variable quality, useful for perceptual metrics.	[181]
LIU4K	High resolution	2100	4K (3840 × 2160)	Different background and objects	High resolution, good for visual quality and real-world simulations.	[182]
UHD4K	High resolution	5000+	4K (3840 × 2160)	Satellite images, films, urban scenes	Very high resolution, good for high-end use cases.	[183]
UHD8K	High resolution	2966	8K (7680 × 4320)	Satellite images, films, urban scenes	Extremely high resolution, useful for stress testing.	[183]
LAION-5B	Synthetic	5.85 billion	from 256 × 256 to 4K	Images paired with text prompts (mixture of real and AI-generated)	Unprocessed and noisy, not ideal for reproducible benchmarking.	[184]
CIFAKE	Synthetic	120,000	32 × 32	Real images from CIFAR 10 and synthetic images	Low resolution, designed for deepfake detection benchmarks.	[185]
ArtiFact	Synthetic	1.5 million	from 256 × 256 to 1024 × 1024	People, animals, vehicles, artworks	Moderate resolution, good for testing synthetic distortions.	[186]
ImagiNet	Synthetic	200,000	from 256 × 256 to 2K	Photos, painting	Well-balanced synthetic content, useful for hybrid real/synthetic training.	[187]
NIH Chest X-ray	Specialized (medical)	112,000	1024 × 1024	Chest X-rays	Suitable for medical robustness/embedding studies.	[188]
EuroSAT	Specialized (satellite images)	27,000	64 × 64	Satellite images: forests, urban areas, fields	Low resolution, useful for satellite-specific tests.	[189]
LFW (Labeled Faces in the Wild)	Specialized (faces)	13,000	250 × 250	Facial photos in natural conditions	Standard for face datasets, useful for privacy, detection, and watermarking on identity data.	[190]

Table 4. Comparison of key features of Vision Transformer, Swin Transformer, and CNN in the context of watermarking.

Feature	CNN	ViT	Swin Transformer
Feature processing method	Local (by convolutional filters)	Global (by self-attention)	Local and global (by shifted windows)
Resistance to traditional attacks	Medium	High	High
Resistance to generative attacks	Medium	High	Very high
Computational complexity	Low to medium	High	Medium (optimized)
Ability to capture context	Limited	High	High (with local optimization)
Scalability to high resolution	Limited	Limited (without optimization)	High
Potential in watermarking	Well verified but limited	High (based on previous research)	Very high

Table 5. Comparison of diffusion models and generative networks (GANs) in the context of watermarking.

Feature	GAN	Diffusion Model
Training stability	Low (frequent convergence problems)	High
Quality of generated images	High	Very high
Generation time	Relatively short	Longer (if no optimization methods were used)
Resistance to attacks	Medium	High
Computational complexity	Medium	High
Possibility of embedding in latent space	Limited	Yes
Potential in watermarking	Well verified but limited	High (based on previous research)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bistroń, M.; Żurada, J.M.; Piotrowski, Z. Deep Learning for Image Watermarking: A Comprehensive Review and Analysis of Techniques, Challenges, and Applications. Sensors 2026, 26, 444. https://doi.org/10.3390/s26020444

AMA Style

Bistroń M, Żurada JM, Piotrowski Z. Deep Learning for Image Watermarking: A Comprehensive Review and Analysis of Techniques, Challenges, and Applications. Sensors. 2026; 26(2):444. https://doi.org/10.3390/s26020444

Chicago/Turabian Style

Bistroń, Marta, Jacek M. Żurada, and Zbigniew Piotrowski. 2026. "Deep Learning for Image Watermarking: A Comprehensive Review and Analysis of Techniques, Challenges, and Applications" Sensors 26, no. 2: 444. https://doi.org/10.3390/s26020444

APA Style

Bistroń, M., Żurada, J. M., & Piotrowski, Z. (2026). Deep Learning for Image Watermarking: A Comprehensive Review and Analysis of Techniques, Challenges, and Applications. Sensors, 26(2), 444. https://doi.org/10.3390/s26020444

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning for Image Watermarking: A Comprehensive Review and Analysis of Techniques, Challenges, and Applications

Highlights

Abstract

1. Introduction

2. Fundamentals of Digital Watermarking

2.1. Watermarking Workflow

2.2. Watermarking Taxonomy

2.3. Watermarking Paradigms and Metrics

2.3.1. Transparency

2.3.2. Robustness

2.3.3. Capacity

2.4. Attacks on Watermarking Systems

3. Traditional Image Watermarking Methods

3.1. Spatial Domain

3.2. Frequency Domain

3.3. Hybrid Domains

3.4. Summary

4. Deep Learning-Based Watermarking

4.1. Deep Learning Architectures Used in Image Watermarking

4.2. Overview of Deep Learning-Based Image Watermarking Algorithms

4.3. Summary of Deep Learning-Based Image Watermarking Algorithms

5. Datasets for Image Watermarking

6. New Research Directions and Challenges in Image Watermarking

6.1. Key Challenges in Implementing Image Watermarking Systems

6.2. Research Directions for Methods and Algorithms

6.3. Application-Oriented Research in Watermarking

6.3.1. Watermarking in Identification of Deepfakes

6.3.2. Watermarking in Cybersecurity

6.3.3. Watermarking in Monitoring and IoT Systems

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI