High-Quality Video Watermarking Based on Deep Neural Networks and Adjustable Subsquares Properties Algorithm

This paper presents a method of high-capacity and transparent watermarking based on the usage of deep neural networks with the adjustable subsquares properties algorithm to encode the data of a watermark in high-quality video using the H.265/HEVC (High-Efficiency Video Coding) codec. The aim of the article is to present a method of embedding a watermark in a video with HEVC codec compression by making changes in a video in a way that is not noticeable to the naked eye. The method presented here is characterised by focusing on ensuring the accuracy of the original image in relation to the watermarked image, providing the transparency of the embedded watermark, while ensuring its survival after compression by the HEVC codec. The article includes a presentation of the practical results of watermark embedding with a built-in variation mechanism of its capacity and resistance, thanks to the adjustable subsquares properties algorithm. The obtained PSNR (peak signal-to-noise ratio) results are at the level of 40 dB or better. There is the possibility of the complete recovery of a watermark from a single frame compressed in the CRF (constant rate factor) range of up to 16, resulting in a BER (bit error rate) equal to 0 for the received watermark.


Introduction
Currently, video material uploaded to an Internet service can be very easily and quickly downloaded and then published further without the knowledge and consent of the author or the rights' holder. Of course, there are many types and cases of copyright infringement; however, the description of this phenomenon is not the purpose of this publication. In view of this phenomenon, the protection of intellectual property in the digital field is an important aspect of today's social life. To secure the rights to a video, a visible watermark is applied to the video for subsequent identification [1,2]. Despite the simplicity and effectiveness of this solution, the method has the significant disadvantage of leaving a visible mark on the entire video, which may affect its reception by the viewer of the video, and it also informs a person who is potentially interested in the unauthorised downloading and publication of the video about the security of the video. Potential infringers will not be discouraged by visible watermark information as there are methods available for the removal of visible watermarks. Due to the above-mentioned drawbacks of the solution for the protection of intellectual rights with a visible watermark, a watermark transparent to the human eye has been introduced, which leaves the reception by the target viewer undisturbed by redundant graphic information. Moreover, a potential copyright infringer may not be aware of the fact that the video material is protected [3,4]. However, it should be acknowledged that there are specialised methods designed to detect transparently embedded watermarks [5,6]; moreover, there are methods for attempting to remove such watermarks [7,8].
With the development of the issue of embedding a transparent watermark, there has been a demand for the development of new ways of embedding a watermark in a video.
There are different methods for embedding a watermark in a video based on conventional methods, such as the discrete wavelet transform [9], the use of the least significant bit [10], and others mentioned later in this article; however, this article presents an approach that uses deep neural networks (DNNs) for this task. The methods outlined in the previous sentence are designed to embed a transparent watermark with the greatest possible capacity and resistance to distortion.
There are many papers in the literature dealing with video watermarking using artificial neural networks (ANNs) [11,12]. Among the works describing the process of embedding a watermark in a video, there are works dedicated specifically to a particular video codec, such as H.264/AVC (Advanced Video Coding) [13,14] and its successor H.265/HEVC (High-Efficiency Video Coding) [15,16].
Methods based on the use of ANNs, as well as classical methods, have their advantages and disadvantages in their applications, and they have the potential to be further improved. In this paper, the solution is based on the use of DNNs because neural networks are able to predict and find features of the problem under study that a creator of handwritten algorithms cannot predict; therefore, algorithms operating on deep networks are bettersuited to solving various, highly complex problems. The necessary condition for such a statement is the assumption that a properly designed neural network architecture is selected and that it has an optimal training set.
The aim of this article is to introduce a robust watermarking method based on the use of DNNs with the adjustable subsquares properties encoding data algorithm dedicated to high-quality video compressed by HEVC codec. The main problem to overcome is assuring the survival of the watermark when the embedded frames are passed through the HEVC codec compression channel. It is a demanding task to embed a watermark and provide its resistance to compression channels in a way that does not degrade the visual quality of the video. As mentioned before, there are various approaches to solving this issue in the literature, and this paper presents a new approach based on a DNN autoencoder (with a compression channel between the encoder and the decoder during the learning process) and an original encoding watermark algorithm. This paper is structured as follows: At the beginning of the paper, a literature review is presented. Then, the idea of the proposed method will be presented, along with a discussion of the adjustable subsquares properties algorithm. The learning process will be presented next, and the structures of ANNs will be demonstrated. Next, the results of the presented method will be discussed, together with their analysis and comparisons with other methods. The last section concludes this publication.

Literature Review
There are many works in the literature on the issue of embedding a watermark in an image. To generalize, they can be divided into classical methods and those using ANNs.
In the literature, there are many publications on embedding a watermark in a static image using the classical methods [24][25][26][27][28][29][30][31][32][33][34]. These methods use the whole image or some of its regions, constituting the areas of interest. The least significant bit method [9,32], as its name implies, uses the least significant bit of each channel value of a given colour palette, affecting the image slightly, but it is prone to losing the watermark due to all kinds of conversions. The whole image is also influenced by methods that manipulate the frequency domain [18][19][20][21][22][23], which are characterized by better survival despite various types of conversions than are those generated via the method of the least significant bit. The modification of the most convenient areas takes place in different methods. An example of one such methods is that which allows for the selection of the optimal colour channel to be modified, as presented in the work of Huynh-The et al. [27].
With the development of ANNs and the areas of their use, they began to also be used to embed watermarks in static images [35][36][37][38]. The introduction of the use of ANNs has created new opportunities for research in watermark embedding, and those are also explored in this paper.
Embedding a watermark in a video is a more sophisticated issue than is that process with a static image due to the different characteristics of the codecs used to compress video material. For the purposes of embedding a video watermark, methods used with static images [12,13], as well as those dedicated to specific video codecs, are applied, using their specific properties [15,16,43]. Depending on the assumptions made, it is possible to recover the watermark from a single video frame or from a fixed number of frames.
In this article, a method based on using DNNs as applied to the HEVC codec to embed watermarks in videos will be discussed.

Presentation of the Concept of the Proposed Method
The watermark embedding method proposed in this paper for H.265/HEVC (High-Efficiency Video Coding) encoded video is based on DNNs and an adjustable subsquares properties algorithm. The research was conducted on a Linux operating system using the TensorFlow version 2.7 machine learning library. The FFmpeg library was used to modify the video material, while the source codes were written in Python version 3.9.
The following is the concept underlying the method, which is based on an article by Shumeet Baluja [44]. The article deals with hiding a static image in another static image. The method is based on the use of three ANNs (preliminary networks which prepare the hidden image, encode it, and decode it) as components that form a system for hiding the image and recovering it, the process for which is presented in Figure 1. frequency domain [18][19][20][21][22][23], which are characterized by better survival despite various types of conversions than are those generated via the method of the least significant bit. The modification of the most convenient areas takes place in different methods. An example of one such methods is that which allows for the selection of the optimal colour channel to be modified, as presented in the work of Huynh-The et al. [27].
With the development of ANNs and the areas of their use, they began to also be used to embed watermarks in static images [35][36][37][38]. The introduction of the use of ANNs has created new opportunities for research in watermark embedding, and those are also explored in this paper.
Embedding a watermark in a video is a more sophisticated issue than is that process with a static image due to the different characteristics of the codecs used to compress video material. For the purposes of embedding a video watermark, methods used with static images [12,13], as well as those dedicated to specific video codecs, are applied, using their specific properties [15,16,43]. Depending on the assumptions made, it is possible to recover the watermark from a single video frame or from a fixed number of frames.
In this article, a method based on using DNNs as applied to the HEVC codec to embed watermarks in videos will be discussed.

Presentation of the Concept of the Proposed Method
The watermark embedding method proposed in this paper for H.265/HEVC (High-Efficiency Video Coding) encoded video is based on DNNs and an adjustable subsquares properties algorithm. The research was conducted on a Linux operating system using the TensorFlow version 2.7 machine learning library. The FFmpeg library was used to modify the video material, while the source codes were written in Python version 3.9.
The following is the concept underlying the method, which is based on an article by Shumeet Baluja [44]. The article deals with hiding a static image in another static image. The method is based on the use of three ANNs (preliminary networks which prepare the hidden image, encode it, and decode it) as components that form a system for hiding the image and recovering it, the process for which is presented in Figure 1. The method, presented hereafter, employs a modified version of the general system scheme, incorporating an additional preliminary network at the decoder input as well as at the encoder and decoder subsystems. The general conceptual scheme of the system components is shown in Figure 2; however, it is worth noting that this is a superficial representation of the system. The individual components presented in the diagram from The method, presented hereafter, employs a modified version of the general system scheme, incorporating an additional preliminary network at the decoder input as well as at the encoder and decoder subsystems. The general conceptual scheme of the system components is shown in Figure 2; however, it is worth noting that this is a superficial representation of the system. The individual components presented in the diagram from the paper and in the diagram created from the actually conducted research are not the same, and they differ in their complexity.  At the very beginning, the encoder subsystem prepares a hidden image, which is a watermark, in accordance with the adjustable subsquares properties algorithm presented in the foundational article. Then, the hidden image is fed into the input of the preliminary network. The pre-prepared, hidden image, together with the original image, which is a carrier for the hidden image, is fed into the input of the encoding network. The encoder DNN substructure encodes the watermark in the carrier image by decomposing the secret image from the preliminary network to the set of features. The next step is to carry out the deconvolution of the obtained set of secret image features with the carrier image (achieved via the substructure of the encoder so as to achieve an encoded image carrier with the watermark) which embeds the watermark optimally so as to preserve the image as accurately as possible, as compared to the original image, while ensuring the possibility of recovering the watermark. During the learning process, the neural network automatically selects (learns) the optimal filters (of the convolutional layers) by which the image modification can be carried out in order to achieve the assumed goal.
The encoding of the data bits in the image is performed by encoding the bit data with the adjustable subsquares properties algorithm and obtaining a bit-encoded image. In the next step, the DNN encoder performs the optimal image encoding. The entire carrier image is subject to permanent modification because the watermark is embedded in the entire image area that is serviced by the encoder.
After the image has passed through the compression channel of the HEVC codec, the decoding process begins. In the decoding process, the watermark image, which was created as a result of the encoding process, is fed into the input of the preliminary network of the decoder, and then the prepared image is fed into the input of the decoding network. The DNN decoder is responsible for the retrieval of the watermark from the carrier image. In the event that a watermark fails to embed in accordance with the proposed method, the recovered potential watermark will clearly return nonsensical, random information, and it will not be so clearly divided amongst the regions of interest according to the characteristics of the adjustable subsquares properties algorithm, as shown in Figure 3. The recovered watermark obtained at the output of the decoding network is processed by the decoder subsystem which, using the decoding algorithm mentioned in the previous sentence, identifies the watermark by recognizing the information encoded in the recovered image. At the very beginning, the encoder subsystem prepares a hidden image, which is a watermark, in accordance with the adjustable subsquares properties algorithm presented in the foundational article. Then, the hidden image is fed into the input of the preliminary network. The pre-prepared, hidden image, together with the original image, which is a carrier for the hidden image, is fed into the input of the encoding network. The encoder DNN substructure encodes the watermark in the carrier image by decomposing the secret image from the preliminary network to the set of features. The next step is to carry out the deconvolution of the obtained set of secret image features with the carrier image (achieved via the substructure of the encoder so as to achieve an encoded image carrier with the watermark) which embeds the watermark optimally so as to preserve the image as accurately as possible, as compared to the original image, while ensuring the possibility of recovering the watermark. During the learning process, the neural network automatically selects (learns) the optimal filters (of the convolutional layers) by which the image modification can be carried out in order to achieve the assumed goal.
The encoding of the data bits in the image is performed by encoding the bit data with the adjustable subsquares properties algorithm and obtaining a bit-encoded image. In the next step, the DNN encoder performs the optimal image encoding. The entire carrier image is subject to permanent modification because the watermark is embedded in the entire image area that is serviced by the encoder.
After the image has passed through the compression channel of the HEVC codec, the decoding process begins. In the decoding process, the watermark image, which was created as a result of the encoding process, is fed into the input of the preliminary network of the decoder, and then the prepared image is fed into the input of the decoding network. The DNN decoder is responsible for the retrieval of the watermark from the carrier image. In the event that a watermark fails to embed in accordance with the proposed method, the recovered potential watermark will clearly return nonsensical, random information, and it will not be so clearly divided amongst the regions of interest according to the characteristics of the adjustable subsquares properties algorithm, as shown in Figure 3. The recovered watermark obtained at the output of the decoding network is processed by the decoder subsystem which, using the decoding algorithm mentioned in the previous sentence, identifies the watermark by recognizing the information encoded in the recovered image.

Adjustable Subsquares Properties Algorithm
To demonstrate the effect of the proposed method, the adjustable subsquares properties algorithm will be discussed first. The algorithm consists of dividing a square into a number of subsquares adequate to the specific size of the subsquare, chosen such that the sum of the sides of the subsquares in the square is equal to the side of the main square in which the subsquares are placed. In a variant of the method presented in this article, the size of the main square (hidden image) is assumed to be 128 × 128 pixels, and the size of the subsquare is assumed to be 32 × 32 pixels, making for a total of 16 subsquares. As digital images stored in popular colour palettes, such as RGB and YUV, have three channels of information, the main square also has three dimensions, such that the main square can be considered, in simple terms, to be a square image, with specific areas in it being smaller squares. Thus, by dividing the image in the above-described way, 16 areas of interest are obtained from an image with dimensions (128, 128, 3). At this point, it should be mentioned that the input and output of both the encoder and the decoder contain an image stored in the form of a YUV colour palette. Thus, the third dimension of the image, representing the individual channels gives three possible values to modify for each pixel; in the method under discussion, the modification of the two values of the U and V channels, representing chrominance, is adopted, while the Y channel, representing luminance, is not subject to modification.
The strength of the adjustable subsquare property manifests primarily in the adopted number system underlying the capacity of the system, together with the adopted size of the subsquare. Assuming that each channel is a representation of a digit in the adopted numeral system, it is possible to obtain as many as three digits in a given numeral system. For example, the maximum hidden value, starting from 0, for the binary numeral system is 7. For the senary numeral system, this value would be 215, and for the octal numeral system, this value would be 511. As mentioned previously, for the research presented in this paper, a two-channel variant was adopted; thus, referring to the examples in the previous sentence, the maximum values for each numeral system will change accordingly: For the binary numeral system, the maximum value will be 3; for the senary numeral system, 35; and for the octal numeral system, 63. For the purpose of presenting the method outlined in this paper, the senary numeral system has been adopted. A graphical depiction of the concept of the adjustable subsquares properties algorithm is shown in Figure 1, while the properties of the method are described by the following relationships:

Adjustable Subsquares Properties Algorithm
To demonstrate the effect of the proposed method, the adjustable subsquares properties algorithm will be discussed first. The algorithm consists of dividing a square into a number of subsquares adequate to the specific size of the subsquare, chosen such that the sum of the sides of the subsquares in the square is equal to the side of the main square in which the subsquares are placed. In a variant of the method presented in this article, the size of the main square (hidden image) is assumed to be 128 × 128 pixels, and the size of the subsquare is assumed to be 32 × 32 pixels, making for a total of 16 subsquares. As digital images stored in popular colour palettes, such as RGB and YUV, have three channels of information, the main square also has three dimensions, such that the main square can be considered, in simple terms, to be a square image, with specific areas in it being smaller squares. Thus, by dividing the image in the above-described way, 16 areas of interest are obtained from an image with dimensions (128, 128, 3). At this point, it should be mentioned that the input and output of both the encoder and the decoder contain an image stored in the form of a YUV colour palette. Thus, the third dimension of the image, representing the individual channels gives three possible values to modify for each pixel; in the method under discussion, the modification of the two values of the U and V channels, representing chrominance, is adopted, while the Y channel, representing luminance, is not subject to modification.
The strength of the adjustable subsquare property manifests primarily in the adopted number system underlying the capacity of the system, together with the adopted size of the subsquare. Assuming that each channel is a representation of a digit in the adopted numeral system, it is possible to obtain as many as three digits in a given numeral system. For example, the maximum hidden value, starting from 0, for the binary numeral system is 7. For the senary numeral system, this value would be 215, and for the octal numeral system, this value would be 511. As mentioned previously, for the research presented in this paper, a two-channel variant was adopted; thus, referring to the examples in the previous sentence, the maximum values for each numeral system will change accordingly: For the binary numeral system, the maximum value will be 3; for the senary numeral system, 35; and for the octal numeral system, 63. For the purpose of presenting the method outlined in this paper, the senary numeral system has been adopted. A graphical depiction of the concept of the adjustable subsquares properties algorithm is shown in Figure 1, while the properties of the method are described by the following relationships: where RGB max is the maximum value in the RGB colour palette (255), L is the minimum value of the watermark embedding range in the value interval 0; 255 , SteepValue min is the minimum constant value added to the step calculating the value range for the digit, B is the maximum value of the digit of the used numeral system (senary), M is the maximum value of the range as a multiplier of the numeral system (senary) used, SteepValue is the constant value of the step calculating the value range for the digit, SteepValue [0;255] is the constant value of the step calculating the range of values for the digit in the RGB value range, SteepValue min[0;255] is the minimum value added to the step calculating the range of values for the digit in the RGB value range, DigitMultiplier is the multiplier for the digit specifying the lower-or upper-range value for the digit within the integer value range 0; B * 2 , and SteepValue f inal is the total value of the step calculating the range of values for the digit.
The system's basic purpose is to encode 16 ASCII characters using an image fragment of 128 × 128 pixels from the range of Arabic numerals (0-9) and the letters of the English alphabet (A-Z), making for a total of 36 possible ASCII characters. Before the character concealment operation is performed, the image undergoes a change from the RGB colour palette to the YUV colour palette and a normalisation from the number range 0 ; 255 to the floating-point number range in 0; 1 . ASCII characters are encoded on a 128 × 128 pixel image in 32 × 32 pixel blocks. The U and V colour channels are used to store the characters. The Y channel, which is the luminance channel, is trained to represent the original Y channel. The characters are encoded in the U and V channels in the senary numeral system. The U channel contains the older bit, while the other channel contains the younger bit. The value of the digit is stored in the interval 0; 1 with the step value calculated according to the previously presented relations.
The value ranges on which the digits are encoded are shown in Table 1; the limiting value in the translation in the RGB colour palette value system is 182.5; it is the limiting value for digit 4; values greater than this will represent digit 5. At this point, it is worth noting that, according to the system components presented, two ANNs are used in the encoding process. The first one prepares the image to be hidden in a form matching the input of the encoder, and the second one is the encoder, which receives at its input the appropriately transformed, hidden image and the original image in which it is to be embedded. The result of the encoder's operation is an image which is supposed to be, for the human eye, indistinguishable from the original, but which has encoded information. After embedding the hidden image in the input image, the image, so prepared, can be fed into the input of the decoder. The decoder, just like the encoder, also has a network for processing the image. The decoder reads the hidden image from the image and then converts it into text form using the decoder subsystem. The higher the resolution of the image, the more watermark repetitions are read from a single frame. At 2160 × 3840 (UHD) resolution, a single frame results in 464 watermark repetitions measuring 128 × 128 pixels. All watermark repetitions are subjected to a median function operation that best represents the value of individual pixels for decoding.
An example of the results of the described system for watermarking video material is shown in Figure 3. The system user can encode the watermark stored as a text represented as ASCII characters from the range of Arabic numerals (0-9) and the letters of the English alphabet (A-Z), which is a total of 36 possible ASCII characters, or as a binary string of characters stored as shown in Table 2. The total number of characters for a 128 × 128 pixel encoder is 16 characters, or 80 bits. To avoid increasing the complexity of the system, and simultaneously the knowledge of the user, the characters whose encoding does not fit into 5 bits (V, W, X, Y, and Z) are not available when reading binary characters from a file, but they are available and work correctly when reading characters from a text file.

Learning Process
For the purpose of learning the deep SSNs constituting the encoder and decoder, as previously shown in Figure 2, the following relationships were applied for the purpose of adjusting the objective function accordingly: MSE DecImg = MSE Image ToHide , Image DecodedHidden MSE EncVid = MSE Image Original , ImageFromVideo Encoded (8) L EncFinalLoss (i) = (L EncLoss (i) + (0.56 * L DecLoss (i))) (17) ∇L where For the implementation of the learning process, the Adam Optimizer [45] was used, which is successfully applied to optimise multivariate objective function problems. The learning set consisted of 4000 randomly downloaded images from the Web, which were changed in resolution, according to the image size supported by the SSN, to 128 × 128 pixels. After conversion from the RGB to the YUV colour palette, these images were then directly used as Image Original in the training process, as well as for computing both MSE EncImg and MSE EncVid . For the purpose of embedding the watermark in the video, the watermarked images of the learning set were also saved as video files in H.265 format, with the video compression ratio CRF = 7, and then read as ImageFromVideo Encoded for the purpose of computing MSE EncVid . The hidden images Image ToHide were randomly generated in such a way that the data storage system imposed by the adjustable subsquares properties algorithm was preserved. The numerical values of the digits shown in Table 1 were drawn as the value of a digit was taken as the median value from the corresponding range of values. The test set, on the other hand, was composed of the individual MSE DecVid frames of the video, encoded and then read after HEVC recording.
The neural network structure described in this work is based in its assumption of the operation presented in the work previously mentioned at the beginning of this section [1]. The autoencoder structure consists of an encoder, which is a network that embeds the hidden watermark in the image, and a decoder, which returns the recovered hidden watermark as an output. Similar to the structure from the work mentioned at the beginning of this paragraph, where the input data is an RGB image which is a three-channel tensor, in the presented method, the input data is a YUV image which is also a three-channel tensor. Tables 3-5 show the parameters of the DNN encoder and decoder structures. As shown in Figure 2, the proposed method includes encoding, decoding, and preliminary networks (which are identical in their structure for the needs of the encoder and decoder, but, of course, each of them has been trained for the needs of the encoder or decoder, respectively). Table 3. Basic Layer (BL) processing model.

Layer Number
Layer Type 1 Convolution 2D layer 2 Batch Normalization 3 Convolution 2D layer 4 Batch Normalization 5 Convolution 2D layer 6 Batch Normalization The preliminary networks for the encoder and decoder are identical in their structure, but they have different roles. For the encoder, the preliminary network processes the original image so that the encoding network can embed the watermark with better results, while, for the decoder, the preliminary network prepares the output image for the decoding network. The network structure includes the following layers: Convolution 2D, Batch Normalization, and Concatenate. The LeakyReLU activation function is used for the convoluted layers. At the input of the encoding network, a tensor-combined from the image processed by the preliminary network and the watermark-is applied, while the output of the network is expected to be the image with the embedded watermark. The decoding network receives at its input a pre-processed image in the form of a tensor from the preliminary network, which it further processes and returns as an embedded watermark at its output. The preliminary networks for the encoder and decoder are identical in structure. The encoding and decoding networks have similar network structures, the difference being the number of filters used.
Memory and performance requirements for processing a single 128 × 128 pixel image are shown in Table 6.

Edge Effect
When embedding a watermark into an image larger than the input size of 128 × 128 pixels, an edge effect is noticeable, consisting of a distortion of a few pixels at the edges of the encoded portion. The distortion is visible when zoomed in appropriately, depending on the size of the image. An example of an edge effect is shown in Figure 4.

Edge Effect
When embedding a watermark into an image larger than the input size of 128 x 12 pixels, an edge effect is noticeable, consisting of a distortion of a few pixels at the edges o the encoded portion. The distortion is visible when zoomed in appropriately, dependin on the size of the image. An example of an edge effect is shown in Figure 4.

Results of the Research
This section presents the results of the video watermarking system when the video compression rate is changed, which is represented by the CRF (constant rate factor) (Tables 7 and 8; Figure 5) and when changing the video BF (brightness factor) ( Table 9; Figure 6). The impact of changing the video resolution will also be discussed (Tables 10 and 11; Figure 7). For the purpose of testing the effectiveness of the presented method, an embedded watermark in the form of the string "QVERTS0123456789" was adopted. The test-video fragment for the presentation of the method in this paper contained 30 frames of video footage of moving objects in the form of clouds visible in the image, and it was characterised by a gradual brightening, starting from a complete darkening of the image [46]. Example video frames are shown in Figure 5.        According to the results shown in Table 7, there is an apparent trend in the overall retrieval of the watermark from the individual video frames for the CRF in the range ⟨0 ; 16⟩. According to the results in the table for CRF values greater than 17, the recovered mark is partially degraded as the compression ratio value increases. Analysing the results in Table 8, there is a trend in the similarity of the original image to the encoded image, measured by MSE to 5 decimal places. In contrast with the well-known fact in steganog-     According to the results shown in Table 7, there is an apparent trend in the overall retrieval of the watermark from the individual video frames for the CRF in the range 0; 16 . According to the results in the table for CRF values greater than 17, the recovered mark is partially degraded as the compression ratio value increases. Analysing the results in Table 8, there is a trend in the similarity of the original image to the encoded image, measured by MSE to 5 decimal places. In contrast with the well-known fact in steganography that it is easier to hide information in a brightened image, this method has no need to do so, because embedding a watermark slightly affects the original image, as shown in the results in the table. The following characteristics (Figures 8-11) show the results obtained during the learning process. From the course of the characteristics, the trend towards correct watermark recovery is noticeable, starting from the 330th learning epoch onwards, for which the BER is 0.   The effect of changing the brightness, both by brightening and darkening, is shown in Table 9. As the results of the brightness-change study show, the presented method shows better resistance to image brightening than to image darkening. Additionally, it is worth noting that, for a darkened image, changing the brightness of the image does not degrade the watermark to the extent that it cannot be recovered. However, for very dark backgrounds which are almost black or black, when the image brightness is changed, the watermark is lost in these areas. A preview of the brightness factor changes is shown in Figure 6, and a preview of the original frames is shown in Figure 5.
Changing the resolution negatively affects the embedded watermark. The presentations of the results of the resolution changes in Tables 10 and 11 show the results for changes from the UHD resolution of the images to another resolution. A preview of the decoding results after the resolution changes is shown in Figure 7.
The following characteristics (Figures 8-11) show the results obtained during the learning process. From the course of the characteristics, the trend towards correct watermark recovery is noticeable, starting from the 330th learning epoch onwards, for which the BER is 0. The following characteristics (Figures 8-11) show the results obtained during the learning process. From the course of the characteristics, the trend towards correct watermark recovery is noticeable, starting from the 330th learning epoch onwards, for which the BER is 0.

Comparison with Other Methods
In this subsection, the proposed method is compared with the selected state-of-theart (SOTA) methods mentioned earlier in the literature review.
In order to make the comparison, the obtained MSE results were converted into the PSNR coefficient according to the following formula [47]: where PSNR stands for peak signal-to-noise ratio given in dB, is the number of bits needed to define the range of values that a pixel can take, and MSE is an abbreviation for the mean square error.
It should be noted that the MSE given in the previous tables was calculated for the floating-point values at the encoder input and output because neural networks usually take values in the range ⟨0 ; 1⟩ due to the normalization process. In the comparison with other methods, these values will be compared with MSE values in the range ⟨0 ; 255⟩.
To ensure the comparability of the methods in the experiment, the selected compression value for HEVC is 16. The results of the method comparison are presented in Table   Figure 11. The absolute accuracy of the hidden message (number of correctly recovered characters).

Comparison with Other Methods
In this subsection, the proposed method is compared with the selected state-of-the-art (SOTA) methods mentioned earlier in the literature review.
In order to make the comparison, the obtained MSE results were converted into the PSNR coefficient according to the following formula [47]: where PSNR stands for peak signal-to-noise ratio given in dB, M is the number of bits needed to define the range of values that a pixel can take, and MSE is an abbreviation for the mean square error. It should be noted that the MSE given in the previous tables was calculated for the floating-point values at the encoder input and output because neural networks usually take values in the range 0; 1 due to the normalization process. In the comparison with other methods, these values will be compared with MSE values in the range 0; 255 .
To ensure the comparability of the methods in the experiment, the selected compression value for HEVC is 16. The results of the method comparison are presented in Table 12. Each method uses a different way to embed a watermark. The first method [15] presents an intra-drift-free watermarking algorithm, which uses a multi-coefficient modification method. This method embeds the watermark into intra prediction residual pixels of 4 × 4 luminance transform blocks in the spatial domain.
In the second method [16], initially, a spatial texture analysis is performed based on the number of non-zero transform coefficients of embedding blocks. Then, suitable candidate blocks for watermark embedding are selected. In the next step, the grouping of intra prediction modes is performed. Each group is represented by two bits of watermark sequence. The embedding process is performed by altering the prediction modes of selected 4 × 4 intra prediction blocks to the representative mode of the group denoted by the watermark bit pair.
The third method [43] uses the BCH syndrome code technique. In this method, groups of the prediction directions are provided to limit the intra-frame distortion drift. The encoded data are embedded into the multi-coefficients of the selected 4 × 4 luminance discrete sine transform blocks as required by pre-defined groups.
A detailed description of the proposed method of high-capacity and transparent watermarking based on the usage of DNNs with the adjustable subsquares properties algorithm is presented in Section 3.
The average PSNR for the proposed method and the compared methods is greater than 40 dB, which provides an acceptable visual experience.
The time needed to compute the proposed method is much greater compared to that required by the other methods; this is due to the fact that the ANN structure is not optimized for real-time processing. The currently proposed ANN structure is a research structure and is only applicable for offline watermark embedding. The embedding time for the proposed method also includes the time taken to remove the edge effect described in Section 3.4, which doubles the watermarking time. An additional factor that increased the computation time was the need to encode 8 areas of 128 × 128 in order to embed a watermark on the entire image with a size of 416 × 240. It is worth noting that watermark repeats were embedded, while other watermark combinations could be embedded by creating sequences of watermarks to compose one larger watermark. In order to accelerate the computation, the structures of the ANNs proposed in this paper should be reduced.
The compared methods [15,16,43] have a watermark capacity of 100 bits on the test sets, with the smallest image size being 416 × 240. However, the presented variant of the adjustable subsquares properties algorithm offers a capacity of 80 bits in a 128 × 128 image. By increasing the size of the encoded image with the proposed method to a size close to the test size for compared methods to the size of 224 × 224, the watermark capacity with the test variant of the adjustable subsquares properties algorithm increases to 245 bits. In accordance with the previous sentence, the proposed method has a greater watermark capacity than the compared methods. This capacity can be further increased by modifying the parameters of the adjustable subsquares properties algorithm.

Conclusions
The process of training DNNs for video watermark embedding is a computationally resource-intensive task. The main reason for this is the resolution at which the ANN operates during the training process. By increasing the resolution of the learning patterns, i.e., the resolution at which the network will operate natively, the hardware resource requirements increase significantly. Given this fact, it is reasonable to train ANNs (encoders and decoders) at lower resolutions and then, through an appropriate algorithm, transpose their operations to higher image resolutions. Such a solution facilitates the process of training and searching for the optimal structure of the neural network, but it also enables the encoding of arbitrary image resolutions. Additionally, such a solution allows for multiple encodings of the same watermark in a larger image (a single frame of video), thus facilitating the process of watermark recovery. Moreover, it is possible to create a scenario of a larger image that includes different watermarks which, when combined, form one larger watermark.
Encoding the watermark in the YUV colour palette seems to be characterised by a more permanent watermark deposition in the chrominance, i.e., in the U and V channels, than in the luminance Y channel. When designing a video watermarking system, it is worth bearing in mind the range of values in which the YUV and RGB colour palettes operate and, in particular, the fact that the YUV colour palette can also have negative values, while RGB has only positive values. This is not only important for the design of the watermark embedding system, but it is also crucial when using ANNs, forcing the selection of an appropriate activation function that takes negative values into account. The proper selection of the value range in which the watermark is embedded has a noticeable effect on the image in which it is hidden. When attempting to hide a white image in a black one (in the RGB colour palette, this would be hiding the 255 value in the 0 value), the network tends to learn to brighten the image so as to hide the watermark and keep the original image as true as possible, which results in an unwanted, visible brightening of the image. One possible solution to the above problem is to change the value range in which the watermark is embedded so as to reduce the size of the difference between the numerical values of each individual pixel of the original image and the watermark. Another approach is to attempt to apply an appropriate thresholding of the watermark energy, depending on the energy of the original image.
The method presented in this paper generates desirable results for high image quality (low video-compression ratio), that is CRF, in the range 0; 16 and demonstrates a tolerance for changes in image brightness-less for a darkened image, and more for a brightened one. Changing the encoded video to a different resolution results in a degradation of the watermark-greater for reduced video resolution and less for increased video resolution.
The issues of changing the resolution of a watermarked video and embedding a watermark for a higher video compression ratio require further research. The question of increasing the capacity of the watermark and reducing the structure of the ANNs to speed up the method also requires further exploration.
In order to reduce the structure of the neural network, the number of filters can be reduced. To make the watermark more resistant to image-resolution changes, it is worth introducing an additional component improving the quality of the recovered watermark, such as a super-resolution neural network. Increasing the capacity of the watermark can be achieved by appropriately selecting the parameters of the adjustable subsquares properties algorithm. Improving the codec compression tolerance for correct watermark embedding can be achieved by introducing an additional neural network sub-structure to show the learning neural network the results of higher HEVC codec compression and its effect on the hidden watermark.