Blind Audio Watermarking Based on Parametric Slant ‐ Hadamard Transform and Hessenberg Decomposition

: Digital watermarking has been widely utilized for ownership protection of multimedia contents. This paper introduces a blind symmetric audio watermarking algorithm based on parametric Slant ‐ Hadamard transform (PSHT) and Hessenberg decomposition (HD). In our proposed algorithm, at first watermark image is preprocessed to enhance the security. Then, host signal is divided into non ‐ overlapping frames and the samples of each frame are reshaped into a square matrix. Next, PSHT is performed on each square matrix individually and a part of this transformed matrix of size m × m is selected and HD is applied to it. Euclidean normalization is calculated from the 1st column of the Hessenberg matrix, which is further used for embedding and extracting the watermark. Simulation results ensure the imperceptibility of the proposed method for watermarked audios. Moreover, it is demonstrated that the proposed algorithm is highly robust against numerous attacks. Furthermore, comparative analysis substantiates its superiority among other state ‐ of ‐ the ‐ art methods.


Introduction
Nowadays, the significant improvement of the internet makes it possible to easily access different multimedia data. Thus, various types of new challenges related to copyright protection and content tempering are introduced every day. Digital watermarking has been effectively utilized to tackle these new challenges. It is a process of embedding secret information into digital contents for authenticity. The major applications of digital watermarking include data authentication, fingerprinting, copyright protection, ownership protection, and broadcast monitoring [1]. The primal requirements of watermarking methods are (i) imperceptibility (ii) robustness, (iii) data payload, and (iv) security [2]. The imperceptibility property of a watermarking algorithm defines the indistinguishability between the host signal and watermarked signal. The robustness property of a watermarking algorithm is the ability to sustain the watermark against numerous signal processing attacks. Data payload of a watermarking algorithm defines the number of watermark bits that are embedded into the host signal. Security of a watermarking algorithm ensures that a watermark can be detectable only by an authorized person. The main challenge of a watermarking algorithm is to maintain a good trade-off among these requirements. In general, digital watermarking can be classified by different properties. On the basis of robustness property, digital watermarking can be classified into robust and fragile (or semi-fragile) watermarking. Moreover, the watermarking methods can be classified into blind, semi-blind, and non-blind. While blind watermarking method can detect the watermark without the host signal, the non-blind method requires the host signal to extract the watermark and semi-blind method needs some information of host signal to extract the watermark. In this paper, we introduce a blind symmetric audio watermarking algorithm using a parametric Slant-Hadamard transform (PSHT), Hesssenberg decomposition (HD), and Euclidean normalization, which provides a good trade-off among imperceptibility, robustness, and data payload.
The remainder of this paper is organized as follows. Section 2 provides the related research that includes a brief summary of recent methods. Section 3 briefly describes the background information including PSHT and HD. Section 4 introduces the proposed watermarking method consisting of watermark preprocessing, watermark embedding, and extraction processes. Section 5 provides the experimental results and compares the performance of the proposed method with recent methods in terms of imperceptibility and robustness. Finally, in Section 6, the conclusion of this paper is presented.

Related Research
An extensive survey on audio watermarking techniques is described in [1,2]. According to the domain, watermarking is classified into time domain and transform domain techniques. Time domain techniques embed a watermark into the audio signal by modifying its coefficients directly [3]. This technique is easy to implement and requires few computational resources. On the other hand, the transform domain technique is applied to coefficients obtained as the result of transformation of either a whole audio or the frame of the audio. Some well-known and conventional transform domain techniques are discrete wavelet transform (DWT) [4], discrete cosine transform (DCT) [5], and fast fourier transform (FFT) [6]. Pandey et al. [3] presented a method that uses the pseudo-random gray sequence property. However, the imperceptibility result of this method is not quite high and robustness result is provided for very few attacks. Kaur et al. [4] suggested a method based on a mathematical model by using features such as energy, short time energy, and zero cross means, but robustness against some attacks is quite low. Tsai et al. [5] proposed a watermarking method based on energy averaging. However, the data payload of this method is not reported there. In [6], a watermarking scheme was proposed based on Lucas regular sequence (LRS) and FFT. However, this scheme shows less robustness against some of the common attacks. Dhar et al. [7] proposed a DCT-based algorithm using singular value decomposition (SVD) and exponential-log operations (ELO) where the watermark is embedded to the highest power of DCT coefficients, but robustness results against some common attacks were not reported. Karnjana et al. [8] introduced a method based on singular spectrum analysis (SSA) and psychoacoustic model (PM), but it shows quite low robustness against some common attacks. The authors of [9] proposed a multifunctional algorithm based on chaotic scrambling. However, the peak signal-to-noise ratio (PSNR) of this method is quite low. In [10], authors proposed a method based on DCT, singular value decomposition (SVD), entropy, and log-polar transformation (LPT). It shows good results for imperceptibility, but it does not show good robustness results against some common signal processing attacks. Hwang et al. [11] introduced a watermarking method based on quantization index modulation (QIM) and SVD, but the imperceptibility and robustness of this scheme is a little low. In [12], a watermarking method is proposed based on flexible segmentation (FS) and adaptive embedding (AE), but it provides low SNR and low robustness against some common attacks. Hu et al. [13] suggested a method in dual domain using flexible segmentation and adaptive embedding where binary watermark bits are inserted into discrete wavelet packet transform coefficients. However, it shows slightly poor results for imperceptibility. In [14], the authors introduced a watermarking algorithm using DWT and directsequence spread spectrum (DSSS). However, the robustness result of this method against some attacks is quite low. Irawati et al. [15] presented a method based on DCT and QR decomposition. The SNR of this method ranges between 11 dB to 27 dB, which is much lower than the basic requirement, and the bit error rate (BER) against some attacks is also quite high. Gupta et al. [16] suggested a watermarking method using lifting wavelet transform (LWT) and adaptive quantization. Although this method is blind, the SNR and normalized correlation (NC) of this method is poor. In [17], a watermarking scheme is proposed using audio characteristics and scrambling encryption. This scheme shows high security; however, it has low robustness against some attacks. In [18], a watermarking scheme is suggested using empherical mode decomposition (EMD) where intrinsic feature of final residual is used to embed the watermark. It shows good robustness, but the objective listening test was not performed and data payload was also not reported there. Safitri et al. [19] presented a method using DWT, SVD, and BCH code where watermark bits are inserted using QIM. However, the PSNR of this method is little low and also robustness against some attacks was not conducted. A histogram-based audio watermarking using stationary wavelet transform (SWT) and synchronization is suggested in [20]. However, the data payload of this method is quite low and BER of this method against some attacks is quite high. An audio watermarking method using phase shifting is introduced in [21]. However, the PSNR result of this method is not reported and the robustness result against some attacks is quite low. From the above studies, we observed that some methods have low robustness, whereas some methods have less imperceptible or low data payload. To overcome the limitations stated above, in this paper, we suggest a blind symmetric audio watermarking algorithm based on PSHT, HD, and Euclidean normalization. To the best of our knowledge, this is the first audio watermarking algorithm that utilizes PSHT, HD, and Euclidean normalization jointly. The main features of the proposed algorithm are as follows: (i) it applies PSHT, HD, and Euclidean normalization unitedly; (ii) the logistic map is used for scrambling the watermark to safeguard the unauthorized detection; (iii) it embeds watermark into the largest value of the 1st column of Hessenberg matrix using a new embedding equation; (iv) watermark is extracted without the host signal; (v) it ensures the trade-off among imperceptibility, robustness, and data payload. Simulation results demonstrated that our proposed method is highly robust against numerous attacks. The BER of the proposed method varies from 0 to 6.54, whereas the BER of the recent methods [4][5][6][7][8][9][10][11][12] vary from 0 to 17.76. The PSNR of the proposed method varies from 43.81 to 47.75, whereas PSNR of the recent methods vary from 19.39 to 44.81. In other word, the proposed method outperforms state-of-the-art methods in terms of robustness and imperceptibility.

Parametric Slant-Hadamard Transform (PSHT)
Parametric Slant-Hadamard transform (PSHT) was introduces by Agaian and this method is mostly used for signal processing [22]. PSHT mainly includes some parameters for which the fidelity, robustness, and imperceptibility property varies. Let f denote the original signal and F denote the transformed signal. Then, two-dimensional PSHT can be described as: , where represents a 2n ×2n parametric slant-Hadamard matrix with real elements. The inverse transform to recover f from the transformed matrix F is given by: . ( The parametric slant-Hadamard matrix with order 2 is obtained from the matrix of order 2 with the help of Kronecker product operator given as: where represents the identity matrix of order 2 and denotes the matrix of recursion kernel property.
can be described as follows: where denotes the all-zero matrix of size M×M and ⊗ denotes the Kronecker product [22]. The parameters and are defined as: The PSHT can be categorized into four groups based on the value of parameter β: (i) for all =1, it represents the classical slant transform; (ii) for = 2 and n>1, it represents the Walsh-Hadamard transform; (iii) for ⋯ = , and|β| ≤ 4, it represents the constant beta slant transform; (iv) for ⋯ , 2 ≤ ≤ 2 andn= 2, 3, 4, …,it represents the multiple beta slant transform.

Hesssenberg Decomposition (HD)
The Hessenberg decomposition (HD) decomposes a general square matrix A into the following form: where P denotes orthogonal matrix and H is an upper triangular matrix [23].

Proposed Watermarking Algorithm
Let Y = {y (n),1≤ n ≤ S}be the host signal containing S samples and W = {w (k,l), 1≤ k ≤ M, 1≤ l ≤ M} represent the binary watermark image. Let , ∈ 0,1 be the pixel value at the point , that will be embedded into the host audio.

Watermark Preprocessing
For the enhancement of confidentiality, at first, a watermark should be preprocessed. The proposed method uses a logistic map which encompasses the chaotic characteristic to encrypt the binary watermark image and this feature will ensure the confidentiality of the proposed method. The mapping is defined as follow: where (1)∈ (0,1) and a, b are real parameters according to the map's initial condition. After this, a binary sequence is obtained with the help of the following equation: where T represents a predefined threshold value, which depends on the real parameters a and b. Moreover, T is proportional to a and b, i.e., as the values of a and b increase, the value of T also increases and vice versa.
The original binary watermark image W is converted into an one dimensional sequence , where ={ , =1,2,3,…,M×M}. Then, in the final stage of preprocessing, is encrypted using with the help of the following equation: where  is the exclusive-or (XOR) operation. After this encryption process, u(i) cannot be found through random search. In this process, (1), a, and b can be used as a secret key K. The pseudo code of the watermark preprocessing is presented in Algorithm 1.

Watermark Embedding Process
The proposed watermark embedding procedure is shown in Figure 1 and is described as follows: 1. The host signal Y is firstly divided into M M non-overlapping frames F= { , ′ , … , and each frame Fi is converted into two-dimensional matrix of size m×m,where i represents the frame number.
2. PSHT is applied on each matrix and transformed matrix is obtained. 3. Then, each transformed matrix is sub-divided into N non-overlapping blocks B = { , 1 of size n×n and sum of the absolute mean of each block is calculated using the following equation: 5. The is selected for decomposition and for simplicity it is represented as . HD is then performed on the selected n×n matrix , which is represented by: (11) where Pidenotes the orthogonal matrix and Hidenotes the Hessenberg matrix. 6. Euclidean normalization of the 1st column of the Hessenberg matrix Hiis calculated using the following equation:  7. The watermark bit is embedded into the Euclidean normalization of the 1st column of Hessenberg matrix Hi. Watermark is embedded using the following rule: (i) when mod (xi, 2)=0, the following equation is used: (ii) when mod (xi, 2)=1, the following equation is used: 8. Finally, the largest coefficient denoted by , of the 1st column of Hessenberg matrix is modified using the following equation:

The modified largest coefficient
, is re-inserted into to obtain the modified Hessenberg matrix and inverse HD is applied for obtaining the modified matrix , which can be defined as: (16) 7. N non-overlapping blocks including the modified block are recombined to obtain . Inverse PSHT is applied to the to obtain the modified matrix . 8. Each watermarked frame is obtained by reshaping each modified matrix . 9. Finally, watermarked signal is obtained by concatenating all the watermarked frames. The pseudo code of the watermark embedding procedure is presented in Algorithm 2.  (10) end for select block with maximum sum of absolute mean apply HD on matrix using Equation (11) calculate using Equation (12) calculate and update into ′ using Equations (13) and (14) modify the largest Hessenberg coefficient , using Equation (15) apply inverse HD on matrix * using Equation (16) apply inverse PSHT on * reshape properly reshape * properly.
end for return watermarked audio *

Watermark Extraction Process
The proposed watermark detection procedure is shown in Figure 2. The blind extraction of the watermark is described in the following steps: 1. The attacked watermarked audio * is firstly divided into M×M non-overlapping frames and each frame is converted into two-dimensional matrix * .
4. HD is then performed on * to obtain the matrices * and * . * is calculated from * . 5. Then, * and * are calculated from * . 6. The encrypted watermark sequence is extracted using the following rule: 7. Chaotic decryption is performed using the secret key K in order to find the binary watermark sequence with the following equation: *  * 8. Finally, the watermark sequence is obtained after rearranging the binary sequence * into a square matrix * with size M×M.
The pseudo code of the watermark extraction procedure is presented in Algorithm 3.  (17) calculate * using the Equation (18) reshape * end for return watermark *

Experimental Results and Discussion
In this section, the performance of our proposed algorithm has been evaluated and compared with some state-of-the-art methods. In this study, we used 20 audio files belong to four different audio groups as host audio signals, which are given below: Group 1: 05 files containing pop music; Group 2: 05 files containing classical music; Group 3: 05 files containing jazz music; Group 4: 05 files containing rock music; All audio files are mono-channel 16 bit with a 44.1 kHz sampling rate and they contain 262,144 samples (duration 5.94 s). The selected size of the frame for each audio is 256 samples. Therefore, we have 1024 frames for each audio. A binary watermark image and the corresponding encrypted watermark image with size 32×32 are shown in Figure 3. Thus, one watermark bit is embedded in each frame. In this study, constant beta slant transform is used with parameters 2, 2, 2. Moreover, the selected value of y(1), b, T, and s are 1, 1, 0.5, and 10, respectively. These parameters are considered to obtain a good trade-off between the imperceptibility and robustness. HD is applied on matrix Ri with size 8×8 for better computation cost of space and time.

Imperceptibility Analysis
Imperceptibility property of the proposed algorithm is assessed by using both subjective and objective analysis.

Subjective Analysis
For ensuring imperceptibility, perceptual quality of watermarked audio should be calculated. In this study, 10 participants were blindly given both the original and watermarked signals and were asked to differentiate these two signals based on a subjective difference grade (SDG) that ranged from 5.0 to 1.0 (imperceptible to very annoying) as given in Table 1. The average result of subjective grading is presented in Table 2. The result shows that the mean opinion score (MOS) of the proposed method lies between 4.9 to 5.0 for all watermarked audios, which ensures the imperceptibility of the watermarked audio. Subjective evaluation was also conducted by another technique known as the ABX method. The test was evaluated with the help of 10 subjects. At first, each subject listened to both the host signal (A) and the watermarked signal (B). Then, they were given another unknown signal (X) and were asked to find out the unknown one. Five trials were conducted by each subject. Table 2 presents the results of the correct detection, which varied between 48% to 54%, indicating the high imperceptibility of the proposed method.

Objective Analysis
The objective assessment is generally measured by the SNR of the watermarked audio. According to the standard of Industrial Federation of the Phonographic Society (IFPI), the ideal SNR of watermarked audio should be more than 20 dB for satisfying the imperceptibility property [7]. The SNR of the proposed method for various audio is given in Table 2. We observed that SNRs of various audios are greater than 40 dB, which satisfy the international standard.
Moreover, objective assessment was also conducted using object difference grade (ODG), which is one of the output obtained from the perceptual evaluation of audio quality (PEAQ) measurement based on ITU-R BS.1387 (International Telecommunication Union-Radio-communication Sector) standard [7]. The ODG score lies between 0 to −4 (imperceptible to very annoying) given in Table 1. The objective quality of different audios using the proposed method are evaluated in terms of ODG and the results are shown in Table 2. It is observed that all ODGs of our proposed algorithm range from −0.39 to −0.46, indicating that the original and watermarked audios are perceptual similar. Table  3 shows a comparative analysis between the proposed and several recent methods [4,12] in terms of SNR and MOS. From this comparison, it was observed that our proposed method shows better result in terms of SNR and MOS. In other words, subjective and objective analysis proves that the proposed method provides better performance than the other methods in terms of imperceptibility. Table 3. A comparative analysis between the proposed and various methods in terms of imperceptibility.

Robustness Analysis
The robustness of our proposed algorithm has been evaluated using (1) normalized correlation (NC) and (2) bit error rate (BER).Define if appropriate Normalized correlation (NC) compares the similarities between two images. It is calculated as follows: 7. Echo addition: Echo signal containing a delay time of 150 ms and decay rate of 35% was applied to the watermarked signal. 8. Distortion: The watermarked audio signal was distorted within a range of 0 dB to −10 dB. 9. Amplification: The watermarked audio was amplified (enlarged) by 1.25 times of its original amplitude. 10. Delay: A delay time of 150 ms was used and the volume of the delayed signal contains 3% of the original signal. 11. Invert: The watermarked audio signal was fully inverted to obtain the inverted form of the actual watermark signal. 12. Low-Pass Filter: A low-pass filter with a cut-off frequency of 15,000 Hz was applied to the watermarked audio. Tables 4 and 5 show the robustness result of our proposed algorithm in terms of NC and BER, which are obtained from various attacked watermarked audio signals. We observed that the proposed method recovers the watermark successfully from the attacked watermark audio signals for noise reduction, invert, and echo addition, as the NC values are 1 and BER values are 0.
Moreover, the proposed method shows good NC and BER values for amplification, distortion, delay, re-sampling, re-quantization, cropping, and low-pass filtering attack. The NC of the proposed method for various attacks varies from 0.9459 to 1. Moreover, the BER of the proposed method varies from 0 to 6.54 for various attacks. In other words, the NCs of our proposed method are greater than 0.9459 and BERs of the proposed method are less than 7%. Figures 4-7 show the extracted watermark images for different audios against various attacks. From these figures, we observed that watermark is extracted without any errors in most of the cases, which proves the high robustness of the proposed method.  (a) (a)   Table 6 illustrates a comparative analysis between the proposed and some recent methods [4][5][6][7][8][9][10][11][12] in terms of noise addition, resampling, re-quantization, and MP3 compression. From this table, we observed that our proposed method shows less BER than the other recent methods for noise addition. Moreover, it shows better result than that of the methods presented in [4,[6][7][8]10,11] for the re-sampling attack. For the re-quantization attack, it shows better result than that of the methods proposed in [5][6][7][8] and for MP3 compression, it shows better result than that of the methods suggested in [5][6][7][8][9]11,12]. From these results, we can conclude that our proposed method provides lower BER values against some common attacks compared with some recent state-of-the-art methods. Overall, our proposed method shows better performance than the recent state-of-the-art methods in terms of imperceptibility and robustness. This is because the watermark bits were inserted into the largest value of the 1st column of the Hessenberg matrix of PSHT coefficients of each frame using a quantization function.

Data Payload
Data payload defines the number of bits that can be embedded into the original signal over a unit of time. It is measured by bits per second. The data payload P is defined as follows: (21) where T indicates the time duration of the original audio signal and B indicates the number of watermark bits to be embedded into the host signal. The standard value for data payload is more than 20 bps [7]. The data payload value of our proposed scheme is 172.39 bps, which is much higher than the standard value.

Security Analysis
To enhance the security, the proposed scheme uses chaotic encryption. First, we encrypted the watermark using logistic mapping where a key K is used for both encryption and decryption. Second, there is another parameter β, which is used in the PSHT process. Different values of β shows different experimental results. Last, a quantization coefficient x was used for both embedding and blind extraction. Therefore, it is not possible to detect the embedded watermark without these three parameters.

Computation Time Analysis
The computation time of our proposed method including both the embedding and extraction processes is calculated and compared with that of the methods presented in [5,6,8], which is given in Table 7. We observed that the computation time for embedding process of our proposed method is 2.03 s, which is much lower than that of the methods given in [5,8], whereas it is slightly higher than that of the method reported in [6]. On the other hand, the computation time for detection process of our proposed method is 0.75 s, which is much lower than that of the methods given in [6,8]. From this point of view, it can be concluded that the proposed method has lower computational cost compared with other methods.

Conclusions
In this paper, we proposed a blind symmetric audio watermarking algorithm based on two wellknown transformation and decomposition techniques, namely PSHT and HD, which are used in audio watermarking for the first time. Watermark is embedded into the largest value of the 1st column of the Hessenberg matrix of PSHT coefficients of each frame using a new quantization function. By simulation, it is demonstrated that the proposed algorithm is highly robust against numerous attacks such as noise addition, noise reduction, echo addition, cropping, re-quantization, MP3 compression, re-sampling, distortion, amplification, delay, invert, and low-pass filter. In addition, the proposed algorithm is computationally faster and it has high data payload. Moreover, the audio quality tests ensure the high imperceptibility of the watermarked audios. Furthermore, comparative analysis substantiates its superiority among other state-of-the-art methods. These results verified the validity of our proposed algorithm for audio copyright protection. In the future, the proposed algorithm will be compared with several recent state-of-the-art methods using the same dataset in terms of imperceptibility, robustness, and computation time.