A Novel Robust Audio Watermarking Algorithm by Modifying the Average Amplitude in Transform Domain

Featured Application: This algorithm embeds a binary image into an audio signal as a marker to prove the ownership of this audio signal. With large payload capacity and strong robustness against common signal processing attacks, it can be used for copyright protection, broadcast monitoring, ﬁngerprinting, data authentication, and medical safety. Abstract: In order to improve the robustness and imperceptibility in practical application, a novel audio watermarking algorithm with strong robustness is proposed by exploring the multi-resolution characteristic of discrete wavelet transform (DWT) and the energy compaction capability of discrete cosine transform (DCT). The human auditory system is insensitive to the minor changes in the frequency components of the audio signal, so the watermarks can be embedded by slightly modifying the frequency components of the audio signal. The audio fragments segmented from the cover audio signal are decomposed by DWT to obtain several groups of wavelet coefﬁcients with different frequency bands, and then the fourth level detail coefﬁcient is selected to be divided into the former packet and the latter packet, which are executed for DCT to get two sets of transform domain coefﬁcients (TDC) respectively. Finally, the average amplitudes of the two sets of TDC are modiﬁed to embed the binary image watermark according to the special embedding rule. The watermark extraction is blind without the carrier audio signal. Experimental results conﬁrm that the proposed algorithm has good imperceptibility, large payload capacity and strong robustness when resisting against various attacks such as MP3 compression, low-pass ﬁltering, re-sampling, re-quantization, amplitude scaling, echo addition and noise corruption.


Introduction
With the rapid development of the Internet, multimedia data stored in digital form can be easily replicated and destroyed by illegal users, so protection against intellectual property infringement increasingly becomes an important issue. There are two primary methods to overcome the above problems, which are digital signature [1,2] and digital watermarking [3,4]. Digital signature is a kind of number string which can be used as the secret key for both senders and receivers [5], and it easily stimulates the desire of illegal users to destroy the multimedia data. Digital watermarking technology conceals the watermarks into the multimedia data and later extracts such watermarks to prove the owner of multimedia data, so it is an efficient approach to protect the media contents, widely used for copyright protection, broadcast monitoring, fingerprinting, data authentication, and medical

Related Works
In this section, we recall some previous related works on audio watermarking algorithms. Over the past decades, many audio watermarking algorithms have appeared. Audio watermarking technology can be generally implemented in either the time domain [16][17][18] or transform domains. Lei [16] proposed an audio algorithm by modifying the group amplitude. This algorithm had low payload capacity because it utilized three fragments to present a one-bit watermark. Erfani [17] presented an audio watermarking method with less robustness based on the time spread echo. Basia [18] presented an audio watermarking algorithm for copyright protection by modifying the amplitude of each audio sample. In general, the time-domain algorithms can be implemented easily and require less computation, but are usually less robust to many kinds of digital signal processing attacks [12]. Compared with the time-domain algorithms, transform-domain algorithms are more robust because they take advantage of the audio signal characteristics and human auditory properties [19]. There are many transform domain algorithms, such as discrete Fourier transform (DFT) [20][21][22], DCT [6,23], DWT [8,19,[24][25][26][27][28] and singular value decomposition (SVD) [29]. Asmara [30] compared the characteristics of DFT, DCT and DWT when they were applied to watermarking algorithm. Natgunanathan [20] presented a patchwork-based watermarking algorithm for stereo audio signals by exploiting the similarity of the two audio channels of stereo signals in the DFT domain. This algorithm had good robustness against several conventional attacks, but the payload capacity was not high. Megias [21] presented a blind watermarking algorithm for audio signal to resist against self-synchronization by using fast Fourier transform. The algorithm embedded synchronization signals in the time domain and watermarks in the frequency domain. However, the self-synchronization code in time domain was vulnerable to some attacks, which would lead to the watermark being unable to survive. Tewari [22] proposed a digital audio watermarking algorithm which modified the middle frequency-band of DCT coefficients to embed the watermarks. Natgunanathan [6] designed another patchwork-based audio watermarking method which embedded and extracted watermark bits in a multilayer framework by modifying the mean values of selected fragments in the DCT domain. The payload capacity was higher than that in paper [20]. Hu [23] presented a large capacity audio watermarking algorithm by developing perceptual masking in the DCT domain. The authors claimed that the payload capacity of their algorithm reached 848.08 bps because they embedded the watermarks into three DCT coefficients respectively. However, they did not verify the overall performance of the algorithm when embedding the watermarks into those three DCT coefficients simultaneously. Due to the multi-resolution characteristics of DWT, many audio watermarking algorithms used DWT to analyze the frequency components of the audio signal in order to improve the performance of the algorithm. A variable-dimensional vector modulation (VDVM) algorithm was presented in paper [8], this algorithm maximized the efficiency of the norm-space DWT-based audio watermarking algorithm to achieve higher payload capacity, but its robustness was not very satisfactory. Kumsawat [19] used a genetic algorithm to search the optimal quantization step in order to improve both audio quality and robustness. This approach achieved good robustness against most of the attacks except for low-pass filtering, but its payload capacity was very low. Li [24] proposed a content-dependent localized audio watermarking algorithm to combat random cropping and time-scale modification. The watermark bits were embedded into the steady high-energy local regions to improve the robustness. This algorithm could resist synchronization attacks, but it had poor robustness against conventional signal processing attacks, such as equalization, re-sample and echo. Chen [25] proposed an adaptive method by modifying the average values of the wavelet-based entropy to embed the watermarks, but the robustness to re-sampling and low-pass-filtering attacks was quite low. An audio watermarking algorithm was proposed based on DWT in paper [26]. The algorithm changed the energy values of the former and the latter part of each audio fragment to hide confidential information. It had good robustness against several common attacks, but the author did not verify the robust performance when resisting MP3 compression which was the most common format for audio media. Wu [27] presented a self-synchronized audio watermarking algorithm based on DWT by embedding the watermarks into the low frequency-band. The algorithm had large payload capacity, but its robustness was poor against MP3 compression and noise corruption. Wu [28] proposed a self-synchronized audio watermarking method in which the synchronization code and the watermarks are embedded with the low-frequency sub-band in the DWT domain, but this algorithm has a high bit error rate (BER) against MP3 compression. Abd [29] utilized a twofold strategy to embed the image watermark into audio signal based on SVD. The algorithm blended the watermark with the diagonal matrix holding singular values and then performed the second SVD on the modified matrix after applying the first SVD to a 2-D matrix. The matrices which contained left-and right-singular vectors must be conserved in order to extract the watermark.
All the above algorithms were designed in a single transform domain. In recent years, there have been many audio watermarking algorithms in multiple transform domains. An audio watermarking algorithm against desynchronization attacks was proposed based on support vector regression in paper [31]. The algorithm used the support vector machines (SVM) theory to locate the optimal embedding positions, and embedded the watermarks into the statistical average value of low-frequency components in DWT and DCT domains. Wang [32] proposed an audio watermarking algorithm according to the multi-resolution characteristic of DWT and the energy compression capability of DCT, but its robustness was poor against low-pass filtering and amplitude scaling. Vivekananda [33] utilized DWT and SVD to propose an adaptive audio watermarking by applying a quantization index modulation (QIM) process on the SVD values in the DWT domain. Bhat [34] presented a SVD-DWT blind watermarking algorithm to embed watermark into the audio signals in which the quantization steps were determined by the statistical properties of the involved DWT coefficients. Lei [35] attempted to embed the watermark into the high frequency-band of the SVD-DCT block. They claimed the performance generally better than the previous SVD-based methods. Hu [12] integrated discrete wavelet packet transformation (DWPT), SVD and QIM to achieve an approach for blind audio watermarking. The SVD was employed to analyze the matrix formed by the DWPT coefficients and embedded the watermarks by controlling singular values subject to perceptual criteria. Vivekananda [36] proposed a robust and blind audio watermarking algorithm based on SVD and QIM. Apart from the above algorithms, there are other audio watermarking algorithms in papers [4,7]. Wang [4] integrated exponent moments (EMs) and QIM to achieve an audio watermarking algorithm which used EMs to improve the robustness and QIM to realize the blind extraction of watermark, but this algorithm was not robust enough to resist amplitude scaling. Xiang [7] presented a reversible audio hiding scheme by using non-causal prediction. The scheme used the minimum error power method to calculate the optimum order and the prediction correlation of the audio data which could be used to embed the watermarks.
It can be seen from the above introduction of the related works that the performance of the algorithms is not only related to the transform domain of signal processing, but also related to the embedding rules. Even though the algorithms use the same transform domain, their performance varies greatly due to the different embedding rules. The proposed algorithm in this paper combines the characteristics of DWT and DCT to process audio signal, and uses special embedding rules to embed the watermarks, which provides this algorithm with good robustness.

Principle of Watermark Embedding
The multi-resolution analysis characteristic of DWT renders excellent robustness against most attacks, so DWT can be used in audio watermarking algorithm to decompose the audio signal into wavelet coefficients with different frequency bands used for carrying watermarks. The energy compression capability of DCT can concentrate the main energy of the audio signal on the low frequency coefficient of DCT. This study will take advantage of these two methods to develop a novel robust audio watermarking algorithm. Since the human auditory system is insensitive to minor changes in the high-frequency components of the audio signal, the watermark information can be hidden in these high-frequency components obtained by DWT and DCT on the carrier audio signal.
Suppose that the carrier audio signal is A, which has K sample points, and it can be expressed by the following formula: where a(k) is the kth sample value of this audio signal. A is divided into M audio fragments A l (1 ≤ l ≤ M) with N sample points, and then the r-level DWT is performed on A l to obtain the wavelet coefficient D l showed in Formula (2), including the approximation coefficient Ce(r) and the detail coefficients De(i) (i = 1, 2, · · · r): where Ce(r) is the rth level approximation coefficient decomposed by DWT, containing the lowest frequency component of the audio signal. The minor changes in Ce(r) will cause a significant drop in the audio quality, so usually watermarks cannot be embedded into this frequency-band. De(i) (i = 1, 2, · · · r) is the ith level detail coefficient. The smaller i is, the higher the frequency component contained in De(i) will be, and the smaller the impact of minor changes of De(i) on the audio quality will be. Therefore, the watermarks may be embedded into De(i). However, the high-frequency components are vulnerable to malicious attacks, so the detail components near the approximate components can be chosen to conceal watermark, which not only has little influence on the audio quality, but also can resist malicious attacks. In this study, the rth level detail coefficient De(r, n) (n = 1, 2, . . . , N/2 r ) can be selected as the embedding frequency band. Divide De(r, n) into two packets respectively according to Formulas (3) and (4), including the former packet De 1 (r, j) and the latter packet De 2 (r, j) with the length of N/2 r+1 : D e2 (r, j) = D e (r, N 2 r+1 + j) j = 1, 2, . . . , N/2 r+1 .
Perform DCT on De 1 (r, j) and De 2 (r, j) to obtain two transform-domain coefficients C 1 (r, j) and C 2 (r, j) with the length of N/2 r+1 , and then connect C 1 (r, j) and C 2 (r, j) to form an array C(r, n) with the length of N/2 r . Calculate the average amplitudes of |C(r, n)|, |C 1 (r, j)| and C 2 (r, j) according to Formulas (5)- (7).
The average amplitude of |C(r, n)| is The average amplitude of the former packet C 1 (r, j) is The average amplitude of the latter packet C 2 (r, j) is Suppose that the binary image watermark to be embedded is W = {w(q), 1 ≤ q ≤ L} , where w(q) ∈ {0, 1} , L is the length of the watermarks, L ≤ M. The average amplitudes of the two packets are modified to embed the watermark. The embedding rules are as follows: If w(q) = 1, modify C 1 (r, j) and C 2 (r, j) according to the following Formulas (8) and (9): If w(q) = 0, modify C 1 (r, j) and C 2 (r, j) according to the following Formulas (10) and (11): where λ is the embedding depth and its span is within the interval of (0,1). C 1 (r, j) is the modified coefficient of the former packet, and C 2 (r, j) is the modified coefficient of the latter packet. Perform the inverse DCT on C 1 (r, j) and C 2 (r, j) to obtain De 1 (r, j) and De 2 (r, j) respectively, and then recombine them into De (r, n). Replace De(r) with De (r, n) in Formula (2) to get the watermarked coefficient D l . Finally, perform the inverse DWT on D l to reconstruct the watermarked audio fragment A l and then recombine the watermarked audio signal A .

Principle of Watermark Extracting
When extracting the watermark, the watermarked audio A is divided into M audio fragments A l (1 ≤ l ≤ M) with N sample points, and then perform r-level DWT on each audio fragment to obtain the wavelet coefficient Ce (r) and De (i) (i = 1, 2, · · · r). Divide De (r) into the former packet De 1 (r, j) and the latter packet De 2 (r, j). Perform DCT on the two packets to obtain C 1 (r, j) and C 2 (r, j) and then calculate their average amplitudes respectively according to Formulas (6) and (7). If w(q) = 1, according to Formulas (6)-(9), the average amplitude of C 1 (r, j) is The average amplitude of C 2 (r, j) is According to Formulas (12) and (13), when λ > 0, M c1 ≥ M c2 . Similar to the above analysis process, if w(q) = 0, according to Formulas (10) and (11), the average amplitude of C 1 (r, j) is The average amplitude of C 2 (r, j) is It can be seen from Formula (14) and (15), when λ > 0, M c1 ≤ M c2 . Based on the above analysis, the binary watermark can be extracted from the audio fragment A l according to Formula (16):

Impact of the Embedding Depth on Algorithm Performance
The principle of watermark embedding in Section 3.1 shows that a watermark is embedded by modifying the amplitudes of C 1 (r, j) and C 2 (r, j); the larger the variation of the amplitude is, the worse the quality of the carrier audio is. Otherwise, the watermark cannot be accurately extracted if the variation is too small. Therefore, the variation of the amplitude should be maintained within a certain range, which not only guarantees the imperceptibility of the algorithm, but also maintains good robustness. It can be seen from Formulas (8)-(11) that the modified C 1 (r, j) and C 2 (r, j) are related to the embedding depth, so the following experiment tests the impact of embedding depth on the performance of the algorithm. The tested carrier audio signal is a song downloaded from the Internet, lasting for 60 s approximately, sampled at 44,100Hz and 16-bit quantization. The watermark bits were a series of random binary string, only including 1 and 0, and long enough to cover the entire carrier signal. In order to objectively evaluate the performance of this watermarking algorithm, the quality of the watermarked audio can be determined using signal-to-noise ratio (SNR) as the performance index formulated as where A and A denote the carrier audio signal and the watermarked audio signal respectively. The larger the SNR is, the smaller the decrease of the audio quality will be, and the better the imperceptibility of algorithm will be. Usually, when SNR is over 20 dB, the audio quality is good. The robustness of the proposed algorithm to resist various attacks is evaluated using the bit error rate (BER), which is defined as where w(q) and w(q) denote the original watermark and the extracted watermark respectively, ⊕ stands for the exclusive-OR operator, and L is the length of the watermark. Generally, the smaller value of BER implies that the algorithm has good robustness against attacks. Normalized correlation (NC) coefficient can be used to compare the similarity between the original watermark and the extracted watermark represented as Formula (19). If NC is close to 1, w(q) is very similar to w(q) . On the other hand, w(q) and w(q) will be very different when NC is close to zero: When λ changes from 0 to 1, the experimental results of SNR and BER are shown in Figure 1. Figure 1a shows that the larger the λ is, the smaller the SNR of the watermarked audio signal is, and the worse the imperceptibility of the algorithm is. Figure 1b shows that the larger the λ is, the smaller the BER of the extracted watermark is, and the better the robustness of the algorithm is. Thus, imperceptibility and robustness are contradictory, a larger λ can be selected to improve the robustness of the algorithm under the premise of ensuring that the SNR is greater than 20 dB.
where ) (q w and ' ) (q w denote the original watermark and the extracted watermark respectively, ⊕ stands for the exclusive-OR operator, and L is the length of the watermark. Generally, the smaller value of BER implies that the algorithm has good robustness against attacks. Normalized correlation (NC) coefficient can be used to compare the similarity between the original watermark and the extracted watermark represented as Formula (19). If NC is close to 1, . On the other hand, ) (q w and ' ) (q w will be very different when NC is close to zero: When λ changes from 0 to 1, the experimental results of SNR and BER are shown in Figure 1. Figure 1a shows that the larger the λ is, the smaller the SNR of the watermarked audio signal is, and the worse the imperceptibility of the algorithm is. Figure 1b shows that the larger the λ is, the smaller the BER of the extracted watermark is, and the better the robustness of the algorithm is. Thus, imperceptibility and robustness are contradictory, a larger λ can be selected to improve the robustness of the algorithm under the premise of ensuring that the SNR is greater than 20 dB.

Implementation of the Proposed Algorithm
In Section 3, the watermark embedding principle and extraction principle of the proposed algorithm are described. The detailed implementation steps are described in this section including two parts: embedding watermark and extracting watermark.

Procedure for Embedding Watermark
The watermark embedding diagram of this proposed algorithm is showed in Figure 2. The detailed embedding steps are described as follows: Step 1: Convert the image watermark into the binary bit stream W = {w(q), 1 ≤ q ≤ L} with the length of L.
Step 2: The carrier audio A is divided into M audio fragments A l (1 ≤ l ≤ M) with the length of N after low-pass filtering. M is the number of the audio fragments, M ≥ L.
Step 3: When l changes from 1 to M, perform the r-level DWT on each fragment A l to obtain the wavelet coefficients, and select De(r) as the embedding frequency-band.
Step 4: Divide De(r) into the former packet De 1 (r, j) and the latter packet De 2 (r, j) with the length of N/2 r+1 according to Formulas (3) and (4).
Step 6: Connect C 1 (r, j) and C 2 (r, j) to form an array C(r, n) with the length of N/2 r .
Step 9: Perform the inverse DCT on C 1 (r, j) and C 2 (r, j) to obtain De 1 (r, j) and De 2 (r, j).
Step 10: Recombine De 1 (r, j) and De 2 (r, j) into De (r, n) and perform the inverse DWT to reconstruct the watermarked audio fragment A l .
Step 11: Repeat Step 3 to Step 10 until all watermarks are embedded.

Procedure for Extracting Watermark
The watermark extracting diagram of this proposed algorithm is showed in Figure 3. The detailed extracting steps are described as follows: Step 1: Segment the watermarked audio signal A into M audio fragments A l with the length of N.
Step 2: Perform r-level DWT on A l to obtain the wavelet coefficients De (r).
Step 7: Repeat Step 2 to Step 6 until all binary watermarks are extracted.
Step 8: Convert the extracted binary stream into binary image watermark.

Performance Evaluation
This section will use a large number of experiments to evaluate the performance of the proposed algorithm. The detailed experimental environment is described as follows: (1)

Performance Evaluation
This section will use a large number of experiments to evaluate the performance of the proposed algorithm. The detailed experimental environment is described as follows: (1)

Imperceptibility and Payload Capacity
The test of each audio signal was repeated 10 times, so it takes 200 experiments to test all the audio signals. Based on the principle of watermark embedding in Section 3, one-bit watermark can be embedded for each audio fragment. Since the length of the audio fragment is 256, the payload capacity of this algorithm is 44,100/256 = 172.27 bps.
The average results about the SNR of the audio signal, the NC and BER of the extracted watermark and the payload capacity are listed in Table 1. The experimental results showed that the payload capacity of the algorithm is the same as that of paper [25], higher than that of paper [4] and [12], and far higher than that of paper [16] and [19]. Note that N/A means no report is found in the selected algorithm.
The average SNR of this proposed algorithm is 23.49 dB, which is higher than that in papers [12,16,25] but not paper [19], while the payload capacity of this proposed algorithm is five times that of the paper [19].
The images extracted without any attack by this algorithm are very similar to the original images because NC equals 1 and BER equals 0 as shown in Table 1. Figure 5 shows the waveform comparison of the carrier audio and the watermarked audio (only show a short fragment lasting about three seconds) without performing any attack. It can be seen from Figure 5 that two waveform figures have no obvious changes before and after the watermark was embedded into the carrier

Imperceptibility and Payload Capacity
The test of each audio signal was repeated 10 times, so it takes 200 experiments to test all the audio signals. Based on the principle of watermark embedding in Section 3, one-bit watermark can be embedded for each audio fragment. Since the length of the audio fragment is 256, the payload capacity of this algorithm is 44,100/256 = 172.27 bps.
The average results about the SNR of the audio signal, the NC and BER of the extracted watermark and the payload capacity are listed in Table 1. The experimental results showed that the payload capacity of the algorithm is the same as that of paper [25], higher than that of paper [4,12], and far higher than that of paper [16,19]. The average SNR of this proposed algorithm is 23.49 dB, which is higher than that in papers [12,16,25] but not paper [19], while the payload capacity of this proposed algorithm is five times that of the paper [19].
The images extracted without any attack by this algorithm are very similar to the original images because NC equals 1 and BER equals 0 as shown in Table 1. Figure 5 shows the waveform comparison of the carrier audio and the watermarked audio (only show a short fragment lasting about three seconds) without performing any attack. It can be seen from Figure 5 that two waveform figures have no obvious changes before and after the watermark was embedded into the carrier audio. Figure 6 shows that two spectrogram figures are slightly different at high frequency band, mainly because the watermarks are embedded into the high frequency band of the watermarked audio in Figure 6b, but human ears are not sensitive to these minor changes of the high frequency component. The experimental result of SNR in Table 1, Figures 5 and 6 all indicate the excellent imperceptibility of this algorithm.

Robustness
Robustness is an important index for evaluating the performance of the watermarking algorithm. This study examines the NC and BER between the carrier watermark and the extracted watermark to assess the robustness against various attacks. The attack types considered in the test are as follows: A. Low-pass filtering: applying low-pass filter with cutoff frequency of four kilohertz. B. Amplitude scaling: scaling the amplitude of the watermarked audio signal by 0. 8 Re-quantization: quantizing the watermarked audio signal from 16-bit/sample to 8-bit/sample and then back to 16-bit/sample. K. Echo addition: adding an echo signal with a delay of 50 ms and a decay of five percent to the watermarked audio signal.

Robustness
Robustness is an important index for evaluating the performance of the watermarking algorithm. This study examines the NC and BER between the carrier watermark and the extracted watermark to assess the robustness against various attacks. The attack types considered in the test are as follows: A. Low-pass filtering: applying low-pass filter with cutoff frequency of four kilohertz. B. Amplitude scaling: scaling the amplitude of the watermarked audio signal by 0.8. C. Amplitude scaling: scaling the amplitude of the watermarked audio signal by 1. Re-quantization: quantizing the watermarked audio signal from 16-bit/sample to 8-bit/sample and then back to 16-bit/sample. K. Echo addition: adding an echo signal with a delay of 50 ms and a decay of five percent to the

Robustness
Robustness is an important index for evaluating the performance of the watermarking algorithm. This study examines the NC and BER between the carrier watermark and the extracted watermark to assess the robustness against various attacks. The attack types considered in the test are as follows: A.
Low-pass filtering: applying low-pass filter with cutoff frequency of four kilohertz. B.
Amplitude scaling: scaling the amplitude of the watermarked audio signal by 0.8. C.
Amplitude scaling: scaling the amplitude of the watermarked audio signal by 1.2. D.
Noise corruption: adding zero-mean Gaussian noise to the watermarked audio signal with 20 dB. E.
Noise corruption: adding zero-mean Gaussian noise to the watermarked audio signal with 30 dB. F.
Noise corruption: adding zero-mean Gaussian noise to the watermarked audio signal with 35 dB. Re-quantization: quantizing the watermarked audio signal from 16-bit/sample to 8-bit/sample and then back to 16-bit/sample. K.
Echo addition: adding an echo signal with a delay of 50 ms and a decay of five percent to the watermarked audio signal.
The extracted images and the average experimental results of NC are shown in Figures 7-10  The extracted images and the average experimental results of NC are shown in Figures 7-10 respectively.        It can be seen from Figures 7-10 that the extracted image watermarks are very similar to the original image watermarks when resisting against low-pass filter with cutoff frequency 4 kHz, amplitude scaling by 0.8 and 1.2, MP3 compression with 128 kbps and 64 kbps, re-sampling, re-quantization, echo addition with a delay of 50 ms and a decay of 5%, noise corruption in 30 dB    It can be seen from Figures 7-10 that the extracted image watermarks are very similar to the original image watermarks when resisting against low-pass filter with cutoff frequency 4 kHz, amplitude scaling by 0.8 and 1.2, MP3 compression with 128 kbps and 64 kbps, re-sampling, re-quantization, echo addition with a delay of 50 ms and a decay of 5%, noise corruption in 30 dB It can be seen from Figures 7-10 that the extracted image watermarks are very similar to the original image watermarks when resisting against low-pass filter with cutoff frequency 4 kHz, amplitude scaling by 0.8 and 1.2, MP3 compression with 128 kbps and 64 kbps, re-sampling, re-quantization, echo addition with a delay of 50 ms and a decay of 5%, noise corruption in 30 dB and 35 dB. The extracted image watermarks shown in Figures 7d, 8d, 9d and 10d are relatively obscure when suffering attack from noise corruption in 20 dB, but with the decrease of white noise, they become more and more clear, as shown in Figure 7e Table 2. It can be seen that this proposed algorithm has excellent robustness against low-pass filter with cutoff frequency four kilohertz, amplitude scaling by 0.8, amplitude scaling by 1.2, MP3 compression with 128 kbps, MP3 compression with 64 kbps, re-sampling, re-quantization, echo addition with a delay of 50 ms and a decay of five percent, noise corruption in 30 dB and noise corruption in 35 dB, so it is far superior to the algorithms proposed in papers [4,16,25]. In particular, when this algorithm resists low-pass filter, BER is only 0.01%, which is much better than 21.975% in paper [16], 6.93% in paper [19], 28.250% in paper [25], 0.12% in paper [12] and 0.39% in paper [4]. The robustness is slightly inferior to that in paper [19] when resisting re-quantization, and also slightly inferior to that in paper [12] when resisting noise corruption in 20 dB. When suffering attack from noise corruption, the quality of the extracted watermark is slightly poor, which is mainly because the algorithm is achieved by comparing two sets of TDC obtained from the fourth level detail coefficient. When the additional noise is very loud, the fourth level wavelet coefficient will be affected by noise, thus reducing the accuracy of the extracted watermark. As the noise becomes smaller, the quality of watermark is significantly improved under the noise attack of 30 and 35 dB.

Conclusions
In this paper, a novel and blind audio watermarking algorithm with strong robustness is proposed according to the multi-resolution characteristic of DWT and the energy compression capability of DCT. The cover audio signal is first segmented into audio fragments, and then each audio fragment is performed by DWT and DCT in order to obtain two sets of TDC which are modified to embed the binary watermark. This proposed algorithm can realize blind extraction of digital watermark without the participation of carrier audio signal when extracting watermark, which is convenient for the practical application. The embedding depth is the key factor to determine the performance of the algorithm. In the case that the algorithm has good imperceptibility, the embedding depth should be increased to improve the robustness. The experimental results show that the average SNR of the carrier audio signals reaches 23.49 dB in the case of the payload capacity of 172.27 bps, which indicates that this proposed algorithm has large capacity and good imperceptibility. In addition, this proposed algorithm has excellent robustness against white noise, low-pass filtering, re-sampling, re-quantization, echo addition, amplitude scaling and MP3 compression compared with other audio watermarking algorithms. Without synchronization signal, the watermark extraction begins with the first audio fragment, so this algorithm cannot combat the desynchronization attack, although it can effectively resist the conventional signal processing attacks generated during the use of audio media. In the next study, we will focus on the issue of how to add synchronization signals in the carrier audio, as well as the problem of combating other types of attacks, such as desynchronization attack, collusion attack type II and strong noise corruption.