Autocorrelation Modulation-Based Audio Blind Watermarking Robust Against High Efficiency Advanced Audio Coding

High Efficiency Advanced Audio Coding (HE-AAC) is a lossy compression method for digital audio data which supplies high-quality audio at a very low bit rate. In this paper, the audio blind watermarking algorithm, on the basis of autocorrelation modulation, is introduced to maximize the robustness against low-bit rate HE-AAC. The watermark is embedded by modulating the normalized correlation of the original signal as well as its delayed version. The signal-to-noise ratio of before and after HE-AAC compression decides the strength of the embedding watermark. The watermarking embedding strength is guaranteed by the feedback process. The effectiveness of the proposed method is proven using the Perceptual Evaluation of Audio Quality algorithm and bit error rate of recovered watermarks under HE-AAC compression on mono, stereo and 5.1 channel audio. Experimental results show that the proposed method provides good performance in terms of imperceptibility, robustness and data payload compared with some recent state-of-the-art watermarking methods under an MPEG-2 Audio Layer III (MP3) compression attack.


Introduction
Over the last couple of years, High Efficiency Advanced Audio Coding (HE-AAC) [1][2][3] has become one of the most important enabling technologies for state-of-the-art multimedia systems. It delivers compact disc (CD) quality stereo at 48 kbps and 5.1 channel surround sound at 128 kbps. This efficiency level is the optimal choice for current internet content delivery. At the same time, it has basically realized novel applications in mobile digital broadcasting and mobile markets. It also enhances audio services while enhancing video services, such as digital television. Without reducing the quality of the audio signal, more bits can be distributed to video signal by coupling HE-AAC with MPEG-4 video.
Though HE-AAC is popular in the field of multimedia systems, as far as we know, there is no audio watermarking related scientific paper that have developed a watermarking algorithm for HE-AAC or have evaluated performance under it. Therefore, the objective of this study was to design an audio watermarking algorithm to maximize the robustness against low-bit rate HE-AAC compression. This paper is an expansion of our previously published conference paper [4]. We illustrate this algorithm in more detail through more experiments and visualization. The parameters of the proposed algorithm are optimized by thorough experiments, and its performance on multi-channel audio is also added. Moreover, synchronization design is discussed.
The rapid growth of computer multimedia technology as well as the broad utilization of the internet have promoted the distribution as well as transmission of digital multimedia content. Digital watermarking [5] refers to the procedure of embedding data into digital multimedia content such as audio, video, and image. It was initially utilized for security-related purposes like copyright protection and source tracking, but nowadays it is dedicated to various non-security-oriented applications [6], such as broadcast monitoring and automatic content recognition.
An audio watermarking method must comply with the following requirements [7]. (1) Imperceptibility: The perception of the original audio signal is similar to that of the watermarked audio signal. (2) Security: Watermarks can only be detected by authorized personnel. (3) Payload: It is a number of bits in which audio can be inserted in a unit of time. The watermarking data payload needs to be greater than 20 bps (bit/second). (4) Robustness: This means that the watermark should be resistant to common malicious attacks as well as signal processing. However, there is no algorithm that can meet all the demands mentioned above. The goal of the watermarking algorithm is to achieve an appropriate trade-off between requirements. Security-oriented applications may require high robustness and security, because it is more likely to receive malicious attacks. By contrast, nonsecurity-oriented applications may require high payload and a certain degree of robustness against one or two specific attacks that are known in advance.
The organization of the rest of the thesis is as follows. In the next part, we give a simple instruct of HE-AAC and audio watermarking algorithms. In the third part, we introduce the audio watermarking algorithm in detail. Experimental results are discussed in Section 4, and these evaluate the function of the algorithm in the data payload, robustness under compression attack, and imperceptibility aspects. Finally, Section 5 summarizes and analyzes the future direction of development.

HE-AAC
HE-AAC is an extension of low complexity AAC (AAC-LC), which provided an optimization for those applications of low bit rate like streaming audio. It has the characteristics of standardization and has become a configuration file for the MPEG-4 audio standard. Two versions of HE-AAC are available: HE-AAC v1 and HE-AAC v2. HE-AAC v1 includes two principal technologies, spectral band replication (SBR) and AAC-LC. In contrast to version 1, version 2 additionally uses a technique named parametric stereo (PS) that compresses stereo signals more effectively. Figure 1 shows the HE-AAC techniques family. AAC-LC is a commonly used audio compress codec. It removes the audio signal by utilizing a psychoacoustic model and only keep auditory information. Though it has good audio quality at a bit rate of 128 kbps for mono audio, the audio quality begins to significantly decline below this bit rate. In order to achieve good audio quality as well as a low compression bit rate, two complementary technologies, SBR and PS, are exploited.
SBR can improve compression efficiency in the frequency domain. At low frequencies, there is a lot of important audio information. SBR utilizes the strong relevance between high frequency and low frequency audio signals to reconstruct the high frequency signals through approximation as well as the transposition of low frequency signals rather than transfer the data of high frequency audio. AAC-LC is a commonly used audio compress codec. It removes the audio signal by utilizing a psychoacoustic model and only keep auditory information. Though it has good audio quality at a bit rate of 128 kbps for mono audio, the audio quality begins to significantly decline below this bit rate. In order to achieve good audio quality as well as a low compression bit rate, two complementary technologies, SBR and PS, are exploited.
SBR can improve compression efficiency in the frequency domain. At low frequencies, there is a lot of important audio information. SBR utilizes the strong relevance between high frequency and low frequency audio signals to reconstruct the high frequency signals through approximation as well as the transposition of low frequency signals rather than transfer the data of high frequency audio. The compression efficiency of stereo signal is improved by PS. PS technology mixes stereo signals downward into mono-channel signals, extracts parametric stereo data, and describes the differences as well as similarities of two channels. When decoding, the original stereo signal is reconstructed using parametric stereo data and a mono-channel signal.
The SS-based watermarking scheme embeds a watermark bit into a host audio segment by using pseudorandom number sequences which are shaped to fit under the masking threshold of the audio signal. At the extractor, the watermarks are extracted by a sliding correlator that correlates the received signal to the predefined spread spectrum template. One of the main drawbacks of the existing SS-based audio watermarking methods is their low embedding capacity. Hence, their main task is to increase embedding capacity while maintaining high imperceptibility and robustness.
The QIM approach uses a set of quantizers to quantify the host signal features to embed watermarking data, and each quantizer is associated with different information. In most cases, to obtain high robustness, quantization is implemented to some coefficients in the transform domain rather than to signal samples. Though the QIM approach usually has merits of low complexity and high embedding capacity, it has low robustness.
Another audio watermarking algorithm is autocorrelation modulation [20][21][22]. Its basic idea is that inserting a delayed or advanced version of the host signal itself can modify autocorrelation. In other words, it uses a delayed and modulated version of the host signal as a watermark signal (the difference between the host and watermarked signal), which is then inserted back to the host signal to make the final watermarked signal. A common watermarking approach is to introduce the watermark signal as noise, but a drawback to these approaches is that lossy audio compression algorithms tend to remove most imperceptible artifacts, including typical low dB noise. Autocorrelation modulation introduces changes to the host signal that are characteristic of environmental conditions rather than random noise. Therefore, using a host signal as a watermark signal can be robust against lossy audio compression. In this paper, we present an autocorrelation modulation-based audio blind watermarking algorithm which maximizes robustness against low-bit rate HE-AAC compression. In the next section, the detail of the proposed audio watermarking method is described.

Proposed Audio Watermarking Method
This section first provides the following definitions:

•
The host audio signal x(t) is frequency bandpass filtered to outputs the filtered audio signal s(t) which is further modulated to be a watermark signal. There are two purposes for the frequency bandpass filtering process. One is to acquire an amount of the host signal, which will reduce disturbance to the audio signal. The other is to avoid embedding watermarks around a high frequency, as the HE-AAC will entirely remove the signals at the range of high frequency on encoding process. The lower and upper cutoff frequencies are denoted as F lc and F uc , respectively. • Then, the filtered audio signal is divided into successive frames, each of which has length T f and contains two non-overlapping sections of samples. These two sections have equal length, and we call them as the front subframe and the back subframe in this paper.
• One piece of watermark information is represented as one binary bit of value '0' or '1,' which is embedded in one frame.

•
The normalized correlation of the original signal and its delayed version (NCOD) was selected as the characteristic of each subframe. The embedding and extracting of watermark bits are decided by the difference of NCOD relations between the front subframe and back subframe. The NCOD item is calculated as follows: where i (i = 0, 1, 2 . . .) represents the frame index. The integration time T (in samples) should be T ≤ T f − τ to avoid intersymbol interference. For a frame, the NC i1 and NC i2 are the NCOD values of the front subframe and back subframe, respectively. Since the correlation value is normalized, NC ∈ [−1, 1].

•
The difference between NC i1 and NC i2 is computed to obtain: If the watermark bit is '1,' D i should larger than 0 (D i > 0). If the watermark bit is '0,' D i should smaller than 0 (D i < 0). Basically, raising |D i | enhances robustness. On the basis of the value of D i , one bit of watermark information is embedded and extracted from each frame. Figure 2 presents the process of the watermark embedding. Firstly, bandpass filtering is applied to the host signal x(t) to get the filtered signal s(t). Then, the filtered signal is separated into successive frames. After that, they are divided into the front subframes and back subframes. We compute a gain based on the scale factor ξ i , original D i as well as watermark bit w i , and then multiply the gain to the front and back subframe with mutually opposite signs to generate the watermark signal. In the end, the watermark signal is added back into the host signal with a specified delay τ to modify the NCOD values. Appl. Sci. 2019, 9, 5 of 17 When modulated subframes are added back to host signal, there are sudden changes at the boundaries which cause audible noise. To avoid this, amplitude of the boundaries of the modulated subframes are attenuated. In our implementation, the window function is calculated by following equation:

Watermark Embedding Scheme
where ∆k is attenuation interval and is attenuation length at the boundary. We set |∆k| = 1/ . Figure 3 shows the window. The watermark embedding procedure of each frame is illustrated by pseudo code in Algorithm 1. The is the bipolar term of , ∈ 0,1 ⇒ ∈ −1, +1 . The is threshold for determining a gap of NCOD relation between the front subframe and back subframe.  When modulated subframes are added back to host signal, there are sudden changes at the boundaries which cause audible noise. To avoid this, amplitude of the boundaries of the modulated subframes are attenuated. In our implementation, the window function is calculated by following equation: where ∆k is attenuation interval and l a is attenuation length at the boundary. We set |∆k| = 1/l a . Figure 3 shows the window. When modulated subframes are added back to host signal, there are sudden changes at the boundaries which cause audible noise. To avoid this, amplitude of the boundaries of the modulated subframes are attenuated. In our implementation, the window function is calculated by following equation: where ∆k is attenuation interval and is attenuation length at the boundary. We set |∆k| = 1/ . Figure 3 shows the window. The watermark embedding procedure of each frame is illustrated by pseudo code in Algorithm 1. The is the bipolar term of , ∈ 0,1 ⇒ ∈ −1, +1 . The is threshold for determining a gap of NCOD relation between the front subframe and back subframe. The watermark embedding procedure of each frame is illustrated by pseudo code in Algorithm 1.
The TH i is threshold for determining a gap of NCOD relation between the front subframe and back subframe. Algorithm 1. Watermark embedding process return // return without performing any action 6: else 7: g i = TH i − D i 8: end 9: front subframe = front subframe ·g i · f w 10: back subframe = back subframe ·(−g i )· f w If D i has already met the requirements of the watermark embedding condition, then no further modulating operation to the frame is required. Otherwise, a gain value g i is calculated by TH i minus D i , and this gain is multiplied to the front subframe. In contrast, the back subframe is multiplied by opposite sign of g i . These subframes are further multiplied by the f w window function.
In other words, the natural NCOD is firstly calculated, and then the necessary modifications are determined. To get a well visual effect, Figure 4 illustrates the procedure of the watermark embedding of two frames.
Appl. Sci. 2019, 9, 6 of 17 If has already met the requirements of the watermark embedding condition, then no further modulating operation to the frame is required. Otherwise, a gain value is calculated by minus , and this gain is multiplied to the front subframe. In contrast, the back subframe is multiplied by opposite sign of . These subframes are further multiplied by the window function.
In other words, the natural NCOD is firstly calculated, and then the necessary modifications are determined. To get a well visual effect, Figure 4 illustrates the procedure of the watermark embedding of two frames. The scale factor is utilized to balance requirements between robustness and transparency. The main purpose of the proposed audio watermarking algorithm is to be robust against HE-AAC compression attack, so we designed the algorithm to adaptively adjust by using the signal-tonoise ratio (SNR) between host signal and the HE-AAC compressed signal. For convenience, we call the SNR as CSNR in this paper. Figure 5 presents the calculation process of . We can calculate the CSNR of each frame by utilizing following formula: where ( ) is host signal and ( ) is its compressed version. It can be indicated that the lower the CSNR value, the greater the compression is performed to the frame, which means lots of redundant signals are removed. Thus, a frame with a lower CSNR value would get a larger value to guarantee robustness and vice versa. The scale factor ξ i is utilized to balance requirements between robustness and transparency. The main purpose of the proposed audio watermarking algorithm is to be robust against HE-AAC compression attack, so we designed the algorithm to adaptively adjust ξ i by using the signal-to-noise ratio (SNR) between host signal and the HE-AAC compressed signal. For convenience, we call the SNR as CSNR in this paper. Figure 5 presents the calculation process of ξ i .

Watermark Extraction Scheme
Appl. Sci. 2019, 9, 6 of 17 If has already met the requirements of the watermark embedding condition, then no further modulating operation to the frame is required. Otherwise, a gain value is calculated by minus , and this gain is multiplied to the front subframe. In contrast, the back subframe is multiplied by opposite sign of . These subframes are further multiplied by the window function.
In other words, the natural NCOD is firstly calculated, and then the necessary modifications are determined. To get a well visual effect, Figure 4 illustrates the procedure of the watermark embedding of two frames. The scale factor is utilized to balance requirements between robustness and transparency. The main purpose of the proposed audio watermarking algorithm is to be robust against HE-AAC compression attack, so we designed the algorithm to adaptively adjust by using the signal-tonoise ratio (SNR) between host signal and the HE-AAC compressed signal. For convenience, we call the SNR as CSNR in this paper. Figure 5 presents the calculation process of . We can calculate the CSNR of each frame by utilizing following formula: where ( ) is host signal and ( ) is its compressed version. It can be indicated that the lower the CSNR value, the greater the compression is performed to the frame, which means lots of redundant signals are removed. Thus, a frame with a lower CSNR value would get a larger value to guarantee robustness and vice versa. We can calculate the CSNR of each frame by utilizing following formula:

Watermark Extraction Scheme
where x(t) is host signal and y(t) is its compressed version. It can be indicated that the lower the CSNR value, the greater the compression is performed to the frame, which means lots of redundant signals are removed. Thus, a frame with a lower CSNR value would get a larger ξ i value to guarantee robustness and vice versa. Figure 6 shows the proposed watermark extraction procedure, where the |·| symbol means the absolute function. The procedure needs to know the embedding parameters in advance: The passband frequency [F lc , F uc ], the frame length T f , and the delay τ. The watermark extraction procedure can be regarded as a part of the embedding procedure. Assuming that the start point of watermark embedding has been found, the watermarked signal x(t) is firstly filtered to produce the filtered watermarked signal s(t). After then, the formula (1)- (7) is utilized to calculate the D i of each frame. If D i > 0, the detected bit is '1.' If D i < 0, the detected bit is '0.' Note that the proposed method does not need the original signal in the watermark extraction process, which indicates that we proposed a blind audio watermarking method.

Watermark Extraction Scheme
Appl. Sci. 2019, 9, 7 of 17 Figure 6 shows the proposed watermark extraction procedure, where the |•| symbol means the absolute function. The procedure needs to know the embedding parameters in advance: The passband frequency [ , ], the frame length , and the delay . The watermark extraction procedure can be regarded as a part of the embedding procedure. Assuming that the start point of watermark embedding has been found, the watermarked signal ̅ ( ) is firstly filtered to produce the filtered watermarked signal ̅ ( ). After then, the formula (1)- (7) is utilized to calculate the of each frame. If > 0, the detected bit is '1.' If < 0, the detected bit is '0.' Note that the proposed method does not need the original signal in the watermark extraction process, which indicates that we proposed a blind audio watermarking method.

Feedback Process
A feedback process was added to the watermark embedding procedure ( Figure 1) to ensure that the watermark is embedded with enough strength. As illustrated in Figure 7, after embedding the watermark, the generated watermarked signal ̅ ( ) is immediately input into the watermark extractor to calculate the values. By comparing the extracted and , we can know whether the desired watermarking strength is achieved. If the strength is not enough, the gain g is increased by a small quantity , and then it is put back into the watermark embedder to generate a stronger watermark signal. The feedback process works in an iterative manner.

Experimental Results
In this section, we present the results of our many experiments to present the performance of the proposed audio watermarking algorithm. First, there were parameter optimization experiments, followed by a comparison of the results with some state-of-the-art audio watermarking algorithms. We also show experiment results on 5.1 channel and stereo audio. Finally, synchronization is discussed.

Feedback Process
A feedback process was added to the watermark embedding procedure (Figure 1) to ensure that the watermark is embedded with enough strength. As illustrated in Figure 7, after embedding the watermark, the generated watermarked signal x(t) is immediately input into the watermark extractor to calculate the D i values. By comparing the extracted D i and TH i , we can know whether the desired watermarking strength is achieved. If the strength is not enough, the gain g is increased by a small quantity α, and then it is put back into the watermark embedder to generate a stronger watermark signal. The feedback process works in an iterative manner.
Appl. Sci. 2019, 9, 7 of 17 Figure 6 shows the proposed watermark extraction procedure, where the |•| symbol means the absolute function. The procedure needs to know the embedding parameters in advance: The passband frequency [ , ], the frame length , and the delay . The watermark extraction procedure can be regarded as a part of the embedding procedure. Assuming that the start point of watermark embedding has been found, the watermarked signal ̅ ( ) is firstly filtered to produce the filtered watermarked signal ̅ ( ). After then, the formula (1)- (7) is utilized to calculate the of each frame. If > 0, the detected bit is '1.' If < 0, the detected bit is '0.' Note that the proposed method does not need the original signal in the watermark extraction process, which indicates that we proposed a blind audio watermarking method.

Feedback Process
A feedback process was added to the watermark embedding procedure (Figure 1) to ensure that the watermark is embedded with enough strength. As illustrated in Figure 7, after embedding the watermark, the generated watermarked signal ̅ ( ) is immediately input into the watermark extractor to calculate the values. By comparing the extracted and , we can know whether the desired watermarking strength is achieved. If the strength is not enough, the gain g is increased by a small quantity , and then it is put back into the watermark embedder to generate a stronger watermark signal. The feedback process works in an iterative manner.

Experimental Results
In this section, we present the results of our many experiments to present the performance of the proposed audio watermarking algorithm. First, there were parameter optimization experiments, followed by a comparison of the results with some state-of-the-art audio watermarking algorithms. We also show experiment results on 5.1 channel and stereo audio. Finally, synchronization is discussed.

Experimental Results
In this section, we present the results of our many experiments to present the performance of the proposed audio watermarking algorithm. First, there were parameter optimization experiments, followed by a comparison of the results with some state-of-the-art audio watermarking algorithms. We also show experiment results on 5.1 channel and stereo audio. Finally, synchronization is discussed.
Like other watermarking algorithms, we used bit error rate (BER) as a metric to evaluate the robustness of the watermarking algorithm. It is defined as: The objective difference grade (ODG) was utilized to measure imperceptibility, as it is one of the output values acquired according to the Perceptual Evaluation of Audio Quality (PEAQ) measurement technique prescribed by the ITU-R BS.1387 standard [25]. The ODG values were between 0.0 and −4.0 (imperceptible to very annoying), and they are shown in Table 1. In the rest of this part of the paper, if there is no specific notice of parameters, experiments were performed with following default parameters: Frame length T f = 500 samples (96 bps), delay τ = 45, integration time T = 500/2 − 45 = 205 samples, l a = 10 samples, ∆k = 0.1, α = 0.005, and passband frequency [F lc , F uc ] = [ 2500 ∼ 5500 Hz]. CSNR was calculated using 24 kbps HE-AAC v1 compression. We terminated the feedback process of a frame if it was conducted more than five times and it resulted in average of 2.13 feedbacks per frame.
The ODG was measured by basic PEAQ model, which was implemented from the Telecommunications & Signal Processing Laboratory of McGill University [26]. The "fdkaac.exe" software [27], which was developed by Fraunhofer, was used to conduct HE-AAC compression. The "ffmpeg.exe" [28] was used to conduct MP3 compression and decode the compressed audio.

Experiments on Mono Audio
We used seven audio clips belonging to seven different genres as host audio signals to illustrate the performance of our watermarking algorithm on mono-channel audio. They were "Drama," "Debate," "Sports Commentary," "Classical Music," "Jazz Music," "Pop Music," and "Rock Music." All audio files were in the WAVE format, mono, sampled at 48 kHz, quantized with 16 bits, and 30 s long.
The scale factor ξ i was adaptively adjusted by CSNR, as we mentioned in Section 3.1. Here, we first showed CSNR with 24 kbps HE-AAC v1 compression for two test audios and the sum of CSNR histograms of all seven test audio signals in Figure 8. From the histogram graph, we can see that most CSNR values were distributed from 0 to 15. According to the histogram distribution, we found that calculating ξ i by following Equation (11) well balanced the trade-off between robustness and imperceptibility in our implementation. The data payload refers to the number of bits that can be embedded into the audio signal within a unit of time and is measured in the unit of bps (bits per second). The data payload of our algorithm was determined by frame length , where one bit watermark is embedded. For example, if = 500 and the sampling rate of the host audio is 48000 Hz, we can embed 48000/500 = 96 bits per second in the audio. As the frame length will affect the imperceptibility and robustness of our algorithm, in Table 2 and Figure 9, we show how the change of data payload influenced ODG and BER, where BER was calculated after the HE-AAC v1 24 kbps compression. From the table and graph, we can see that BER was under 2% when the data payload was lower than 96, but it riose rapidly from 1.77 to 8.77% when the data payload was 96-192bps ( = 250). With the increase of data payload, the ODG was slightly increased and larger than −1, which indicates the original and watermarked audio signals were perceptually similar and not annoying.  The data payload refers to the number of bits that can be embedded into the audio signal within a unit of time and is measured in the unit of bps (bits per second). The data payload of our algorithm was determined by frame length T f , where one bit watermark is embedded. For example, if T f = 500 and the sampling rate of the host audio is 48,000 Hz, we can embed 48,000/500 = 96 bits per second in the audio. As the frame length will affect the imperceptibility and robustness of our algorithm, in Table 2 and Figure 9, we show how the change of data payload influenced ODG and BER, where BER was calculated after the HE-AAC v1 24 kbps compression. From the table and graph, we can see that BER was under 2% when the data payload was lower than 96, but it riose rapidly from 1.77 to 8.77% when the data payload was 96-192 bps (T f = 250). With the increase of data payload, the ODG was slightly increased and larger than −1, which indicates the original and watermarked audio signals were perceptually similar and not annoying.
Besides the frame length, the other important parameter of the proposed algorithm is passband frequency. If the selected passband frequency is too low or too high, it is likely to yield bad results for BER, as the audio compression will remove most of signals in that frequency range based on the psychoacoustic model. Moreover, HE-AAC applies SBR to directly cutoff high frequencies, which will result in worse BER results. Table 3 and Figure 10 show how the selected passband frequency influences the ODG and BER, where BER was calculated after the HE-AAC v1 24 kbps compression. As shown in the table and graph, we observed that, when using a frequency higher than 3.5 kHz, BER was over 5% and rose rapidly. When using a passband frequency of 0.5~3.5 kHz, the ODG was lower than −1, the other frequency ranges were all larger than −1, and the BER was also worse than when using 1.5~4.5 kHz and 2.5~5.5 kHz. Therefore, we can conclude that the 1.5~5.5 kHz frequency range is a good choice for embedding watermarks with the proposed algorithm. Note that this range of frequency bands is very sensitive to the human hearing system according to the absolute threshold of hearing. Almost all audio watermarking methods were designed to avoid this range of frequency to meet the imperceptibility requirement, though our method achieved high imperceptibility on the 1.5~5.5 kHz frequency band while maintaining a low BER.  Besides the frame length, the other important parameter of the proposed algorithm is passband frequency. If the selected passband frequency is too low or too high, it is likely to yield bad results for BER, as the audio compression will remove most of signals in that frequency range based on the psychoacoustic model. Moreover, HE-AAC applies SBR to directly cutoff high frequencies, which will result in worse BER results. Table 3 and Figure 10 show how the selected passband frequency influences the ODG and BER, where BER was calculated after the HE-AAC v1 24 kbps compression. As shown in the table and graph, we observed that, when using a frequency higher than 3.5 kHz, BER was over 5% and rose rapidly. When using a passband frequency of 0.5~3.5 kHz, the ODG was lower than −1, the other frequency ranges were all larger than −1, and the BER was also worse than when using 1.5~4.5 kHz and 2.5~5.5 kHz. Therefore, we can conclude that the 1.5~5.5 kHz frequency range is a good choice for embedding watermarks with the proposed algorithm. Note that this range of frequency bands is very sensitive to the human hearing system according to the absolute threshold of hearing. Almost all audio watermarking methods were designed to avoid this range of frequency to meet the imperceptibility requirement, though our method achieved high imperceptibility on the 1.5~5.5 kHz frequency band while maintaining a low BER.    Besides the frame length, the other important parameter of the proposed algorithm is passband frequency. If the selected passband frequency is too low or too high, it is likely to yield bad results for BER, as the audio compression will remove most of signals in that frequency range based on the psychoacoustic model. Moreover, HE-AAC applies SBR to directly cutoff high frequencies, which will result in worse BER results. Table 3 and Figure 10 show how the selected passband frequency influences the ODG and BER, where BER was calculated after the HE-AAC v1 24 kbps compression. As shown in the table and graph, we observed that, when using a frequency higher than 3.5 kHz, BER was over 5% and rose rapidly. When using a passband frequency of 0.5~3.5 kHz, the ODG was lower than −1, the other frequency ranges were all larger than −1, and the BER was also worse than when using 1.5~4.5 kHz and 2.5~5.5 kHz. Therefore, we can conclude that the 1.5~5.5 kHz frequency range is a good choice for embedding watermarks with the proposed algorithm. Note that this range of frequency bands is very sensitive to the human hearing system according to the absolute threshold of hearing. Almost all audio watermarking methods were designed to avoid this range of frequency to meet the imperceptibility requirement, though our method achieved high imperceptibility on the 1.5~5.5 kHz frequency band while maintaining a low BER.  Delay is also a significant factor which influences the performance of the proposed algorithm. If the delay is too large, the accumulation samples for calculating autocorrelation will decrease, which will cause an increase in the BER. The results in Table 4 and Figure 11 show how the delay had and effect on the ODG and BER. We can see that the delay around 45 samples gave good result where the ODG was over −1 and the BER was under 3%. Though the ODG got better when delays were over 100 samples, BER rose over 3% will cause an increase in the BER. The results in Table 4 and Figure 11 show how the delay had and effect on the ODG and BER. We can see that the delay around 45 samples gave good result where the ODG was over −1 and the BER was under 3%. Though the ODG got better when delays were over 100 samples, BER rose over 3% Figure 12 and Table 5 show how the attenuation length at the boundary of the window influenced the BER and ODG. It can be observed that the ODG steadily rose as the attenuation length increased and the BER decreased. An attenuation length of 10-25 is preferable, as the BER increased quickly over 25, and the ODG was relatively low at 5.     Figure 12 and Table 5 show how the attenuation length at the boundary of the window influenced the BER and ODG. It can be observed that the ODG steadily rose as the attenuation length increased and the BER decreased. An attenuation length of 10-25 is preferable, as the BER increased quickly over 25, and the ODG was relatively low at 5. In Table 6, we illustrate our comparison results. Our comparison was based on reported results of recently published papers [8,9,12,13,23,24] and was given for the data payload, ODG, and BER under MP3 compression (32 kbps, 64 kbps, 96 kbps, 128 kbps). We also listed the BER under various HE-AAC v1 compression bitrates (16 kbps, 24 kbps, 32 kbps, 64 kbps) with various data payloads. As we can see from the table, the proposed algorithm is competitive compared with other methods at similar data payloads under MP3 compression, indicating that our algorithm is also able to be robust against MP3 compression. Comparing the BER at the same bit rate (32 kbps, 64 kbps) on MP3 and HE-AAC v1 compressions of our algorithm, we found that the BER under HE-AAC v1 was nearly half of the MP3. From the BER results at various HE-AAC v1 compression bitrates, we can see that our algorithm was able to be robust against HE-AAC v1 compression where the bit rate was higher  In Table 6, we illustrate our comparison results. Our comparison was based on reported results of recently published papers [8,9,12,13,23,24] and was given for the data payload, ODG, and BER under MP3 compression (32 kbps, 64 kbps, 96 kbps, 128 kbps). We also listed the BER under various HE-AAC v1 compression bitrates (16 kbps, 24 kbps, 32 kbps, 64 kbps) with various data payloads. As we can see from the table, the proposed algorithm is competitive compared with other methods at similar data payloads under MP3 compression, indicating that our algorithm is also able to be robust against MP3 compression. Comparing the BER at the same bit rate (32 kbps, 64 kbps) on MP3 and HE-AAC v1 compressions of our algorithm, we found that the BER under HE-AAC v1 was nearly half of the MP3. From the BER results at various HE-AAC v1 compression bitrates, we can see that our algorithm was able to be robust against HE-AAC v1 compression where the bit rate was higher than 24 kbps with a low BER. Unfortunately, we could not have a comparison result under HE-AAC v1, as there is no paper which has evaluated its robustness under HE-AAC v1 compression.

Experiments on 5.1 Channel and Stereo Audio
Multichannel audio systems have become more and more popular in home entertainment environments. As HE-AAC is one of the most efficient audio codecs for multichannel audio, here we present experimental results under HE-AAC v1 and v2 compression by applying our watermarking algorithm on 5.1 channel audio and stereo audio, respectively. The CSNR for 5.1 channel audio was applied toHE-AAC v1 128 kbps compression, and, for stereo audio, it was applied to HE-AAC v2 48 kbps compression.
We used the twelve 5.1 channel audio types, which are listed in Table 7. All 5.1 channel audio files were in WAVE format, sampled at 48 kHz, quantized with 16 bits, and 10~20 s long. Figure 13 shows the time domain waveforms of the "Bach organ" 5.1 channel audio.  Atmospheric performance of Latin-American music 12 Moonriver Mouth organ and string orchestra The channels from, top to bottom, are front left (FL), front right (FR), center (C), low-frequency effects (LFE), surround left (SL), and surround right (SR). We independently embedded a watermark on each channel except LFE, as the channel showed nearly no signals in our test audio. Table 8 provides the BER of each channel of 5.1 channel audio under HE-AAC v1 128 kbps compression, as well as the corresponding ODG of each audio. The ODG was first calculated for each channel except LFE and averaged to report in the table for each audio. From the table, we can observe that there was a big difference of the BER of each channel of an audio. For example, the BER of the center channel of the No.1 audio was only 0.18%, but its front right channel BER was 5% higher than center channel, which was 6.21%. Regarding the ODG, all test audio yielded larger than −1 ODG values, except for three test audios that were slightly lower than −1. The average BER of the twelve test audios was 2.53%, and the average ODG was −0.84.
For testing stereo audio, we down-mixed the 5.1 channel audio by the following formula, FL = FL + 0.707 × FC + 0.5 × SL; FR = FR + 0.707 × FC + 0.5 × SR, to make test stereo audio. Figure 14 shows the time domain waveforms of the "Bach organ" stereo audio, which was made by the corresponding 5.1 channel audio. For the experiment on 5.1 channel audio, we independently embedded a watermark on each channel. Table 9 provides the BER of each channel of stereo audio under HE-AAC v2 48kbps compression, as well as the corresponding ODG of each audio type. From the table, we can see that average BER of the twelve test audios was 1.28%, and the average ODG was −0.78. The channels from, top to bottom, are front left (FL), front right (FR), center (C), low-frequency effects (LFE), surround left (SL), and surround right (SR). We independently embedded a watermark on each channel except LFE, as the channel showed nearly no signals in our test audio. Table 8 provides the BER of each channel of 5.1 channel audio under HE-AAC v1 128 kbps compression, as well as the corresponding ODG of each audio. The ODG was first calculated for each channel except LFE and averaged to report in the table for each audio. From the table, we can observe that there was a big difference of the BER of each channel of an audio. For example, the BER of the center channel of the No.1 audio was only 0.18%, but its front right channel BER was 5% higher than center channel, which was 6.21%. Regarding the ODG, all test audio yielded larger than −1 ODG values, except for three test audios that were slightly lower than −1. The average BER of the twelve test audios was 2.53%, and the average ODG was −0.84.
For testing stereo audio, we down-mixed the 5.1 channel audio by the following formula, FL = FL + 0.707 × FC + 0.5 × SL; FR = FR + 0.707 × FC + 0.5 × SR, to make test stereo audio. Figure 14 shows the time domain waveforms of the "Bach organ" stereo audio, which was made by the corresponding 5.1 channel audio. For the experiment on 5.1 channel audio, we independently embedded a watermark on each channel. Table 9 provides the BER of each channel of stereo audio under HE-AAC v2 48 kbps compression, as well as the corresponding ODG of each audio type. From the table, we can see that average BER of the twelve test audios was 1.28%, and the average ODG was −0.78.      Table 10. It can be seen that our watermarking algorithm was able to be robust against HE-AAC v1 and HE-AAC v2 compression when the bit rate was higher than 128 and 32 kbps, respectively, for 5.1 channel and stereo audio, as the BER was under 3% at a data payload of 96 bps, and the ODG was larger than −1. However, our algorithm showed a bad robustness for extremely low bit rate compression, like 5.1 channel audio with HE-AAC v1 64 kbps and stereo audio with HE-AAC v2 16 kbps, where the BER was over 20%.

Synchronization Design
In order to apply the audio watermarking technique in real situations, integrity information should be embedded repeatedly in the audio signal within each block unit. Each block must be further segmented to the synchronization codes part and the watermark information part. Synchronization is an effective way to accurately identify the watermark location.
In our algorithm, we can use a certain number of consecutive frames as the synchronization codes parts by embedding a fixed watermark sequence. In the extraction process, by checking whether the extracted consecutive watermark bits are same as the synchronization codes or a certain proportion is correct, we can determine the start of a block.
As we used the correlation between the signal and a delayed version of itself to extract the watermark, finding the exact start sample of each watermarked frame was not necessary because the correlation values D around the start point were similar. Figure 15 shows the D values calculated for each of the 5000 samples of "Drama" audio after HE-AAC v1 24 kbps compression, where the frame length was 500 samples and the watermark bits [1, 0, 1, 0, 1, 0, 1, 0, 1, 0] were embedded. As we mentioned in Section 3.2, the D value was calculated to determine the extracted watermark bit. From the figure, we can see that the D values changed slowly and were not likely to have an abrupt change within several samples. Hence, we could efficiently find the start of a block by skipping a number of samples each time when determining if the synchronization codes are found instead of brute-force searching through each sample. The selection for number of skipping samples is affected by frame length, and a larger number of skipping samples could be chosen with a longer frame length.

Synchronization Design
In order to apply the audio watermarking technique in real situations, integrity information should be embedded repeatedly in the audio signal within each block unit. Each block must be further segmented to the synchronization codes part and the watermark information part. Synchronization is an effective way to accurately identify the watermark location.
In our algorithm, we can use a certain number of consecutive frames as the synchronization codes parts by embedding a fixed watermark sequence. In the extraction process, by checking whether the extracted consecutive watermark bits are same as the synchronization codes or a certain proportion is correct, we can determine the start of a block.
As we used the correlation between the signal and a delayed version of itself to extract the watermark, finding the exact start sample of each watermarked frame was not necessary because the correlation values around the start point were similar. Figure 15 shows the values calculated for each of the 5000 samples of "Drama" audio after HE-AAC v1 24kbps compression, where the frame length was 500 samples and the watermark bits [1, 0, 1, 0, 1, 0, 1, 0, 1, 0] were embedded. As we mentioned in Section 3.2, the value was calculated to determine the extracted watermark bit. From the figure, we can see that the values changed slowly and were not likely to have an abrupt change within several samples. Hence, we could efficiently find the start of a block by skipping a number of samples each time when determining if the synchronization codes are found instead of brute-force searching through each sample. The selection for number of skipping samples is affected by frame length, and a larger number of skipping samples could be chosen with a longer frame length.
To ensure extracting complete information, according to the BER and the length of watermark information, an appropriate error correcting code should be applied.

Conclusions
In this paper, an autocorrelation modulation based an audio blind watermarking algorithm was proposed. The difference in NCOD relations between the front subframe and back subframe was used to embed and extract the watermark bits of each frame. The SNR before and after HE-AAC compression was used to adaptively adjust the scale factor, which was further used to balance the trade-off between robustness and imperceptibility. In addition, a feedback process was added to the watermark embedding procedure to ensure that the watermark was embedded with enough strength. To ensure extracting complete information, according to the BER and the length of watermark information, an appropriate error correcting code should be applied.

Conclusions
In this paper, an autocorrelation modulation based an audio blind watermarking algorithm was proposed. The difference in NCOD relations between the front subframe and back subframe was used to embed and extract the watermark bits of each frame. The SNR before and after HE-AAC compression was used to adaptively adjust the scale factor, which was further used to balance the trade-off between robustness and imperceptibility. In addition, a feedback process was added to the watermark embedding procedure to ensure that the watermark was embedded with enough strength.
By using optimized parameters, the experimental results present the fact that our algorithm is able to be robust against low bit rate (24 kbps for mono, 32 kbps for stereo and 128 kbps for 5.1 channel) HE-AAC compression where the BER is under 3% while ensuring a high level of imperceptibility (the average ODG was over −1) and data payload (96 bps). Synchronization for our algorithm was also discussed.
In the future, we will study a more efficient and effective watermarking method for multichannel audio to achieve a higher data payload. We will further explore a suitable error correction code and synchronization to the method to develop a real-time application.