High-Capacity Data-Hiding Scheme on Synthesized Pitches Using Amplitude Enhancement — A New Vision of Non-Blind Audio Steganography

This work proposes a new and non-blind steganographic scheme for synthesized pitches. Synthesized music is popularly used to demonstrate early versions of compositions conveniently and at low-cost. They can also be utilized to pass secrets or obtain digital rights. The method consists of two procedures, of which the first is the realistic simulation of synthesized pitches using a computer and the second is the hiding of secrets during the generated simulated pitches. The first part of this paper reviews attempts to discover the fundamental patterns of synthesized pitches and to develop a strategy for generating approximate pitches using a computer. The component frequencies are used to generate a pitch in which to hide secrets. Legal receivers receive the referenced composition and frequencies, enabling them to generate the synthesized pitches according to the main frequencies of the referenced composition. Finally, the generated and received pitches are compared to identify the secret bits. As more frequencies are used to hide secret bits, more secret bits can be embedded in the synthesized pitches. The use of more frequencies makes synthesized pitches more realistic compared to real ones. The performance of the proposed method is also compared with that of competing methods and under common attacks.


Introduction and Related Work
The Internet is a popular environment in which people exchange personal data.Accordingly, great importance is now attached to information security.Many techniques for keeping data confidential have been developed.The field of steganography concerns hiding data that embed messages in insignificant media before a transmission.Data-hiding schemes are used to hide secrets in cover media, producing stego-media.The approach enables users to discover attempts by intruders to replace original messages with fabricated content.One of its applications is to back-up personal, private data or information on the Internet.Presently, many people frequently upload personal and secret data to cloud services, while reasonably distrusting cloud vendors.The objective of data-hiding is to increase hiding capacity while reducing the likelihood that intruders can identify anything is hidden [1]. Figure 1 displays some applications of steganography: (a) when a user backs up sensitive data on a cloud storage service, he does not want the data accessible even though it is encrypted; (b) when a user create his multimedia files, he wants to obtain his ownerships before he publishes those files; and (c) when a user wants to communicate to some, he does not want to be noticed so he makes his messages to be un-perceptual.
Symmetry 2017, 9, 92 2 of 20 up sensitive data on a cloud storage service, he does not want the data accessible even though it is encrypted; (b) when a user create his multimedia files, he wants to obtain his ownerships before he publishes those files; and (c) when a user wants to communicate to some, he does not want to be noticed so he makes his messages to be un-perceptual.Popular multimedia includes images, audio, video and text [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16].Audio-based data-hiding methods [3,[17][18][19][20][21][22][23] can be divided into two categories.Time domain-based operations tend to replace the least significant bits [3,20], and the echo parts of a signal [3,23].Frequency domain-based operations include directly hiding secrets in high or low frequencies and spreading the secrets throughout band frequencies.The former approach is known as psychoacoustic masking [3,21] and the latter is known as spreading the spectrum [3,22,23].Most audio-based methods suffer from the same problems as image-based methods, that is, the issue of distortion between cover and stego-medium.Phase coding and spreading spectrum methods are safer due to they are designed to deployed secrets in un-perceptual frequencies [3], although their computation time are greater than those of other schemes.
Traditional data-hiding strategies are based on digitalized multimedia.Distortion is the most important limitation of steganography, and it must therefore be controlled to reduce awareness of the generated stego-media.Other methods for hiding data in audio are based on ideas that have been proposed by various authors [3,[20][21][22][23].They include least-significant-bit (LSB), phase coding, spread spectrum, echo data-hiding and psychoacoustic masking methods.Akhaee et al. [17] proposed a robust data-hiding algorithm to avoid statistical cracking.They found that most time-domain processing data-hiding schemes are weak and can be easily decoded by current steganalytic strategies.Their algorithm applies correlated quantization to embed data using a histogram-based detector.Huang et al. [19] presented a new steganographic scheme with variable capacity and synchronization for the secure multimedia transmission of acoustic data in real time.Atoum et al. [18] proposed a new data-hiding scheme that is based on the mp3 file format.They were concerned that human ears are very sensitive to audio features so the audio content should not be modified.Their method, therefore, hides secret data only between the frames of an mp3 file.Yamamoto and Iwakiri [24] developed a method of identifying the ownership of a digital instrumental audio.They mentioned that digital instrumental audio is now very popular on the Internet.Accordingly, the present work develops a data-hiding scheme that is based on the fundamental composition of simulated instrumental audio.
Most of the above proposed methods are based on traditional digitalized audio and are therefore restricted by distortion and exhaustion of transmission, which means the more secrets Popular multimedia includes images, audio, video and text [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16].Audio-based data-hiding methods [3,[17][18][19][20][21][22][23] can be divided into two categories.Time domain-based operations tend to replace the least significant bits [3,20], and the echo parts of a signal [3,23].Frequency domain-based operations include directly hiding secrets in high or low frequencies and spreading the secrets throughout band frequencies.The former approach is known as psychoacoustic masking [3,21] and the latter is known as spreading the spectrum [3,22,23].Most audio-based methods suffer from the same problems as image-based methods, that is, the issue of distortion between cover and stego-medium.Phase coding and spreading spectrum methods are safer due to they are designed to deployed secrets in un-perceptual frequencies [3], although their computation time are greater than those of other schemes.
Traditional data-hiding strategies are based on digitalized multimedia.Distortion is the most important limitation of steganography, and it must therefore be controlled to reduce awareness of the generated stego-media.Other methods for hiding data in audio are based on ideas that have been proposed by various authors [3,[20][21][22][23].They include least-significant-bit (LSB), phase coding, spread spectrum, echo data-hiding and psychoacoustic masking methods.Akhaee et al. [17] proposed a robust data-hiding algorithm to avoid statistical cracking.They found that most time-domain processing data-hiding schemes are weak and can be easily decoded by current steganalytic strategies.Their algorithm applies correlated quantization to embed data using a histogram-based detector.Huang et al. [19] presented a new steganographic scheme with variable capacity and synchronization for the secure multimedia transmission of acoustic data in real time.Atoum et al. [18] proposed a new data-hiding scheme that is based on the mp3 file format.They were concerned that human ears are very sensitive to audio features so the audio content should not be modified.Their method, therefore, hides secret data only between the frames of an mp3 file.Yamamoto and Iwakiri [24] developed a method of identifying the ownership of a digital instrumental audio.They mentioned that digital instrumental audio is now very popular on the Internet.Accordingly, the present work develops a data-hiding scheme that is based on the fundamental composition of simulated instrumental audio.
Most of the above proposed methods are based on traditional digitalized audio and are therefore restricted by distortion and exhaustion of transmission, which means the more secrets embedded in a cover media, the more differences of a cover and a stego media.They can be used simply to modify media content, but in so doing, they limit the capacity for and security of the hidden secrets.Currently, much content is transmitted through the Internet, and a fraction of it is made by computer animators, and includes text and simulated media [25][26][27][28][29][30][31][32].Cayre and Macq [28] were the first to propose a data-hiding method that was based on 3D polygons.Their scheme is applied to 3D meshes of triangles, and extends one of the simplest data-hiding techniques, called the triangle strip peeling sequence (TSPS) technique.The basic idea of the TSPS algorithm is the insertion of bits in a path traced on the mesh.In 2009, Chao et al. [29] improved Cayre's algorithm and increased its capacity.They redefined 3D polygons as multilayered triangles such that each layer could be used to hide secrets.They also used coordinates to define each polygon and defined a multi-layer such that the capacity of their method was 3n times that of Cayre's method (where n is the number of layers).
Inspired by the above research works, the authors developed a new data-hiding scheme that uses synthesized musical pitches to embed more secrets and reduce distortion.The scheme reproduces a synthesized musical pitch in which it embeds secrets by amplitude enhancement.The enhancement slightly pads noises and legal receivers have only to compare the magnitude of the amplitude with that in standard patterns of synthesized pitches.The restriction is that only one pitch can be used at the same time.Sound synthesis is an important topic in the simulation of digital musical instruments.This paper is organized as follows.Section 2 first introduces the fundamental principle of the synthesis of musical notes, which is the basis of the data-hiding scheme that is presented herein.A reliable formula for synthesizing notes is introduced and some simulations are carried out.Therefore, a data-hiding scheme are presented.Section 3 displays some practical experiments to prove the feasibility and ability.Section 4 compares the performance of the proposed method with of others and presents some other evaluations and theoretical analysis.Finally, Section 5 draws conclusions.All experiments are analyzed using MATLAB (2006a, The MathWorks, Natick, MA, USA).

Materials and Methods
References [33,34] described efforts to describe objectively the quality of piano tones, as understood by musicians, and they tried to find synthetic tones that would be considered to be better than real piano tones.Casey showed that a two-layer feed-forward model can perform inverse mapping for a simple physical model of a string [35].References [36,37] showed the numerical approach and the underlying physical model can be improved to simulate the motion of a piano string with a high degree of realism.This work develops a model of instruments as follows.First, a discrete Fourier transform (DFT) is utilized to transform the sampled sound data of an instrument from the time domain to the frequency domain.When a real instrumental pitch is recorded, analog acoustic is digitalized with being sampled automatically by a computer.Then the DFT could be adopted to classify and decompose the composition of the frequencies of a single pitch.Second, use the frequency domain function and the sound of an instrument is described as a pattern, which is generated using DFT and the inverse DFT, which are used to analyze sampled data using computers.Section 2.1 is the fundamental of the methodology, while Section 2.2 is the proposed scheme.

Fundamentals
This section discusses why musical instruments produce such beautiful music.Harmonics will be introduced.Fourier transformations are based on the fact that a function in the time domain can be represented as a summation of cosine functions.Consider the periodic square wave, plotted in Figure 2. The signal x(t) can be represented by Equation (1) [38]: Symmetry 2017, 9, 92 4 of 20 A square wave signal [38].
The frequency  0 is the fundamental frequency.As k increases, the coefficient of the cosine function, 2 sin(2 0   )  , decreases.Hence, only the coefficients for  = 2, 3, 4 and other low values are important.The cosine function for  = 2 is known as the second harmonic; that for  = 3 is the third harmonic, and so on.The sound of any musical note that is produced by a musical instrument contains the fundamental frequency and a few harmonics.Figure 3 plots the function of a real piano's Middle C in the time domain.The DFT is applied to the function in Figure 3 to obtain the frequency spectrum in Figure 4.Only a few of the magnitudes are marked because space is limited.After the frequency spectrum of Middle C on a real piano was obtained, the frequency spectra of all of the pitches that are produced in the middle region of a piano are found.Figure 5 displays these spectra.The frequency f 0 is the fundamental frequency.As k increases, the coefficient of the cosine function, 2 sin(2πk f 0 T s ) πk , decreases.Hence, only the coefficients for k = 2, 3, 4 and other low values are important.The cosine function for k = 2 is known as the second harmonic; that for k = 3 is the third harmonic, and so on.The sound of any musical note that is produced by a musical instrument contains the fundamental frequency and a few harmonics.Figure 3 plots the function of a real piano's Middle C in the time domain.The DFT is applied to the function in Figure 3 to obtain the frequency spectrum in Figure 4.Only a few of the magnitudes are marked because space is limited.After the frequency spectrum of Middle C on a real piano was obtained, the frequency spectra of all of the pitches that are produced in the middle region of a piano are found.Figure 5 displays these spectra.
The frequency  0 is the fundamental frequency.As k increases, the coefficient of the cosine function, 2 sin(2 0   )  , decreases.Hence, only the coefficients for  = 2, 3, 4 and other low values are important.The cosine function for  = 2 is known as the second harmonic; that for  = 3 is the third harmonic, and so on.The sound of any musical note that is produced by a musical instrument contains the fundamental frequency and a few harmonics.Figure 3 plots the function of a real piano's Middle C in the time domain.The DFT is applied to the function in Figure 3 to obtain the frequency spectrum in Figure 4.Only a few of the magnitudes are marked because space is limited.After the frequency spectrum of Middle C on a real piano was obtained, the frequency spectra of all of the pitches that are produced in the middle region of a piano are found.Figure 5 displays these spectra.important.The cosine function for  = 2 is known as the second harmonic; that for  = 3 is the third harmonic, and so on.The sound of any musical note that is produced by a musical instrument contains the fundamental frequency and a few harmonics.Figure 3 plots the function of a real piano's Middle C in the time domain.The DFT is applied to the function in Figure 3 to obtain the frequency spectrum in Figure 4.Only a few of the magnitudes are marked because space is limited.After the frequency spectrum of Middle C on a real piano was obtained, the frequency spectra of all of the pitches that are produced in the middle region of a piano are found.Figure 5 displays these spectra.A real music instrument produces different pitches with different frequency patterns.Consider a randomly chosen musical instrument, such as a piano and the playing of any note on it.Perform a DFT on the note.The k frequencies with the largest magnitudes are selected.Denote these frequencies as  1 ,  2 , … ,   , where  1 <  2 < ⋯ <   , with magnitudes  1 ,  2 , … ,   , 0 ≤   ≤ 1 .
Calculate   =   / 1 for 1 ≤  ≤ .Now suppose that the goal is to generate a frequency pattern for Middle C. The fundamental frequency of Middle C is known to be 262 Hz.Denote this frequency as   .The Middle C that is generated by a real piano has frequencies  1   ,  2   , … ,     with magnitudes  1 ,  2 , … ,   , respectively, and the inverse DFT generates the sound from these frequencies.Of course, musical sounds that are generated in this way are not expected to be the same as those produced by a piano.However, as demonstrated by the following experiment, they will be piano-like if  is sufficiently large.The k frequencies with the largest magnitudes are selected using the prune and search method [39].
The following experiments involve synthesized pitches.Let  = 10 so the ten frequencies with the largest magnitudes are obtained.The magnitudes  1 ,  2 , … ,  10 and respective multiples  1 ,  2 , … ,  10 are found and shown in Table 1.Middle C on a piano has frequencies (  , 1.0038  , …, 7.0843   ) with respect magnitudes (0.2635, 0.7042, …, 0.2402).Let   = 262 , yielding frequencies of (262, 263, …, 1856 Hz).Figures 6 and 7 plot the experimental results in the frequency and time domains.Comparing Figures 3 and 7, the waves are not similar.A real music instrument produces different pitches with different frequency patterns.Consider a randomly chosen musical instrument, such as a piano and the playing of any note on it.Perform a DFT on the note.The k frequencies with the largest magnitudes are selected.Denote these frequencies as suppose that the goal is to generate a frequency pattern for Middle C. The fundamental frequency of Middle C is known to be 262 Hz.Denote this frequency as f ap .The Middle C that is generated by a real piano has frequencies b 1 f ap , b 2 f ap , . . ., b k f ap with magnitudes a 1 , a 2 , . . ., a k , respectively, and the inverse DFT generates the sound from these frequencies.Of course, musical sounds that are generated in this way are not expected to be the same as those produced by a piano.However, as demonstrated by the following experiment, they will be piano-like if k is sufficiently large.The k frequencies with the largest magnitudes are selected using the prune and search method [39].
The following experiments involve synthesized pitches.Let k = 10 so the ten frequencies with the largest magnitudes are obtained.The magnitudes a 1 , a 2 , . . ., a 10 and respective multiples b 1 , b 2 , . . ., b 10 are found and shown in Table 1.Middle C on a piano has frequencies ( f ap , 1.0038 f ap , . . ., 7.0843 f ap ) with respect magnitudes (0.2635, 0.7042, . . ., 0.2402).Let f ap = 262, yielding frequencies of (262, 263, . . ., 1856 Hz).Figures 6 and 7 plot the experimental results in the frequency and time domains.Comparing Figures 3 and 7, the waves are not similar.Figure 8 presents the similarity between a simulated piano note with  = 10,000 and a real piano note.The difference between them is negligible and this fact will be exploited in the following section.A higher  yields a smaller distortion and the higher ability to embed more secrets.Various pitches from different musical instruments were simulated and analyzed.

Data Hiding Scheme
The proposed data-hiding scheme involves the following steps.First, choose an instrumental pitch P to be the reference pitch.With reference to the preceding sections, all of the required  Figure 8 presents the similarity between a simulated piano note with  = 10,000 and a real piano note.The difference between them is negligible and this fact will be exploited in the following section.A higher  yields a smaller distortion and the higher ability to embed more secrets.Various pitches from different musical instruments were simulated and analyzed.

Data Hiding Scheme
The proposed data-hiding scheme involves the following steps.First, choose an instrumental pitch P to be the reference pitch.With reference to the preceding sections, all of the required Figure 8 presents the similarity between a simulated piano note with k = 10, 000 and a real piano note.The difference between them is negligible and this fact will be exploited in the following section.A higher k yields a smaller distortion and the higher ability to embed more secrets.Various pitches from different musical instruments were simulated and analyzed.Figure 8 presents the similarity between a simulated piano note with  = 10,000 and a real piano note.The difference between them is negligible and this fact will be exploited in the following section.A higher  yields a smaller distortion and the higher ability to embed more secrets.Various pitches from different musical instruments were simulated and analyzed.

Data Hiding Scheme
The proposed data-hiding scheme involves the following steps.First, choose an instrumental pitch P to be the reference pitch.With reference to the preceding sections, all of the required

Data Hiding Scheme
The proposed data-hiding scheme involves the following steps.First, choose an instrumental pitch P to be the reference pitch.With reference to the preceding sections, all of the required parameters can be obtained.These include the magnitudes a i and the main frequencies b i f ap .The number of a i terms is k and the number of b i f ap terms is k because the system is used only to generate the main k frequencies of the signals.Next, suppose that the length of the secret bit stream bt 1 , bt 2 , . . ., bt k is also k.If bt 1 = 1, then a 1 is increased to σa 1 , 1 < σ < 2 ; the same operation is applied to a 2 , a 3 , . . ., a k .Figure 9 is the overview of the scheme.The sender uses the standard pattern of synthesized pitches and k to encode the secrets to generate a synthesized pitch using amplitude enhancement.A legal receiver uses the standard pattern of synthesized pitches and k to decode the secret.
Symmetry 2017, 9, 92 7 of 20 parameters can be obtained.These include the magnitudes   and the main frequencies     .The number of   terms is k and the number of     terms is k because the system is used only to generate the main k frequencies of the signals.Next, suppose that the length of the secret bit stream  1 ,  2 , … ,   is also k.If  1 = 1, then  1 is increased to σ 1 , 1 <  < 2 ; the same operation is applied to  2 ,  3 , … ,   .Figure 9 is the overview of the scheme.The sender uses the standard pattern of synthesized pitches and k to encode the secrets to generate a synthesized pitch using amplitude enhancement.A legal receiver uses the standard pattern of synthesized pitches and k to decode the secret.Algorithm 1 is the simple encoding procedure.Algorithm 2 is the decoding procedure.Algorithm 1 is the simple encoding procedure.Algorithm 2 is the decoding procedure.The value (σ − 1)a 1 shall be too small to be perceived by the human ear.Consider, for example, Table 2: hiding bit stream 1001101101 in the synthesized pitch yields the enhanced magnitudes, which are shown in the first table.Legitimate receivers obtain the reference instrumental pitch P, the length of the secret k and the method of generation of the synthesized pitches.The decoding procedure is as follows.First, identify the main frequencies that correspond to the k largest magnitudes from P and the received pitch p.The frequencies of the former are denoted as F 1 , F 2 , . . ., F k and those of the latter are denoted as f 1 , f 2 , . . ., f k .The magnitudes of all main frequencies are obtained as A 1 , A 2 , . . ., A k and a 1 , a 2 , . . ., a k .Each A i is compared with the corresponding a i ; for Algorithm 4, Step 3, if A i = a i , then the secret bit bt i = 1; otherwise, bt i = 0. Finally, all bt i s are concatenated and the secret bit stream can be produced.
While the synthesis of musical pitches and the data-hiding scheme are public, the above algorithms can be designed more secure by including a parameter R, which is the real order of a i and b i f ap during the data embedding procedure.R is generated using a random number generator and the seed of the generator is obtained by legal receivers.The formal definition of R is R = {r i }, 1 ≤ r i ≤ k, where all r i have different values.An example follows.Consider R = {3, 1, 7, 9, 2, 10, 4, 6, 5, 8}; Table 3 is obtained after the complex version of the data embedding scheme is implemented.The red numbers indicate hidden secret "1" bits.Evidently, the positions of the secret bits differ from those in the second table.Algorithms 3 and 4 describe the complex version of the proposed scheme.for all R i , 1 ≤ i ≤ k, decode secret bit bt i with reference to the following condition: The above two proposed embedding schemes focus on enhancing amplitudes when secret "1" bits are embedded.However, it shall be considered that the enhancement will be too large if there are too many "1" bits.A strengthened version is presented here called alternating current (AC) algorithm.The main idea of AC algorithm is to reduce a large enhancement caused by embedding secret "1" bits.The embedding scheme goes on alternatively enhancing each amplitude by multiply σ and 1 σ .For the example in Table 3, the parameters are modified as listed in Table 4 by adopting AC algorithm.The numbers of the even positions (9, 4, 8) of embedding secret "1" bits are modified by multiplying 1 σ .Algorithms 5 and 6 describe the embedding and extracting procedures of the AC algorithm, respectively.initialize AC = 0 Step 3: for all R i , 1 ≤ i ≤ k, obtain a 1 , a 2 , . . ., a k as follows.
if It shall be proven that even if the proposed steganographic scheme is applied twice, it is still able to get the embedded secrets.A simple proof is given as follows.The basic idea of this presented research work is the enhancement of amplitudes if embeds a secret "1" bits.Applying the embedding procedure twice makes the enhanced amplitudes multiply σ of 1  σ again, that is, a i will be σ 2 a i or 1 σ 2 a i but not σa i or 1 σ a i .When legal receivers apply the decoding procedure, the main strategy is to compare the enhanced amplitudes to the original amplitudes of the standard patterns.The are only three possible values of a i : a i , σ 2 a i , and 1 σ 2 a i .Apparently, the comparisons work and are able to obtain the embedded secrets.

Results
This section displays experiments of the proposed scheme.Figure 10 displays the cover and the stego-pitches with k = 10.The red numbers indicate secret "1" bits that are hidden.Figure 11 shows that the cover and the stego-pithes when k = 10,000 by embedding 10,000 random bits and the differences are nearly none because the enhancement of amplitudes is too small.Figure 12 plots the signal-to-noise ratio (SNR) between the embedded noise and the standard pitch as a function of σ from 1.001 to 1.01 for k from 10 to 50.A larger σ causes greater distortions of the cover pitches.
The performance of a data-hiding scheme is generally measured using capacity and distortion.As mentioned above, more multimedia are being produced by Internet users, so intruders have difficulty in distinguishing stego-media from cover media.This work develops a new cover medium, and comparisons can only be made between frequencies for different values of k, as selected by the standard pattern in Section 2.1.First, the capacity of the proposed scheme is discussed.Since the proposed scheme embeds a secret bit in a selected pitch, the capacity increases with the number of selected pitches.Figure 13 displays the capacity for different values of k; a larger k allows more secret bits to be embedded.The value of k is linearly related to capacity.discussed.Since the proposed scheme embeds a secret bit in a selected pitch, the capacity increases with the number of selected pitches.Figure 13 displays the capacity for different values of k; a larger  allows more secret bits to be embedded.The value of k is linearly related to capacity.discussed.Since the proposed scheme embeds a secret bit in a selected pitch, the capacity increases with the number of selected pitches.Figure 13 displays the capacity for different values of k; a larger  allows more secret bits to be embedded.The value of k is linearly related to capacity.discussed.Since the proposed scheme embeds a secret bit in a selected pitch, the capacity increases with the number of selected pitches.Figure 13 displays the capacity for different values of k; a larger  allows more secret bits to be embedded.The value of k is linearly related to capacity.Second, the distortions between the stego and real pitches are provided for different values of k and .The following experiment is conducted.The peak-signal-to-noise ratio (PSNR) function in the MATLAB library is utilized to evaluate the distortion of two plotted images of stego and real pitches.Figure 14 plots the variation of the PSNR associated with different values of k and .Each point represents a comparison between a stego and a real pitch; for example, the point (10,000, 35) compares a stego pitch (with 10,000 bits hidden,  = 1.0001) and a real pitch.

The curve of capacity (k) to distortion (PSNR) between real and simulated stego-pitches with different 𝜎 values.
The above curve reveals that the proposed scheme yields a smaller distortion as more secret bits are embedded up to a limit beyond which too many secret bits are hidden.In the proposed experiment, the upper bound of the distortion is 37 dB when  = 0.0001 and 12,000 bits are embedded.In practical applications, the advantage of the present scheme is that the capacity grows and the distortion declines.
Third, the computational performance of the proposed scheme is addressed.Figure 15 plots the time consumption of the proposed scheme.Second, the distortions between the stego and real pitches are provided for different values of k and .The following experiment is conducted.The peak-signal-to-noise ratio (PSNR) function in the MATLAB library is utilized to evaluate the distortion of two plotted images of stego and real pitches.Figure 14 plots the variation of the PSNR associated with different values of k and .Each point represents a comparison between a stego and a real pitch; for example, the point (10,000, 35) compares a stego pitch (with 10,000 bits hidden,  = 1.0001) and a real pitch.The above curve reveals that the proposed scheme yields a smaller distortion as more secret bits are embedded up to a limit beyond which too many secret bits are hidden.In the proposed experiment, the upper bound of the distortion is 37 dB when  = 0.0001 and 12,000 bits are embedded.In practical applications, the advantage of the present scheme is that the capacity grows and the distortion declines.
Third, the computational performance of the proposed scheme is addressed.Figure 15 plots the time consumption of the proposed scheme.The above curve reveals that the proposed scheme yields a smaller distortion as more secret bits are embedded up to a limit beyond which too many secret bits are hidden.In the proposed experiment, the upper bound of the distortion is 37 dB when σ = 0.0001 and 12,000 bits are embedded.In practical applications, the advantage of the present scheme is that the capacity grows and the distortion declines.
Third, the computational performance of the proposed scheme is addressed.Figure 15 plots the time consumption of the proposed scheme.

Discussion
This section presents the comparison with other related work, performance under other attacks and some theoretical analysis.

Comparisons with Related Work
Common attacks against an audio data-hiding scheme include the low pass filter (LPF) attack, mp3-like compression, and re-quantization.The bit error ratio (BER) is used to measure the performance of a data-hiding scheme under common attacks.The BER is defined as the ratio of exact matching of the embedded secret bit-stream and of the decoded secret bit-stream after the stego cover pitch has undergone by the above common attacks.The mathematical definition is: The definitions of the common attacks adopted in the proposed comparisons are as follows:  LPF (3 kHz) filters all signals with frequencies lower than 3 kHz. mp3 (64 kbps) adopts an existing multimedia tool (Adobe Audition) to compress the stego-pitch (.wav file->.mp3file) and decompress back to .wavfile format. Re-quantization (16 to 32 bits) adopts an existing multimedia tool (Adobe Audition) to re-quantize the sampling point from 16 bits to 32 bits and then to re-quantize it back to 16 bits. Re-quantization (16 to 8 bits) adopts an existing multimedia tool (Adobe Audition) to re-quantize the sampling point from 16 bits to 8 bits and then to re-quantize it back to 8 bits.
Table 5 compares the BERs of the proposed scheme and methods proposed elsewhere [17,40,41].The results indicate that the methods developed herein outperform the others under the indicated attacks.In [17], the authors proposed two steganographic methods, hard quantization (HQ) and soft quantization (SQ) using correlated quantization to embed data with histogram based detector.A novel mapping denoted as point to point graph (PPG) is used to evaluate the correlation among each value of samples.PPG point radii are suggested to embed data to obtain the performances.In [40], the authors presented a self-synchronization scheme for audio watermarking.The synchronization codes are hidden into audio as the informative data, thus the embedded data have the ability of self-synchronization.The synchronization codes are hidden into low frequency coefficients in discrete wavelet transform domain.In [41], the authors proposed an echo hiding scheme.Some echoes are adequately adapted when the embedding process is executing.
The proposed scheme protects all of the hidden secrets under the specified common attacks for the following reasons.

Discussion
This section presents the comparison with other related work, performance under other attacks and some theoretical analysis.

Comparisons with Related Work
Common attacks against an audio data-hiding scheme include the low pass filter (LPF) attack, mp3-like compression, and re-quantization.The bit error ratio (BER) is used to measure the performance of a data-hiding scheme under common attacks.The BER is defined as the ratio of exact matching of the embedded secret bit-stream and of the decoded secret bit-stream after the stego cover pitch has undergone by the above common attacks.The mathematical definition is: The definitions of the common attacks adopted in the proposed comparisons are as follows: • LPF (3 kHz) filters all signals with frequencies lower than 3 kHz.• mp3 (64 kbps) adopts an existing multimedia tool (Adobe Audition) to compress the stego-pitch (.wav file->.mp3file) and decompress back to .wavfile format.

•
Re-quantization (16 to 32 bits) adopts an existing multimedia tool (Adobe Audition) to re-quantize the sampling point from 16 bits to 32 bits and then to re-quantize it back to 16 bits.

•
Re-quantization (16 to 8 bits) adopts an existing multimedia tool (Adobe Audition) to re-quantize the sampling point from 16 bits to 8 bits and then to re-quantize it back to 8 bits.
Table 5 compares the BERs of the proposed scheme and methods proposed elsewhere [17,40,41].The results indicate that the methods developed herein outperform the others under the indicated attacks.In [17], the authors proposed two steganographic methods, hard quantization (HQ) and soft quantization (SQ) using correlated quantization to embed data with histogram based detector.A novel mapping denoted as point to point graph (PPG) is used to evaluate the correlation among each value of samples.PPG point radii are suggested to embed data to obtain the performances.In [40], the authors presented a self-synchronization scheme for audio watermarking.The synchronization codes are hidden into audio as the informative data, thus the embedded data have the ability of self-synchronization.The synchronization codes are hidden into low frequency coefficients in discrete wavelet transform domain.In [41], the authors proposed an echo hiding scheme.Some echoes are adequately adapted when the embedding process is executing.
The proposed scheme protects all of the hidden secrets under the specified common attacks for the following reasons.LPF (3 kHz), by the definition of the LPF, frequencies higher than 3 kHz will be eliminated under this attack, while in the proposed scheme, the selected k-pitches are all beyond 3 kHz such that no bits are eliminated.mp3 (64 kbps), the main purpose of mp3 compression is to eliminate frequencies higher than 22 kHz, and as under the LPF attack, the proposed scheme selects all low frequencies so that no bits disappear.Re-quantization (16 to 32 bits), this attack increases the size of the binary representation of numbers and extension will not affect the sampled values of the stego-pitches.Evidently, no secret bits will be destroyed.Re-quantization (16 to 8 bits), this attack shrinks the size of the binary representation of numbers, eliminating the suffix numbers when is re-quantized to its original size.In the proposed scheme, the selected σ does not produce long floating numbers after the enhancement of amplitudes so the scheme performs well under this attack.
Another fact of the comparison is the curve of the trend among capacity and distortion.Figure 16 demonstrates the difference of the curve between the related work [17] and the proposed work.It can be seen that the distortion decreases when the capacity increases, while, in this work, the distortion increases to a boundary value when the capacity increases.The phenomenon is because the proposed scheme is designed for breaking the bottleneck of traditional steganography, as illustrated in Figure 16a.LPF (3 kHz), by the definition of the LPF, frequencies higher than 3 kHz will be eliminated under this attack, while in the proposed scheme, the selected k-pitches are all beyond 3 kHz such that no bits are eliminated.mp3 (64 kbps), the main purpose of mp3 compression is to eliminate frequencies higher than 22 kHz, and as under the LPF attack, the proposed scheme selects all low frequencies so that no bits disappear.Re-quantization (16 to 32 bits), this attack increases the size of the binary representation of numbers and extension will not affect the sampled values of the stego-pitches.Evidently, no secret bits will be destroyed.Re-quantization (16 to 8 bits), this attack shrinks the size of the binary representation of numbers, eliminating the suffix numbers when is re-quantized to its original size.In the proposed scheme, the selected  does not produce long floating numbers after the enhancement of amplitudes so the scheme performs well under this attack.
Another fact of the comparison is the curve of the trend among capacity and distortion.Figure 16 demonstrates the difference of the curve between the related work [17] and the proposed work.It can be seen that the distortion decreases when the capacity increases, while, in this work, the distortion increases to a boundary value when the capacity increases.The phenomenon is because the proposed scheme is designed for breaking the bottleneck of traditional steganography, as illustrated in Figure 16a.

Performances under Other Attacks
The performances of the proposed work under other attacks like frequency cropping [42], direct current (DC) and high pass filter (HPF) [43] are discussed in this subsection.The definition of DC attack is to pad a certain power on a stego audio, then run the decoding procedure to examine the BER.The definition of HPF attack is to filter all signals whose frequencies are higher than a certain frequency, then run the decoding procedure to examine the BER.The frequency cropping attack is to randomly pad signals with certain frequencies on a stego pitch.The presented schemes are conditionally outperformed under the above three attacks.
For frequency cropping attack, if, frequencies of all randomly padded signals are exactly the same with the frequencies of the selected     s, a high BER occurs.Figure 17 presents the experiments of frequency cropping attack.In the setting of the experiment, 50 frequencies are

Performances under Other Attacks
The performances of the proposed work under other attacks like frequency cropping [42], direct current (DC) and high pass filter (HPF) [43] are discussed in this subsection.The definition of DC attack is to pad a certain power on a stego audio, then run the decoding procedure to examine the BER.The definition of HPF attack is to filter all signals whose frequencies are higher than a certain frequency, then run the decoding procedure to examine the BER.The frequency cropping attack is to randomly pad signals with certain frequencies on a stego pitch.The presented schemes are conditionally outperformed under the above three attacks.
For frequency cropping attack, if, frequencies of all randomly padded signals are exactly the same with the frequencies of the selected b i f ap s, a high BER occurs.Figure 17 presents the experiments of frequency cropping attack.In the setting of the experiment, 50 frequencies are randomly generated to replace the corresponding frequencies of a pitch with different k selected.Another 50 frequencies are generated by normal distribution.In addition, a baseline of the theoretical upper bound is also drawn in the figure.For DC attack, all sampled values increase after padding a DC signal, and the decoding procedure will fail because the testing condition is the quality of the power of the cover and the stego pitches.However, if the decoding procedure is adequately modified to inspect all amplitudes of     s, it is possible to filter out the DC power and obtain all correct secrets.The inspection could be achieved as follows.Subtract all amplitudes of the selected     s and discover the most appeared value.The most appeared value is the power of DC, and the subtraction of the DC power from the stego pitch will successfully filter out the DC.
For HPF attack, all selected     s will be filtered out under a certain frequency   , that is, only those selected     s which are than   can survive after exploiting HPF on the stego pitch.However, it is a trick to avoid the sabotage of HPF; selecting higher     will outperform under HPF attack.Table 6 presents the performances under DC and HPF attacks compared with [17,40,41].Two types of the selected     s are used.In recent years, many steganography were proposed with considering the cracking probability [44][45][46][47][48][49][50][51] under a brute force guessing.The threating mode evaluated here is based on the intruders knowing algorithms but without the order R.In practice, R is a private key and it should be transmitted via a secure channel.If an intruder uses a computer and successfully detects all slight noises between the stego and cover pitch (the standard patterns are also public knowledge), he could dress all enhanced frequencies but without a correct order.This means that he knows which frequency is used to embed "1" or "0" but he cannot reconstruct the exact bit-stream without R. Absolutely, the only thing he can do is to try all combinations of these 1s and 0s with n 1s and  0s where 0 ≤  ≤  and k is the number of selected frequencies.Denote the summation of the For DC attack, all sampled values increase after padding a DC signal, and the decoding procedure will fail because the testing condition is the quality of the power of the cover and the stego pitches.However, if the decoding procedure is adequately modified to inspect all amplitudes of b i f ap s, it is possible to filter out the DC power and obtain all correct secrets.The inspection could be achieved as follows.Subtract all amplitudes of the selected b i f ap s and discover the most appeared value.The most appeared value is the power of DC, and the subtraction of the DC power from the stego pitch will successfully filter out the DC.
For HPF attack, all selected b i f ap s will be filtered out under a certain frequency f HP , that is, only those selected b i f ap s which are larger than f HP can survive after exploiting HPF on the stego pitch.However, it is a trick to avoid the sabotage of HPF; selecting higher b i f ap will outperform under HPF attack.Table 6 presents the performances under DC and HPF attacks compared with [17,40,41].Two types of the selected b i f ap s are used.In recent years, many steganography were proposed with considering the cracking probability [44][45][46][47][48][49][50][51] under a brute force guessing.The threating mode evaluated here is based on the intruders knowing algorithms but without the order R.In practice, R is a private key and it should be transmitted via a secure channel.If an intruder uses a computer and successfully detects all slight noises between the stego and cover pitch (the standard patterns are also public knowledge), he could dress all enhanced frequencies but without a correct order.This means that he knows which frequency is used to embed "1" or "0" but he cannot reconstruct the exact bit-stream without R. Absolutely, the only thing he can do is to try all combinations of these 1s and 0s with n 1s and i 0s where 0 ≤ n ≤ k and k is the number of selected frequencies.Denote the summation of the combinations as S(n).For n = 0, there are k possibilities of embedded bit-stream because an intruder does not obtain the length of R, so that S(0) = k.
For n = 1: It is the worst case of the successful guessing under brute force.For  = 1: For  =  − 3: ( − 3) =

Theoretical Analysis
This section describes some theoretical analyses of the proposed scheme.Theorem 1 is the theoretical boundary condition of the enhancement.The lower bound is according to the limit of the floating points provided by the software.The upper bound is decided by a user with selected number of capacity and expected distortion.Theorem 2 is the theoretical evaluation of the capacity, and it is according to the selected number of hidden secrets.Theorem 3 describes the theoretical time consumption of the proposed scheme.The parameter n is the total sampled data of a pitch and k is the number of secrets.The total computations include DFT, time of data embedding and inverse DFT.Theorem 4 describes the theoretical evaluate of the limit when error bits occur.Because the least bits of the proposed re-quantization are eight bits, this indicates that each datum needs at least

Theoretical Analysis
This section describes some theoretical analyses of the proposed scheme.Theorem 1 is the theoretical boundary condition of the enhancement.The lower bound is according to the limit of the floating points provided by the software.The upper bound is decided by a user with selected number of capacity and expected distortion.Theorem 2 is the theoretical evaluation of the capacity, and it is according to the selected number of hidden secrets.Theorem 3 describes the theoretical time consumption of the proposed scheme.The parameter n is the total sampled data of a pitch and k is the number of secrets.The total computations include DFT, time of data embedding and inverse DFT.Theorem 4 describes the theoretical evaluate of the limit when error bits occur.Because the least bits of the proposed re-quantization are eight bits, this indicates that each datum needs at least eight bits to represent the number.Therefore, if the bit-represent of the enhancement larger than eight, the suffix of the number will disappear when the down-re-quantization (16 to 8 bits) deployed.
Theorem 1.The practical boundary condition of σ is 1 + 10 −L < σ < 1 + 1 kSNR where L is the bit length of the floating point in the corresponding software; k is the number of the selected pitches, and SNR is the set quality of the stego-pitches.
Proof.The value of σ is considered here.Ideally, the range of σ is 1 to 2 because less noises is better and embedding a signal in a cover pitch that is identical to it is meaningless.A smaller σ is preferred because it supports lower distortion and the length of the floating point is limited such that lg(σ − 1) cannot exceed the size of the floating point, say L, which is defined by the software.Therefore, lg(σ − 1) < L, 1 < σ < 2 → 1 + 10 −L < σ (becasue 0 < σ − 1 < 1) .The practical upper bound is obtained by the setting P signal P stego −P signal > SNR (signal-to-noise ratio).The denominator P stego − P signal causes the embedded noise to stisfy 1 k(σ−1) > SNR.With fixed k and SNR, the upper bound of σ is obtained as 1 + 1 kSNR .
Theorem 2. The capacity of the proposed data-hiding scheme is O(k), where k is the number of selected pitches.
Proof.According to Figure 15, the capacity complexity is related to k, the number of selected pitches.
Proof.The DFT and inverse DFT cause a temporal bottleneck in the scheme.The best time consumption of DFT and inverse DFT is O(nlgn), and the time consumed by the proposed scheme, for embedding secrets is O(k).Because k frequencies are selected, the inverse DFT has a time cost of O(klgk) rather than O(nlgn).The total time complexity of the proposed scheme is addressed by the following theory.
Proof.The value of σ at which the error bit of re-quantization (16 to 8 bits) begins to be caused is of interest.The key point is the bit-size of σa i , so lgσ should be smaller than 8.A theory concerning the production of the error under re-quantization (16 to 8 bits) that corresponds to σ is provided.

Conclusions and Future Work
This work developed a data-hiding scheme that is based on a new cover medium, and synthesized pitches, which are popularly used to demonstrate initial versions of compositions conveniently and at low-cost.This data-hiding scheme relies on the similarity between synthesized pitches and real instrumental pitches to remove concern about the compromising of the hidden of data by audio distortion.To demonstrate the feasibility of the scheme, secret bits are embedded during the generation of simulation of instrumental pitches.Experimental results reveal that more secrets can be hidden without distortion.The proposed method differs from traditional data-hiding schemes in that the data embedding procedure causes insignificant signal distortion.Finally, comparisons of the BER values obtained herein and in related work under common attacks reveal that the scheme herein outperforms under some attacks.
The main restriction is that only one pitch can be used at the same time and it is expected that there shall be a more efficient scheme using several pitches at the same time.To achieve more applicability, multiple pitches with multiple instruments used at a single time slot shall be developed.Moreover, according to the progress of technologies of signal processing and the faster computation of computers, exploiting further robust strategies such as spread transform dither modulation is necessary to against attacks.

Figure 1 .
Figure 1.(a) An application of a user backing up his sensitive data on a cloud storage service; (b) an application of a user making digital rights; and (c) an application of a user achieving an un-perceptual communication.

Figure 1 .
Figure 1.(a) An application of a user backing up his sensitive data on a cloud storage service; (b) an application of a user making digital rights; and (c) an application of a user achieving an un-perceptual communication.

Figure 3 .
Figure 3.The time domain function of a real piano's Middle C.

Figure 3 .
Figure 3.The time domain function of a real piano's Middle C.

Figure 3 .Figure 2 .
Figure 3.The time domain function of a real piano's Middle C.

Figure 3 .
Figure 3.The time domain function of a real piano's Middle C.

Figure 4 .
Figure 4. Discrete Fourier Transform (DFT) of real piano Middle C.Figure 4. Discrete Fourier Transform (DFT) of real piano Middle C.

Figure 4 .
Figure 4. Discrete Fourier Transform (DFT) of real piano Middle C.Figure 4. Discrete Fourier Transform (DFT) of real piano Middle C.

Figure 5 .
Figure 5. Frequency spectra of all piano pitches.

Figure 5 .
Figure 5. Frequency spectra of all piano pitches.

Figure 7 .
Figure 7.The time domain function of Middle C of a simulated piano with  = 10.

Figure 7 .
Figure 7.The time domain function of Middle C of a simulated piano with  = 10.

Figure 7 .
Figure 7.The time domain function of Middle C of a simulated piano with k = 10.

Figure 7 .
Figure 7.The time domain function of Middle C of a simulated piano with  = 10.

Figure 9 .
Figure 9.An overview of the proposed scheme.

Figure 9 .
Figure 9.An overview of the proposed scheme.

Figure 12 .
Figure 12.Correlation between different k values and  values.

Figure 12 .
Figure 12.Correlation between different k values and  values.

Figure 12 .
Figure 12.Correlation between different k values and  values.

Figure 12 .
Figure 12.Correlation between different k values and σ values.

Figure 13 .
Figure 13.The curvy of capacity to .

Figure 13 . 20 Figure 13 .
Figure 13.The curvy of capacity to k

Figure 14 .
Figure 14.The curve of capacity (k) to distortion (PSNR) between real and simulated stego-pitches with different  values.

Figure 14 .
Figure 14.The curve of capacity (k) to distortion (PSNR) between real and simulated stego-pitches with different σ values.

Figure 15 .
Figure 15.The curve of computational performance to .

Figure 15 .
Figure 15.The curve of computational performance to k.

Figure 16 .
Figure 16.The curve of the trend of distortion and capacity: (a) proposed in [17]; and (b) the work presented by the authors.

Figure 16 .
Figure 16.The curve of the trend of distortion and capacity: (a) proposed in [17]; and (b) the work presented by the authors.
replace the corresponding frequencies of a pitch with different k selected.Another 50 frequencies are generated by normal distribution.In addition, a baseline of the theoretical upper bound is also drawn in the figure.

Figure 17 .
Figure 17.The bit error ratio (BER) performances of frequency cropping attack.

Figure 17 .
Figure 17.The bit error ratio (BER) performances of frequency cropping attack.

.
The above equations reveal a fact that the probability is the same of [S(0),S(k)], [S(1), S(k − 1)],[S(2), S(k − 2)], …, [S(⌊ is the worst case of the successful guessing under brute force.Figure18performs (/2) as an exponential function performed and the probability is then equal to

Table 1 .
The values of   ,   and     of Middle C of a piano with  = 10.

Table 1 .
The values of a i , b i and b i f ap of Middle C of a piano with k = 10.
Output: secret bit stream  1 ,  2 , … ,   Step 1: use standard pattern in Section 2.1 to obtain  1 ,  2 , … ,   of P Step 2: use standard pattern in Section 2.1 to obtain  1 ,  2 , … ,   of p Step 3: for each   and   , decode secret bit   as follows.

Table 2 .
The modified parameters after embedding secret 1001101101 in Middle C of a simulated piano with σ = 1.01.

Algorithm 3
Encoding Procedure Input: secret bit stream bt 1 , bt 2 , . . ., bt k , secret order R and reference instrumental pitch P Output: a stego-synthesized pitch Step 1: find a 1 , a 2 , . . ., a k and b 1 f ap , b 2 f ap , . . ., b k f ap by referencing P Step 2: for all R i , 1 ≤ i ≤ k, obtain a 1 , a 2 , . . ., a k as follows.if (bt i = 1) set a R i = σa R i else set a R i = a R i Step 3: use a 1 , a 2 , . . ., a k and b 1 f ap , b 2 f ap , . . ., b k f ap to create a pitch p

Table 3 .
The modified parameters after embedding secret 1001101101 in Middle C of a simulated piano with R = {3, 1, 7, 9, 2, 10, 4, 6, 5, 8} and σ = 1.01.Unlike in the first version, receivers no longer need the value of k but only the secret order R or random seed.reference pitch P and received pitch p Output: secret bit stream bt 1 , bt 2 , . . ., bt k Step 1: use standard pattern in Section 2.1 to obtain A 1 , A 2 , . . ., A k of P Step 2: use standard pattern in Section 2.1 to obtain a 1 , a 2 , . . ., a k of p Step 3:

Table 4 .
The modified parameters after embedding secret 1001101101 in Middle C of a simulated piano using the alternating current (AC) algorithm with R = {3, 1, 7, 9, 2, 10, 4, 6, 5, 8} and σ = 1.01.bt 1 , bt 2 , . . ., bt k , secret order R and reference instrumental pitch P Output: a stego-synthesized pitch Step 1: find a 1 , a 2 , . . ., a k and b 1 f ap , b 2 f ap , . . ., b k f ap by referencing P Step 2: a 2 , . . ., a k and b 1 f ap , b 2 f ap , . . ., b k f ap to create a pitch p Input:secret order R, reference pitch P and received pitch p Output:secret bit stream bt 1 , bt 2 , . . ., bt k Step 1: use standard pattern in Section 2.1 to obtain A 1 , A 2 , . . ., A k of P Step 2: use standard pattern in Section 2.1 to obtain a 1 , a 2 , . . ., a k of p Step 3: for all R i , 1 ≤ i ≤ k, decode secret bit bt i with reference to the following condition: if (A R i = a R i ) set bt i = 1 else set bt i = 0 Step 4: concatenate b 1 , b 2 , . . ., b i , . . ., b k , 1 ≤ i ≤ k to form a bit stream B Step 5: return B

Table 5 .
The bit error ratio (BER) of the proposed methods and related work.LPF: low pass filter; HQ: hard quantization; SQ: soft quantization.

Table 5 .
The bit error ratio (BER) of the proposed methods and related work.LPF: low pass filter; HQ: hard quantization; SQ: soft quantization.

Table 6 .
The BER of the proposed methods and related work.HPF: high pass filter; DC: direct current.
∅: Not mentioned by the corresponding authors.

Table 6 .
The BER of the proposed methods and related work.HPF: high pass filter; DC: direct current.
∅: Not mentioned by the corresponding authors.