Steganalysis of Inactive Voice-Over-IP Frames Based on Poker Test

This paper concentrates on the detection of steganography in inactive frames of low bit rate audio streams in Voice over Internet Protocol (VoIP) scenarios. Both theoretical and experimental analyses demonstrate that the distribution of 0 and 1 in encoding parameter bits becomes symmetric after a steganographic process. Moreover, this symmetry affects the frequency of each subsequence of parameter bits, and accordingly changes the poker test statistical features of encoding parameter bits. Employing the poker test statistics of each type of encoding parameter bits as detection features, we present a steganalysis method based on a support vector machine. We evaluate the proposed method with a large quantity of speech samples encoded by G.723.1 and compare it with the entropy test. The experimental results show that the proposed method is effective, and largely outperforms the entropy test in any cases.


Introduction
Steganography is a technique of covert communication by embedding secret messages into seemingly innocent digital media such as audio [1][2][3], image [4][5][6][7][8][9][10] and video [11,12].Like other security techniques, such as encryption [13,14], the misuse of steganography by lawbreakers will pose a threat to network security and public safety.To confront this challenge, its countermeasure, steganalysis, has received increasing attention.The aim of steganalysis is to detect, extract and destroy the secret messages embedded in digital media, where determining whether the suspicious media contain secret messages is the precondition of other operations [15].
In recent years, Voice over VoIP has emerged as a popular communication service over the Internet for its convenience and instantaneity.With the widespread application of VoIP, researchers have paid more and more attention to VoIP-based steganography.Compared with traditional carriers, there are many advantages for VoIP-based carriers, such as immediacy, high steganographic bandwidth and alterable steganographic length [15].In general, VoIP-based steganography can be classified into the following two categories.One employs the relevant protocols of VoIP as carriers, for example, Mazurczyk and Szczypiorski [16] used the redundant data area in session initiation protocol (SIP) to embed secret messages and Forbes [17] created a covert channel by modifying the timestamp in real-time transport protocol (RTP).The other is to embed the secret messages in VoIP payload, which attracted more attention from the research community last decade [15] for its higher steganographic bandwidth.Low bit rate codecs are widely applied in VoIP for its high compression ratio; most of the steganographic algorithms in VoIP payload are conducted on them.For most low bit rate codecs, there are three feasible embedding domains, including fixed codebook (FCB) [18][19][20], liner prediction coefficients (LPC) [21][22][23] and adaptive codebook (ACB) [24][25][26][27].For example, Geiser and Vary [18] presented a steganography by modifying the FCB search strategy to embed secret messages during the encoding process, and the embedding capacity can reach up to 35 bits per subframe with adaptive multi-rate (AMR) 12.2 kbit/s mode.Later, Miao et al. [19] proposed another steganography to limit the pulse positions in FCB to embedding secret messages and further introduced an embedding factor to control the embedding capacity.Liu et al. [22] introduced the genetic algorithm into Vector Quantization (VQ) division and replaced the quantization index set of LPC with secret messages, which has a better performance than random division of VQ.Huang et al. [25] proposed an embedding algorithm with high steganographic capacity, which was accomplished by adjusting the closed-loop pitch period range of a subframe according to secret message bits.Due to it being integrated into an encoding process, there is no delay when embedding and extraction.Janicki et al. [27] proposed a steganography algorithm based on approximating the F0 parameter of pitch in a speex codec, which can be applied without any steganographic cost.Recently, Huang et al. [28] improved the Voice Activity Detection (VAD) algorithm to keep the VAD result invariant after steganography and modified several types of parameter bits in inactive frames in G.723.1 with 6.3 kbit/s mode to embed the secret messages, whose steganographic bandwidth can reach up to 101 bits per frame.Lin [29] proposed another improved VAD algorithm to keep the VAD result unchanged after steganography and extended Huang's method to 5.3 kbit/s mode.
As for VoIP steganalysis, there is no universal detection method currently, but some effective steganalysis methods [30][31][32][33][34][35][36][37] have been proposed to detect steganographic algorithms, which modify specific encoding parameters.For example, Li et al. [30] pointed out that the codeword of LPC became asymmetrical after embedding secret messages and proposed a quantization codework correlation network model to detect LPC-based steganography, which has a good detection performance even in short sample length.Lin et al. [31] first introduced the recurrent neural network (RNN) into steganalysis and designed a two-layer network to detect the LPC-based steganography.The experimental results show that Lin's method [31] has better detection performance than Li's method [30] and can achieve a good detection accuracy when the sample length is only 0.1 s at the embedding rate of 100%.After analyzing the search rule of pitch delay, a Markov matrix of the second-order difference of pitch delay was presented as steganalysis features by Ren et al. [32] to detect steganography in ACB.According to the short-time stability of speech, Tian et al. [33] proposed a series of steganalysis features to completely describe the characteristics among the pulse positions in FCB, which included the Markov matrix of the pulse positions in the same subframe and the joint probability distributions of pulse positions among different subframes.Because the steganalysis feature vector had a high feature dimension, adaptive boost was applied to feature selection.The experimental results show that Tian's method [33] outperforms the state of the arts [34,35].However, for Huang's [28] and Lin's [29] methods, several types of encoding parameter bits in inactive frames are modified and there is no effective detection method currently.To fill this gap, we analyze the effect of steganography on the encoding parameter bits and present a support vector machine based steganalysis method with a large number of speech samples encoded by G.723.1.Specifically, our contribution in this work can be summarized as follows: (1) we present a steganalysis method of inactive voice-over-IP frames based on poker test, which is the first work aiming to detect steganography of inactive VoIP frames; (2) we analyze the impacts on parameter bits induced by the steganographic process, and model the steganalysis feature using the poker test statistics of different parameters; and (3) we comprehensively evaluate the detection performance of the presented scheme by experiments and comparisons with the traditional entropy test [38].The experimental results demonstrate that our scheme can effectively detect the steganography of inactive VoIP frames, and significantly outperforms the entropy test.
The rest of this paper is organized as follows.Section 2 introduces the improved VAD algorithm and the steganography in the inactive frames.In Section 3, theoretical analyses of the proposed features have been presented.The support vector-machine based steganalysis method is presented in Section 4. Experiments and performance analysis are given in Section 5. Finally, Section 6 concludes the paper.

Improved VAD Algorithm
ITU G.723.1 [39] is a hybrid codec with two encoding modes: 5.3 kbit/s mode and 6.3 kbit/s mode and each frame is coded into various parameters.The length of each frame of both modes is 30 ms, and the speech bits of each frame are 160 bits with 5.3 kbit/s mode and 192 bits with 6.3 kbit/s mode.The bit allocation of encoding parameters of each frame is listed in Tables 1 and 2.
VAD is to determine whether the current frame is active or inactive by comparing the energy of the current frame with a threshold [39].The inactive state of frames may be affected by steganography, so it is necessary to keep the VAD results of the sender and the receiver consistent.However, the original VAD algorithm is related to the previous frame of the current frame.If the inactive state of the previous frame is impacted by embedding secret messages, the inactive state of the current frame may be also affected when applied original VAD algorithm to detect inactive frames, which leads to an altered VAD result.To solve the problem, Huang et al. [28] improved the algorithm of autocorrelation coefficients, which made the autocorrelation coefficients of the current frame were independent with the previous frame.Then, the stateless coefficients were used to calculate residual energy, which was then compared with the threshold, so the VAD results kept invariable after steganography.Due to the silence compression function being optional for G.723.1 codec and there being some bits in the VoIP packet header to direct whether to use the silence compression function during encoding process, Lin [29] used the one free bit generated by disabling the silence compression function to mark the inactive frames embedded with secret messages.When decoding the VoIP streams at the receiver end, there is no need to run the VAD algorithm again and the secret messages are extracted from the inactive frames that have been marked.

Steganography in Inactive Frame
Huang et al. [28] and Lin [29] selected the parameters to embed secret messages by evaluating the effects after being modified.There are 101 bits per frame for steganography with 6.3 kbit/s mode, 81 bits per frame for steganography with 5.3 kbit/s mode, and the suitable parameters for steganography are listed in Tables 3 and 4. The selected parameters are used to embed the secret messages with the following algorithm that involves three steps: Step 1: Voice activity detection.Speech samples are divided into frames, and each frame is input into the VAD detector, where the inactive frames are marked with a tag.
Step 2: Encoding and embedding secret messages in inactive frames.All frames are encoded without applying silence compression function.If the frame has been marked in Step 1, suitable parameters of the frame will be embedded with secret messages.
Step 3: Encapsulation and send.All the frames are encapsulated in VoIP packets, which are transmitted over the Internet.

Steganalysis Based on Poker Test
Poker Test [40] is a technique to determine whether a given bit sequence satisfies the characteristics of a truly random sequence.Let X be a bit sequence of length n and m be the length of a subsequence of X such that For a given bit sequence X, it can be divided into k non-overlapping subsequences each of length m, which can be written as Let F i be the frequency of the i-th type of subsequence of length m, where 1 ≤ i ≤ 2 m .The poker test statistic is defined as which approximately follows a χ 2 distribution with 2 m − 1 degrees of freedom.In the proposed steganalysis method, each parameter suitable for steganography in all inactive frames can form a bit sequence.The bit sequence can be considered as consisting of a series of subsequences, which can be expressed as X = S 1 , S 2 , ..., S j , (1 where S j is the subsequence of X of length m.Denote P as the set which contains all the 2 m types of subsequences.Then, the frequency of the i-th type of subsequence can be calculated by where I(x) is expressed as Let b i be the i-th bit in X, the probabilities b i = 1 and b i = 0 are denoted as p (b i = 1) and p (b i = 0), respectively; denote the embedding rate as r, the probabilities for b i = 1 and b i = 0 after steganography are p'(b i = 1) and p'(b i = 0), which can be expressed as By subtracting the above two equations, we can obtain From Equation ( 9), it can be concluded that the distribution of 0 and 1 in X tends to be symmetric as the embedding rate increases.When the distribution of 0 and 1 becomes symmetric and the bit sequence is long enough, the values of F i will be nearly equal.For example, let m = 2, there are four types of subsequences ({0,0}, {0,1}, {1,0}, {1,1}) and the probability of each subsequence is approximately equal to 0.25.Based on this, the values of F i satisfy the following equations: According to the Cauchy-Buniakowsky-Schwarz inequality [41], we can obtain: Putting Equation (12) into Equation (3), we can obtain: However, because the length of X is limited, the value of V is generally non-zero.We can still reach the conclusion from the above analyses that the poker test statistic of X tends to decrease with the increase of the embedding rate.To verify the inference, we calculate the poker test statistics of each parameter suitable for steganography at different embedding rates (from 0.1 to 1.0).Table 5 shows the poker test statistics of each parameter suitable for steganography at different embedding rates with 6.3 kbit/s mode.Apparently, although the embedding secret messages have different effects on different parameter sequences, the poker test statistics tend to decrease with the increase of embedding rate.Therefore, the poker test statistics of each parameter sequence can be applied as steganalysis features.

SVM-Based Steganalysis Method
In this section, the steganalysis method based on the support vector machine (SVM) has been presented, and the proposed features in Section 3 are employed as the steganalysis features.The steganalysis method includes a training process and a detection process.Specifically, the training process is divided into three steps as follows: Step Similarly, the detection process contains two steps as follows: Step 1: Feature extraction.Extract the proposed features from the samples to be detected.
Step 2: Decision-making.Input the features extracted in Step 1 into the trained SVM classifier to determine whether the samples to be detected contain secret messages according to the classification results.

SVM-Based Steganalysis Method
In this section, the steganalysis method based on the support vector machine (SVM) has been presented, and the proposed features in Section 3 are employed as the steganalysis features.The steganalysis method includes a training process and a detection process.Specifically, the training process is divided into three steps as follows: Step 1: Sample preparation.Collect a great quantity of speech samples encoded by G.723.1 with both encoding modes and embed secret messages with the steganography in Section 2.2 at different embedding rates.Similarly, the detection process contains two steps as follows: Step 1: Feature extraction.Extract the proposed features from the samples to be detected.
Step 2: Decision-making.Input the features extracted in Step 1 into the trained SVM classifier to determine whether the samples to be detected contain secret messages according to the classification results.

Experiment Setup and Performance Evaluation
In this paper, we gather a large number of speech samples with a length of 10 s (333 frames) from language-learning lessons to evaluate the performance of the proposed method without loss generality.Specifically, the experimental dataset consists of 2200 speech samples, which are 8000 Hz sampled and 16-bit quantized.All of these samples involve four types, namely, Chinese male speech samples, Chinese female speech samples, English male speech samples and English female speech samples.Note that, in the experiments, we only focus on the inactive frames in these speech samples.

Experiment Setup and Performance Evaluation
In this paper, we gather a large number of speech samples with a length of 10 s (333 frames) from language-learning lessons to evaluate the performance of the proposed method without loss generality.Specifically, the experimental dataset consists of 2200 speech samples, which are 8000 Hz sampled and 16-bit quantized.All of these samples involve four types, namely, Chinese male speech samples, Chinese female speech samples, English male speech samples and English female speech samples.Note that, in the experiments, we only focus on the inactive frames in these speech samples.The distribution of inactive frames is shown in Figure 2, and secret messages are produced randomly.In this paper, the SVM with radial basis function (RBF) kernel is implemented based on LibSVM [42] in C-style, where c = 1 and g = 1/1064.Meanwhile, half samples are used to train the classifiers, and the other half are used to test the performance of the proposed method.The distribution of inactive frames is shown in Figure 2, and secret messages are produced randomly.In this paper, the SVM with radial basis function (RBF) kernel is implemented based on LibSVM [42] in C-style, where c = 1 and g = 1/1064.Meanwhile, half samples are used to train the classifiers, and the other half are used to test the performance of the proposed method.The poker test statistics of all suitable parameters for steganography are calculated with m = 2, which reaches the best detection performance in our experiments and satisfies Equation (1).Since there is no detection method to detect the targeted steganography, we compare our method with the entropy test [38], which has been already used to detect symmetry [43] and steganography [34,36].In the entropy test, all of the parameters suitable for steganography can form a sequence; then, the entropies of the eight types of binary sequences ({0,0,0}, {0,0,1}, … {1,1,1}) are calculated respectively as steganalysis features.Employing the entropies of the eight types of binary sequences as steganalysis features reaches the best detection performance in our experiments.
The performances of both the steganalysis methods are evaluated by accuracy (ACC), false positive rate (FPR) and false negative rate (FNR).The accuracy is the percentage of the samples that are correctly classified in the total of test samples.ACC can be calculated by where NTP is the quantity of true positives, namely, the quantity of steganographic samples identified as steganographic samples; NTN is the quantity of true negatives, namely, the quantity of cover samples identified as cover samples; NFP is the quantity of false positives, namely, the quantity of cover samples identified as steganographic samples; NFN is the quantity of false negatives, namely, the quantity of steganographic samples identified as cover samples.False positive rate (FPR) is the probability of false positives in the total number of negatives, which can be expressed as where the sum of NTN and NFP is the total number of negatives, that is, the total number of cover samples.False negative rate (FNR) is the probability of false negatives in the total number of positives, which can be expressed as where the sum of NTP and NFN is the total number of positives, that is, the total number of steganographic samples.The poker test statistics of all suitable parameters for steganography are calculated with m = 2, which reaches the best detection performance in our experiments and satisfies Equation (1).Since there is no detection method to detect the targeted steganography, we compare our method with the entropy test [38], which has been already used to detect symmetry [43] and steganography [34,36].In the entropy test, all of the parameters suitable for steganography can form a sequence; then, the entropies of the eight types of binary sequences ({0,0,0}, {0,0,1}, . . .{1,1,1}) are calculated respectively as steganalysis features.Employing the entropies of the eight types of binary sequences as steganalysis features reaches the best detection performance in our experiments.
The performances of both the steganalysis methods are evaluated by accuracy (ACC), false positive rate (FPR) and false negative rate (FNR).The accuracy is the percentage of the samples that are correctly classified in the total of test samples.ACC can be calculated by where N TP is the quantity of true positives, namely, the quantity of steganographic samples identified as steganographic samples; N TN is the quantity of true negatives, namely, the quantity of cover samples identified as cover samples; N FP is the quantity of false positives, namely, the quantity of cover samples identified as steganographic samples; N FN is the quantity of false negatives, namely, the quantity of steganographic samples identified as cover samples.False positive rate (FPR) is the probability of false positives in the total number of negatives, which can be expressed as where the sum of N TN and N FP is the total number of negatives, that is, the total number of cover samples.False negative rate (FNR) is the probability of false negatives in the total number of positives, which can be expressed as where the sum of N TP and N FN is the total number of positives, that is, the total number of steganographic samples.

Performance and Analysis
In our steganalysis experiments, 2200 ten-second samples are embedded with secret messages at different embedding rates (from 0.1 to 1.0), respectively.Figure 3 shows the experimental results of detection accuracy, FPR and FNR at different embedding rates.

Performance and Analysis
In our steganalysis experiments, 2200 ten-second samples are embedded with secret messages at different embedding rates (from 0.1 to 1.0), respectively.Figure 3 shows the experimental results of detection accuracy, FPR and FNR at different embedding rates.
From these charts, we can reach the following conclusions: first, for both of the detection methods, the detection accuracy increases as the embedding rate increases, which means that the detection performance is positively correlated with an embedding rate of a given steganography.Furthermore, from Figure 3a,d, it can be observed that the difference of detection accuracy between entropy test and the poker test also increases as the embedding rate increases.Particularly, for entropy tests, the accuracy is nearly 85% with 5.3 kbit/s at the embedding rate of 100% while the poker test can reach the same level of accuracy at the embedding rate of 60%, which means that the proposed method has much better detection performance than an entropy test.Second, FPR and FNR decrease according to the increase of the embedding rate and the proposed method has lower FPR and FNR than an entropy test.The abnormality at low embedding rates may be that the parameter bits are ordered in the initial state and when the embedding rate is low, few parameters are modified, which lead to the parameters bits slightly disordered; with the embedding rate increasing, more parameters are replaced by secret messages, which makes the parameter bits become ordered again.
To further evaluate the performances of steganalysis methods, the receiver operating characteristic (ROC) curves at the embedding rates of 30%, 60%, 100% are drawn in Figure 4. To conduct ROC curves, True Positive Rate (TPR) and True Negative Rate (TNR) need to be calculated firstly.TPR is the proportion of true positives out of all positives, which is calculated by TNR is the proportion of true negatives out of all negatives, which is calculated by From these charts, we can reach the following conclusions: first, for both of the detection methods, the detection accuracy increases as the embedding rate increases, which means that the detection performance is positively correlated with an embedding rate of a given steganography.Furthermore, from Figure 3a,d, it can be observed that the difference of detection accuracy between entropy test and the poker test also increases as the embedding rate increases.Particularly, for entropy tests, the accuracy is nearly 85% with 5.3 kbit/s at the embedding rate of 100% while the poker test can reach the same level of accuracy at the embedding rate of 60%, which means that the proposed method has much better detection performance than an entropy test.
Second, FPR and FNR decrease according to the increase of the embedding rate and the proposed method has lower FPR and FNR than an entropy test.The abnormality at low embedding rates may be that the parameter bits are ordered in the initial state and when the embedding rate is low, few parameters are modified, which lead to the parameters bits slightly disordered; with the embedding rate increasing, more parameters are replaced by secret messages, which makes the parameter bits become ordered again.
To further evaluate the performances of steganalysis methods, the receiver operating characteristic (ROC) curves at the embedding rates of 30%, 60%, 100% are drawn in Figure 4. To conduct ROC curves, True Positive Rate (TPR) and True Negative Rate (TNR) need to be calculated firstly.TPR is the proportion of true positives out of all positives, which is calculated by TNR is the proportion of true negatives out of all negatives, which is calculated by TNR = N TN N TN + N FP (18) The results reconfirm that the proposed method has better performance than the entropy test.Moreover, we can get another conclusion from Figure 4 that the proposed method with 5.3 kbit/s mode slightly outperforms that with 6.3 kbit/s mode.
The results reconfirm that the proposed method has better performance than the entropy test.Moreover, we can get another conclusion from Figure 4 that the proposed method with 5.3 kbit/s mode slightly outperforms that with 6.3 kbit/s mode.that the proposed method is effective and achieves much better detection performance than the entropy test.

1 : 2 : 1 . 3 :
Sample preparation.Collect a great quantity of speech samples encoded by G.723.1 with both encoding modes and embed secret messages with the steganography in Section 2.2 at different embedding rates.Step Feature extraction.Extract the proposed features in Section 3, of which the extraction process is shown in Figure Step Classifier training.Train the SVM classifier with the feature vector built in Step 2.

Step 2 : 1 . 3 :
Feature extraction.Extract the proposed features in Section 3, of which the extraction process is shown in Figure Step Classifier training.Train the SVM classifier with the feature vector built in Step 2.

Figure 1 .
Figure 1.The extraction process of steganalysis features.

Figure 1 .
Figure 1.The extraction process of steganalysis features.

Figure 2 .
Figure 2. The distribution of inactive frames.

Figure 2 .
Figure 2. The distribution of inactive frames.

Figure 4 .
Figure 4.The receiver operating characteristic (ROC) curves at various embedding rates with both encoding modes.(a) ROC at embedding rate of 30%; (b) ROC at embedding rate of 60%; (c) ROC at embedding rate of 100%; (A) the ROC with 5.3 kbit/s mode; (a) ROC at embedding rate of 30%; (b) ROC at embedding rate of 60%; (c) ROC at embedding rate of 100%; (B) the ROC with 6.3 kbit/s mode.

Table 3 .
[28]meters of the inactive frame suitable for embedding secret messages with 6.3 kbit/s mode[28].

Table 4 .
[29]meters of the inactive frame suitable for embedding secret messages with 5.3 kbit/s mode[29].

Table 5 .
Poker test statistics with 6.3kbit/s mode at various embedding rates.

Table 5 .
Poker test statistics with 6.3kbit/s mode at various embedding rates.