Design and ARM-Based Implementation of Bitstream-Oriented Chaotic Encryption Scheme for H.264/AVC Video

In actual application scenarios of the real-time video confidential communication, encrypted videos must meet three performance indicators: security, real-time, and format compatibility. To satisfy these requirements, an improved bitstream-oriented encryption (BOE) method based chaotic encryption for H.264/AVC video is proposed. Meanwhile, an ARM-embedded remote real-time video confidential communication system is built for experimental verification in this paper. Firstly, a 4-D self-synchronous chaotic stream cipher algorithm with cosine anti-controllers (4-D SCSCA-CAC) is designed to enhance the security. The algorithm solves the security loopholes of existing self-synchronous chaotic stream cipher algorithms applied to the actual video confidential communication, which can effectively resist the combinational effect of the chosen-ciphertext attack and the divide-and-conquer attack. Secondly, syntax elements of the H.264 bitstream are analyzed in real-time. Motion vector difference (MVD) coefficients and direct-current (DC) components in Residual syntax element are extracted through the Exponential-Golomb decoding operation and entropy decoding operation based on the context-based adaptive variable length coding (CAVLC) mode, respectively. Thirdly, the DC components and MVD coefficients are encrypted by the 4-D SCSCA-CAC, and the encrypted syntax elements are re-encoded to replace the syntax elements of the original H.264 bitstream, keeping the format compatibility. Besides, hardware codecs and multi-core multi-threading technology are employed to improve the real-time performance of the hardware system. Finally, experimental results show that the proposed scheme, with the advantage of high efficiency and flexibility, can fulfill the requirement of security, real-time, and format compatibility simultaneously.


Introduction
Real-time video communication is widely used in military, business, entertainment, and personal social activities, such as video conferencing, live video, and video surveillance [1][2][3][4]. However, the openness of the internet leads to many security risks in video communication. Cybersecurity incidents such as malicious attacks, illegal access, information leakage, theft, and evil tampering are frequently reported. Many studies have shown that end-to-end encryption is an effective way to protect cyber information and personal privacy [5][6][7][8][9]. Only communicating parties with the right secret keys can easily decrypt ciphertexts, while third parties without the matched secret keys cannot obtain plaintexts even if they obtain the ciphertexts. Therefore, the end-to-end encryption is a necessary means for real-time video confidential communication.
To realize the end-to-end encryption for the communication video, security, real-time, and format compatibility are important technical indicators that must be met simulta-In the existing video encryption scheme for H.264/AVC, they can be classified into full encryption (FE) [32] and selective encryption (SE) [16,[30][31][32]. The FE method is to indiscriminately encrypt the entire encoded video information, which will cause the format information of the encoded video to be destroyed [17]. The SE method is to selectively encrypt part of the critical information in the encoded video, and the selected critical information will not affect the encoded video format after being encrypted. Compared with the FE method, the SE method has less encrypted data and can maintain the format of encrypted video. Thus, the SE scheme is an effective method to realize the security, real-time, and format compatibility of the video confidential communication system.
According to the different relationships between encryption algorithms and video codecs, SE methods can be further subdivided into compression-integrated encryption (CIE) and bitstream-oriented encryption (BOE). The feature of CIE is that it has a coupling structure embedding the encryption algorithm into the video codec, where the video information is encrypted during encoding. The feature of BOE is that it has an independent architecture separating the encryption algorithm and the video codec, where the video information is encrypted after being encoded. Chen et al. [32] and Zhang et al. [36] designed a video confidential communication system based on the CIE scheme. The encryption algorithm was embedded into software codecs, and the syntax elements in the H.264 bitstream were encrypted during the encoding process. The experimental results prove that the CIE scheme can realize that encrypted videos keep the H.264 format, fulfilling the format compatibility. However, under the 640 × 480 video resolution, the video transmission frame rate in [32,36] cannot meet the basic real-time requirement (25 f/s) in the actual video communication. The main reason for this result is that software codecs with low computational efficiency seriously damage real-time performance. Besides, the CIE scheme lacks flexibility owing to the dependency between the encryption algorithm and the software codecs. Any standard codecs, no matter the software or hardware, are not applicable in the CIE scheme. Only the special software codecs customized with the encryption algorithm are suitable for it. Compared with the CIE scheme, the BOE scheme is more flexible than the CIE scheme because any standard codecs can be used to perform the coding operation in the independent architecture. Arachchi et al. [5] proposed a standalone encryption method to achieve end-to-end security adaptation requirements. The standalone encryption method separated the video encoding and video encryption process. Meanwhile, it performed a series operation of analysis, extraction, encryption, and encoding on the H.264 bitstream, so it can be classified as the BOE method. References [3,18,23] proposed a similar encryption method for H.264 bitstream, which retained the 'header' information of the H.264 bitstream and encrypted the rest of the data. When the type of 'header' is revised as unspecific, the encrypted bitstream will be bypassed without decoding to maintain format compatibility. Although this method performed encryption operation for the H.264 bitstream with an independent architecture of the encryption algorithm and codec, it is still far from the BOE method. In fact, the method is more like FE than SE due to the main information of the encoded videos being encrypted indiscriminately. Its biggest flaw comes from the destruction on the syntactic structure of macroblocks. Boyadjis et al. [37] and Cheng et al. [4] encrypted syntax elements extracted from H.264 bitstream and generated encrypted video with format compatibility. They all adopted the combination of BOE method and AES block cipher to design a video encryption system. AES algorithm has to construct a fixed-length packet for encryption and even fill up the package with padding data when the packet data is insufficient. The data filling operation will increase computational load and the amount of encrypted data. Moreover, the encryption test objects are static video files rather than real-time video streaming media. The experiment is conducted by simulation on PC rather than by real test on the embedded hardware platform under the actual network environment. Therefore, they do not provide strong proof of feasibility, effectiveness, and superiority for the BOE method. To summarize, the comparison of the features of BOE, CIE, and FE methods are given in Table 1. To address the above issues, for H.264/AVC video, an improved BOE is proposed and verified in an actual network environment upon the ARM-embedded hardware platform. The main contributions and novelties of this work can be summarized as follows: (1) A 4-D SCSCA-CAC is proposed, which can effectively resist the cryptanalysis method combining a chosen-ciphertext attack with a divide-and-conquer attack. In addition, the chaotic bit sequences generated by 4-D SCSCA-CAC for encrypting video information have passed the NIST and TESTU01 test.
(2) An improved BOE method based on chaotic stream ciphers is proposed, which separates apart the hardware codec and 4-D SCSCA-CAC to form an independent architecture. The proposed scheme can achieve the format compatibility of encrypted videos and balance the technical contradiction between security and real-time performance.
(3) An ARM-based hardware system for real-time video confidential communication is designed and implemented, whose experimental results give a practical and objective evaluation for the critical issues about security, real-time, and format compatibility, verifying feasibility and effectiveness for the BOE method.
The rest of the paper is organized as follows: Section 2 introduces the design of the improved chaotic stream cipher and its security analysis. Section 3 describes the design and ARM-based implementation of the bitstream-oriented chaotic encryption scheme. Section 4 presents the experimental results and test analysis. Section 5 concludes the paper.
According to the structure characteristics of n-D SCSCA (n = 3, 7, 8), and combining with the cryptanalysis process of chosen-ciphertext attack and divide-and-conquer attack, the security loopholes of n-D SCSCA (n = 3, 7, 8) are analyzed in this subsection.
The algorithm structure of n-D SCSCA (n = 3, 7, 8) and corresponding cryptanalysis methods are shown in Table 2.
Combination of chosenciphertext attack and DCA-TSNCIC [22] Except the secret keys multiplied with the ciphertext and anticontroller secret keys, the rest of the secret keys in the encryption algorithms are deciphered.
According to the above analysis, the main problems of n-D SCSCA (n = 3,7,8) can be summarized as follows: (1) To achieve self-synchronization, the ciphertext need to be fed back into the anticontrollers of the chaotic system, and the anti-controller types of n-D SCSCA (n = 3,7,8) are sine or mod function. In this case, when the cryptanalyst set ciphertext information as p(k) = 0 by the chosen-ciphertext attack, the chaotic system will be degenerated into a linear system, and the controller secret keys are eliminated.
(2) In actual channel communications, initial conditions of arbitrary chaotic variables at the receiver can achieve asymptotic synchronization with the sender. Therefore, the initial conditions of the chaotic variables belong to the weak secret keys, and the cryptanalyst can arbitrarily select the initial conditions conducive to cryptanalysis. Besides, n-D SCSCA (n = 3,7,8) all use the lower eight-bits of a single chaotic variable or the multiplication result of multiple chaotic variables as the chaotic pseudo-random sequence. With the chosenciphertext attack, when all initial conditions are set as a same non-zero constant, a constant common factor can be extracted from the decryption expression. Consequently, by setting the appropriate value of the constant, the cryptanalyst can use the divide-and-conquer attack to obtain the complete information of the secret key expression. After acquiring a sufficient number of nonlinear equations about secret keys, the original secret keys can be successfully deciphered through solving the nonlinear equations.

Design of 4-D Chaotic Cipher Algorithm
To deal with the loopholes of SCSCA analyzed in Section 2.1, in this paper, one proposes an improved scheme named 4-D SCSCA-CAC, which can effectively against the combination of the chosen-ciphertext attack and the divide-and-conquer attack.

A. Design An Asymptotically Stable Nominal System
An uncontrolled 4-D discrete-time linear nominal system is expressed as: where According to the design principle of discrete-time chaotic system [38], all eigenvalues of matrix A must be located inside the unit circle on the complex plane to make the nominal system asymptotically stable.
First, the MATLAB rand() function is used to generate a 4-D full rank matrix with uncorrelated random values in the range of (−1, 1). Next, a orthonormal matrix Q is obtained by orthonormalizing the 4-D matrix using the orth() function. Finally, since all eigenvalues of orthonormal matrix are located on the unit circle on the complex plane, when matrix Q is multiplied by a coefficient less than 1, a matrix A with all eigenvalues located inside the unit circle on the complex plane can be obtained, given by: From Equation (11), one gets four characteristic roots of the matrix A in the nominal system (9), that are λ 1 = 0.9, λ 2 = 0.0558 + j0.8983, λ 3 = 0.0558 − j0.8983, λ 4 = −0.9. Thus, the nominal system (9) is asymptotically stable.

C. 4-D SCSCA-CAC
Note that, since application environment of the chaotic system is ARM-embedded platform in this paper, the following calculation operations are based on binary representation. The chaotic pseudo-random sequence s(k) used for encryption operation is expressed as: where · represents round-down operation, mod(·, 2 8 ) is used for intercepting the lower eight-bits on (x 1 (k) × x 2 (k) + x 3 (k))/2 16 to generate eight-bits chaotic pseudo-random sequence s(k). Then, the ciphertext p(k) can be expressed as: where m(k) represents the plaintext information, ⊕ represents the bitwise XOR operation.
To realize self-synchronization, the ciphertext p(k) needs to be fed back and substituted for the chaotic system to participate in the iterative calculation [19][20][21] , then 4-D SCSCA-CAC is designed as: The main features of 4-D SCSCA-CAC are summarized as follows: (1) Unlike the anti-controllers shown in Table 2, a cosine function is used as the anti-controller. With p(k) = 0, the controller parameters remain in the equation without being eliminated, which can increase the difficulty of the key parameters deciphering, especially increasing the ability to resist the divide-and-conquer attack. Note that σ i , ε i = 0 (i = 1, 2, 3) in Equation (16), consequently ε i cos(σ i p(k)) = 0 within the value range of {p(k)|0 ≤ p(k) ≤ 255, p(k) ∈ N}. Although Equation (16) degrades into a linear iterative equation with p(k) = 0, ε i can still retain in the linear iterative equation and provide a powerful condition to resist the divide-and-conquer attack.
(2) Note that an additive term is introduced in the modulo operation and the rounddown operation in Equation (14); resulting m(k) contains addition independent terms that do not multiply by the initial conditions, under the divide-and-conquer attack.

Security Analysis
From the chosen-ciphertext attack method, it can be known that when the security analysis of 4-D SCSCA-CAC is performed, the cryptanalyst can arbitrarily select the ciphertext that is beneficial to the decryption algorithm and obtain the corresponding plaintext. When the ciphertext is set as p(k) = 0, the mathematical expression of the linear iterative equation can be derived as: where k = 0, 1, 2, · · · . By substituting p(k) = 0 into Equation (15), which yields: From Equation (17), one can see that although the chaotic system is degenerated into a linear iterative equation with p(k) = 0, anti-controller secret keys ε i (i = 1, 2, 3) still retain in the linear iterative equation and will provide a powerful condition to resist divide-and-conquer attack.
In the decryption process, decryption end can achieve asymptotic synchronization with the encryption end under any given initial conditions, so the cryptanalyst can choose any initial conditions that are conducive to deciphering the encryption algorithm. In summary, when the ciphertext p(k) = 0 is set in the 4-D SCSCA-CAC, the initial condition value x i (0) = c i (i = 1, 2, 3, 4) can be arbitrarily selected to try to obtain the secret keys information by the divide-and-conquer attack. Next, DCA-TMNCIC with higher attack strength than DCA-TSNCIC is used to analyze the security of 4-D SCSCA-CAC.
Firstly, by substituting k = 0 into Equation (17), the first iteration result is given by: Then, by substituting k = 1 into Equation (18), the second decryption operation result is given by: With DCA-TMNCIC for 4-D SCSCA-CAC, the set of fifteen selection methods of initial conditions is as follow: Note that, in the case of choosing the ciphertext, the ciphertexts and corresponding plaintexts are both known. Therefore, by substituting the fifteen initial conditions in Equation (21) into equation Equation (19), one can obtain m i (1)(i = 1, 2, · · · , 15) from Equation (20) as follows: Then, by setting the initial conditions as a same non-zero constant c i = c 0 (i = 1, 2, 3, 4) into Equation (22), one gets: (a 12 a 22 + a 12 a 24 + a 14 a 22 + a 14 a 24 )c 2 0 + (a 12 a 21 ε 1 + a 14 a 21 ε 1 + a 32 + a 34 )c 0 + a 31 ε 2 (a 12 a 22 + a 12 a 23 + a 12 a 24 + a 13 a 22 + a 13 a 23 + a 13 a 24 + a 14 a 22 + a 14 a 23 +a 14 a 24 )c 2 0 + (a 12 a 21 ε 1 + a 13 a 21 ε 1 + a 14 a 21 ε 1 + a 32 + a 33 + a 34 )c 0 + a 31 ε 2 (a 11 a 22 + a 11 a 23 + a 11 a 24 + a 12 a 22 + a 12 a 23 + a 12 a 24 + a 13 a 22 + a 13 a 23 + a 13 a 24 + a 14 a 22 From Equation (23), the second decryption results m i (1) (i = 1, 2, · · · , 15) both contain an independent additive term (a 31 ε 2 /2 16 ) that does not multiply with the initial condition under fifteen different initial conditions. When the initial conditions are set as a same non-zero constant c 0 , the constant common factor multiplied by the secret key expression cannot be extracted from the decryption expression. Therefore, it is impossible to obtain the correct sub-block information of the secret key expression by selecting suitable values of initial conditions c i (i = 1,2,3,4). Similarly, as the number of iterations k increases, m i (k) (k = 2, 3, · · · ) also contain the above independent addition term. Therefore, the method that combines the chosen-ciphertext attack and the divide-and-conquer attack proposed in [22,35] fails in this case.
In summary, the improved 4-D SCSCA-CAC proposed in this paper is safe against the combinational effect of the chosen-ciphertext attack and the divide-and-conquer attack .

Implementation of a Bitstream-Oriented Encrypted Video Communication System
Transmission frame rate is an important indicator to represent the real-time performance of video communication systems, and 25 f/s is a basic requirement in actual application scenarios. Thus, promoting real-time performance is a primary goal in system design. In this paper, an optimized design is proposed that can be conducted from three aspects: (1) improve the BOE method for further reducing the computational load; (2) utilize the hardware codec accelerator on the chip to speed the system; and (3) adopt the multi-core and multi-threading technology to improve system efficiency in parallel work.

Overall Design Scheme
The overall design scheme of the real-time video communication system based on the ARM platform and the BOE method is shown in Figure 2. On the sender, firstly, YUV raw videos are captured from the camera continuously and then encoded as the H.264 bitstream by hardware codec. Secondly, the H.264 bitstream is encrypted through the BOE method. Thirdly, the encrypted bitstream is sent to the receiver through Ethernet. On the receiver, the encrypted bitstream is received from the sender through Ethernet and then decrypted into the original H.264 bitstream through the bitstream-oriented decryption (BOD) method. Fourthly, the original bitstream is decoded into the original YUV videos by hardware codec. Finally, the original videos are displayed on the screen. When K is switched to channel 1, the encrypted bitstream can be decrypted correctly. When K is switched to channel 2, the encrypted bitstream is bypassed without decryption. Most significantly, the encrypted bitstream can be successfully decoded and displayed, which suggests that the encrypted bitstream keeps the H.264 format compatibility and can be transcoded by servers when it is transmitted through the Internet.  The BOE and BOD are specific implementations for selective encryption and decryption, respectively. There are three main parts in the BOE and BOD modules in this paper, including H.264 entropy decoding, chaos encryption-decryption, and H.264 entropy encoding. In the chaos encryption-decryption part, the proposed 4-D SCSCA-CAC is used to encrypt and decrypt the H.264 bitstream. According to Equation (16), the 4-D SCSCA-CAC expression for the BOE and BOD module are obtained as Equations (24) and (25), respectively, and a block diagram of 4-D SCSCA-CAC encryption-decryption in BOE and BOD modules is shown in Figure 2.
x e 1 (k + 1) = a e 11 x e 1 (k) + a e 12 x e 2 (k) + a e 13 x e 3 (k) + a e 14 x e 4 (k) x e 2 (k + 1) = a e 21 ε e 1 cos(σ e 1 p(k)) + a e 22 x e 2 (k) + a e 23 x e 3 (k) + a e 24 x e 4 (k) x e 3 (k + 1) = a e 31 ε e 2 cos(σ e 2 p(k)) + a e 32 x e 2 (k) + a e 33 x e 3 (k) + a e 34 x e 4 (k) x e 4 (k + 1) = a e 41 ε e 3 cos(σ e 3 p(k)) + a e 42 x e 2 (k) + a e 43 x e 3 (k) + a e 44 x e 4 (k) where x e i (k) (i = 1, 2, 3, 4) represents the encryption chaotic variables, and a e ij (1 ≤ i ≤ 4, 1 ≤ j ≤ 4), ε e i , σ e i (1 ≤ i ≤ 3) denote the secret keys at encryption end. x d i (k) (i = 1, 2, 3, 4) represents the decryption chaotic variables, and a d ij denote the secret keys at decryption end. Since 4-D SCSCA-CAC is a symmetric cipher, the encryption and decryption end secret keys need to be matched as a e ij =a d to decrypt correctly. On the sender, as shown in Figure 3, · represents round-down operation, ⊕ represents the bitwise XOR operation, mod(·, 2 8 ) is used for intercepting the lower eight-bits on (x (e) sequence s e (k). Then, the binary information of the original syntax element is encrypted as the encrypted H.264 syntax element p(k) by bitwise XOR with s e (k), hence the encryption operation is given by Equation (26). Finally, p(k) is fed back and substituted for x e 1 (k) in the second to third equations of the chaotic system. On the receiver, the received p(k) is also substituted for x d 1 (k) in the second to third equations of the chaotic system, mod(·, 2 8 ) is used for intercepting the lower eight-bits on (x (d) 3 (k))/2 16 to generate eight-bits decryption pseudo-random sequence s d (k). Similarly, the encrypted H.264 syntax element p(k) is decrypted as the original H.264 syntax element m(k) by bitwise XOR with s d (k). Therefore, the encryption operation is given by: In Figure 2, the hardware codecs can be applied in the independent architecture separating apart the encryption algorithm and the video codec. The acceleration function of the hardware codec copes with the speed bottleneck caused by the software codec.
Multi-core multi-threading technology is another essential factor for improving realtime performance. The whole system can be divided into multiple threads. Each thread is executed by their according CPU in a parallel and pipeline manner. Better than the single thread operation, multi-core multi-threading technology is capable of solving the blocking and delay problems effectively and is beneficial to the execution efficiency.

Presentation of H.264/AVC Standard
Established by the Moving Picture Experts Group (MPEG) consortium and the Video Coding Expert Group (VCEG), H.264/AVC is one of the most popular standards for video encoding. With the characteristics of low bit rate and high compression rate, H.264/AVC is widely used in video applications. H.264/AVC-based standard specifies three profiles, including baseline profile, main profile, and extended profile, in which each profile supports a specific type of application. The baseline profile is mainly used for high real-time video wireless communication application scenarios, while the main profile and extended profile are used for applications with relatively low real-time requirements such as video broadcasting or video storage. Therefore, to satisfy the real-time performance of the video confidential communication system, one considers the baseline profile in this paper. In this profile, only inter prediction and intra prediction are supported, and the H.264 bitstream is encoded by the CALVC and Exponential-Golomb coding methods.

Encryption Analysis of Syntax Elements in H.264 Bitstream
The H.264 bitstream is highly compressed bit sequences, and each bit data is closely related to the video encoding and decoding process. If arbitrary bit data is encrypted, the format compatibility of the encoded video may be destroyed. To achieve a format compatible and secure encryption scheme, firstly, one needs to understand the hierarchical structure of the H.264 bitstream and perform the encryption analysis for the syntax elements in H.264 bitstream. Finally, the syntax elements that can keep the video format compatibility after encryption will be selected as the encryption objects.
The hierarchical structure principle of the H.264 bitstream is shown in Figure 4. The H.264 bitstream is composed of multiple groups of pictures (GOP), and each GOP is composed of a series of frame pictures. In the baseline profile standard, the frame types in GOP only contain the I-frame and P-frame. An I-frame and P-frame are composed of one or more I-slices and P-slices, respectively. Each slice can be further divided into several macroblocks and one macroblock is composed of 16 × 16 pixel matrixes. Among them, an I-slice only contains I-macroblocks, while a P-slice contains both I-macroblocks and P-macroblocks. The I-macroblocks, intra prediction macroblocks, are encoded according to the decoded pixels in the current slice as the reference. The P-macroblocks, inter prediction macroblocks, are encoded according to the previous encoded picture as the reference. In the H.264 bitstream, the syntax element is the basic unit of data, and each syntax element consisting of several bits represents a specific physical meaning. Different syntax elements are carried on the different types of the macroblock. For I-macroblock, there are Macroblock Type (Mb_type), Macroblock Prediction mode (Mb_pred), Coded Block Pattern (CBP), Quantization Parameter (QP), and Residual syntax elements. For P-macroblock, there are Mb_type, MVD, CBP, QP, and Residual syntax elements. The syntax element 'Residual', and other syntax elements, can be extracted by CAVLC entropy decoding and the Exponential-Golomb decoding method from the macroblock, respectively. Next, the above syntax elements will be analyzed one by one to determine whether they can be used for encryption.
• Mb_type specifies the type of the current macroblock. If the Mb_type is encrypted, the encrypted macroblock cannot be recognized by the codec, so the Mb_type syntax element cannot be encrypted.
• Mb_pred represents the prediction mode used to reconstruct the current block. Because the prediction modes applicable to macroblocks in different locations are different, if the encrypted value of Mb_pred is not within the applicable prediction mode for the current macroblock, it will also lead to decoding failure, so the Mb_pred syntax element cannot be encrypted.
• CBP refers to the coding scheme of the residual data of the current macroblock, each bit of which represents the number of the luma component or chroma component in the current macroblock. It is related to whether there is luma or chroma component data in the bitstream. Therefore, the CBP syntax element cannot be encrypted.
• QP is used to scale the prediction residual transform coefficients. Since the value of QP is limited in size, encrypting QP will cause greater data expansion. Therefore, the QP syntax element is not regarded as the encryption object in this paper.
• • MVD only exists in the P-macroblock, and it represents the motion direction of each sub-block in the current macroblock. Encrypting MVD will affect the reconstruction effect of the inter prediction image, so it can be encrypted.

Design of BOE Module
The BOE module is made up of entropy decoding, chaotic encryption, and entropy encoding. The design flow of the module is as follows: Step1: Parse the syntax elements from original H.264 bitstream The macroblock MB i (1 ≤ i ≤ n) as a basic processing unit in a frame of the H.264 video is parsed into the syntax elements. For I-macroblock, the 'Residual' syntax element is parsed through CAVLC entropy decoding for extracting the DC components D I j (1 ≤ j ≤ q) as the encryption objects, where D I j represents the j th DC component in a I-macroblock. The number of DC components q in a I-macroblock is determined by Coded Block Pattern Chroma (CBPC) and Coded Block Pattern Luma (CBPL). The values of CBPC and CBPL and the corresponding value of q have the following four situations: x When CBPC = 0 and CBPL = 0, q = 0. y When CBPC = 0 and CBPL = 15, q = 8. z When CBPC = 1 or 2 and CBPL = 0, q = 16. { When CBPC = 1 or 2 and CBPL = 15, q = 24.
For P-macroblock, DC components D P j (1 ≤ j ≤ q) and MVD coefficients MVD j (1 ≤ j ≤ z) as the encryption objects are extracted via CALVC and Exponential-Golomb entropy decoding operations, respectively. D P j represents the j th DC component in a P-macroblock. The same as the I-macroblock, the number of DC components in a P-macroblock is also determined by CBPC and CBPL. MVD j represents the j th MVD coefficient in a P-macroblock, which can be further divided into horizontal component Mx j and vertical component My j . The quantity of the MVD coefficients represented as z is determined by the division mode of the P-macroblock. When the division mode is x P_16 × 16, z = 1. y When P_16 × 8 or P_8 × 16, z = 2. z When P_8 × 8, z = 4. z When P_4 × 8 or P_8 × 4, z = 8. { When P_4 × 4, z = 16. Notably, the division mode is decided by the Mb_type syntax element.
Step2: Encrypt the syntax elements The syntax elements are encrypted by 4-D SCSCA-CAC. When the current frame is an I-frame, the DC component in the I-macroblock is the only encrypted object. According to Equation (26), the encryption expression is as follows: Where s e (k) represents the encryption sequence which is expressed as s e (k) = mod (x e 1 (k) × x e 2 (k) + x e 3 (k))/2 16 , 2 8 , D I ij (1 ≤ i ≤ n, 1 ≤ j ≤ q) represents the j th DC component in the i th I-macroblock in a I-frame, andD I ij represents the encrypted DC component corresponding to D I ij . Equation (28) requires k = n ∑ i=1 q i iterations to fulfill encryption operation for one I-frame, where q i represents the quantity of the DC components in the current I-macroblock. When the current frame is a P-frame, there are two types of macroblocks to be considered. In the I-macroblock, only the DC components are encrypted. In the P-macroblock, both the DC components and MVD coefficients are encrypted. Assuming that the amount of the macroblocks in a P-frame is n, and the quantity of the P-macroblocks is m (m ≤ n) in the current frame, then the number of the I-macroblocks is n − m. The encryption expression is as follows: represents the j th DC component in the p th Imacroblock, and D P pj (1 ≤ p ≤ n, 1 ≤ j ≤ q) represents the j th DC component in the p th P-macroblock in a P-frame. Mx pj and My pj denote the horizontal component and vertical component of the j th MVD coefficient in the p th P-macroblock, and the corresponding ciphertexts are represented asMx pj andMy pj , respectively. Equation (29) requires 2z i iterations to complete encryption operation for one P-frame, where z i represents the quantity of the MVD coefficients in the current P-macroblock, and q i represents the quantity of the DC components in the current macroblock.
Step3: Re-encode the encrypted syntax elements as the encrypted H.264 bitstream After encrypting,D I ij ,D I pj , andD P pj are re-encoded via CAVLC entropy encoding operation. Meanwhile,Mx pj andMy pj are re-encoded through the Exponential-Golomb encoding operation. Finally, the encrypted bitstream is substituted for the original H.264 bitstream. Remarkably, excepting for the syntax elements of the DC components and MVD coefficients, the remaining syntax elements remain unchanged during the above encryption process.
From the above description, one knows that encryption iteration k in the BOE method is much less than the FE method. Hence, the BOE method with advantages of lower calculation load and higher processing speed obviously enhances the real-time performance of the system. The execution flow of the BOE module is shown in Algorithm 1.

Design of BOD Module
The BOD module is made up of entropy decoding, chaotic decryption, and entropy encoding. The design flow of the module is as follows: Step1: Parse the syntax elements from encrypted H.264 bitstream Parse the encrypted syntax elements from macroblock MB i (1 ≤ i ≤ n) in a frame of the encrypted H.264 video. For I-macroblocks,D I j (1 ≤ j ≤ q) are extracted from the 'Residual' syntax element. For P-microblock, in addition to obtainingD P j (1 ≤ j ≤ q), it still has to extract MVD j (1 ≤ j ≤ z).
Step2: Decrypt the encrypted syntax elements The encrypted syntax elements are also decrypted by 4-D SCSCA-CAC. When the current frame is an I-frame, the decryption expression according to Equation (27) is as follows: where s(k) represents the decryption sequence which is expressed as erations to complete the decryption operation for one I-frame. When the current frame is a P-frame, there are two types of macroblocks to be considered. In the I-macroblock, only the encrypted DC components need to be decrypted. In the P-macroblock, both the encrypted DC components and MVD coefficients need to be decrypted. The decryption expression is as follows: Step3: Re-encode the syntax elements as the original H.264 bitstream After decryption, D I ij , D I pj , and D P pj are re-encoded via CALVC entropy encoding operation. Meanwhile, Mx pj and My pj are re-encoded through the Exponential-Golomb decoding operation. Last, the decrypted bitstream is substituted for the encrypted bitstream.
The execution flow of the BOD module is shown in Algorithm 2. Note that both the CAVLC entropy and the Exponential-Golomb coding operation in Algorithm 1 and Algorithm 2 are transplanted from JM86 and X264 software codec models.

Multi-Core Multi-Threading Process
Multi-core multi-threading technology is an effective way to optimize system architecture and improve the efficiency of multitasking. The processing time for one frame on the sender includes video capturing t cp , video encoding t enc , encryption t enp , video conversing t cnv , and sending t s . The hardware platform that is used in our implementation has four ARM Cortex A9 cores. In the case of single thread technology, the total operation time T = t cp + t enc + t enp + t cnv + t s is much longer than 40 ms. Obviously, the system cannot achieve the real-time metrics (25 f/s). In contrast, multi-core multi-threading technology splits the operation tasks into six threads. Under the reasonable assignment of the tasks, operation time for each thread t i (i = 2, 3 · · · ) can be less than 40 ms. Therefore, the total operation time for one frame can fulfill T < 40 ms, reaching the basic frame rate indicator. The multi-core multi-threading design principle is shown in Figure 5. Figure  5a,b represents the multi-core multi-threading design schemes for the sender and receiver in Figure 2.
Read-write conflict prevention is the most crucial working mechanism for shared buffers. It prohibits more than one thread to perform a reading or writing operation in the same buffer simultaneously. Otherwise, unpredictable errors will happen. The mutex lock mechanism can achieve the read-write conflict protection, whose working principle is illustrated by switches K i (i = 1, 2, · · · 11) in Figure 4. Taking Buffer_1 in Figure 4a as an example, once K 1 is closed, K 2 and K 3 are kept open. It means that when Thread_1 is executing a data storage operation in Buffer_1, Thread_2 and Thread_3 are prohibited to operate Buffer_1 and carry out data format conversion for the previous frame at the same time.
In summary, the BOE and BOD modules have the advantages of high efficiency and good real-time performance. Besides, the hardware codec solves the time-consuming problem of encoding operation, and the multi-core multi-threading technology further optimizes the system. As a result, the operation period of all threads are constrained to less than 40 ms. Experiment results prove that the frame rate of the embedded hardware system is more than 25 f/s.

Experimental Results and Analysis
According to the chaotic encryption and decryption algorithms given in Equations (24) and (25), as well as the design principles of Figures 2-5, one has realized the remote realtime video confidential communication system, as shown in Figure 6. First, choose two demo boards with four ARM Cortex A9 cores as the sender and receiver. Second, attach the camera and screens for video capturing and displaying. Third, connect two boards to LAN and set the IP address in the range of 192.168.1.1 to 192.168.1.255.

Experiment Results
The real-time video confidential communication system is tested in the real network environment under the 640 × 480 video resolution.
The experimental result of the original videos displaying at the sender is shown in Figure 7a. When keys are matched as a e ij =a d , the experimental result of successful decryption is shown in Figure 7b. When keys are mismatched, the experimental result of decryption failure is shown in Figure 7c. When encrypted videos are decoded and displayed without decryption, the experimental result is shown in Figure 7d. The experimental results show that the BOE method with the 4-D SCSCA-CAC can achieve effective perceptual encryption. Besides, the encrypted bitstream is available to be decoded without decryption. According to the test result, the average transmission frame rate of the system is up to 27 f/s. The comparison of experiment results of the video confidential communication system based on ARM-embedded platform is shown in Table 3. The hardware platform for experiments in Table 3 is the ARM Cortex A9 core development board and the video resolution is 640×480. From the experimental results of [32,36], it can be seen that using the hardware codec and multi-core and multi-threaded technology can effectively improve the video transmission frame rate of the system. With the same multi-core and multi-threaded technology, although the FE scheme [32] can utilize the hardware codec to improve the video transmission frame rate, it cannot meet the video format compatibility. On the contrary, although the CIE scheme [36] can meet the video format compatibility, the transmission frame rate of this scheme is lower than 17 f/s caused by the software codec with a large computing load and time consumption. However, in the case of having two more operations of entropy coding and entropy decoding than the above schemes, the video transmission frame rate of the proposed BOE scheme not only reaches 27 f/s higher than that of the full encryption scheme, but also meets the format compatibility of video at the same time. Therefore, one can know that the proposed BOE scheme is superior in terms of security, real-time performance, and format compatibility.  Table 4 shows the result of the passing ratio and means of P-value of each test for 4-D SCSCA-CAC. In this experiment, the 100 groups of bit sequence with length as 10 6 are generated from from intercepting the low 8-bit of the chaotic variable expressed as s(k) = mod (x 1 (k) × x 2 (k) + x 3 (k))/2 16 , 2 8 . From Table 4, one can have that the 4-D SCSCA-CAC can pass all the tests, which implies that the sequences generated from 4-D SCSCA-CAC are with good statistical performances and can be regarded as true random. Compared with the NIST test, the TESTU01 test is a more rigorous statistical characteristic test. TESTU01 has seven test suites, including SmallCrush, Crush, BigCrush, Alphabit, Rabbit, PseudoDIEHARD, and FIPS-140-2 suite. Among them, the amount of test sequences of BigCrush suite is as high as 10TB, which makes the dynamic behavior of chaotic system easier to be exposed. Even though some chaotic systems can pass NIST test, they cannot further pass the TESTU01 test. Therefore, if the chaotic sequences generated by the chaotic system can successfully pass the TESTU01 test, it shows that the chaotic sequence has better statistical performance and randomness. Table 5 shows the results of TESTU01 test for the 4-D SCSCA-CAC. The 10Tb test sequences are generated by intercepting the low 8-bit of the chaotic variable expressed as s(k) = mod (x 1 (k) × x 2 (k) + x 3 (k))/2 16 , 2 8 . The results show that the 4-D SCSCA-CAC can successfully pass all the tests in TESTU01. The phase space reconstruction attack is to analyze the time series of a state variable at different times, continuously produced by the chaotic system. It reconstructs the time series of the state variable by determining the appropriate delay time τ and embedding dimension m, and then recovers the regular trajectories such as attractors in the embedded dimension space. The structure of some chaotic maps with complicated trajectory will become simple and evident in the reconstructed phase space. Therefore, the attacker can easily predict the behavior of the chaotic variable according to the trajectory in the reconstructed phase space. Next, the phase space reconstruction attack is used to test the state variable {x 1 (k)} in 4-D SCSCA-CAC.
By using the auto-correlation method and False Nearest Neighbor (FNN) method, the delay time τ is calculated as 1 and the embedding dimension m is calculated as 4, and these two parameters are used to reconstruct the phase space. The estimating results of the delay time τ and the embedding dimension m are shown in Figure 8a,b, respectively. The reconstructed phase space is shown in Figure 9. It can be seen from Figure 9 that the reconstructed phase space is disordered and has no obvious structure, so the attacker cannot predict the behavior of the chaotic variable by reconstructing the phase space. Therefore, 4-D SCSCA-CAC can effectively resist the phase space reconstruction attack.

Sensitivity Test of Key Parameters Mismatch
When the chaotic system has a key parameter with a very small mismatch error, and the original video signal is not able to decrypt, the key parameters are very sensitive to the mismatch error. The smaller the number of the mismatch error, the better the security of the system.
The sensitivity test results of key parameters mismatch for 4-D SCSCA-CAC is shown in Table 6. The absolute value of the mismatch between the key parameters in encryption system and decryption system is expressed as |∆a ij |= |a e ij − a d ij | (1 ≤ i ≤ 4, 1 ≤ j ≤ 4). From the test results, one knows that any key parameter in 4-D SCSCA-CAC has high sensitivity to the tiny mismatch error. Therefore, 4-D SCSCA-CAC can effectively resist the brute force attacks.

PSNR and SSIM Index Tests
PSNR can be used as a performance index to evaluate the perceptual security of encrypted video. When the PSNR value is lower than 20 dB, it means that original video information cannot be discriminated from the encrypted video by human eyes. PSNR is calculated based on the Mean Square Error (MSE) between the original image and the encrypted image. The mathematical expression of MSE is as follows: The mathematical expression of PSNR is obtained as: Among them, H and W denote the height and width of the video, respectively. P org (x, y) represents the pixel value of the original video image, P enc (x, y) represents the pixel value of the encrypted video, and k represents the pixel depth.
Different from PSNR that measures image quality based on pixel error, SSIM measures image similarity from three aspects: brightness, contrast, and structure. The SSIM value is in the range of [0, 1]. The smaller the SSIM value is, the lower the structural similarity will be, leading to higher perceptual security for the encrypted video.
In order to test the encryption effect of the video confidential communication system, one intercepted three frames of original images and their corresponding encrypted images in the actual video communication process, as shown in Figure 10. The PSNR and SSIM indicators test results are shown in Table 7. According to Table 7, the PSNR values are much smaller than 20 dB, indicating that it is difficult to obtain the original frames' information from the encrypted frames. The SSIM values are all less than 0.05, representing the low structural similarity between the original and encrypted images. All the above statistical analysis results come to the conclusion that the BOE scheme has good perceptual security performance.

Conclusions
For ensuring security, format compliance, and real-time transmission of encrypted videos, the SE method has become a research hotspot in the field of video encryption. In SE methods, with high computational efficiency, BOE is a more desirable encryption method compared with the CIE scheme. Some reports have studied and improved the BOE scheme, but these schemes lack the corresponding hardware implementation to prove the feasibility, effectiveness, and superiority of the BOE method. Moreover, some studies adopted AES block cipher as the encrypted algorithm in their BOE scheme with high computational load, increasing the time consumption in encrypting. To deal with these problems, in this paper, one proposed: (1) An improved algorithm 4-D SCSCA-CAC.
(2) An improved BOE scheme utilizing the hardware codec to improve the real-time performance of video transmission.
(3) An ARM-based hardware implementation of the BOE scheme. 4-D SCSCA-CAC can resist the cryptanalysis combining of the chosen-ciphertext attack and the divide-and-conquer attack, and the chaotic bit sequences generated by 4-D SCSCA-CAC for encrypting video information have passed the NIST and TESTU01 test, ensuring the security of the video confidential communication system. In addition, due to the chaotic stream cipher having less computational overhead, without the consuming time for package construction to encrypt the fixed-length syntax elements, 4-D SCSCA-CAC is more appropriate for application in actual video communication than AES algorithm. Besides, one found that the hardware codec can be used for the BOE scheme in actual applications. The proposed scheme utilizes the hardware acceleration to improve the video transmission frame rate to 27 f/s. In contrast with hardware implementation based on the CIE method, the experimental results have proved that the BOE scheme is more suitable for real-time video secure communication application scenarios, and reflected the advantages of high efficiency and flexibility in the BOE scheme.