#### 2.1. Block Cipher-Based Video Encryption

Block cipher-based video encryption algorithms that work on lightweight protocols (i.e., user datagram protocol (UDP) and constrained application protocol (CoAP) [

16]) have been suggested for real-time video stream encryption on low-performance devices. The datagram transport layer security (DTLS) algorithm is an UDP-based lightweight video stream encryption algorithm (see

Figure 1) [

17]. On server, the encoder compresses the video stream with the H.264-based real-time messaging protocol (RTMP) [

18] and sends it to the DTLS feeder. The DTLS feeder then encrypts the video stream with an AES256 encryption algorithm and transfers it to the client. On client, a DTLS broadcaster decrypts the received video stream. However, DTLS also has a high processing overhead on low-performance processors because it uses the AES256 block cipher algorithm. Thus, DTLS is not appropriate for real-time video stream encryption on IoT security cameras.

To address the processing overhead problem, a hardware-based AES encryption mechanism has been proposed in [

19]. Here, an AES-specialized hardware chipset conducts the encryption in various steps. Hardware-based AES encryption can address processing overhead problems on a lightweight processor. However, it has low versatility and requires additional cost. To solve the problems of block cipher-based video stream encryption algorithms, a permutation-based video encryption algorithm, which encrypts video stream only using the information inside the frame, is proposed.

#### 2.2. Permutation-based Video Encryption

A permutation-based video encryption algorithm encrypts video by permutating a specific part in the frame with another specific part. Video frame data includes a significant amount of pixel information. Thus, recovering the original location of all pixels is almost impossible. For full high definition (FHD) resolution, which is mostly adopted for multimedia contents, a frame includes $1920\times 1080=2,073,600$ pixels. In this instance, a malicious attacker must rearrange and review 2,073,600 frames to find the original frame using a brute-force attack. Thus, brute-force attack on a permutated frame without the permutation list is practically impossible.

The encryption speed of permutation-based video encryption is much faster than the block cipher-based encryption. Generally, the size of video data is significantly larger than text data. Block cipher-based encryption brings high overhead since it performs an operation at the same part of data repeatedly. Thus, adopting block cipher-based video encryption on real-time video streaming is practically impossible. Permutation-based video encryption algorithms do not perform permutation at the same part of data repeatedly. Thus, it can encrypt video with lower overhead than block cipher-based video encryption.

The position of the encryption algorithm classifies the permutation-based video encryption algorithms (see

Figure 2). A video codec compresses the original video stream. Permutation-based video encryption algorithms can be classified as pre-compression, while-compression, and post-compression.

Liu and Koenig [

14], Sultana and Shubhangi [

15], and Akhter et al. [

20] are previously proposed permutation-based video encryption algorithms (see

Table 1). The while-compression encryption algorithm by Akhter et al. performs encryption simultaneously with the moving picture experts group (MPEG) video compression [

20]. This algorithm encrypts the video before entropy coding and after discrete cosine transform (DCT) conversion. By doing so, it can reduce encryption time and minimize the size of final cryptographic videos. When the DCT conversion is complete, the high-frequency components within the video frame are removed. Thus, this algorithm is advantageous in terms of encryption speed and the size of the cryptographic video is smaller than the original video. However, it has to use a special codec that includes encryption algorithms, and is not compatible with other video formats except MPEG. Moreover, it is vulnerable to known-plaintext attacks using the same permutation list for all frames in the video.

The algorithm proposed by Liu and Koenig is a post-compression encryption algorithm that separates a compressed video frame into blocks of a specific size, and then permutates them to encrypt videos [

14]. Since the compressed video has a smaller size than the original video, the size of the final cryptographic video can also be reduced. However, the attacker can restore the original frame data using file structures such as header markers, an EOF marker, and Huffman tables. Moreover, if some permutation list is missing during the decryption, the critical file structure may be lost. In this case, the client cannot even obtain a portion of the original frame data that has not been lost. This algorithm divides the compressed frame data into 256 same-sized blocks and permutates them according to the permutation list for encrypting the video frames. However, it is vulnerable to known-plaintext attacks as it uses the same permutation list for all frames in the video. In order to counter known-plaintext attacks, a method for creating and using different permutation lists for each frame was proposed. However, it is unsuitable for real-time streaming because of the high processing overhead of key exchange and permutation list generation.

The algorithm proposed by Sultana and Shubhangi [

15] is a pre-compression encryption algorithm. Although pre-compression encryptions are less effective and result in large sized final cryptographic frames by including high-frequency components in the original frame, it is impossible to deduce the original video frame data based on the file structure such as the header marker and EOF marker. Moreover, even if some data goes missing during the transmission, the content of the video frame can still be verified except for the missing parts. Since this algorithm is designed to repeat the faro shuffle and frame rotation several times instead of generating a random permutation list, it is relatively easy to restore the original frame data through brute-force attacks.

In this paper, while maintaining the efficiency of existing permutation-based algorithms, we propose a robust permutation-based video encryption algorithm for known-plaintext attacks, which is a common vulnerability of such algorithms. CSPRNG is used to update the permutation lists for each video frame. Instead of generating a permutation list for each frame, the random values generated by CSPRNG are added to the permutation list used in the previous frame. Accordingly, we can solve the problem of key exchanging and permutation list generating overhead, a weakness in Liu and Koenig’s algorithm [

14]. In addition, by performing permutation for each color channel following color channel separation, non-recognition of shape elements and color elements can be simultaneously satisfied. Making color elements unrecognizable can lead to a higher security than the previously proposed permutation-based video encryption algorithms, which only make the shape elements unrecognizable.