A Huffman-Based Joint Compression and Encryption Scheme for Secure Data Storage Using Physical Unclonable Functions

: With the developments of Internet of Things (IoT) and cloud-computing technologies, cloud servers need storage of a huge volume of IoT data with high throughput and robust security. Joint Compression and Encryption (JCAE) scheme based on Huffman algorithm has been regarded as a promising technology to enhance the data storage method. Existing JCAE schemes still have the following limitations: (1) The keys in the JCAE would be cracked by physical and cloning attacks; (2) The rebuilding of Huffman tree reduces the operational efﬁciency; (3) The compression ratio should be further improved. In this paper, a Huffman-based JCAE scheme using Physical Unclonable Functions (PUFs) is proposed. It provides physically secure keys with PUFs, efﬁcient Huffman tree mutation without rebuilding, and practical compression ratio by combining the Lempel-Ziv and Welch (LZW) algorithm. The performance of the instanced PUFs and the derived keys was evaluated. Moreover, our scheme was demonstrated in a ﬁle protection system with the average throughput of 473Mbps and the average compression ratio of 0.5586. Finally, the security analysis shows that our scheme resists physical and cloning attacks as well as several classic attacks, thus improving the security level of existing data protection methods.


Introduction
The combination of Internet of Things (IoT) and Cloud-Computing [1,2] has emerged as a prospective platform, where IoT devices collect data from every corner of the real world and forward them to the cloud servers for the further analysis, and the cloud servers always have powerful resources to process them with high operational efficiency [3]. This revolutionary architecture greatly changes our lives by bringing various applications to the areas of Public Security, Intelligent Transportation, Smart Home, etc. However, the security and its efficiency against attacks of data storage in cloud servers have become two problems impeding the developments of this architecture. The amount of the data increases rapidly with the explosive growth of IoT devices, thereby making it an urgent requirement to keep a satisfactory operational efficiency for data storage [4]. Moreover, the security of data storage should be ensured, since this IoT data may involve the sensitive information, such as personal images, secret files, user passwords, and diagnostic data [5], etc. Therefore, it is vital of importance to develop an efficient and secure data storage scheme for cloud servers.
Joint Compression and Encryption (JCAE) has been regarded as an effective technique to supplement the existing data protection methods, such as Attribute-based Encryption (ABE) [6], Identity-based Encryption [7], and symmetry encryption [8] for cloud servers, as it not only provides a function of compression to achieve the good operational efficiency of the data storage, but also supports a function of encryption to further enhance the security level [9][10][11]. Moreover, the incorporation of this novel JCAE method would not affect the operational efficiency of data protection process. As a result, introducing JCAE scheme into data protection is considerable.
The construction method of JCAE has been proposed mainly by integrating the confusion and diffusion mechanism into the entropy coding compression algorithms, including Huffman codes [9][10][11] and arithmetic codes [12][13][14], which compress the data by utilizing a statistical model. To meet the requirements of encryption, a secret key will be chosen to modify the statistical model. For the huge computational costs of operations in both compression and encryption processes, Huffman coding algorithm and stream cipher algorithm are chosen for the low computational cost to construct the JCAE in which encryption does not affect the properties of the compressed data or the compression ratio. For example, the process based on multiple Huffman tables controls the exchange of the node branches of Huffman tree through the key and accomplishes the encryption and compression at the same time [9]. As a result, the output of the compression algorithm will become ciphers, as the initial statistical model has been changed. Only if the receiver obtains the secret key will he synchronize the modified statistical model to correctly decrypt the ciphers. In general, it is promising to incorporate this kind of key-controlled mechanism into the entropy coding compression algorithms to develop the JCAE scheme, thus improving the existing data protection systems for cloud servers.
Various studies [9][10][11]15,16] have put efforts on the developments of the JCAE based on the Huffman coding algorithm. Multiple Huffman Tables (MHT) Algorithm [9] uses the multiple Huffman coding tables when encoding for operational efficiency but proved vulnerable to known-plaintext and chosen-plaintext attacks in [17]. Chaotic Huffman Tree (CHT) Algorithm uses piecewise linear chaotic maps to obtain a key-stream to mutate and update the Huffman coding tree for security but is time-consuming. Swapped Huffman code Table (SHT) Algorithm updates the Huffman coding tree and encode plaintext segment by segment for security but is still not efficient in operation. Although the above research has been done, the following issues remain to be addressed:

1.
The key-stream used in the JCAE is only pseudo-random number (PRN), which is not as secure as true random number (TRN).

2.
The dynamic Huffman tree update operation is time-consuming and frequent, which makes the JCAE inefficient.

3.
The compression ratio of the JCAE scheme could be further improved, which is defined as the ratio of the size of the compressed file to the original file in this paper.
PUFs can generate TRN and resist physical and clone attacks. The PUFs circuit widely existing in many devices, can provides lightweight encryption [18][19][20]. So, a Huffmanbased JCAE for the secure data storage using PUFs is proposed for the first time to address the above issues. The contributions are as follows:

1.
A physically secure key generator is incorporated into JCAE scheme by using the PUFs key is put forward. In this method, the Bistable Ring PUF (BR-PUF) circuit is instanced to generate TRN and further produce the PUFs key, which makes our JCAE scheme resist physical and cloning attacks.

2.
A parallel mutation method based on the basic Huffman tree is proposed for operational efficiency by reducing the computational cost of the merging process and employing a parallel architecture. This method first generates the basic Huffman tree by the statistic model calculated from the input messages. Then, to update the Huffman coding table with the modified mutated Huffman tree technique, instead of rebuilding the new tree. Moreover, a parallel architecture is designed to execute the PUFs key generation and the basic Huffman tree construction at the same time. In general, the operational efficiency is improved.

3.
A cascading encode structure is designed by inserting the LZW algorithm into our scheme. The introducing of LZW algorithm not only reduces the complexity of the input messages, but also helps our scheme maintain a satisfactory compression ratio, which makes it generic and practical to adapt to the existing data storage method.

4.
The proposed JCAE scheme is implemented on the Field Programmable Gate Array (FPGA) Xilinx LX110T and then integrated into a proof-of concept file protection system. The performance evaluation shows that the proposed scheme is efficient in execution and has a practical compression ratio.

5.
The security analysis is performed to show that our scheme can improve the original security level by resisting physical and cloning attacks, and several classic attacks.
The remaining part of this paper is organized as follows. In Section 2, the related JCAE schemes and BR-PUF are introduced. Section 3 described our proposed scheme in details. In Section 4, the performance of our implementation is evaluated. Then, the security analysis is carried out in Section 5. Finally, we conclude our work in Section 6.

Related Works
Recently, many cloud-related and IoT-based applications which concern the data security have been researched. We list the main contributions of the related works in Table 1. Table 1. Summary of contributions of existing data security methods and security analysis.

Methods.
Year Type

Specific Contributions
McIntosh et al. [21] 2021 Method Their work defended malware attacks by proposing a situation-aware access control scheme, which supports to defer the access control decision and roll back the changes when necessary.
Kayes et al. [22] 2019 Method They proposed a context-aware access control method to characterize the imprecise context for data from multiple cloud sources.
Sharma et al. [23] 2019 Analysis They proposed a novel classification for cloud and IoT-based data models, discussed important cases studies, and detailed the emerging service and data analytics in the area of autonomous vehicles.
Xue et al [24] 2017 Method They developed a collaborative access control scheme for outsourced data by using attribute-based encryption. Authorized multiple users can gain access permission by collaboration.
Hermassi et al [10] 2012 Method They proposed a joint compression and encryption scheme by using the chaotically mutated Huffman trees.
Balduzzi et al [25] 2012 Analysis They implemented an automated system to test and analyze the security of the Amazon public AMIs, and found the security problems, including unauthorized access, malware attacks, and leakage of sensitive information.
Somorovsky et al [26] 2011 Analysis They performed a security analysis to the control interface of cloud service. The analysis shows that the control interfaces are at risk of signature wrapping and advanced XSS attacks.

Proposed scheme 2021 Method
Our scheme is the first to integrate the PUFs key into the Huffman-based compression to improve the security. Besides, a parallel mutation method as well as a cascading structure are developed to further improve the efficiency and support a good compression ratio.
The rest of this section will focus on the related works of the JCAE schemes based on Huffman and the PUFs used in the scheme.

Huffman Tree Mutation (HTM) Technique
In [9], a technique was proposed to create the Huffman coding tables with the compression ratio unchanged. They trained and obtained a basic Huffman tree. Then, hundreds of different tables could be derived by Huffman tree mutation. The basic Huffman tree had leaves and inner nodes. Each node had the left branch, normally labeled "0" and the right branch, normally labeled "1", which is called the label-pair. If the label-pairs were permuted, a new Huffman tree could be derived. This process is called the Huffman tree mutation as shown in Figure 1. The HTM technique provided the opportunity to decide how to permute every label-pair. To derive a new Huffman tree, a random bit-stream was generated, then the label-pairs of the basic Huffman tree were permuted if the corresponding bit was 0, otherwise kept unchanged. It is important to note that the Huffman tree mutation has no effect on the coding efficiency and had equal codeword lengths.

Multiple Huffman Tables (MHT) Algorithm
MHT [9] was a scheme which combined compression and encryption with the multiple Huffman coding tables during the encoding process. The secret key consisted of m distinct Huffman coding tables and a key-stream (k 0 , k 1 , . . . k n−1 ) which value of k i was selected from the set {0, 1, . . . m − 1}. The Huffman tables were randomly selected from the public pool of Huffman tables by key-stream to encode a symbol, so the adversary could not get the selected tables exactly.
Since the given symbols were encoded to the codewords with different lengths by different basic trees, the MHT algorithm affected the compression ratio. The serious problem is that "MHT are vulnerable to low complexity known-and/or chosen-plaintext attacks" [17] by the scheme proposed in [17].

Chaotic Huffman Tree (CHT) Algorithm
The CHT [10] used a chaotic map to generate the key-stream to update the Huffman coding tree. First a Huffman coding tee was generated "based on the probability distribution of the symbols in the message" [10]. The basic Huffman tree was shown in Figure 2a. This Huffman tree had 6 symbols, which means 6 "leaves" and 5 "inner-nodes" in the tree. The leaves were represented by "points" and the inner-nodes were represented by "squares" and they were numerated from (1) to (5). Permuted the label-pairs of selected inner-node according to the key-stream with HTM technique and updated the Huffman tree as shown in Figure 2b. The CHT must update the Huffman tree each time for each symbol. It would increase the computation cost. As described in [10], the additional average CPU instructions over the standard Huffman coding can be calculated by cost = (2 + n+ 5 + N)/8 per bit, where n was the literation number and N was the number of symbols. For example, Calgary corpus is a collection of files, and used to compare the data compression algorithms. The test file "paper5" in the Calgary corpus got n = 10 and N = 256, and the number of the cost would be 34 CPU instructions per bit. So, CHT maintains efficient coding but not an execution efficient scheme.

Swapped Huffman Code Table (SHT) Algorithm
The SHT [11] used the key-stream to update the Huffman coding tree and encode plaintext segment by segment as shown in Figure 3.  Table. First, the Huffman coding tree based on the statistical model of the plaintext symbols is generated. Then each time we got N symbols in the plaintext, the Huffman tree was mutated and updated using key-stream with HTM technique. The mutation would occur when every N symbol was obtained, which could reduce the computational-cost. However, every update operation will still cost extra memory-load, comparison, exchange, and memory-store operation. In other words, SHT still is not an efficient execution scheme.

Bistable Ring PUF (BR-PUF)
PUFs is a noise function of the physical entity, which implements the mapping from inputs (challenges) to outputs (responses). This mapping has randomness and uniqueness, which inevitably result from the uncontrollable differences of physical entity during the manufacturing process. PUFs can be divided into strong PUFs and weak PUFs according to the space of challenge-response pairs (CRPs). BR-PUF [27] structure is exactly one of the hopeful candidates for strong PUFs.
BR-PUF is a combination of SRAM PUFs [28] and APUF [29]. It not only has millions of excitation space like APUF, but also has a bistable ring structure like SRAM PUFs. BR-PUF has an n-stage series loop structure. Each stage structure contains two NOR gates, one 2-input multiplexer and one 2-input demultiplexer and requires 1 bit challenge, as shown in Figure 4.  Figure 5 shows a bistable ring with 8-stage inverters [27]. After power on, each inverter in the ring will force the output from "0" to "1". However, due to the reaction of the differences of inverter threshold voltage, the inverter ring has two stable states, and the ring does not enter a stable state immediately after being energized. It will oscillate over a period before reaching a stable state which may be "10101010" or "01010101". When the ring enters a stable state, the output of one of the inverters is taken as the response of BR-PUF.

Proposed Scheme
In this section, the overview of our scheme is first introduced to show the working flow of our scheme. Then, several modules formulating our proposed scheme are explained, respectively.

Overview
The block diagram of the compression-encryption progress is shown in Figure 6 and detailed as follows: 1.
Key generation: A PUFs challenge C∈{0,1} 128 is generated and sent to the PUFs module to generate the correlated response R∈{0,1} 128 . The response acting as a true random number is further input into the Huffman tree mutation module as key-stream.

2.
Pre-encoding: The input message will first be pre-encoded by the LZW module to output the bit stream. Every 8 bits are constructed one symbol, and 32 symbols forms one block. In theory, one challenge can process one block. To improve the throughput, we reuse one challenge to encode M blocks. It will be security when M is less than 32 which suggested in [17].

3.
Basic Huffman Tree (BHT) Construction: In this process, 32N symbols will be sent to the Statistics module. The Statistics module outputs the weight of every received symbol to construct a basic Huffman tree BHT with K inner-node. 4.
Mutation: K bits from the key-stream generated in step 1 will be used to mutate BHT constructed in step 2, in order to obtain the corresponding mutation Huffman tree (M-HT). Each M-HT will compress/encrypt 1 block M times. 5.
Cipher output: As is described in step 4, every challenge can generate 128 bits keystream at once. Since each symbol is encoded with variable code length, the length of the output bit stream is not fixed. The challenge C will be attached to the beginning of the output bit streams. Finally, the challenge C and the correlated BHT will be recorded into a map C_BHT for the decompression-decryption. It is noted that a parallel architecture is designed to execute our scheme in our hardware platform. As is shown in Figure 6, Thread 0 and Thread 1 will start at the same time and meet at the Huffman tree mutation module.
Then, the decompression-decryption progress is demonstrated in Figure 7 and we only introduce the main differences between these two processes: 1.
Cipher separation: The cipher is composed of challenge C and the bit streams BS i∈{1,N−1} . We separate the cipher and send them to different modules.

2.
Key generation: This procedure remains unchanged according to the received challenge. 3.
Basic Huffman Tree Construction: This process directly finds the BHT from the C_BHT map according to the received challenge, and then constructs the basic Huffman tree.
Plaintext output: With the help of M-HT generated in step 4, the bit stream will be decompressed and decrypted into N blocks. In the final, these blocks are decoded by the LZW module to obtain the original input messages. After the overall description to our proposed JCAE scheme, we will introduce the n parts involved in our scheme, including PUFs key generation method, Statistic module, efficient mutation method, etc., respectively.

PUFs Module
We chose BR-PUF described in Section 2.5 to generate the key-stream. A 128-stage BR-PUF is implemented as described in Figure 8. Each stage has the structure shown in Figure 4 and shares a challenge C[i], which decides different signal path of the MUX and the DEMUX. Then the 128 stages NOR-gate form a ring exclusively with a 128-bit challenge. As the structure shows in Figure 8, 2 128 different rings can be created. When powered on or reset, each stage of BR-PUF forces the output level from low to high. Due to the differences of each NOR-gate electrical characteristics, the NOR ring may have two stable states. After the NOR ring is energized, it will take a certain time to reach a steady state. When the NOR ring enters a steady state, the output is taken as the response of BR-PUFS. The response generation process of BR-PUF is shown in the following steps.

1.
Set power = 1 to shut down the NOR ring.

3.
Wait for the NOR ring to the state of full "0". 4.
Set power = 0 to energize the NOR ring.

5.
Wait for n clock cycles to enter a stable state. 6.
Read the output as a response. 7.
Repeat step 1 to step 6.

LZW Modules
A dictionary compression algorithm called LZW is used to compress and decom-press data, which was proposed by Welch in 1984 on the basis of LZ78 (Lempel-Ziv).
The LZW module works with a dictionary, which encodes unique partial sequences into codewords. Ideally, l-bit length codewords can express 2l-bit length sequences [30]. LZW is used in conjunction with Huffman entropy coding algorithm in a variety of commercial compression software to further improve the compression ratio. When compressing (in Figure 6), it converts the sequences into codewords; when de-compressing (in Figure 7), the module reverses the conversion.

Statistic Modules
The statistics module is designed to analyze the character distribution (frequency) of the sequence to be compressed, in order to get a better (lower) compression ratio [15]. in Huffman coding algorithm. This module obtains the frequency distribution of symbols from the input stream. The message M will be separated and input into this module symbol by symbol. If the symbol appears the first time, it will be placed into a new register and the frequency counter will be assigned to 1. If the symbol is the same, the frequency counter should add 1. The frequency of the symbol is called weight normally. According to the frequency statistics, the characters with higher frequency will be assigned to the shorter depth of the Huffman tree nodes. Symbols and their weights will be delivered to BHT construction module as shown in Figure 6, and are further used to calculate the probability distribution of the symbols.

Basic Huffman Tree Construction Module
This module generates a basic Huffman tree based on the probability of the symbols calculated in the statistic module. BHT has various functions, including sorting, selection of minimum and second minimum, summation of minimum and second minimum, weight updating and binary tree generation. In this module, Sorting and selection operations are performed simultaneously in hardware implementation.
The sorting operation function is to sort the frequency of each symbol from small to large. In hardware implementation, the comparator array is constructed to compare every two numbers in pairs, and then the adder is used to sum the comparison result of the frequency of each symbol. For one symbol frequency, if it is less than the frequency of another symbol, the corresponding result is 0, otherwise it is 1. By specifying the sum of the symbol frequency and the other symbol frequency, the position of its frequency among all the frequencies can be determined.
The selection operation selects the symbol frequencies corresponding to the comparison results of 0 and 1 in the sorting operation, which represent the minimum frequency and the second minimum frequency. The selection operation also selects the symbols corresponding to the two frequencies. The hardware implementation builds multiple comparators, and then uses the multiplexer to accurately pick out the desired value from multiple frequency and symbols.
The sum operation of minimum value and sub-small value is to sum the minimum frequency and sub-small number selected by the selection operation to form a new node as the parent node of two leaf nodes, and the two leaf nodes are the left and right branch of the new node, respectively.
Binary tree and coding module: replace two leaf nodes with new nodes, reorder, select, sum, until a root node is generated, and finally generate a binary tree. The Huffman coding table is generated by reverse traversal from leaf node to root node.

Huffman Tree Mutation Module
After a basic Huffman tree is created, this module permutes the label-pairs of innernode according to the key-stream to implement encryption. The Huffman coding algorithm is an entropy algorithm to assign shorter coding to frequently occurring symbols and longer coding to fewer occurring symbols. For a Huffman tree with N leaves, the average coding length of a symbol, I avg is expressed as Equation (1): where p i is the probability of the ith symbol appearing in the source message. l i is the coding length of the ith symbol got from the Huffman tree. As shown in Figure 9, the coding of symbol E changes from 001 to 010. It is obvious that all symbol coding only exchanges label values of branches at the corresponding inner-node, so the encoding length remains unchanged, and I avg , remains unchanged. As mentioned above, encoding efficiency is expressed in terms of compression ratio. After the compression, the size of the data is equal to the total number of symbols multiplied by the average encoding length of the symbols. The total number of symbols, average coding length, and plaintext length remain unchanged, so the compression ratio is the same as that of the original Huffman algorithm, which achieves the best compression effect and ensures the same coding efficiency as the original Huffman algorithm. The essence of inner-node mutation is to keep the tree structure unchanged, only exchange the label, and then change the symbol encoding in the coding table. This is a time-consuming process, requiring n-1 node mutation and n-code update operations. N-1 node Mutation is easy to be realized in hardware design. While the process of obtaining a new code table through subsequent reverse traverse takes a long time, with a complexity of o(nlog 2 n), which takes a long time and increases the calculation consumption each time.
For example, the experimental results of CHT show that the operational efficiency is far lower than Huffman. Therefore, the key to improve fusion efficiency is to reduce these additional computational losses. In fact, as can be seen from Figure 10, the process from T0 to T1 to T2 to T3 has the same result as the process from T0 directly to T3, depending only on the value of the key-stream, not the times of mutations. So T1, T2, and T3 can be obtained directly from T0, as shown in Figure 10. That is to say, as long as the original code is through the key temporary generation of new code, HTM can achieve the same effect of the compression and encryption. So, a modified HTM technique is proposed to reduce the operation cost of the Huffman Tree update: the mutation of the inner-node from basic Huffman tree instead from the mu-tated Huffman tree. The modified technique removes the Huffman tree inner-node coding update operation in each mutation process, which will save N-1 update operations each.
A parallel mutation method based on the basic Huffman tree is proposed for operational efficiency by reducing the computational cost of the merging process and employing a parallel architecture.

Splice Module
The variable length encoding of Huffman code brings about the compression of data, but it also brings the problems for hardware implementation. After encoding, the code word is of variable length. Since the width of the data bus is fixed, the code stream composed of variable length code sign needs to be divided and packaged into fixed-length code and output to the data bus.
First, the code word and code length are obtained from the foregoing unit and are latched into registers, respectively. The bucket shift register concatenates and locks the contents of the register. When the register is filled, the data in it is output to the data bus, and the bucket shift register data move in. The remaining codewords are thus joined to the new ones through the bucket shift register.
During the process, the variable-length codeword stream is divided into fixed-length codeword streams and then transferred to a fixed-width data bus.

Parallel Architecture
As shown in Figures 6 and 7, the PUFs key generation can be performed in parallel with the module statistic and the basic Huffman tree construction. The probability statistics of symbol and the construction of Huffman tree, are the most time-consuming operation in the whole Huffman algorithm. Even with the most efficient heap sorting algorithm, the time complexity reaches o(nlog 2 n).
The process of generating the random number in PUFs is also a time-consumed and independent to Huffman coding. The blocks are designed in parallel to make the execution time reduce from t l + t 2 to max {t 1 , t 2 }, which will greatly improve the execution efficiency.

Experiments and Performance Evaluation
In this section, to prove that our scheme is practical and effective, we first introduce the features of our implemented BR-PUF, and then show the PUFs key is reliable to be taken into the practical use. Finally, we constructed a file protection system to evaluate the compression performance of our proposed JCAE scheme.

Performance of Implemented PUFs
In this part, the performance of PUFs instanced on our hardware platform are evaluated by the method proposed in work [31]. Since the PUFs circuit is used to generate PUFs keys that play an important role of mutation the Huffman Tree, it is necessary to prove the performance of the PUFs inserted in our JCAE scheme is satisfactory. In this paper, we instance the BR-PUF on our hardware platform as shown in Figure 11. Figure 11. Schemes follow the same formatting.
The test system was constructed as shown in Figure 11 with a Xilinx LX110T FPGA and a personal computer. The communication channel between two devices is constructed by the Ethernet. The FPGA implemented 16 128-bits BR-PUF to generate responses. The personal computer sends challenges, receives the corresponding responses, and uses evaluation scripts to evaluate the performance of PUFs.
The performance of PUFs is evaluated from the following three aspects.

Reliability
The reliability of the PUFs circuit reflects the ability of reproducing the same response according to a repeated challenge. When working on a noisy working condition, there will be the noisy in the response to influence the reliability of the PUFs circuit. The noisy and reliability are calculated as the Equations (2)-(4): where R x is a m-bits reference response, R x,y is the y-th sampling of R x , HD(X,Y) is the hamming distance between X,Y, and each PUFs challenge has been repeated n times. We tested the reliability on seven different temperatures ranging from 0 • C to 60 • C. For each PUFs circuit, we collected 81,920 responses at each temperature, and the responses collected at 30 • C are used as reference responses. The noise L noise in response is close to a state of normal distribution, where the expected value is 0.0494, the variance is 0.0131, and the biggest value of noise is less than 0.09. According to Equation (6), the reliability is 0.9506.

Randomness
The randomness of PUFs describes the randomness of the responses generated by the PUFs circuit. In other words, the uniformity represents the possibility of "0" or "1" occurs in every bit. Therefore, the ideal randomness of PUFs is 50%. If the randomness is close to 50%, it will become difficult for the adversary to guess the actual response of PUFs. The randomness of PUFs is calculated as Equation (5): where p is the proportion of the bit "1" in the response. The experiment results show that the randomness of these 16 implemented BR-PUF is between 0.9261 and 0.9999, and the average value is 0.9865.

Uniqueness
The uniqueness is used to evaluate the difference of the response between two PUFs circuits, when they work on the same condition and receive the same challenge. The ideal value of uniqueness is 50%, and it means half of the bits in response from two PUFs circuits are different. The uniqueness of PUFs circuit is computed as Equation (6): where k is the number of PUFs circuits. The experiments shows that the uniqueness of these 16 BR-PUF obeys normal distribution, where the expected value is 0.4892 and the variance is 0.322. Therefore, the uniqueness of our implementation is 0.4892.

Performance of Generated Key-Stream from PUFs
The responses of the PUFs circuit are used to generate the secret key to mutate the Huffman tree. Thus, it is necessary to evaluate the reliability of the PUFs key and check whether the responses produced by the PUFs circuit meet the requirements of acting as the random number.
The fuzzy extractor was implemented by Reed-Muller code (16,5,8) and Repetition code (16,1,16) and the probability of generating the incorrect PUFs key is reduced to 1.7257 × 10 −12 , which is a sufficient small value. Therefore, the PUFs key is practical to be used in our scheme. Moreover, NIST random tests [32] are carried on the generated responses. The results are listed in Table 2 and pass all the 15 checks. As a result, the responses from the PUFs circuit are suitable to be used for the key generation and the PUFs key is reliable to be integrated into our JCAE scheme.

Performance of Proposed JCAE Scheme
To further evaluate the actual performance of our JCAE scheme, we implemented our JCAE hardware circuit and constructed a file protection system for cloud server. The proof-of-concept prototype of the file protection system is shown in Figure 12. The system mainly consists of three parts, including the cloud server, the hardware implementation of our proposed JCAE scheme, and the communication channel. Cloud server: The cloud server that is responsible of storing the huge volume of data was implemented on the Fiber Home server Fit Server R2200 V5. To protect the sensitive data, the file protection service will send the data stream of the input file to the hardware circuit for further processing. The ciphertext that is compressed and encrypted by the hardware circuit will be sent back and stored on the cloud server.
Hardware platform: The hardware circuit of our proposed JCAE scheme was instanced on the FPGA Xilinx LX110T, which is shown in the right side of Figure 12. The Data Channel is used to exchange the data stream with the help of PCIE and the Reg Channel receives the control signal from the cloud server to trigger the system control module (sys_ctrl). The JCAE core is composed of two connected compression modules, which implement LZWR1 (Variation of LZW) compression and Huffman lossless compression respectively. It is noted that the introducing of LZWR1 is to a common method to improve the compression rate and make our JCAE scheme more practical to be used in the file protection system. This method maps the entropy source of the input file to a proper range, where the Huffman encoding executes efficiently. The PUFs key generation (PUFs_key_gen) module generates the PUFs key and mutates the Huffman tree to jointly compress and encrypt the data. The encrypted data will be transmitted to the PCIE through the output control module (out_ctrl).
Communication channel: The communication channel is constructed by PCIE to share the data between the cloud server and the hardware platform and achieve a high transmission rate. We used the open-source project to implement the PCIE module on both cloud server and hardware platform.
Twenty-nine various files chosen from Calgary and Canterbury corpus were input into our file protection system to evaluate the performance of our JCAE scheme. The test results are recorded in Table 3 from three aspects, including the operation time, the throughput, and the compression ratio. The throughput of our proposed scheme is efficient. As can be seen from Table 3, the throughput for most of the files precedes 400 Mbps, which is an efficient value. However, the throughput for some small files is fluctuant. The file is split into 16 KB data blocks to be transmitted between the cloud server and the hardware platform. If the file size is small and no more than 100 KB, serval data blocks cannot reflect the working efficiency of the compression pipeline. Therefore, the throughput for big files proves the operational efficiency of our scheme.
The compression ratio of our scheme is reasonable and satisfactory. The compression ratio is defined as the following Equation (7): The compression ratio for most of the files in our test is variable and approximately around 60%. The difference is mainly caused by the various entropy value in different test files. In other word, repeated data or data that can be compressed will not be the same in each file. In general, the compression ratio of our scheme is proved to achieve a good value.
Although the operation time is closely related to the size of the input file, the two of the biggest test files book1 and kennedy.xls only take 14.18 ms and 13.13 ms, respectively, to finish the joint compression and encryption. Therefore, the operation times is efficient to prove that our scheme is practical to be used in the file protection system.

Security Analysis
In this section, we analyze the security of scheme based on four different attack models, including one physical attack model and three traditional attack models, to further prove that our scheme is effective to enhance the security level of existing date protection methods.

Physical and Cloning Attacks
With the abilities of performing physical and cloning attacks, the adversary would derive the secret key from the cloud server by methods of power analysis. The attacker may use the physical attack method of power analysis [33] to derive the secret key stored in devices. When the PUFs circuit is integrate into our JCAE scheme, there is no need to store any secret key. Moreover, the PUFs circuit is unclonable as evaluated in Section 4.1, and the attacker cannot clone the circuit to acquire the real PUFs response. The attacker can only guess the correct secret key with 128-bit length, the probability of which is sufficiently small. The PUFs key that is derived from the hardware circuit will be used to mutate the Huffman tree to realize the function of encryption in our proposed scheme. Therefore, it is hard for the adversary to obtain the right PUFs key and build the corresponding Huffman tree to decrypt the ciphertext. In general, our proposed JCAE scheme is resistant to physical and cloning attacks.

Ciphertext-Only Attack
The ciphertext-only attack is the least favorable for the adversary among these three traditional attack models. Under this model, the adversary can only obtain the ciphertext of the protected sensitive data. The ciphertext encoded by the methods of the mutated Huffman Tree has the good randomness with statistical irregularities. Thus, the adversary must perform the exhaustive key research on a large key space. As is described in section three, the key is the response of the PUFs circuit, which has the bit length of 128. As a result, the key space of our scheme is 2 128 and makes it impossible for the adversary to operate the exhaustive key research to construct the right Huffman tree. In general, our scheme is resistant to ciphertext-only attack.

Know-Plaintext Attack
The know-plaintext attack model is stronger than the ciphertext-only attack. In this attack model, the adversary can obtain a number of plaintext/ciphertext pairs by leakage of insider information or guessing. The goal of the adversary is to rebuild the mutated Huffman tree that is used to encrypt the plaintext. Thus, it is necessary to find out the corresponding relationship between the output bits and symbols in the input data. However, the synchronization problem is extremely difficult for the adversary to solve, since the length of each encoded/encrypted symbol has at least two different possibilities.
Then, the difficulty is evaluated by calculating the number of the different synchronization ways. It is assumed in this situation that the input data has M = 32N symbols and each symbol has two possible bit lengths of l i and l j . As a result, the bit length of the output is shown in Equation (8): The possible ways of synchronization are calculated as C L M , where L is the number of symbols with l i bit length. If the M is set to 100 and SP is the number of the synchronization patterns, when L increases from 1 to 99, the logarithm number log 2 SP will increases from 6.67 to 96.35, and reaches the peak when L equals 50. In conclusion, if we choose the parameters properly, the number of the synchronization patterns is close to 2 100 , making the adversary hard to solve the synchronization problem as shown in Figure 13. Therefore, our scheme is resistant to know-plaintext attack.

Chosen-Plaintext Attack
The adversary has the most powerful capabilities in the chosen-plaintext attack model. In this model, the adversary can send random symbol to the encryption system and get the corresponding output bit stream, with the purpose of deriving the secret key to break the security of the encryption system. If the adversary sends the plaintext byte by byte and analyzes the output bit stream continuously, our scheme is vulnerable to this attack model. However, to effectively resist this kind of attack, our scheme accumulates the input symbols to form a big chunk and outputs the corresponding bit stream at one time, as described section three. The size of the data chunk is closed related to the length of the vector P. Although this accumulation method will bring the time delay, the performance evaluation mentioned above has proven the influence is limited.
Moreover, it is also a common attack method for the adversary to launch the differential attack by resetting the encryption system in this chosen-plaintext model. For example, the adversary would first insert one symbol X at the beginning of a whole symbol chunk (0, 0, ,0, 0) to construct a new symbol chunk, as (X, 0, ,0, 0) in this attack method. Then, the encryption system will encode/encrypt these two chunks and outputs two bitstream with different length x, y respectively. By analyzing the result of |x-y| with the help of the basic Huffman table, the adversary can obtain the real length of the encryption result of X. If this process is repeated many times, the synchronization method mentioned in the above attack model would be solved. In general, to withstand the chosen-plaintext attack, our JCAE scheme should encode/encrypt a whole chunk at one time and restrict the reset request.
In conclusion, although the security mechanism provided by our scheme is not as strong as the common encryption algorithms, the physically secure feature is valuable, and actually improves the security level of data storage.

Conclusions and Future Research
To enhance the data storage method of cloud servers with the high operational efficiency and robust security, we proposed a Huffman-based JCAE for the secure data storage using PUFs. Firstly, the BR-PUF circuit was instanced to make the proposed scheme resist physical and cloning attacks. Secondly, the modified mutated Huffman tree technique was proposed to reduce the computational cost. A parallel architecture was designed to in the hardware platform to reduce the execution time. Lastly, the LZW algorithm was combined with the Huffman algorithm to get a satisfactory compression ratio, which made it generic and practical to adapt to the existing data storage method.
Finally, the proposed JCAE scheme was implemented on the FPGA Xilinx LX110T and then integrated into a proof of the concept file protection system. The results showed that the PUFs in the scheme are qualified with reliability value 0.9506, randomness value 0.9865, and uniqueness value 0.4892, and the key-stream generated by PUFs passed NIST random tests. Further tests with Calgary and Canterbury corpus were run in the file protection system for cloud server. The average throughput is 473 Mbps and average compression ratio is 55.86%. Additionally, the security was analyzed based on Physical and Cloning Attacks, Ciphertext-Only Attack, Know-plaintext Attack, and Chosen-Plaintext Attack, which further proved the proposed scheme enhances the security level of existing date protection methods.
However, we must point out the limitations of our scheme: the BR-PUF adopted in this paper can only be implemented on the application architecture with strong PUFs structure, so it is not suitable for the application architecture with weak PUFs structure. How to design a JCAE scheme using weak PUFs is a future plan to be solved. It is suggested to make compound cryptosystem with weak PUFs, and CRP extracted from weak PUFs is taken as the seed to generate public-private key pair.
Furthermore, the higher throughput and security have always been the constant research trend of JCAE. How to integrate the compression algorithm and encryption algorithm as far as possible under the premise of maintaining the compression ratio is the main development direction in the future.