WGAN-E: A Generative Adversarial Networks for Facial Feature Security

: Artiﬁcial intelligence technology plays an increasingly important role in human life. For example, distinguishing di ﬀ erent people is an essential capability of many intelligent systems. To achieve this, one possible technical means is to perceive and recognize people by optical imaging of faces, so-called face recognition technology. After decades of research and development, especially the emergence of deep learning technology in recent years, face recognition has made great progress with more and more applications in the ﬁelds of security, ﬁnance, education, social security, etc. The ﬁeld of computer vision has become one of the most successful branch areas. With the wide application of biometrics technology, bio-encryption technology came into being. Aiming at the problems of classical hash algorithm and face hashing algorithm based on Multiscale Block Local Binary Pattern (MB-LBP) feature improvement, this paper proposes a method based on Generative Adversarial Networks (GAN) to encrypt face features. This work uses Wasserstein Generative Adversarial Networks Encryption (WGAN-E) to encrypt facial features. Because the encryption process is an irreversible one-way process, it protects facial features well. Compared with the traditional face hashing algorithm, the experimental results show that the face feature encryption algorithm has better conﬁdentiality.


Introduction
The general methods of facial feature extraction are: Gabor features, Haar-like features (HAAR), Histogram of Oriented Gradient (HOG), and Local Binary Pattern (LBP) [1]. The advantage of the traditional face recognition method is that it runs more quickly under the CPU. The disadvantage is that the recognition rate is relatively low, because features need to be specified manually and are not "autonomous" as in deep learning. Recently, the face recognition process based on deep learning mainly uses convolutional neural networks [2,3]. The disadvantage is that it runs very slowly under the CPU [4,5].
In recent years, biometric technology has made great progress and has gradually penetrated all aspects of human life. However, with the continuous deepening of the application of biometric technology, its inherent security and privacy protection issues have gradually been exposed. The biological characteristics of the human body are fixed. Once they are lost and used by criminals for When deep learning is applied to sensitive datasets, there are many privacy-related implementation issues. These issues are particularly evident in the medical, financial, legal, and government industries. Homomorphic encryption allows the server to infer the input of the client's encryption [19]. For any model depth, there is no complete implementation of the generic deep learning operation using homomorphic encryption. Related scholars have proposed a new method to effectively implement multiple deep learning functions by using bootstrap homomorphic encryption. The findings provide a promising direction for user-regression of privacy-protected representation learning and data control [20].
With the rapid development of current Internet networks and online applications, network traffic classification becomes more and more important. Much of the research on this topic has led to many different approaches. Most of these methods use predefined features extracted by experts to classify network traffic. In contrast, Mohammad Lotfollahi et al. proposed a deep learning-based approach that integrates feature extraction and classification into a single system [21]. The proposed solution, called Deep Packet, can handle two traffic characteristics, one is to classify network traffic into major classes (such as FTP (File Transfer Protocol) and P2P (peer to peer lending)), and the other is to require end-user application identification (such as BitTorrent).
As neural networks are applied to increasingly complex tasks, they are often trained to meet the ultimate goal of transcending simple functional specifications and address multi-agent problems. Through these efforts, Google Brain discovered that neural networks can learn to protect communication security [22]. However, neural networks are not good at cryptography. It is worth noting that, under neural network encryption, a single neuron cannot learn the XOR, but a multilayer network can do it. However, neural networks can learn how to protect the confidentiality of data from other neural networks: they discover how to encrypt and decrypt forms without having to teach a specific algorithm for these goals.
Martin Arjovsky et al. introduced an algorithm of Wasserstein GAN (WGAN), which is an alternative to traditional GAN training [23,24]. In this new model, they demonstrated that WGAN can improve learning stability, get rid of pattern collapse and other issues, and provide meaningful learning curves for debugging and hyperparametric search. In addition, the study demonstrates that the corresponding optimization problem is reasonable and provides a wide range of theoretical work, highlighting the deeper connections to other distances between distributions.
WGAN has made progress in stabilizing GAN, but sometimes it can only generate bad samples or fail to converge. Ishaan Gulrajani et al. found that these problems were usually due to the use of weight-cutting in WGAN to impose Lipschitz constraints on critics, which could lead to bad behavior. Thus, Gulrajani et al. proposed an alternative method of weight reduction: penalizing the critics' gradient specification relative to their input [25]. The proposed method performs better than the standard WGAN and can stably train multiple GAN architectures with almost no hyperparameter adjustment, including 101-layer ResNet (Residual Neural Network) and a language model with continuous generators.
This study introduces face feature extraction. The re-encoded facial features are then encrypted using a model trained against the neural network WGAN-E.

Face Feature Description
The LBP primitive operator introduced by Ojala et al. is a powerful texture description method. The operator marks the pixels of the image as a 3 × 3-neighbor of each pixel and considers the result as a binary number. LBP can also be used as a biometric description. The principle of the basic LBP operator is shown in Figure 1. Later, the method was extended to allow the use of circular areas of different sizes and bilinear interpolation of pixel values, allowing for any radius and number of Electronics 2020, 9, 486 5 of 20 pixels nearby. For neighboring regions, the study uses the symbol LBP(P, R), which means that the P-sampling point on the circle of the radius of (see Figure 2) is an example of a circle (8,2) [1].
Electronics 2020, 9, x FOR PEER REVIEW 5 of 20 Figure 1. LBP principle. The center is 2; the numbers that are larger than 2 become 1, while the numbers that are smaller than 2 become 0. This study uses the following notation for the LBP operator: iI xy where n is the number of different tags generated by the LBP operator This is a distribution of edges, spots, and planar regions over the entire image. For valid facial representations, it should also keep spatial information. For this purpose, the image is divided into regions 0 In this histogram, the effective description of the face is: the histogram of the label contains pixellevel information, and the information generated by the label and region level is added together. This information establishes a globally described face description of the regional histogram. Figure 1. LBP principle. The center is 2; the numbers that are larger than 2 become 1, while the numbers that are smaller than 2 become 0.
Electronics 2020, 9, x FOR PEER REVIEW 5 of 20 Figure 1. LBP principle. The center is 2; the numbers that are larger than 2 become 1, while the numbers that are smaller than 2 become 0. This study uses the following notation for the LBP operator: where n is the number of different tags generated by the LBP operator This is a distribution of edges, spots, and planar regions over the entire image. For valid facial representations, it should also keep spatial information. For this purpose, the image is divided into regions In this histogram, the effective description of the face is: the histogram of the label contains pixellevel information, and the information generated by the label and region level is added together. This information establishes a globally described face description of the regional histogram. This study uses the following notation for the LBP operator: LBP u 2 P,R . The subscript indicates the use of operators near (P, R). Superscript u 2 means that only the unified mode ((P,R) = (8,2) is used uniformly to calculate LBP and all remaining modes are marked as one tag.
The histogram of the marked image f I (x, y) can be defined as where n is the number of different tags generated by the LBP operator Electronics 2020, 9, 486 6 of 20 This is a distribution of edges, spots, and planar regions over the entire image. For valid facial representations, it should also keep spatial information. For this purpose, the image is divided into regions R 0 , R 1 , · · · , R m−1 and the spatially enhanced histogram is defined as In this histogram, the effective description of the face is: the histogram of the label contains pixel-level information, and the information generated by the label and region level is added together. This information establishes a globally described face description of the regional histogram.

Principles of Encryption and Decryption
A complete cryptosystem includes plaintext space, key space, and algorithms. Using the associative memory neural network for encryption, this study only uses the iteration of the network, regardless of whether it can be added to the stable point of the network. This study only needs to extract a given face feature input network and train it with a specified number of iteration steps. It not only can be encrypted, but is also simple and easy. The disadvantage is that there is a certain key redundancy along with the improvement of security.
Since there are many noise-added samples and redundant attractors in the associative memory, if these data were separated from a specific neural network and number of iterations, the data would be meaningless. Therefore, this study proposes a method for neural network encryption using noisy samples and redundant attractors.
Encryption principle: There are two types of cryptosystems, single key and dual key. This article uses the traditional single key system, that is, the confidentiality of the system depends on the security of the key. Of course, if the encryption is not publicly disclosed, then the security level of the algorithm and key system is high. In the traditional encryption system, the operations of the encryption algorithm and the decryption algorithm are performed under the control of a set of keys, which are called encryption key and decryption key, respectively.
Encryption process: Assume that there is a plaintext of length 256. The packets are encrypted and XORed according to the secret key generated by the above process, or other higher encryption algorithms are used.
Decryption process: The receiver intercepts the vector after the plaintext according to the protocol and compares the network parameters transmitted through the relatively secure channel. The 24th bit of the vector is intercepted, and then 4 bits of information are intercepted according to the protocol (each vector has three vectors). The algorithm needs to calculate the weights of the network and iterate the positive samples in groups of four for five iterations to obtain the results. The numbers after the 24th one (or 4 × 33 + 24) are intercepted. A set of inputs is decomposed into the five five-dimensional samples starting from the 55th bit, and the result of iterating three times is the key. The key and the ciphertext are XORed to obtain the plaintext.

WGAN-E Encrypts Facial Feature Data
This section discusses how to use shared keys to protect the confidentiality of facial features. It describes the organization of the system in this study and the role of various parts of the system. It also illustrates the architecture of the neural network model used in this study.
The classic security scenario involves three aspects: Users, Servers, and B competitors. Generally, A's facial features need to be stored securely on the server, while B wants to steal A's facial features. Therefore, the required security attribute is confidentiality rather than integrity. Define an adversary as a "passive attacker" who can intercept communications. However, beyond that, the adversary's permissions are very limited: it cannot start a session, inject messages, or modify messages during transmission [25].
This study begins with a particularly simple example of this scenario (Figure 3), in which A wants to send a separate secret face feature FF (representing the Facial Feature, "Face Feature") to the Server. The face feature Facial Feature is the input to A. When A processes this input, it produces an output EFF (representing Encrypt Facial Feature, "Encrypted Face Features"). Both Server and B receive the EFF, process it, and attempt to recover the face feature FF. This study uses P B and P S to represent their calculation results. A and Server have an advantage over B: they share a Key. This study uses K as an additional input to A and Server. Assume that each face feature FF has a new Key, but, at least on this abstract level, this study does not mandate that the Key and FF are the same length. Although this setting is basic, it is sufficient for the basic plan. Especially considering that A and Server use Key as a one-time key between facial features FF and encrypted facial features EFF . However, this study does not require A and Server to operate in this way. In fact, in the threepart experiment, other options were found. For the sake of simplicity, this study not only ignores the process of generating keys, but also ignores the randomness of probability encryption (Goldwasser and Micali, 1984) [25]. These enhancements may be the subject of further work.  (Goldwasser and Micali, 1984), this study does not require the encrypted face feature FF to "seep random" to B . Encrypted face features may even contain explicit metadata to identify it. Therefore, for B, distinguishing between FF and random values extracted from a distribution is not a goal. In this regard, B's goal is in stark contrast to the opponents of GANs [26].
On the other hand, this study can try to redefine the goal of B to distinguish the encrypted face features composed of two different files. With these goals in mind, this study does not separately train A and Server to implement some known cryptosystems (Dourlens, 1996), but jointly trains A and Server to allow them to communicate successfully without pre-designated concepts. In the case of defeat for , as with the definition of GANs, this study hopes that A and Server defeat the most likely B , not a fixed B . Of course, A and Server may not win every plaintext and every keyword, because knowledge of certain face features and keys may be fixed in B . (For example, B can output the same facial features, at least once is correct.) Therefore, this study assumes a distribution on the face features and keys, and For the face feature encryption process, A, Server, and B are neural networks. Their structure is described in Section 3.4. They all have parameters, which are recorded as ω A , ω Server , and ω B , respectively. Because ω A and ω Server do not need to be equal, even if A and Server have the same structure, encryption and decryption are not necessarily the same function. For neural networks, A, Server, and B are all in a tuple of values, not a sequence of bytes. In other words, Key, FF, FF Server , FF B , and EFF are all tuples of values. Note that with this formula, EFF, FF B and FF Server can be composed of arbitrary values, even if FF and Key consist of 0 and 1. In practice, the implementation of this study limits these values to the range, but allows intermediate values.
Although this setting is basic, it is sufficient for the basic plan. Especially considering that A and Server use Key as a one-time key between facial features FF and encrypted facial features EFF. However, this study does not require A and Server to operate in this way. In fact, in the three-part experiment, other options were found. For the sake of simplicity, this study not only ignores the process of generating keys, but also ignores the randomness of probability encryption (Goldwasser and Micali, 1984) [25]. These enhancements may be the subject of further work.
The goal of B is simple: accurately reconstruct FF (in other words, minimize the error between FF and FF Server ). A and Server want to communicate clearly (minimize the error between FF and FF Server ), as well as hide their communication with B. Note that, according to the modern password definition (Goldwasser and Micali, 1984), this study does not require the encrypted face feature FF to "seep random" to B. Encrypted face features may even contain explicit metadata to identify it. Therefore, for B, distinguishing between FF and random values extracted from a distribution is not a goal. In this regard, B's goal is in stark contrast to the opponents of GANs [26]. On the other hand, this study can try to redefine the goal of B to distinguish the encrypted face features composed of two different files.
With these goals in mind, this study does not separately train A and Server to implement some known cryptosystems (Dourlens, 1996), but jointly trains A and Server to allow them to communicate successfully without pre-designated concepts. In the case of defeat for , as with the definition of GANs, this study hopes that A and Server defeat the most likely B, not a fixed B. Of course, A and Server may not win every plaintext and every keyword, because knowledge of certain face features and keys may be fixed in B. (For example, B can output the same facial features, at least once is correct.) Therefore, this study assumes a distribution on the face features and keys, and describes the goals of A and Server based on the expected values [27].
Electronics 2020, 9, 486 8 of 20 This study uses A(ω A , FF, Key) to represent the output of A on input FF and Key, S(ω Server , EFF, Key) to represent the output of Server on input FF and Key, and B(ω B , EFF) to represent the output of B on input C. At the same time, the distance function d is introduced in the face feature, although the exact choice of this function may not be important. For the specific operation, this study where N is the length of the face feature. This study defines the loss function for each instance for B: Intuitively, L B (ω A , ω B , FF, Key) represents how much B is wrong when the face feature is FF and the key is Key. This study also defines the distribution of a loss function for the face feature and key through the expected value for B: This study obtains the "best B" by minimizing this loss: Similarly, this study defines a sample reconstruction error for Server and extends it to a distribution of face features and keys: This study defines the loss function of A and Server by combining the optimal values of L Server and L B : This combination reflects A and Server's desire to minimize Server rebuild errors and maximize "optimal B" rebuild errors. The use of simple subtraction is somewhat arbitrary, the following study will describe useful variants.
By minimizing L AServer (ω A , ω Server ), this study obtains "Best A and Server": We write "optimal" in quotes because there is no need for a global minimum. In general, there are many optimal solutions for A and Server. As a simple example, assume that the key has the same size with the face feature and the encrypted face feature, A and ServerXOR may obtain face features and encrypted face features, respectively, the key to the arrangement, and all the permutations, is: the same is good, as long as A and Server are the same. By way of building the network of this study (see Section 3.4), all permutations may occur.
Training begins with random initialization of the A and Server networks. The purpose of training is to go from this state to (O A , O B ), or close to (O A , O B ). Next, we explain the training process [25,26,28]. Algorithm 1 describes the implementation principle of WGAN-E. Require: c, the clipping parameter. m, the batch size. β 1 β 2 , hyperparameters parameters. n critic , the number of iterations of the critic per generator iteration. The gradient penalty coefficient λ, Adam hyperparameters α (the learning rate). w 0 , initial critic parameters. θ 0 , initial generator's parameters. x, face image data vectors. z, face LBP data vector.
1: while θ has not converged do 2: for t = 0, . . . , n critic do 3: ∼ P r a batch from the real data.
∼ p(z) a batch of prior samples.
∼ p(z) a batch of prior samples.

WGAN-E Neural Network Structure
A, Server, and B Architecture: This study aims to create a neural network architecture that fully learns XOR and other hybrid functions, but does not strongly encode any particular form of the algorithm [29,30].
To this end, the study selected the following "mixing and conversion" architecture. It has a first fully connected (FC) layer where the number of outputs is equal to the number of inputs. Face features and keys are entered into this FC layer. Because each output bit can be a linear combination of all input bits, this layer is enabled-but not enforced-to mix between key and face features. In particular, this layer can swap bits. Before the FC layer is a series of convolutional layers, the last layer producing an output suitable for the size of the face feature or the encrypted face feature. These convolutional layers for learning apply some functions to the bit groups that are mixed by the previous layer without specifying what the function should be. It is worth noting that, in image processing applications, the reverse order (convolution and FC) is more common. Neural networks developed for these applications often use convolution to take advantage of spatial locations. For neurocryptography, this study specifically wants localization. Although it will definitely manually pair each input face feature manually using the corresponding key bits, this study suggests that doing so would be boring [22,31]. WGAN-E Neural network structure is shown in Figure 4. This study avoids imposing more restrictions to simplify the problem. For example, we do not associate A with the Server parameter, as we thought both A and Server should learn the same function, such as XOR .

Experiment and Analysis of Results
The training and test dataset used is the LFW (Labeled Faces in the Wild) dataset. This is a commonly used test set for face recognition. The face pictures provided in it are all derived from natural scenes in life, so the recognition difficulty is increased, especially due to factors such as multiple poses, lighting, expressions, age, and occlusion. Photos of the same person also vary widely. In some photos, more than one face may appear. For these multi-face images, only the center coordinate face is selected as the target, and other areas are regarded as background interference. There are 13,233 face images in the LFW dataset. Each image gives the corresponding person name. There are 5749 people, and most of them have only one picture. The size of each picture is 250 × 250, most of which are color images, but there are also some black and white face pictures. This study avoids imposing more restrictions to simplify the problem. For example, we do not associate A with the Server parameter, as we thought both A and Server should learn the same function, such as XOR.

Experiment and Analysis of Results
The training and test dataset used is the LFW (Labeled Faces in the Wild) dataset. This is a commonly used test set for face recognition. The face pictures provided in it are all derived from natural scenes in life, so the recognition difficulty is increased, especially due to factors such as multiple poses, lighting, expressions, age, and occlusion. Photos of the same person also vary widely. In some photos, more than one face may appear. For these multi-face images, only the center coordinate face is selected as the target, and other areas are regarded as background interference. There are 13,233 face images in the LFW dataset. Each image gives the corresponding person name. There are 5749 people, and most of them have only one picture. The size of each picture is 250 × 250, most of which are color images, but there are also some black and white face pictures.

Data Preprocessing
The original LBP face features are not directly training. We need to improve these facial features (as shown in Figure 5). The improved methods of this study are as follows. Improvements to LBP face features: We note that LBP face features are composed of 8-bit binary tuples, thus it is only necessary to insert random 8-bit random between these tuples to generated key. In particular, the face features mentioned below refer to the modified face features.

Data Preprocessing
The original LBP face features are not directly training. We need to improve these facial features (as shown in Figure 5). The improved methods of this study are as follows. Improvements to LBP face features: We note that LBP face features are composed of 8-bit binary tuples, thus it is only necessary to insert random 8-bit random between these tuples to generated key. In particular, the face features mentioned below refer to the modified face features. Our training method is based on Adam (Adaptive Moment Estimation). In practice, our training methods are very different from those of GANs [32,33]. Our training methods have been improved in several ways, and some improvements have been made to the goals of Section 3.3. Next, we introduce these improvements and give further details.
First, training relies on hundreds of "small-batch" calculations. We do not calculate "best B" for given datasets. We simply approximate it, alternating B's training with A's and Server's. Intuitively speaking, training may be roughly as follows. A may initially create some encrypted face features, but Server and B do not understand at all. By training a few steps, A and Server may find a way to communicate so that Server can at least partially decrypt A's facial features, but B cannot understand this. In particular, A and Server may find some trivial conversions. However, after some training, B may begin to learn to crack part of the facial feature data. With more training, A and Server may find ways to improve encryption, especially that can better utilize key facial features for encryption. B finally finds that it is impossible to easily crack the data of these key facial features. This kind of alternation is a typical confrontation game; the theory of continuous adversarial game involves the results of balancing, which may apply to our problem [34].
In addition, in the training of A and Server , we do not try to maximize the reconstruction error of B . If B is completely wrong, B can flip all output bits completely at the next iteration. A more realistic and useful goal for A and Server is to minimize the mutual information between B's guess and real face features. In the case of symmetric encryption, this goal is equivalent to making the answer generated by B indistinguishable from random guessing. This approach is somewhat similar to the approach designed to prevent overtraining of current opponents (Salimans et al., 2016) [33]. In addition, we can also adjust the loss function so that it does not have a big impact on B, nor does it fix some minor errors in Server.
Finally, once we stop training A and Server , they choose their own cryptosystem. This study verifies that this work is good and tries to break the cryptosystem instance by training many Bs. Some of these examples may come from the early stages of training. Our training method is based on Adam (Adaptive Moment Estimation). In practice, our training methods are very different from those of GANs [32,33]. Our training methods have been improved in several ways, and some improvements have been made to the goals of Section 3.3. Next, we introduce these improvements and give further details.
First, training relies on hundreds of "small-batch" calculations. We do not calculate "best B" for given datasets. We simply approximate it, alternating B's training with A's and Server's. Intuitively speaking, training may be roughly as follows. A may initially create some encrypted face features, but Server and B do not understand at all. By training a few steps, A and Server may find a way to communicate so that Server can at least partially decrypt A's facial features, but B cannot understand this. In particular, A and Server may find some trivial conversions. However, after some training, B may begin to learn to crack part of the facial feature data. With more training, A and Server may find ways to improve encryption, especially that can better utilize key facial features for encryption. B finally finds that it is impossible to easily crack the data of these key facial features. This kind of alternation is a typical confrontation game; the theory of continuous adversarial game involves the results of balancing, which may apply to our problem [34].
In addition, in the training of A and Server, we do not try to maximize the reconstruction error of B. If B is completely wrong, B can flip all output bits completely at the next iteration. A more realistic and useful goal for A and Server is to minimize the mutual information between B's guess and real face features. In the case of symmetric encryption, this goal is equivalent to making the answer generated by B indistinguishable from random guessing. This approach is somewhat similar to the approach designed to prevent overtraining of current opponents (Salimans et al., 2016) [33]. In addition, we can also adjust the loss function so that it does not have a big impact on B, nor does it fix some minor errors in Server.
Finally, once we stop training A and Server, they choose their own cryptosystem. This study verifies that this work is good and tries to break the cryptosystem instance by training many Bs. Some of these examples may come from the early stages of training.

Experimental Process
As a proof of concept, this study implements A, Server, and B networks that use N-bit random face features and encrypted face features and generate N-entry floating-point encrypted face features, N = 8, 16, 32, and 64 (in the previous study, two face feature formats have been normalized to an 8-bit "0-1" format). Face features and keys are evenly distributed. The key is not intentionally reused, but reappears due to random selection. The experiment considers a more interesting detail which allows for different sizes of face features and keys. This study used TensorFlow (Abadi et al., 2016a; b) to perform our experiments. We used a Nvidia V100 GPU Server with memory size 32 GB for training. The specific computing platform does not affect the results of the experiment.
The network of this study follows the "mix and convert" model, described in Section 3. The network connects two n-bit inputs (face features and keys) to a 2N entry vector, using −1 (−1 for 0) and 1 to represent the bit value. The convolutional layer is described in terms of window size, input depth, and output depth. Each step has a "step"-the amount of movement of the window. Intuitively, the layer slides out a size 2 window from the 2N output elements of the FC layer and outputs two numbers: output depth and step. In addition to the last layer, we use a sigmoid nonlinear unit. In the last layer, after reducing the output to N elements, we use a tanh nonlinear unit. tanh bringing the value back to a range that can be mapped to a binary value. The Server network is the same as the A network, and the B network only takes the encrypted face feature as input.
To train the network, this study used a "mini-batch" size, with one batch having 2000 facial features. We used TensorFlow's Adam (Kingma and Ba, 2014) optimizer to learn at a rate of 0.001 [31]. The entire training did not slow down the learning rate because we want A, Server, and B to continue to react strongly to changes in other components until A and Server-hopefully-implement a solution that has enough support for additional changes to B.
Training is alternated between A/Server and B. A and Server train a small batch and then train for next small batches. We choose this ratio to give the adversary B a slightly calculated advantage without over-training it, leading to it becoming too strong for the current parameters of A and Server. From an empirical point of view, this value is very effective [35].
The the loss function of B is as described above: the L2 distance between the guess of B and the face feature of the input. The loss functions of A and Server are more complex, as shown in Sections 3.2 and 3.3. This function has two components, related to the server's refactoring errors and the eavesdropper's success. The first component is the L2 distance between the Server output and the input face feature. On the other hand, the latter is (N/2 − B L2 error) 2 /(N/2) 2 . This definition expresses the goal described in Section 3, i.e., B should not do better than random guessing. Therefore, when half of the messages are wrong and half are correct, this component is minimized. We chose a quadratic formula to emphasize B made a big mistake, and gave a few penalties when B correctly guessed a few bits, occasionally, even if B's guess is random. Using this formula allows us to have a meaningful loss function for each example (rather than looking at larger batch statistics) and improve the robustness of the training. The cost is that our last well-trained A and Server allow B to get more information than a simple random guess. We have not yet achieved satisfactory results because the linear (rather than quadratic) loss function is linear in B's reconstruction error. The best solution is still an open question [36].
The WGAN-E face feature encryption and decryption system working process is shown in Figure 6.
Electronics 2020, 9, x FOR PEER REVIEW 13 of 20 Figure 6. WGAN-E face feature encryption and decryption system. Note: DB is database. A facial feature is generated by LBP feature extraction to generate a series of codes for representing feature values. These face feature codes are stored in the server after being encrypted by WGAN-E face features. When face information needs to be identified, the facial features are extracted from the server and the original facial features can be obtained by WGAN-E decryption.

Analysis of Experimental Results
After successful training, the output result passes through a precision threshold (e.g., up to 0.05 bits of reconstruction error A and Server, rather than random guess prediction), we reset the B network, train it five times from scratch, each step is 250,000, and record the best results through B.

Analysis of Experimental Results
After successful training, the output result passes through a precision threshold (e.g., up to 0.05 bits of reconstruction error A and Server, rather than random guess prediction), we reset the B network, train it five times from scratch, each step is 250,000, and record the best results through B. If A/Server combination failed to achieve the target below threshold at 150,000 steps, and at the same time if retraining B gains a substantial advantage, the solution to this problem is non-robust. Otherwise, we think this is a successful training result. Figures 7 and 8 show the number of successful runs, server rebuild errors, and B rebuild errors with N = 8 face features and key training steps, using a small batch of 2000. Each point in the graph is the average error in 2000 examples. The ideal result is that the server's rebuild error drops to zero, and B's rebuild error reaches 8 (half the error). In this example, both refactoring errors are high. After some time, A and Server began to communicate effectively, but, to a certain extent, B could gradually improve their understanding. Then, in step 10,000, A and Server oppose B's progress. By 15,000 steps, the training objectives were achieved. The remaining steps only slightly increased the B reconstruction error. Figure 6. WGAN-E face feature encryption and decryption system. Note: DB is database. A facial feature is generated by LBP feature extraction to generate a series of codes for representing feature values. These face feature codes are stored in the server after being encrypted by WGAN-E face features. When face information needs to be identified, the facial features are extracted from the server and the original facial features can be obtained by WGAN-E decryption.

Analysis of Experimental Results
After successful training, the output result passes through a precision threshold (e.g., up to 0.05 bits of reconstruction error A and Server, rather than random guess prediction), we reset the B network, train it five times from scratch, each step is 250,000, and record the best results through B. If A/Server combination failed to achieve the target below threshold at 150,000 steps, and at the same time if retraining B gains a substantial advantage, the solution to this problem is non-robust. Otherwise, we think this is a successful training result. Figures 7 and 8 show the number of successful runs, server rebuild errors, and B rebuild errors with N = 8 face features and key training steps, using a small batch of 2000. Each point in the graph is the average error in 2000 examples. The ideal result is that the server's rebuild error drops to zero, and B's rebuild error reaches 8 (half the error). In this example, both refactoring errors are high. After some time, A and Server began to communicate effectively, but, to a certain extent, B could gradually improve their understanding. Then, in step 10,000, A and Server oppose B's progress. By 15,000 steps, the training objectives were achieved. The remaining steps only slightly increased the B reconstruction error.  The training diagram in Figure 9 does not appear to be a typical success result of a neural network. In this process, from the beginning to the end oscillation of the steps is not usually good. The dynamics of this confrontational training seem to be more reminiscent of the evolutionary process. These dynamics seem to depend on some random variations that make the pieces slightly mixed, but, once there is some mixing, the gradient drop can push it further forward.
The argument behind this interpretation is that training is not always successful. At N = 8, 3 out The training diagram in Figure 9 does not appear to be a typical success result of a neural network. In this process, from the beginning to the end oscillation of the steps is not usually good. The dynamics of this confrontational training seem to be more reminiscent of the evolutionary process. These dynamics seem to depend on some random variations that make the pieces slightly mixed, but, once there is some mixing, the gradient drop can push it further forward.
network. In this process, from the beginning to the end oscillation of the steps is not usually good. The dynamics of this confrontational training seem to be more reminiscent of the evolutionary process. These dynamics seem to depend on some random variations that make the pieces slightly mixed, but, once there is some mixing, the gradient drop can push it further forward.
The argument behind this interpretation is that training is not always successful. At N = 8, 3 out of 10 initial runs failed, and none of these failures caused a reconstruction error of S at the 0.05 threshold, nor did the B reconstruction error exceed 7.3 bits (8 bits). To test the robustness of the other 14 A/B combinations, we conducted five trainings on B and obtained B reconstruction errors from 4.67 to 6.97 with an average of 6.1. If we arbitrarily define success as keeping the server at 0.05-bit reconstruction error and asking B to have at least six errors, on average, half of the training is successful (5 of 10 cases). In the future, it may be worthwhile to study the effects of small batch sizes, as well as other parameters such as learning rates. The argument behind this interpretation is that training is not always successful. At N = 8, 3 out of 10 initial runs failed, and none of these failures caused a reconstruction error of S at the 0.05 threshold, nor did the B reconstruction error exceed 7.3 bits (8 bits). To test the robustness of the other 14 A/B combinations, we conducted five trainings on B and obtained B reconstruction errors from 4.67 to 6.97 with an average of 6.1. If we arbitrarily define success as keeping the server at 0.05-bit reconstruction error and asking B to have at least six errors, on average, half of the training is successful (5 of 10 cases).
Although training with opponents is often unstable (Salimans et al., 2016), some additional engineering of neural networks and their training might increase the overall success rate. For example, with only 512 small batches, our success rate is only 1/3 (we use a small batch size of 200). In the future, it may be worthwhile to study the effects of small batch sizes, as well as other parameters such as learning rates.
For a successful training, we studied the changes in encrypted facial features caused by different facial features/key pairs. Although we did not perform a detailed analysis of the human feature encryption method, we did make some observations. First, it is key-dependent: changing the key and keeping the result of the face feature constant leads to different encrypted face feature outputs. It is also a necessary for successful communication. However, it is more than just XOR. In particular, the output values are usually floating point values, not 0 and 1. In addition, the effect of changes to the key or face feature bits propagates across multiple elements in the encrypted face feature, rather than being limited to a single bit as in XOR. In a key, the flipping of a unit usually results in a significant change in 3-6 of the 8 elements in the encrypted face feature, and the other elements will change less.
To evaluate the merits of a cryptographic algorithm, we should first consider whether it meets certain security contraints, and then, the efficiency of the algorithm and the key distribution and confidentiality. Deciphering complexity analysis (Table 5): Let us first analyze the complexity of deciphering the ciphertext. First, we must know the length of the ciphertext, so that we can get the network sample. The decipher cannot know the dimension of the neural network and the number of samples. To calculate only in the simplest case, only the confrontation network is used. We assume that the total number of network samples is n. Considering that there are at least two samples of the network, the sample dimension of the network should be between 2 and [n/2]. Since this study does not use the stability point of the WGAN network, the cracker cannot know the number of times our sample needs to be iterated, so we think that the calculation is very expensive and unsolvable in a limited time without knowing the key. For the parameters of the network, we transmit several pairs of such sequences, which are continuous during transmission. These parameters are the starting position of the network, the network dimension, and the number of iterations required for the sample to generate the key. If the transmission is dispersed, the parameters of the network are greatly increased. Equal error rate (EER) is a biometric security system algorithm used to predetermine the threshold values for its false acceptance rate and its false rejection rate. When the rates are equal, the common value is referred to as the equal error rate. The value indicates that the proportion of false acceptances is equal to the proportion of false rejections. The lower is the equal error rate value, the higher is the accuracy of the biometric system. When the key is not known by the impostor, the EER is as showm in Table 6. When the key has been hacked by the impostor, EER is as shown in Table 7. Table 6. When the key is not known by the impostor, equal error rate.

Conclusions and Future Work
This paper uses neural networks to learn to protect the communication between face features and servers. Learning is not limited to a specific set of cryptographic algorithms: it is based solely on the privacy rules that the training objectives represent. The attacker is simulated by a neural network. In addition, this study might be able to implement the process of replacing analog encrypted data with reinforcement learning. Finally, we conclude that neural networks are useful not only for face feature protection, but also for attacks. Although neural networks seem unlikely to be good at protecting facial features (3 failures in 10 initial runs), they may be very effective in understanding metadata and traffic analysis.
Using multi-threaded code will greatly speed up the model efficiency because separate weighted sum, neuron, multiply, or add operations can be performed on separate threads. However, even without multithreading, on a processor with n cores, the study can run n repeated models on n input instances and expect a flattening cost of n = 1. Implementing an efficient matrix multiplier can also improve batch processing time and may allow for faster convolution operations. In addition to multithreading, this study is also dedicated to designing virtual full adders and using GPU to accelerate existing network models. Implementing homomorphic logic gates on GPU is valuable because each of the research system's operations can be paralleled in some form. Plaintext weights are encrypted without noise to improve performance. In practice, the latest deep learning model uses many additional features, such as residual network modules or a self-normalization activation function. We should be able to extend our encryption system with most of the functions. This study will also do some work to optimize the deep learning encryption model of embedded hardware, review the problem scenario of this study and create a powerful analysis of execution. Deep learning solutions could ensure user privacy. Over time, this may lead to a privacy-centric software development industry that coexists with the expanding role of big data.
In this work, we demonstrate the problem of weight reduction in WGAN-E and introduce an alternative in the form of a penalty clause in the critics' loss, which does not present the same problem. Using our approach, we demonstrate powerful modeling performance and stability across various architectures. Another interesting direction is to adapt the penalty term to the standard GAN objective function, which encourages discriminators to learn smoother decision boundaries and stabilize training.
In fact, we considered using AlexNet, VGG series, and ResNet series to conduct ablation experiments for the selection of neural networks A, B, and Server in the experiment, and compare their performance in the experiment. However, no matter how delicately the facial features are extracted or how accurate the encryption and decryption are, the former two are not comparable to ResNet.
The server that decrypts the neural network does not guarantee that 100% of the decryption is correct. This is a small defect in this system. However, when the data volume base is quite large, such as the LFW dataset used in this study, Server and A learn each other slowly and finally achieve decryption accuracy of about 98.7%. In real life, commercial face recognition systems have difficulty accurately identifying non-Caucasian people or twins, and the recognition accuracy is also affected by light (e.g., when the light is strong, it is difficult for the Alipay face scan payment system to recognize the face). However, comparing the face recognition accuracy error caused by the above reasons, we think that a slight error in the decryption accuracy of Server is acceptable. In actual use, if a face is canned many times or a commercial face database is used for training, this error would be smaller.
We believe that it is necessary to discuss whether encryption and decryption affect the recognition speed of the FR system. In fact, in the process of encryption and decryption, it is substantially equivalent to the time required for the A network and the Server network with trained facial features. In actual work, this time is usually related to the depth of the network. In our system, this time is approximately equal to 0.02 s. For the entire face recognition process, this time is already quite small.