A Ciphertext Reduction Scheme for Garbling an S-Box in an AES Circuit with Minimal Online Time

: The secure computation of symmetric encryption schemes using Yao’s garbled circuits, such as AES, allows two parties, where one holds a plaintext block m and the other holds a key k , to compute Enc ( k , m ) without leaking m and k to one another. Due to its wide application prospects, secure AES computation has received much attention. However, the evaluation of AES circuits using Yao’s garbled circuits incurs substantial communication overhead. To further improve its efficiency, this paper, upon observing the special structures of AES circuits and the symmetries of an S-box, proposes a novel ciphertext reduction scheme for garbling an S-box in the last SubBytes step. Unlike the idea of traditional Yao’s garbled circuits, where the circuit generator uses the input wire labels to encrypt the corresponding output wire labels, our garbling scheme uses the input wire labels of an S-box to encrypt the corresponding “flip bit strings”. This approach leads to a significant performance improvement in our garbling scheme, which necessitates only 2 8 ciphertexts to garble an S-box and a single invocation of a cryptographic primitive for decryption compared to the best result in previous work that requires 8 × 2 8 ciphertexts to garble an S-box and multiple invocations of a cryptographic primitive for decryption. Crucially, the proposed scheme provides a new idea to improve the performance of Yao’s garbled circuits. We analyze the security of the proposed scheme in the semi-honest model and experimentally verify its efficiency.


Introduction 1.Yao's Garbled Circuits and Secure AES Computation
Yao's garbled circuits, introduced in [1], remain a cornerstone method for facilitating secure two-party computation tasks.This approach enables two parties to jointly compute a function without leaking their input to each other.In the traditional garbled circuits protocol, the two parties play a circuit generator and a circuit evaluator, respectively.Initially, the two parties decompose the function that they want to evaluate into a Boolean circuit, and then the circuit generator produces the wire labels and garblings (ciphertexts) for the Boolean circuit.The wire labels are used to conceal the truth values, and the garblings are used to decrypt the output wire labels.Finally, with input wire labels and garblings, the circuit evaluator can decrypt the output wire labels and learn the output of the function.During the process, the evaluator does not learn which truth values the wire labels correspond to because they are all produced by the generator, while the generator does not learn which wire labels the evaluator uses to decrypt the garblings because the generator is not involved in the decryption process.Thus, neither side knows what the other's input is.
Among the myriad applications of garbled circuits [2][3][4], the secure computation of the Advanced Encryption Standard (AES) [5] has garnered significant interest.In this setup, the evaluator, possessing a plaintext message m, can encrypt m using the generator's key k without disclosing m to the generator while remaining oblivious to k. Secure AES computation has found numerous applications [5], notably in the realms of side-channel protection, blind message authentication codes (MACs), blind encryption, third-party operations on encrypted data, etc.A particularly notable application is the construction of the oblivious pseudo-random function (OPRF), which is fundamental to a host of privacypreserving technologies, including private set intersection (PSI) [6][7][8], private information retrieval (PIR) [9,10], private keyword search (PKS) [11,12], location sharing [13], etc.
When these privacy-preserving protocols are constructed using secure AES computation, their runtime performance can be significantly improved.For example, the study in [14] demonstrates a PSI protocol based on the OPRF, which is constructed using secure AES computation, achieving the highest efficiency compared to the RSA blind signaturebased PSI (RSA-PSI), Diffie-Hellman-based PSI (DH-PSI), and Naor-Reingold PRF-based PSI (NR-PSI).However, this method incurs the most communication overhead, primarily due to the use of Yao's garbled circuits to securely compute AES.

Gaps and Motivation
The huge communication overhead and deployment difficulty are two major obstacles to the practical application of Yao's garbled circuits.Although there have been various schemes proposed to reduce the communication overhead of garbled circuits in recent years, their communication overhead is still not ideal due to the fact that (1) the essence of the garbled circuits is to use a bit string (wire label) to mask a bit, and (2) the Boolean circuit that realizes the function is complex.In addition, there are few mature frameworks that can efficiently convert the function to a Boolean circuit, and there is no systematic literature on how to implement the garbled circuit schemes, which adds challenges to the deployment of Yao's garbled circuits in real applications.
Motivation.In view of these challenges, it is important to shift attention from universal Yao's garbled circuits to a commonly used cryptographic module realized by Yao's garbled circuits, such as secure AES computation.The optimization of a specific module is much easier than the optimization of universal garbled circuits.Therefore, in this paper, we focus on how to optimize and implement secure AES computation, which is realized using Yao's garbled circuits.

Our Idea
An AES circuit needs to perform four algorithms: SubBytes, MixColumns, ShiftRows, and AddRoundKey.After extensive optimization of the circuit structure, only the Sub-Bytes step requires the transmission of ciphertexts (i.e., incurs communication overhead).The remaining steps-MixColumns, ShiftRows, and AddRoundKey-can be efficiently performed using only free XOR gates [15].
Central to the SubBytes step is the S-box, which is a critical nonlinear element to create turmoil.There are various studies on the optimization [16,17] of the S-box and its applications [18][19][20][21].For the original AES S-box, Huang et al. [22] proposed two garbling schemes aimed at minimizing the total and online times, respectively.The first scheme involves decomposing the process of the S-box into a circuit, which completes GF (2 8 ) inversion and bit transformation calculations, ultimately resulting in 58 [22] non-free gates.The second scheme treats the S-box as a "gate" with eight input and eight output wires; thus, the generator needs to garble this "gate" using eight input wire labels to encrypt eight output wire labels.This results in 8 × 2 8 ciphertexts, of which the evaluator only needs to decrypt eight (eight output wire labels).The former approach, while less communication-intensive, incurs longer online times, as the computation cannot be pre-processed and requires multi-ple invocations of cryptographic primitives.Many cryptographic primitives can be used to produce ciphertexts such as hash functions [23], fixed-key AES [24], etc.The latter, while necessitating the generation of more ciphertexts, allows for pre-processing by the generator and reduces online time by requiring only four invocations of a cryptographic primitive.
It should be noted that if we use the second idea in [22] to garble an S-box that has eight input and eight output wires, instead of using the garbling scheme in [22] where the generator uses eight input wire labels to encrypt eight corresponding output wire labels, the generator can encrypt the corresponding "flip bit string" according to the mapping law of the S-box.For example, for an S-box that maps 00111100 to 11010101, the flip bit string is 00111100 ⊕ 11010101 = 11101001.These flip bit strings are used to flip the least significant bit (lsb) of the input wire labels of the S-box.Since the final truth values of the output are determined by the lsb of the output wire labels, and there are only XOR operations between wire labels in other parts of the AES circuit, flipping the lsb of the input wire labels of the S-box can eventually produce the correct output, thus reducing the number of ciphertexts by converting the encryption of eight output wire labels into only one flip bit string.Furthermore, by introducing optimized S-box structures [16][17][18][19][20][21], where each flip bit string is almost exclusively mapped to an input byte, the security of our garbling scheme can be effectively improved without additional overhead.
Contributions.The current state of optimizing performance for garbling the AES circuit seems to be approaching its limits unless there is a significant breakthrough in the field of garbled circuits.In this paper, we leverage the unique structure of the S-box and AES circuit to propose a ciphertext reduction scheme.The contributions of this paper are summarized as follows:

•
We propose a novel garbling scheme, applicable to the 16 S-boxes in the final SubBytes step, which requires only 2 8 ciphertexts to garble each S-box, with only one call necessary for a cryptographic primitive.A comparison of the communication and computational cost between our garbling scheme and existing schemes is shown in Table 1.It is important to note that regardless of the S-box structure used, the overhead of our scheme does not change.However, the overhead of the minimal total time scheme in [22] increases because the optimized S-box has more nonlinear gates.

•
In our experiments, we avoid using any hardware description language to instantiate the AES circuit.Instead, we show how to only use the C++ class to construct the structure of the AES circuit.In order to reuse circuit units, we introduce the concept of the circuit layer, where we design a circuit layer for each algorithm in AES, and each circuit layer stores only one copy in memory.Our implementation can help researchers better understand how to deploy Yao's garbled circuits in reality.Organization.Our paper is organized as follows.In Section 2 we discuss related works on garbled circuits and secure AES computation.In Section 3, we give the necessary concepts and notations for understanding our design.In Section 4, we demonstrate our ciphertext reduction scheme for garbling an S-box, illustrating its potential extra overhead and possible extensibility, and provide the whole algorithm for garbling and evaluating the AES circuit.In Section 5, we prove the security of our scheme, and in Section 6, we show the circuit constructions and analyze the efficiency of our garbling scheme.

Related Works
Yao's garbled circuits have been at the forefront of cryptographic research, particularly in the domain of secure multi-party computation (MPC) since their introduction by Andrew Yao in the 1980s.This period has seen a burgeoning interest from the research community in refining and optimizing the performance of this pivotal technique.
The "point-and-permute" optimization, first proposed by Beaver in 1990 [26], marked a significant advancement in reducing the communication and computational overhead associated with garbled circuits.This technique leverages the last bit of the wire label, enabling the evaluator to discern which ciphertext to decrypt.This innovation not only minimizes the computational burden of decryption but also obviates the need for communicating MACs to verify decryption outcomes.Furthermore, a suite of techniques has been developed to reduce the number of ciphertexts required: the 4-to-3 garbled row reduction (GRR3) [25], 4-to-2 garbled row reduction (GRR2) [5], free XOR [15], flexible XOR [27], half-gate [23], and slicing and dicing [28] methods.Each of these technologies contributes to the goal of optimizing communication efficiency within garbled circuits, as detailed in Table 2. AES is a globally adopted symmetric encryption standard known for its efficiency and security.It operates on block sizes of 128 bits with key sizes of 128, 192, or 256 bits, executing several rounds of transformation to securely encrypt plaintext into ciphertext.When combined with garbled circuits, secure AES computation offers expansive application potential in privacy-preserving fields.
Initially introduced by Pinkas et al. [5], secure AES computation has been incorporated into various secure multi-party computation (MPC) frameworks, including Fairplay [29], Obliv-C [30], and TASTY [31].It continues to stand as a prominent benchmark for evaluating MPC systems.
Its appeal arises from its potential for various applications, including the construction of the OPRF, a critical component in cryptographic operations.Although the OPRF derived using secure AES computation is less efficient than that constructed via oblivious transfer (OT) [32,33], it offers a unique advantage.This advantage stems from the inherent efficiency of AES encryption.If a privacy-preserving protocol employing secure AES computationbased OPRF could utilize AES encryption somewhere, its efficiency could be significantly enhanced.This point has been confirmed in the work of Kiss et al. [14].The authors compared the performance of various kinds of PSI protocols constructed using different schemes, as detailed in Table 3, where the GC-PSI protocol is constructed using secure AES computation-based OPRF.The findings suggest that although the PSI protocol built using secure AES computation-based OPRF is less efficient in the base phase for the secure evaluation of AES circuits, it is the most efficient from a global perspective.Wire labels in the garbled circuits are denoted as W b i , where i represents the index of the wire and b signifies the binary value of 0 or 1.For wire i, W 0 i and W 1 i correspond to the false and true labels, respectively.Each gate shares the same index with its output wire.Additionally, wire labels can be also represented by a capital letter along with its least significant bit (lsb), facilitating the exposition of some concepts.For example, (A,1) indicates that the wire label is A and its lsb is 1.The concatenation of two wire labels is denoted by ||, and || m i {0,1,2...n} W denotes the concatenation of multiple labels "← $" denotes the random sampling, and "←" denotes the value assignment.

Garbled Circuit
The basic two-party garbled circuit evaluation scheme involves a generator and an evaluator.The generator is responsible for producing wire labels and ciphertexts for the circuit, while the evaluator uses wire labels in hand to decrypt ciphertexts according to the circuit's topology.The security of the scheme is based on the idea that the generator does not learn which wire labels the evaluator holds, and the evaluator does not learn the truth value that the wire labels represent.
The process for the generator to garble a Boolean circuit is as follows: The generator randomly samples two labels, W 0 and W 1 , for each wire, representing bits 0 and 1, respectively.For a binary gate g with input wires i and j and output wire k, the generator arranges the ciphertexts as follows: The equation means that the generator encrypts the output wire label using the corresponding combination of input wire labels.For example, for an AND gate with input value (0, 1), the output value should be 0; thus, the generator encrypts W 0 k using W 0 i and W 1 j , where the input wires are i and j and the output wire is k.In this way, the generator generates four ciphertexts successively for each gate according to the topology order of the circuit.
A universal representation of the garbling scheme can be derived from [34], where a garbling scheme is denoted as a five-tuple of algorithms G =(Gb, En, De, Ev, ev), as shown in Figure 1.
The function Gb maps f and k to (F, e, d), where (F, e, d) are the stings that represent the encoding garbled function, the encoding function, and the decoding function.Possession of e and x allows one to compute the garbled input X = En(e, x); F and X enable the calculation of the garbled output Y = Ev(F, X); and d and Y allow for the recovery of the final output value y = De(d, Y), which must be equal to ev( f , x).

Free XOR Gate
The free XOR technique, initially proposed in [15], eliminates the need for a ciphertext to evaluate an XOR gate, significantly reducing the communication overhead of garbled circuits.
The idea of free XOR is based on the observation that it is unnecessary to randomly generate false (representing bit 0) and true (representing bit 1) labels on a wire.Instead, the false and true labels on a wire can establish a relationship such that the false label = the true label ⊕ the offset value.The offset value is globally present and is secretly kept by the generator.
As depicted in Figure 2, A and B denote the false labels on their respective wires, and ∆ represents the global offset value.Thus, the corresponding true labels on their respective wire are A ⊕ ∆ and B ⊕ ∆.The false label on the output wire is computed as Similarly, the true label on the output wire is C ⊕ ∆.When the generator produces wire labels for the entire circuit, it only needs to randomly sample a false label for every input wire and a global offset value ∆, and the false label of other wires can be computed gate by gate according to the topology of the circuit.This method of generating wire labels allows the evaluator, holding two input labels of an XOR gate, to simply XOR the two labels to compute the output wire label without decrypting any ciphertext.The correctness is demonstrated in Table 4.

Reusable Circuit Layers in the AES Circuit
There are primarily four algorithms involved in encrypting one plaintext block using AES: SubBytes, ShiftRow, MixColumn, and AddRoundKey.The KeyExpansion algorithm can be processed locally by the generator.
The circuit design of each of these algorithms has been extensively studied [22].However, to the best of our knowledge, there is no existing literature on how to reuse the circuit units in the AES circuit to generate a garbled circuit.Although some automated compilation tools have been proposed [29,31], the structure of the auto-compiled circuit is not the simplest.This underscores the significance of manually designing circuit structures and understanding how to reuse circuit units.
To enable the reuse of circuit units, we propose designing a circuit layer for each algorithm and setting an input and output 128-bit register at both ends of the circuit layer call area.In the AddRoundKey layer, an additional 128-bit register is needed for the generator to input the round key.
As depicted in Figure 3, each circuit layer is stored in memory only once and is directly connected between the input register and the output register when it needs to be called.After evaluating a circuit layer, the wire labels are transmitted from the output register to the input register.By reusing circuit layers, the memory usage of both parties can be greatly reduced.

Intuitive Description
Initially, we recall how the evaluator computes the values on the output wires of a garbled circuit.The origin input x is transformed into input wire labels through X ←En( ê, x).Subsequently, the output wire labels are computed as Y ← Ev( F, X).This process involves the evaluator decrypting the ciphertext F of the garbled circuit using the input wire labels X.Finally, holding the output wire labels Y and the decoding vector d, the evaluator can compute the output values as y ← De( d, Y).The decoding vector d usually comprises the lsbs of the false wire labels corresponding to each output wire and is transmitted from the generator to the evaluator at the beginning of the protocol.After acquiring all the output wire labels by evaluating the garbled circuit, the evaluator can compute the output value y by comparing the lsbs of the output wire labels with the corresponding bits in d.If the lsb of an output wire label matches the corresponding bit in d, it means that this wire label represents the value 0; otherwise, it represents the value 1.
For example, let us assume there is only one output wire, with the false wire label (A, 1) (where 1 is the lsb) and the true wire label (A ⊕ ∆, 0).The generator sends the lsb (1) of the false wire label (A) as the decoding vector d to the evaluator.When the evaluator computes an output wire label (A, 1) and finds that the lsb of A matches the bit in d, it learns that A represents the value 0. Conversely, if the evaluator computes an output wire label (A ⊕ ∆, 0) and finds that the lsb of the output wire label does not match the bit in d, it learns that the computed output wire label represents the value 1.
An important observation here is that if the lsb of the output wire label is flipped but the bit in d remains unchanged, the evaluator will output the opposite value.For instance, if the output wire label computed by the evaluator is (A, 0) (assuming the lsb of A is flipped during the evaluation) instead of (A, 1), but the bit in d remains 1, the evaluator will think that it is acquiring a true wire label and output value 1, even though A represents a false wire label from the generator's perspective.
This example illustrates a key point: if the evaluator ultimately obtains an output value of 1 (0), it is not necessary for them to acquire the true (false) label.By flipping the lsb of the output wire label, the correct output value can still be achieved.Therefore, for an S-box with eight input wires and eight output wires, the generator does not need to encrypt the output wire labels using input wire labels like in the traditional garbled circuits protocol.Instead, it can simply encrypt eight flip bits.When the evaluator decrypts these flip bits using the input wire labels they possess, they use these flip bits to flip the lsbs of these input wire labels.This way, the evaluator can still output the correct values, even without the correct true or false label.
If using this method to garble an S-box, the generator can produce ciphertexts as follows: Consider a possible input value of the S-box 0 × 10, and the output will be S(0 × 10)=0× CA.The corresponding binary representations are 0 × 10 = 0b00010000 and 0 × CA = 0b11001010.Thus, the bits that need to be flipped are 00010000 ⊕ 11001010 = 11011010, where 1 denotes the need for flipping and 0 denotes no flipping.As depicted in Figure 4, for the input value 0x10, the generator knows the evaluator will hold the input wire label combination {A, B, C, D ⊕ ∆, E, F, G, H}.Thus, the generator uses A||B||C||D ⊕ ∆||E||F||G to encrypt the flip bit string ( f bs) 11011010 (all possible f bs are shown in Appendix A Figure A1).Similarly, the generator can produce ciphertexts for all possible input wire label combinations, resulting in a total of 2 8 ciphertexts for an S-box.The evaluator only needs to decrypt one of them.It is important to note that in the AES circuit, except for the S-box, the rest of the gates are free XOR gates, where the evaluator only needs to perform the XOR operation between wire labels locally and does not need to decrypt any ciphertext.Thus, the behavior of flipping lsbs does not affect the circuit evaluation.The encryption scheme is drawn from [23].It is important to note that the length of the output H is σ, and the length of the f bs is 8. Thus, we need a reversible injective mapping pad: (0, 1) 8 ↔ (0, 1) σ , and we use H to encrypt pad( f bs).However, for the convenience of presentation, in the description of the scheme, we omit the process of mapping the f bs to (0, 1) σ .

S-box
Flip bits Garbling bits (r=r0r1r2r3r4r5r6r7) Garbling an S-box with its input wire labels.
However, the f bs will lead to a significant risk of information leakage due to the public mapping mode of the S-box.For example, if the evaluator decrypts the f bs = 11011010, it can promptly deduce that the input value is 0 × 10 and the output value is 0 × CA.This inference is feasible because the f bs almost exclusively corresponds to the input and output values.
To address this concern, the generator needs to conceal the real f bs and let the evaluator decrypt a garbled f bs.The detailed process is as follows: The generator randomly samples an 8-bit string r, which we refer to as the garbling bits, and records each bit of r on the corresponding wire.Then, for all 2 8 possible f bss, the generator XORs r and f bs to obtain the garbled f bs * and uses corresponding input wire label combinations to encrypt f bs * .As shown in Figure 4, for the f bs = 11011010, the generator encrypts f bs * = 11011010 ⊕ r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 using the corresponding wire label combination: A||B||C||D ⊕ ∆||E||F||G.As a result, when the evaluator holds this input wire label combination, it will decrypt the 11011010 ⊕ r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 instead of the real f bs = 11011010.Although every possible f bs is garbled by the same r, based on the security of the free XOR gate scheme, the evaluator can only decrypt one of them, effectively preventing the leakage of the original f bs.
To ensure the correctness of the final output, the generator transmits the garbling bits to the final output wire according to the circuit's topology.Finally, instead of sending the original d to the evaluator, the generator sends d * = d ⊕ r * , where r * represents the garbling bits that are finally recorded on the output wire.Holding d * , the evaluator can still output the correct value as y ← De( d * , Y).
Unfortunately, our garbling scheme for S-boxes can only be applied in the final Sub-Bytes layer among the 10 SubBytes layers involved in the AES-128 circuit.This limitation arises from the fact that flipping the lsbs in the S-boxes of the preceding SubBytes layers would adversely impact the decryption process in subsequent SubBytes layers.
Despite this restriction, the S-box garbling scheme above ensures a reduction in the number of ciphertexts from 8 × 16 × 2 8 to 16 × 2 8 , specifically in the final SubBytes layer while maintaining minimal online time.It should be noted that a SubBytes layer consists of 16 S-boxes.

Garbling Scheme for the Final SubBytes Layer
Our garbling scheme for AES circuits is shown in Figure 5.For each gate i (i is also its output wire), the GateInputs( f , i) function returns its input wire indices.For each S-box i, the GateInputs( f , i) function returns eight input wire indices, and the GateOutputs( f , i) function returns eight output wire indices.r i denotes the garbling bit recorded on wire i.
Here, we mainly describe the GbSbox algorithm in detail.In the initial step, the generator randomly samples garbling bits r i for each input wire i.For every conceivable input (i.e., m 0 − m 255 ), the generator first computes the f bs and garbles it by XORing the f bs with garbling bits recorded on the wire.Then, the generator encrypts these garbled f bss using the corresponding combination of input wire labels.Finally, the generator uses the point-and-permutation technique to sort the order of ciphertexts.The transmission of garbling bits is the same as the evaluation of a gate.For example, for an XOR gate with input wires a and b and output wire c, r c = r a ⊕ r b (there is no AND gate in the AES circuit).After transmission of garbling bits from the S-boxes to the output wire, the generator produces the decoding vector d, where for the output wire i,   , , , , , , }  ( , )   { , , , , , , , }  ( , ) , . . ., (0, 1) 8 ↔ (0, 1) σ .

Discussions on the Additional Cost and Universality
The only additional cost in our S-box garbling scheme is that the generator needs to sample some garbling bits, record them, and transmit them to the output wire.In fact, compared to the overhead incurred by the generator to produce the ciphertexts for the AES circuit, this additional cost is almost negligible.
Apart from the S-box, our garbling scheme can also be applied to other combinational circuits.However, whether it is worthwhile depends on the depth of the combinational circuit and the number of input and output wires.There are two scenarios where our scheme becomes impractical: (1) when the combinational circuit inherently contains few AND gates and (2) when the combinational circuit has too many input wires.In the first scenario, the cost of evaluating gate by gate is minimal, rendering the integration of the combinational circuit unnecessary.In the second scenario, an excessive number of input wires results in a substantial number of ciphertexts, making it impractical.An S-box has eight input wires, resulting in 2 8 ciphertexts, which is acceptable.However, if we decompose the S-box into various gates for evaluation, it may contain more than 58 AND gates, leading to a high cost of evaluating gate by gate.Therefore, our scheme is highly suitable for garbling S-boxes.Random oracle (RO) is an ideal hash function that maps (0, 1) * → (0, 1) σ .For any distinct query, RO outputs a random σ−bit string, which cannot be predicted.For the hash function we used, H(t 0 ⊕ a∆||t 1 ⊕ b∆||...||t 7 ⊕ h∆) must be indistinguishable from a random σ-bit string for any randomly chosen values of {t 0 , t 1 ...t 7 }, {a, b, c...h}.

Semi-Honest Model
The adversary in the semi-honest model strictly follows the protocol but tries to learn more information from the message it receives.For a protocol π: (x, y) → ( f P 0 (x, y), f P 1 (x, y)), P 0 inputs x and outputs f P 0 (x, y), while P 1 inputs y and outputs f P 1 (x, y).The security for party P i can be proven if there is a simulator that can use P 1−i 's input and output to compute the complete view of P i .That is to say, adversary A cannot distinguish the distribution of the simulator's view View π S (x, y) and the real party P i 's view View π P i (x, y) in polynomial time, except for a negligible possibility.View π P i (x, y) consists of its input, output, randomness, and all messages it receives.

Security Analysis of Our Garbling Scheme for S-Boxes
Intuitively speaking, the security of our garbling scheme is totally based on the security of free XOR [15].The only difference is that we use input wire labels to encrypt garbled f bss instead of output wire labels.We now analyze whether there is any information leakage when the evaluator decrypts a garbled f bs.Based on the security analysis in [15], the evaluator can only decrypt one of the ciphertexts, while the rest of the ciphertexts look random in its view.Therefore, the real f bs is equal to being encrypted by a one-time pad r, and the evaluator cannot learn what the real f bs is.
In the following, we demonstrate the simulation process for our garbling scheme.Instead of showing the simulation of the whole garbling scheme, we mainly focus on the process of garbling the S-box, where the evaluator inputs the eight input wire labels of the S-box and the generator inputs 2 8 real f bss.Finally, the evaluator outputs a garbled f bs, and the generator outputs nothing.For the security requirement, the evaluator cannot learn the real f bs.Thus, we only need to simulate the view of the evaluator.
The simulator S randomly samples 2 8 − 1 σ-bit strings as the rest of the garblings for the S-box.Upon receiving the input wire labels and the garbled f bss from the evaluator, S encrypts the garbled f bss using the input wire labels.S arranges the ciphertexts in the corresponding position according to the lsbs of the input wire labels and outputs garblings F S such that View π S (x, y) = F S .
Theorem 1.For any probabilistic polynomial-time (PPT) adversary A: where ε is a negligible function and σ is the security parameter.
Proof.First, the encrypted garbled f bss are the same in the views of both S and the evaluator.Furthermore, H(t 0 ⊕ a∆||t

Experiment
In this section, we experimentally implement our garbling scheme.Our platform is an R7-7840HS at 3.80GHz running on Windows 11.We write our codes in C++.We do not use any hardware description language to instantiate the AES circuit.Instead, we construct the topology structure of the AES circuit using C++ objects.To reuse the circuit units, we use the circuit layer model introduced in Section 3.4.

Gate Class
In our implementation, we create a Gate class to maintain all gate objects.Each gate object contains an output wire and a number of input wires, and the gate shares the same index with its output wire.For a gate with only one input wire, the value on the output wire equals that on the input wire.For a gate with more than one input wire, the value on the output wire equals the XOR of the values of all input wires.

AES Circuit
We demonstrate the construction of the MixColumn, ShiftRow, AddRoundKey, and SubBytes layers in detail.As shown in Figure 3, each circuit layer is connected between an input register and an output register.These registers are composed of 128 gate objects.Each group of eight gates, sequentially arranged from 0 to 127, constitutes a byte within the AES state matrix, as illustrated in Figure 6.

ShiftRow Layer
The ShiftRow operation only involves the conversion of positions between individual bytes, so we only need to concatenate the gate in the input register with the corresponding gate in the output register according to the shift rule.
Here, we give the concatenation rule (Figure 7) between the input register and the output register.igate[i] denotes the i-th gate of the input register, ogate[i] denotes the i-th gate of the output register, and ← denotes the concatenation.
where the computation of a single column is Here, the multiplication is over GF (2 8 ).The computation above is equivalent to Furthermore, the operation 02 that multiplies a byte x can be divided into where {02} • x = y, x = x 7 x 6 x 5 x 4 x 3 x 2 x 1 x 0 , y = y 7 y 6 y 5 y 4 y 3 y 2 y 1 y 0 .
After the decomposition of the computation, the MixColumn layer can be executed only by the XOR gate.We show the MixColumn layer in Figure 8, where Xtimes executes the {02} • x computation, and 5wayXOR is a gate with five input wires, executing the XOR operation.The construction of Xtimes and 5wayXOR is shown in Figure 9.

AddRoundKey Layer
The core computation of the AddRoundKey layer is the XOR operation between two bytes.Except for the input register, which stores the state matrix, an extra 128-bit register is needed for the generator to input the round key.It is important to note that the key expansion algorithm can be executed locally by the generator.The AddRoundKey layer is shown in Figure 10.

Performance Evaluation
In our experimental setup, we use σ = 256-bit wire labels and instantiate H as SHA-256 (the output length is 256 bits).The performance of our garbling scheme is shown in Table 5, which shows the respective runtimes of the two parties in each circuit layer.We disregard the cost of OTs for the wire label transmission from the generator to the evaluator.On the generator's side, the main cost includes two parts: (1) producing the output wire labels for each XOR gate, which includes the XOR operation between the input wire labels and offset value, and (2) producing the ciphertexts for the S-boxes.Both parts can be executed completely offline.On the evaluator's side, the main cost in the AddRoundKey, ShiftRow, and MixColumn layers is the XOR operation between the input wire labels to compute the output wire labels.In the SubBytes layer, the main cost is the hash function invocations, which must be executed online.Therefore, fewer hash calls lead to better online performance of the protocol.
We also compare our garbling scheme with schemes in previous works (the data are derived from Table 1 in [22]).The results (Table 6) suggest that our garbling scheme has minimal online time due to the fewer calls for the hash function.However, the overall time increases, which we believe is mainly because we do not use any hardware description language to instantiate the AES circuit.Furthermore, since it is impossible to reproduce the scheme in [22], and the implementation platform is also different, this comparison can only be used as a general reference to show that the runtime performance of the proposed scheme is comparable with the state of the art.

Conclusions
In conclusion, taking into account the special structure of the S-box and the AES circuit, this paper proposes a garbling scheme for S-boxes in the final SubBytes layer, which further reduces the ciphertext size of secure AES computation.Compared to the best result in previous works, which requires 2048 ciphertexts and 4 hash calls (minimal online time) or 116(174) ciphertexts and 116(58) hash calls (minimal total time), our garbling scheme only requires 256 ciphertexts and 1 hash call.In addition, if the optimized S-box structure is used instead of the original AES S-box to enhance security, it would increase the number of ciphertexts required by the minimum total time scheme, unlike our scheme.
In our implementation, we introduce the circuit layer model to reuse circuit units in the AES circuit, where each algorithm is designed into a circuit layer, and only one copy is stored in memory.Finally, we demonstrate the construction of each circuit layer and experimentally evaluate the performance.The experimental data show that our garbling scheme achieves better online performance compared to schemes in previous works.However, the non-optimal overall time may be due to the fact that we did not use any hardware description language to implement the AES circuit.
Figure A1 shows the corresponding f bs for each possible input of the AES S-box.The red text indicates that the input and f bs are injective, and the number characterizes the security of our garbling scheme.

Figure 2 .
Figure 2. How to produce wire labels in the free XOR scheme. i

Figure 3 .
Figure 3. Structure for reusing circuit layers in the AES circuit.

Table 1 .
Performance comparison between our scheme and existing schemes (single S-box).

Table 3 .
[14]ime performance of different kinds of PSI protocols.The experimental data were obtained from[14], with all other parameters kept constant.
1 ⊕ b∆||...||t 7 ⊕ h∆) is indistinguishable from a random σ-bit string for any randomly chosen values of {t 0 , t 1 ...t 7 }, {a, b, c...h}.Thus, the rest of the garblings F in View π E (x, y) are indistinguishable from the corresponding garblings F S in View π S (x, y).Finally, we can conclude that View π S (x, y) is indistinguishable from View π E (x, y).

Table 5 .
The runtime performance of our garbling scheme for AES circuits.

Table 6 .
Comparison between our garbling scheme and schemes in previous works.