# Round-Efficient Secure Inference Based on Masked Secret Sharing for Quantized Neural Network

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Related Work on Secure Inference

#### 1.2. Our Contributions

- We provide a series of constant-round communication complexity secure protocols for QNN inference, including secure truncation, conversion, and clamping protocol. We achieve this by constructing protocols based on MSS.
- We give detailed proof of security in the semi-honest model. Concretely, our protocols are secure against one single corruption.
- The experiment shows that our protocols are practical and suitable for the high-latency network. Compared to the previous work for quantized inference, our protocols are 1.5 times faster in the WAN setting.

## 2. Preliminaries

#### 2.1. Basic Notations

#### 2.2. Threat Model and Security

**Definition**

**1**.

#### 2.3. Secret Sharing Semantics

- $\langle \xb7\rangle $-sharing: ASS among ${P}_{1}$ and ${P}_{2}$. The dealer samples random elements ${x}_{1},{x}_{2}{\in}_{R}{\mathbb{Z}}_{{2}^{\ell}}$ as the shares of x, such that $x={x}_{1}+{x}_{2}\phantom{\rule{0.277778em}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}{2}^{\ell}$ holds. The dealer distributes the shares to each party such that ${P}_{i}$ for $i\in \{1,2\}$ holds ${x}_{i}$. For simplicity, we denote ${\langle x\rangle}_{i}$ as the additive shares of ${P}_{i}$, and $\langle x\rangle :=({x}_{1},{x}_{2})$.
- $\u301a\xb7\u301b$-sharing: MSS among all parties. The dealer samples random element ${\lambda}_{x}{\in}_{R}{\mathbb{Z}}_{{2}^{\ell}}$, computes ${m}_{x}=x+{\lambda}_{x}\phantom{\rule{0.277778em}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}{2}^{\ell}$, and then shares ${\lambda}_{x}={\langle {\lambda}_{x}\rangle}_{1}+{\langle {\lambda}_{x}\rangle}_{2}$ among ${P}_{1}$ and ${P}_{2}$ by $\langle \xb7\rangle $-sharing. The dealer distributes the shares to each party, such that ${P}_{0}$ holds $({\langle {\lambda}_{x}\rangle}_{1},{\langle {\lambda}_{x}\rangle}_{2})$, ${P}_{1}$ holds $({m}_{x},{\langle {\lambda}_{x}\rangle}_{1})$, and ${P}_{2}$ holds $({m}_{x},{\langle {\lambda}_{x}\rangle}_{2})$. For simplicity, we denote $\u301ax{\u301b}_{i}$ as the masked shares of ${P}_{i}$, and $\u301ax\u301b:=({m}_{x},{\langle {\lambda}_{x}\rangle}_{1},{\langle {\lambda}_{x}\rangle}_{2})$.

- For linear combination $z=cx\pm dy\pm e$, the parties locally compute its shares to be $\u301az\u301b=({m}_{z},{\langle {\lambda}_{z}\rangle}_{1},{\langle {\lambda}_{z}\rangle}_{2})=(c\xb7{m}_{x}\pm d\xb7{m}_{y}\pm e,c\xb7{\langle {\lambda}_{x}\rangle}_{1}\pm d\xb7{\langle {\lambda}_{y}\rangle}_{1},c\xb7{\langle {\lambda}_{x}\rangle}_{2}\pm d\xb7{\langle {\lambda}_{y}\rangle}_{2})$.
- For multiplication $z=xy$, we denote as functionality ${\mathcal{F}}_{\mathsf{Mul}}$, then ${\Pi}_{\mathsf{Mul}}$ can be achieved as follows [19]:
- ${P}_{0}$ and ${P}_{1}$ locally sample random ${\langle {\lambda}_{z}\rangle}_{1}$ and ${\langle {\gamma}_{xy}\rangle}_{1}$ by using ${\mathcal{F}}_{\mathsf{Rand}}$;
- ${P}_{0}$ and ${P}_{2}$ locally sample random ${\langle {\lambda}_{z}\rangle}_{2}$ by using ${\mathcal{F}}_{\mathsf{Rand}}$;
- ${P}_{0}$ locally computes ${\gamma}_{xy}={\lambda}_{x}{\lambda}_{y}$ and sends ${\langle {\gamma}_{xy}\rangle}_{2}={\gamma}_{xy}-{\langle {\gamma}_{xy}\rangle}_{1}$ to ${P}_{2}$;
- ${P}_{i}$ for $i\in \{1,2\}$ locally computes ${\langle {m}_{z}\rangle}_{i}=(i-1){m}_{x}{m}_{y}-{m}_{x}{\langle {\lambda}_{y}\rangle}_{i}-{m}_{y}{\langle {\lambda}_{x}\rangle}_{i}+{\langle {\lambda}_{z}\rangle}_{i}+{\langle {\gamma}_{xy}\rangle}_{i}$;
- ${P}_{i}$ for $i\in \{1,2\}$ sends ${\langle {m}_{z}\rangle}_{i}$ to ${P}_{3-i}$, who locally computes ${m}_{z}={\langle {m}_{z}\rangle}_{1}+{\langle {m}_{z}\rangle}_{2}$.

#### 2.4. Neural Network

- The fully connected layer can be formulated as $\mathit{y}=\mathit{W}\mathit{x}+\mathit{b}$, where $\mathit{y}$ is the output of the fully connected layer, $\mathit{x}$ is the input vector, $\mathit{W}$ is the weight matrix and $\mathit{b}$ is the bias vector.
- The convolution layer can be converted into computing the dot product of the matrix and vector, and then one addition as shown in [20]; thus, it can be formulated as $\mathit{Y}=\mathit{WX}+\mathit{B}$.

#### 2.5. Quantization

## 3. The Architecture for Secure Inference

**Server**: There are three non-colluding servers in our system, denoted as ${P}_{1},{P}_{2},{P}_{3}$. Three servers can be from different companies in the real world, such as Amazon, Alibaba, and Google; any collusion will damage their reputations. Similar to prior works, we assume that all servers know the layer types, the sizes of each layer, and the number of layers. All servers perform a series of secure protocols proposed in Section 4 to execute inference tasks for users’ shared queries in a secure way.**User**: The user holds some queries as input and wants to enjoy a secure inference service without revealing both queries and inference results to others. To do so, the user uses Equation (3) to convert the query to the 8-bit integer firstly, then uses $\u301a\xb7\u301b$-sharing to split quantized queries to its masked shares before uploading to three servers, and receive the shares of inference results from three servers in the end. Note that only the user can reconstruct the final results; the privacy of both queries and inference results are protected during the secure inference.**Model Owner**: The model owner holds a trained QNN model, which includes all quantized weights of different layers along with the quantization parameters. As an important intellectual property belonging to the model owner, the privacy of the QNN model should be protected. To do so, the model owner uses $\u301a\xb7\u301b$-sharing to split quantized weights to its masked shares before deploying to three servers. Once the deployment is done, the model owner can go offline until the model owner wants to update the model.

## 4. Protocols Construction

#### 4.1. Secure Input Sharing Protocol

- $({P}_{i},{P}_{j})=({P}_{0},{P}_{k})$ for $k\in \{1,2\}$: The parties locally set ${m}_{x}={\langle {\lambda}_{x}\rangle}_{3-k}=0,{\langle {\lambda}_{x}\rangle}_{k}=-x$.
- $({P}_{i},{P}_{j})=({P}_{1},{P}_{2})$: The parties locally set ${m}_{x}=x,{\langle {\lambda}_{x}\rangle}_{1}={\langle {\lambda}_{x}\rangle}_{2}=0$.

#### 4.2. Secure Truncation Protocol

#### 4.3. Secure Conversion Protocol

#### 4.4. Secure Comparison Protocol

#### 4.5. Secure Clamping Protocol

#### 4.6. Theoretical Complexity

#### 4.7. Security Analyses

**Theorem 1**.

**Proof.**

**Theorem 2**.

**Proof.**

**Theorem 3**.

**Proof.**

**Theorem 4**.

**Proof.**

**Theorem 5**.

**Proof.**

**Theorem 6**.

**Proof.**

## 5. Quantized Neural Network Structure

## 6. Experimental Evaluation

#### 6.1. Experimental Setup

`tc`to simulate LAN and WAN. Specifically, we considered the LAN setting with 625 Mbps bandwidth and 0.2 ms ping time, and the WAN setting with 80 Mbps bandwidth and 20 ms ping time. Note that these parameters are close to the ones we use daily, proving that our solution is practical.

- We suppose that the input of the user was taken from the MNIST dataset [29], which contains 60,000 training images and 10,000 testing images of handwritten digits. Each image is represented as $28\times 28$ pixel with values between 0 and 255 in greyscale. Note that all greyscales are stored with 8-bit integers already, which eliminates the need for data type conversions.
- We assume that the model owner shared the quantized parameters of each layer among all servers. In short, quantized parameters are encoded to all layers.

#### 6.2. Experimental Results for Secure Inference

## 7. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Abbreviations

MLaaS | Machine Learning as a Service |

QNN | Quantized Neural Network |

FHE | Fully Homomorphic Encryption |

MPC | Secure Multiparty Computation |

SS | Secret Sharing |

ASS | Additive Secret Sharing |

RSS | Replicated Secret Sharing |

MSS | Masked Secret Sharing |

GC | Garbled Circuit |

OT | Oblivious Transfer |

PRF | Pseudo-Random Function |

LAN | Local Area Network |

WAN | Wide Area Network |

FP32 | 32-bit Floating-Point |

INT8 | 8-bit Integer |

ReLU | Rectified Linear Unit |

MSB | Most Significant Bit |

PPA | Parallel Prefix Adder |

## Appendix A. Correlated Randomness

## References

- Ribeiro, M.; Grolinger, K.; Capretz, M.A. MLaaS: Machine Learning as a Service. In Proceedings of the 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 9–11 December 2015; pp. 896–902. [Google Scholar] [CrossRef] [Green Version]
- Riazi, M.S.; Weinert, C.; Tkachenko, O.; Songhori, E.M.; Schneider, T.; Koushanfar, F. Chameleon: A Hybrid Secure Computation Framework for Machine Learning Applications. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security—ASIACCS ’18, Incheon, Republic of Korea, 4–8 June 2018; pp. 707–721. [Google Scholar] [CrossRef] [Green Version]
- Huang, Z.; Lu, W.J.; Hong, C.; Ding, J. Cheetah: Lean and fast secure Two-Party deep neural network inference. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022. [Google Scholar]
- Wang, Y.; Luo, Y.; Liu, L.; Fu, S. pCOVID: A Privacy-Preserving COVID-19 Inference Framework. In Proceedings of the Algorithms and Architectures for Parallel Processing, Copenhagen, Denmark, 10–12 October 2022; pp. 21–42. [Google Scholar] [CrossRef]
- European Union. General Data Protection Regulation (GDPR). 2016. Available online: https://gdpr-info.eu/ (accessed on 4 December 2022).
- Gilad-Bachrach, R.; Dowlin, N.; Laine, K.; Lauter, K.; Naehrig, M.; Wernsing, J. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 201–210. [Google Scholar]
- Mohassel, P.; Rindal, P. ABY3: A Mixed Protocol Framework for Machine Learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 35–52. [Google Scholar] [CrossRef]
- Wagh, S.; Tople, S.; Benhamouda, F.; Kushilevitz, E.; Mittal, P.; Rabin, T. Falcon: Honest-Majority Maliciously Secure Framework for Private Deep Learning. Proc. Priv. Enhancing Technol.
**2021**, 2021, 188–208. [Google Scholar] [CrossRef] - Rouhani, B.D.; Riazi, M.S.; Koushanfar, F. Deepsecure: Scalable provably-secure deep learning. In Proceedings of the 55th Annual Design Automation Conference, San Francisco, CA, USA, 24–29 June 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Mohassel, P.; Zhang, Y. SecureML: A system for scalable privacy-preserving machine learning. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; pp. 19–38. [Google Scholar]
- Riazi, M.S.; Samragh, M.; Chen, H.; Laine, K.; Lauter, K.E.; Koushanfar, F. XONN: XNOR-based oblivious deep neural network inference. In Proceedings of the 28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, 14–16 August 2019; pp. 1501–1518. [Google Scholar]
- Ibarrondo, A.; Chabanne, H.; Önen, M. Banners: Binarized Neural Networks with Replicated Secret Sharing. In Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security, Virtual, 22–25 June 2021; pp. 63–74. [Google Scholar] [CrossRef]
- Zhu, W.; Wei, M.; Li, X.; Li, Q. SecureBiNN: 3-Party Secure Computation for Binarized Neural Network Inference. In Proceedings of the Computer Security—ESORICS 2022, Copenhagen, Denmark, 26–30 September 2022; pp. 275–294. [Google Scholar] [CrossRef]
- Agrawal, N.; Shahin Shamsabadi, A.; Kusner, M.J.; Gascón, A. QUOTIENT: Two-Party Secure Neural Network Training and Prediction. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 1231–1247. [Google Scholar] [CrossRef]
- Dalskov, A.; Escudero, D.; Keller, M. Secure Evaluation of Quantized Neural Networks. Proc. Priv. Enhancing Technol.
**2020**, 2020, 355–375. [Google Scholar] [CrossRef] - Shen, L.; Dong, Y.; Fang, B.; Shi, J.; Wang, X.; Pan, S.; Shi, R. ABNN2: Secure two-party arbitrary-bitwidth quantized neural network predictions. In Proceedings of the 59th ACM/IEEE Design Automation Conference, San Francisco, CA, USA, 10–14 July 2022; pp. 361–366. [Google Scholar] [CrossRef]
- Keller, M.; Sun, K. Secure Quantized Training for Deep Learning. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 10912–10938. [Google Scholar]
- Goldreich, O. The Foundations of Cryptography—Volume 2: Basic Applications; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
- Chaudhari, H.; Choudhury, A.; Patra, A.; Suresh, A. ASTRA: High Throughput 3PC over Rings with Application to Secure Prediction. In Proceedings of the 2019 ACM SIGSAC Conference on Cloud Computing Security Workshop, London, UK, 11 November 2019; pp. 81–92. [Google Scholar] [CrossRef] [Green Version]
- Wagh, S.; Gupta, D.; Chandran, N. SecureNN: 3-Party Secure Computation for Neural Network Training. Proc. Priv. Enhancing Technol.
**2019**, 2019, 26–49. [Google Scholar] [CrossRef] [Green Version] - Guo, Y. A Survey on Methods and Theories of Quantized Neural Networks. arXiv
**2018**, arXiv:1808.04752 [cs, stat]. [Google Scholar] - Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2704–2713. [Google Scholar] [CrossRef] [Green Version]
- Ádám Mann, Z.; Weinert, C.; Chabal, D.; Bos, J.W. Towards Practical Secure Neural Network Inference: The Journey So Far and the Road Ahead, Cryptology ePrint Archive, Paper 2022/1483. 2022. Available online: https://eprint.iacr.org/2022/1483 (accessed on 4 December 2022).
- Ohata, S.; Nuida, K. Communication-Efficient (Client-Aided) Secure Two-Party Protocols and Its Application. In Proceedings of the Financial Cryptography and Data Security, Kota Kinabalu, Malaysia, 10–14 February 2020; pp. 369–385. [Google Scholar] [CrossRef]
- Patra, A.; Suresh, A. BLAZE: Blazing Fast Privacy-Preserving Machine Learning. In Proceedings of the 2020 Network and Distributed System Security Symposium, San Diego, CA, USA, 23–26 February 2020. [Google Scholar] [CrossRef]
- Kolesnikov, V.; Schneider, T. Improved Garbled Circuit: Free XOR Gates and Applications. In Proceedings of the Automata, Languages and Programming, Reykjavik, Iceland, 7–11 July 2008; pp. 486–498. [Google Scholar] [CrossRef] [Green Version]
- Zahur, S.; Rosulek, M.; Evans, D. Two Halves Make a Whole. In Proceedings of the Advances in Cryptology—EUROCRYPT 2015, Sofia, Bulgaria, 26–30 April 2015; pp. 220–250. [Google Scholar]
- Canetti, R. Universally Composable Security: A New Paradigm for Cryptographic Protocols. In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, Washington, DC, USA, 14–17 October 2001; pp. 136–145. [Google Scholar] [CrossRef] [Green Version]
- Yann, L.; Corinna, C.; Chris, B. The MNIST Dataset of Handwritten Digits. 2017. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 4 December 2022).
- Keller, M. MP-SPDZ: A Versatile Framework for Multi-Party Computation. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual, 9–13 November 2020; pp. 1575–1590. [Google Scholar]
- Katz, J.; Lindell, Y. Introduction to Modern Cryptography, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar]

**Figure 1.**The visualization of quantized function [15], where ${\alpha}_{\mathsf{min}}=-S\xb7Z$, ${\alpha}_{\mathsf{max}}=S\xb7({2}^{8}-1-Z)$.

**Figure 3.**Our QNN structure, where $\mathbb{Z}$ denotes the discrete interval ${[0,255]}_{\mathbb{Z}}$.

**Figure 4.**Performance comparison of our solution with SecureQ8 [15] for batch inference over WAN, where query means the input of the user.

Notation | Description |
---|---|

$\stackrel{\mathrm{c}}{\equiv}$ | Computationally indistinguishable |

$\kappa $ | The computational security parameter |

${P}_{j}$ | The computing party, where $j\in \{0,1,2\}$ |

$\mathit{A}$ | The tensor or matrix |

$\mathit{a}$ | The vector |

ℓ | The logarithm of the ring size |

${\mathbb{Z}}_{{2}^{\ell}},{\mathbb{Z}}_{2}$ | The integer ring and the boolean ring |

$[a,b]$ | The real interval |

${[a,b]}_{\mathbb{Z}}$ | The discrete interval $[a,b]\cap \mathbb{Z}$ |

$x{\in}_{R}D$ | Uniform random sample x from distribution D |

$\left(a{\le}_{?}b\right)$ | Return 1 if $a\le b$ holds, and 0 otherwise |

$\mathsf{Clamp}(x;a,b)$ | Set $x\leftarrow a$ if $x<a$, $x\leftarrow b$ if $x>b$, and $x\leftarrow x$ otherwise |

Scheme | Notation | ${\mathit{P}}_{0}$ | ${\mathit{P}}_{1}$ | ${\mathit{P}}_{2}$ |
---|---|---|---|---|

ASS | $\langle x\rangle :=({\langle x\rangle}_{1},{\langle x\rangle}_{2})$ | — | ${x}_{1}$ | ${x}_{2}$ |

MSS | $\u301ax\u301b:=({m}_{x},{\langle {\lambda}_{x}\rangle}_{1},{\langle {\lambda}_{x}\rangle}_{2})$ | $({\langle {\lambda}_{x}\rangle}_{1},{\langle {\lambda}_{x}\rangle}_{2})$ | $({m}_{x},{\langle {\lambda}_{x}\rangle}_{1})$ | $({m}_{x},{\langle {\lambda}_{x}\rangle}_{2})$ |

**Table 3.**The communication and round complexity of our protocols, where ℓ denotes the logarithm of the ring size, and $\kappa $ denotes security parameter. All communications are reported in a number of bits.

Protocol | Offline | Online | ||
---|---|---|---|---|

Communication | Rounds | Communication | Rounds | |

${\Pi}_{\mathsf{Mul}}$ | ℓ | 1 | $2\ell $ | 1 |

${\Pi}_{\mathsf{Share}}$ | 0 | 0 | $2\ell $ | 1 |

${\Pi}_{\mathsf{Trunc}}$ | ℓ | 1 | $2\ell $ | 1 |

${\Pi}_{\mathsf{Bit}}\mathsf{2}\mathsf{A}$ | $2\ell $ | 1 | $2\ell $ | 1 |

${\Pi}_{\mathsf{MSB}}$ | $5\kappa \ell $ | 1 | $\kappa \ell +2$ | 2 |

${\Pi}_{\mathsf{BitInj}}$ | — | — | $4\ell $ | 2 |

${\Pi}_{\mathsf{Clamp}}$ | — | — | $2\kappa \ell +12\ell +4$ | 8 |

**Table 4.**Performance comparison of our solution with other frameworks for classifying a single image from the MNIST dataset, where Top-5 accuracy means the truth label is among the first 5 outputs of the model. (*): Banners and SecureQ8 were only reported in the online phase. (**): No offline phase is required in SecureBiNN.

Framework | Quantized | Secret Sharing | Top-5 Accuracy | Runtime (s) | Communication (MB) | ||||
---|---|---|---|---|---|---|---|---|---|

LAN | WAN | ||||||||

Offline | Online | Offline | Online | Offline | Online | ||||

Chameleon [2] | FP32 | ASS | 99.0% | 1.254 | 0.991 | 4.028 | 2.851 | 7.798 | 5.102 |

Banners * [12] | Binary | RSS | 97.3% | — | 0.120 | — | — | — | 2.540 |

SecureBiNN ** [13] | Binary | RSS | 97.2% | — | 0.007 | — | 0.440 | — | 0.032 |

SecureQ8 * [15] | INT8 | RSS | 98.4% | — | 0.629 | — | 2.198 | — | 3.523 |

This work | INT8 | MSS | 98.4% | 1.018 | 0.701 | 3.279 | 1.465 | 5.982 | 3.853 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wei, W.; Tang, C.; Chen, Y.
Round-Efficient Secure Inference Based on Masked Secret Sharing for Quantized Neural Network. *Entropy* **2023**, *25*, 389.
https://doi.org/10.3390/e25020389

**AMA Style**

Wei W, Tang C, Chen Y.
Round-Efficient Secure Inference Based on Masked Secret Sharing for Quantized Neural Network. *Entropy*. 2023; 25(2):389.
https://doi.org/10.3390/e25020389

**Chicago/Turabian Style**

Wei, Weiming, Chunming Tang, and Yucheng Chen.
2023. "Round-Efficient Secure Inference Based on Masked Secret Sharing for Quantized Neural Network" *Entropy* 25, no. 2: 389.
https://doi.org/10.3390/e25020389