# Effects of JPEG Compression on Vision Transformer Image Classification for Encryption-then-Compression Images

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Preparation

#### 2.1. Vision Transformer

#### 2.2. Previous Classification Method for EtC Images through Encrypted ViT Model

## 3. Evaluation of JPEG-Compression Effects on the Classification Results

#### 3.1. Overview

#### 3.2. Image Encryption

- Step i-1:
- Divide an input image into main blocks, and further divide each main block into sub blocks.
- Step i-2:
- Translocate sub blocks within each main block using ${K}_{1}$.
- Step i-3:
- Rotate and flip each sub block using ${K}_{2}$.
- Step i-4:
- Apply a negative–positive transformation to each sub block using ${K}_{3}$.
- Step i-5:
- Normalize all pixels.
- Step i-6:
- Shuffle the R, G, and B components in each sub block using ${K}_{4}$.
- Step i-7:
- Translocate main blocks using ${K}_{5}$.
- Step i-8:
- Integrate all of the sub and main blocks.

- H and W: the height and width of an image.
- $\mathit{x}\in {\{0,1,\cdots ,255\}}^{H\times W\times 3}$: an input image.
- ${S}_{\mathrm{mb}}$ and ${S}_{\mathrm{sb}}$: the main-block and sub-block sizes.
- ${N}_{\mathrm{mb}}$: the number of main blocks.
- ${N}_{\mathrm{sb}}$: the number of sub blocks within each main block.
- ${\mathit{x}}_{\mathbf{mb}}\in {\{0,1,\cdots ,255\}}^{{N}_{\mathrm{mb}}\times {S}_{\mathrm{mb}}\times {S}_{\mathrm{mb}}\times 3}$: an image after main-block division, called a main-block image.
- ${\mathit{x}}_{\mathbf{sb}}\in {\{0,1,\cdots ,255\}}^{{N}_{\mathrm{mb}}\times {N}_{\mathrm{sb}}\times {S}_{\mathrm{sb}}\times {S}_{\mathrm{sb}}\times 3}$: an image after sub-block division, called a sub-block image.
- ${{\mathit{x}}^{\prime}}_{\mathbf{sb}(\mathbf{\gamma})}\in {\{0,1,\cdots ,255\}}^{{N}_{\mathrm{mb}}\times {N}_{\mathrm{sb}}\times {S}_{\mathrm{sb}}\times {S}_{\mathrm{sb}}\times 3}$: an image after the $\gamma $-th operation in sub-block encryption, where $\gamma \in \{1,2,3,4,5\}$.
- ${{\mathit{x}}^{\prime}}_{\mathbf{sb}}\in {\{0,1,\cdots ,255\}}^{{N}_{\mathrm{mb}}\times {N}_{\mathrm{sb}}\times {S}_{\mathrm{sb}}\times {S}_{\mathrm{sb}}\times 3}$: an image after main-block encryption.
- ${{\mathit{x}}^{\prime}}_{\mathbf{mb}}\in {\{0,1,\cdots ,255\}}^{{N}_{\mathrm{mb}}\times {S}_{\mathrm{mb}}\times {S}_{\mathrm{mb}}\times 3}$: an image after sub-block integration.
- ${\mathit{x}}^{\prime}\in {\{0,1,\cdots ,255\}}^{H\times W\times 3}$: an image after main-block integration, i.e., an EtC image.
- ${x}_{\mathrm{sb}}(m,s,h,w,c)$, ${x}_{\mathrm{sb}(\gamma )}^{\prime}(m,s,h,w,c)$, and ${x}_{\mathrm{sb}}^{\prime}(m,s,h,w,c)$: pixel values in ${\mathit{x}}_{\mathbf{sb}}$, ${{\mathit{x}}^{\prime}}_{\mathbf{sb}(\mathbf{\gamma})}$, and ${{\mathit{x}}^{\prime}}_{\mathbf{sb}}$, respectively.
- -
- $m\in \{1,2,\cdots ,{N}_{\mathrm{mb}}\}$: a main-block number.
- -
- $s\in \{1,2,\cdots ,{N}_{\mathrm{sb}}\}$: a sub-block number in the m-th main block.
- -
- $h\in \{1,2,\cdots ,{S}_{\mathrm{sb}}\}$: a position in the height direction in the s-th sub block.
- -
- $w\in \{1,2,\cdots ,{S}_{\mathrm{sb}}\}$: a position in the width direction in the s-th sub block.
- -
- $c\in \{1,2,3\}$: a color-channel number.

#### 3.2.1. Sub-Block Translocation

#### 3.2.2. Block Rotation and Block Flipping

#### 3.2.3. Negative–Positive Transformation

#### 3.2.4. Normalization

#### 3.2.5. Color Component Shuffling

#### 3.2.6. Main-Block Translocation

#### 3.3. Model Encryption

- Step m-1:
- Transform $\mathbf{E}$ to obtain ${\mathbf{E}}_{\mathbf{sb}}\in {\mathbb{R}}^{{N}_{\mathrm{sb}}\times {S}_{\mathrm{sb}}\times {S}_{\mathrm{sb}}\times 3\times D}$.
- Step m-2:
- Translocate indices in the first dimension of ${\mathbf{E}}_{\mathbf{sb}}$ using ${K}_{1}$.
- Step m-3:
- Translocate indices in the second and third dimensions of ${\mathbf{E}}_{\mathbf{sb}}$ using ${K}_{2}$.
- Step m-4:
- Flip or retain the signs of the elements in ${\mathbf{E}}_{\mathbf{sb}}$ using ${K}_{3}$.
- Step m-5:
- Translocate indices in the fourth dimension of ${\mathbf{E}}_{\mathbf{sb}}$ using ${K}_{4}$.
- Step m-6:
- Transform ${\mathbf{E}}_{\mathbf{sb}}$ into the original dimension of $\mathbf{E}$ to derive ${\mathbf{E}}^{\prime}\in {\mathbb{R}}^{(3\xb7{S}_{\mathrm{mb}}\xb7{S}_{\mathrm{mb}})\times D}$.
- Step m-7:
- Translocate rows in ${\mathbf{E}}_{\mathbf{pos}}$ using ${K}_{5}$ to obtain ${\mathbf{E}}_{\mathrm{pos}}^{\prime}\in {\mathbb{R}}^{({N}_{\mathrm{mb}}+1)\times D}$.

#### 3.3.1. Index Translocation in the First Dimension

#### 3.3.2. Index Translocation in the Second and Third Dimensions

#### 3.3.3. Sign Flipping

#### 3.3.4. Index Translocation in Fourth Dimension

#### 3.3.5. Row Translocation

#### 3.4. Evaluation Metrics

## 4. Experiments

#### 4.1. Experimental Setup

#### 4.2. Experimental Results

#### 4.3. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Konečný, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated learning: Strategies for improving communication efficiency. arXiv
**2016**, arXiv:1610.05492. [Google Scholar] - McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017. [Google Scholar]
- Nagamori, H.; Kiya, H. Combined Use of Federated Learning and Image Encryption for Privacy-Preserving Image Classification with Vision Transformer. arXiv
**2023**, arXiv:2301.09255. [Google Scholar] - Lou, Q.; Feng, B.; Fox, G.C.; Jiang, L. Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
- Boemer, F.; Lao, Y.; Cammarota, R.; Wierzynski, C. nGraph-HE: A graph compiler for deep learning on homomorphically encrypted data. In Proceedings of the 16th ACM International Conference on Computing Frontiers, New York, NY, USA, 30 April–2 May 2019; pp. 3–13. [Google Scholar]
- Gilad-Bachrach, R.; Dowlin, N.; Laine, K.; Lauter, K.; Naehrig, M.; Wernsing, J. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In Proceedings of the 33rd International Conference on Machine Learning (PMLR), New York, NY, USA, 19–24 June 2016; pp. 201–210. [Google Scholar]
- Kiya, H.; Aprilpyone, M.; Kinoshita, Y.; Imaizumi, S.; Shiota, S. An Overview of Compressible and Learnable Image Transformation with Secret Key and Its Applications. APSIPA Trans. Signal Inf. Process.
**2022**, 11, e11. [Google Scholar] [CrossRef] - Aprilpyone, M.; Kiya, H. Privacy-Preserving Image Classification Using an Isotropic Network. IEEE Multimed.
**2022**, 29, 23–33. [Google Scholar] [CrossRef] - Aprilpyone, M.; Kiya, H. Block-Wise Image Transformation With Secret Key for Adversarially Robust Defense. IEEE Trans. Inf. Forensics Secur.
**2021**, 16, 2709–2723. [Google Scholar] [CrossRef] - Madono, K.; Tanaka, M.; Onishi, M.; Ogawa, T. Block-wise Scrambled Image Recognition Using Adaptation Network. In Proceedings of the Workshop on Artificial Intelligence of Things (AAAI WS), New York, NY, USA, 7–8 February 2020. [Google Scholar]
- Tanaka, M. Learnable image encryption. In Proceedings of the IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), Taichung, Taiwan, 19–21 May 2018. [Google Scholar]
- Sirichotedumrong, W.; Maekawa, T.; Kinoshita, Y.; Kiya, H. Privacy-preserving deep neural networks with pixel-based image encryption considering data augmentation in the encrypted domain. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 674–678. [Google Scholar]
- Yi, F.; Jeong, O.; Moon, I. Privacy-Preserving Image Classification With Deep Learning and Double Random Phase Encoding. IEEE Access
**2021**, 9, 136126–136134. [Google Scholar] [CrossRef] - Wang, W.; Vong, C.-M.; Yang, Y.; Wong, P.-K. Encrypted Image Classification Based on Multilayer Extreme Learning Machine. Multidimens. Syst. Signal Process.
**2017**, 28, 851–865. [Google Scholar] [CrossRef] - Huang, Y.; Song, Z.; Li, K.; Arora, S. Instahide: Instance-Hiding Schemes for Private Distributed Learning. In Proceedings of the 37th International Conference on Machine Learning (ICML), Vienna, Austria, 13–18 July 2020; pp. 4507–4518. [Google Scholar]
- Kiya, H.; IIjima, R.; Aprilpyone, M.; Kinoshita, Y. Image and Model Transformation with Secret Key for Vision Transformer. IEICE Trans. Inf. Syst.
**2023**, E106-D, 2–11. [Google Scholar] [CrossRef] - Hamano, G.; Imaizumi, S.; Kiya, H. Image Classification Using Vision Transformer for EtC Images. In Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Chiang Mai, Thailand, 7–10 November 2022; pp. 1503–1510. [Google Scholar]
- Kurihara, K.; Kikuchi, M.; Imaizumi, S.; Shiota, S.; Kiya, H. An Encryption-then-Compression System for JPEG/Motion JPEG Standard. IEICE Trans. Fundam.
**2015**, E98-A, 2238–2245. [Google Scholar] [CrossRef] - Ito, H.; Kinoshita, Y.; Aprilpyone, M.; Kiya, H. Image to Perturbation: An Image Transformation Network for Generating Visually Protected Images for Privacy-Preserving Deep Neural Networks. IEEE Access
**2021**, 9, 64629–64638. [Google Scholar] [CrossRef] - Sirichotedumrong, W.; Kiya, H. A GAN-Based Image Transformation Scheme for Privacy-Preserving Deep Neural Networks. In Proceedings of the 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 18–21 January 2021; pp. 745–749. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, Austria, 3–7 May 2021; pp. 1–21. [Google Scholar]
- Trockman, A.; Kolter, J.Z. Patches are all you need? In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 25–29 April 2022; pp. 1–15. [Google Scholar]
- Weinberger, M.J.; Seroussi, G.; Sapiro, G. The LOCO-I lossless image compression algorithm: Principles and standardization into JPEG-LS. IEEE Trans. Image Process.
**2000**, 9, 1309–1324. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 1–11. [Google Scholar]
- Kiya, H.; Nagamori, T.; Imaizumi, S.; Shiota, S. Privacy-Preserving Semantic Segmentation Using Vision Transformer. J. Imaging
**2022**, 8, 233. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Overview of ViT [21].

**Figure 2.**Block diagram of the previous method [17].

**Figure 3.**Classification flow of evaluation schemes. (

**a**) Classification flow for JPEG-compressed EtC images using the encrypted model trained with plain training images (evaluation scheme A, hereafter). (

**b**) Classification flow for JPEG-compressed EtC images using the encrypted model trained with JPEG training images (evaluation scheme B, hereafter).

**Figure 8.**Relationship between divided the image and ViT parameter $\mathbf{E}$. Blue dots represent single pixels in the segmented image, and green dots represent single rows in $\mathbf{E}$ corresponding to blue dots.

**Figure 12.**Classification accuracy at each quality factor with and without downsampling (${S}_{\mathrm{sb}}=16$).

${\mathit{S}}_{\mathbf{sb}}$ | Transformation Type | Average Amount of Image Data [bpp] | ||||||
---|---|---|---|---|---|---|---|---|

JPEG Compression | Linear | No | ||||||

$\mathit{Q}=100$ | $\mathit{Q}=95$ | $\mathit{Q}=90$ | $\mathit{Q}=85$ | $\mathit{Q}=80$ | Quantization | Compression | ||

8 | Common | 4.19 | 2.08 | 1.47 | 1.20 | 1.04 | 3.00 | 24.00 |

Independent | 5.50 | 2.80 | 2.01 | 1.64 | 1.42 | |||

16 | Common | 2.98 | 1.57 | 1.13 | 0.93 | 0.82 | ||

Independent | 3.49 | 1.64 | 1.18 | 0.98 | 0.87 | |||

No encryption | 2.92 | 1.54 | 1.10 | 0.91 | 0.80 |

${\mathit{S}}_{\mathbf{sb}}$ | Transformation Type | Classification Accuracy [%] (Change Rate [%]) | ||||||
---|---|---|---|---|---|---|---|---|

JPEG Compression | Linear | No | ||||||

$\mathit{Q}=100$ | $\mathit{Q}=95$ | $\mathit{Q}=90$ | $\mathit{Q}=85$ | $\mathit{Q}=80$ | Quantization | Compression | ||

8 | Common | 98.83 | 98.83 | 98.80 | 98.75 | 98.71 | 33.29 (66.70) | 98.89 (0.00) |

(0.20) | (0.30) | (0.46) | (0.61) | (0.60) | ||||

Independent | 98.45 | 98.33 | 98.24 | 98.00 | 97.67 | |||

(0.99) | (1.17) | (1.27) | (1.45) | (1.94) | ||||

16 | Common | 98.87 | 98.89 | 98.89 | 98.85 | 98.86 | ||

(0.12) | (0.18) | (0.17) | (0.19) | (0.25) | ||||

Independent | 98.87 | 98.86 | 98.89 | 98.74 | 98.66 | |||

(0.10) | (0.17) | (0.46) | (0.57) | (0.66) | ||||

No encryption for | 98.89 | 98.89 | 98.81 | 98.89 | 98.90 | 98.89 | ||

images and model | (0.08) | (0.10) | (0.18) | (0.25) | (0.23) | (-) |

${\mathit{S}}_{\mathbf{sb}}$ | Transformation Type | Classification Accuracy [%] | |
---|---|---|---|

JPEG Compression | Linear Quantization | ||

8 | Common | 98.84 | 88.20 |

Independent | 97.94 | ||

16 | Common | 98.96 | |

Independent | 98.80 | ||

No encryption for images and model | 98.97 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Hamano, G.; Imaizumi, S.; Kiya, H.
Effects of JPEG Compression on Vision Transformer Image Classification for Encryption-then-Compression Images. *Sensors* **2023**, *23*, 3400.
https://doi.org/10.3390/s23073400

**AMA Style**

Hamano G, Imaizumi S, Kiya H.
Effects of JPEG Compression on Vision Transformer Image Classification for Encryption-then-Compression Images. *Sensors*. 2023; 23(7):3400.
https://doi.org/10.3390/s23073400

**Chicago/Turabian Style**

Hamano, Genki, Shoko Imaizumi, and Hitoshi Kiya.
2023. "Effects of JPEG Compression on Vision Transformer Image Classification for Encryption-then-Compression Images" *Sensors* 23, no. 7: 3400.
https://doi.org/10.3390/s23073400