# Reduced-Complexity End-to-End Variational Autoencoder for on Board Satellite Image Compression

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Background: Autoencoder Based Image Compression

#### 2.1. Analysis and Synthesis Transforms

#### 2.2. Bottleneck

#### 2.3. Parameter Learning and Entropy Model Estimation

#### 2.3.1. Loss Function: Rate Distortion Trade-Off

- The rate R achieved by an entropy coder is lower-bounded by the entropy derived from the actual discrete probability distribution $m\left(\widehat{\mathbf{y}}\right)$ of the quantized vector $\widehat{\mathbf{y}}$. The rate increase comes from the mismatch between the probability model ${p}_{\widehat{\mathbf{y}}}\left(\widehat{\mathbf{y}}\right)$ required for the coder design and $m\left(\widehat{\mathbf{y}}\right)$. The bit-rate is given by the Shannon cross entropy between the two distributions:$$H\left(\widehat{\mathbf{y}}\right)={\mathbb{E}}_{\widehat{\mathbf{y}}\sim m}\left[-lo{g}_{2}{p}_{\widehat{\mathbf{y}}}\left(\widehat{\mathbf{y}}\right)\right],$$
- The distortion measure D is chosen to account for image quality as perceived by a human observer. Due to its many desirable computational properties, the mean square error (MSE) is generally selected. However, a measure of perceptual distortion may also be employed such as the multi-scale structural similarity index (MS-SSIM) [24].

#### 2.3.2. Entropy Model

- Fully factorized model: For simplicity, in [13,14], the approximated quantized representation was assumed independent and identically distributed within each channel and the channels were assumed independent of each other, resulting in a fully factorized distribution:$${p}_{\tilde{\mathbf{y}}|\psi}\left(\tilde{\mathbf{y}}\right|\psi )=\prod _{i}^{}{p}_{{\tilde{y}}_{i}|{\psi}^{\left(i\right)}}\left({\tilde{y}}_{i}\right),$$$$\tilde{\mathbf{y}}=\mathbf{y}+\Delta \mathbf{y}$$$${p}_{{\tilde{y}}_{i}|{\psi}^{\left(i\right)}}\left({\tilde{y}}_{i}\right)={p}_{{y}_{i}|{\psi}^{\left(i\right)}}\left({y}_{i}\right)*\mathcal{U}\left(-1/2,1/2\right).$$For generality, in [13], the distribution ${p}_{{y}_{i}|{\psi}^{\left(i\right)}}\left({y}_{i}\right)$ is assumed non parametric, namely without predefined shape. In [13,14], the parameter vectors are learned from data during the training phase. This learning, performed once and for all, prohibits adaptivity to the input images during operational phase. Moreover, the simplifying hypothesis of a fully factorized distribution is very strong and not satisfied in practice, elements of $\widehat{\mathbf{y}}$ exhibiting strong spatial dependency as observed in [16]. To overcome these limitations and thus to obtain a more realistic and more adaptive entropy model, [16] proposed a hyperprior model, derived through a variational autoencoder, which takes into account possible spatial dependency in each input image.
- Hyperprior model: Auxiliary random variables $\tilde{\mathbf{z}}$, conditioned on which the quantized representation $\tilde{\mathbf{y}}$ elements are independent, are derived from $\mathbf{y}$ by an auxiliary autoencoder, connected in parallel with the bottleneck (right column of Figure 1 (right)). The hierarchical model hyper-parameters are learned for each input image in operational phase. Firstly, the hyperprior transform analysis ${H}_{a}$ produces the set of auxiliary random variables $\mathbf{z}$. Secondly, $\mathbf{z}$ is transformed by the hyperprior synthesis transform ${H}_{s}$ into a second set of random variables $\sigma $. In [16], $\mathbf{z}$ distribution is assumed fully factorized and each representation element ${\tilde{y}}_{i}$, knowing $\mathbf{z}$, is modeled by a zero-mean Gaussian distribution with its own standard deviation ${\sigma}_{i}$. Finally, taking into account the quantization process, the conditional distribution of each quantized representation element is given by:$${\tilde{y}}_{i}|\tilde{\mathbf{z}}\sim \mathcal{N}\left(0,{{\sigma}_{i}}^{2}\right)*\mathcal{U}\left(-\frac{1}{2},\frac{1}{2}\right).$$The rate computation must take into account the prior distribution of $\tilde{\mathbf{z}}$, which has to be transmitted to the decoder with the compressed data, as side information.

## 3. Reduced-Complexity Variational Autoencoder

#### 3.1. Analysis and Synthesis Transforms

#### 3.1.1. Complexity Assessment

#### 3.1.2. Proposal: Simplified Analysis and Synthesis Transforms

#### 3.2. Reduced Complexity Entropy Model

#### 3.2.1. Statistical Analysis of the Learned Transform

#### 3.2.2. Proposal: Simplified Entropy Model

## 4. Performance Analysis

#### 4.1. Implementation Setup

`Ballé(2017)–non-parametric-N`refers to the autoencoder [13] and is implemented for $N=192$ (respectively $N=256$) for rates below 2 $\mathrm{bits}/\mathrm{pixel}$ (respectively above $2\mathrm{bits}/\mathrm{pixel}$).`Ballé(2018)–hyperprior-N-M`refers to the variational autoencoder [16] and is implemented for $N=128$ and $M=192$ (respectively $N=192$ and $M=320$) for rates below 2 $\mathrm{bits}/\mathrm{pixel}$ (respectively above 2 $\mathrm{bits}/\mathrm{pixel}$).

#### 4.2. Subjective Image Quality Assessment

`Ballé(2017)–non-parametric-N`(d), for a low compression rate ($1.15$ bits/pixel). The image obtained through learned compression appears closest to the original one than the image obtained through the CCSDS.

#### 4.3. Impact of the Number of Filter Reduction

#### 4.3.1. At Low Rates

`Ballé(2018)–hyperprior-N128-M192`, the number of filters N (for all layers, apart from the one just before the bottleneck) is reduced from $N=128$ to $N=64$, keeping $M=192$ for the layer just before the bottleneck. This reduction is applied jointly to the main autoencoder and to the hyperprior one. The proposed simplified architecture is termed

`Ballé(2018)-s-hyperprior-N64-M192`. The complexity of this model is evaluated in terms of number of parameters and of floating point operation per pixel (FLOPp) in Table 1.

`Ballé(2018)-s-hyperprior-N64-M192`to the reference

`Ballé(2018)-hyperprior-N128-M192`.

`Ballé(2018)–hyperprior-N128-M192`and the dashed one for the proposal

`Ballé(2018)–hyperprior-N64-M192`.

`Ballé(2018)-s-hyperprior–N64-M192`achieves a rate-distortion performance close to the one of

`Ballé(2018)–hyperprior-N128-N192`[16], both in terms of MSE and MS-SSIM, for rates up to 2 bits/pixel. We can conclude that the decrease in performance is very small, keeping in mind the huge complexity reduction. Please note that our proposal outperforms by far CCSDS 122.0-B [6], JPEG2000 [5] standards as well as

`Ballé(2017)-non-parametric-N192`[13].

#### 4.3.2. At High Rates

`Ballé(2018)-s-hyperprior–N64-M320`is compared to the reference

`Ballé(2018)-hyperprior–N192-M320`but also to

`Ballé(2017)-non-parametric–N256`and to JPEG2000 [5] and CCSDS 122.0-B [6] standards. Table 3 compares the complexity of

`Ballé(2018)-s-hyperprior-N64-M320`to the reference

`Ballé(2018)-hyperprior-N192-M320`.

`Ballé(2018)–hyperprior-N192-M320`) and the simplified (

`Ballé(2018)-s-hyperprior-N64-M320`) variational models, a training of 1M iterations seems insufficient for the highest rates. Indeed, due to the auxiliary autoencoder implementing the hyperprior, the training has conceivably to be longer, which can be a disadvantage in practice. This may be an additional argument to propose a simplified entropy model.

#### 4.3.3. Summary

#### 4.4. Impact of the Bottleneck Size

`Ballé(2018)-s-laplacian-N64-M`, allows quantifying the impact of M on the performance in terms of both MSE and MS-SSIM for increasing values of the target rate. Figure 12 shows the rate-distortion averaged over the validation dataset. According to the literature, high bit rates require a large global number of filters [16].

#### 4.5. Impact of the Gdn/Igdn Replacement in the Main Autoencoder

`Ballé(2018)–hyperprior-N128-M192`of [16], involving GDN/IGDN non-linearities, is compared with the architecture obtained after a full ReLU replacement, except for the last layer of the decoder part. Indeed, this layer involves a sigmoid activation function for constraining the pixel interval mapping between 0 and 1 before the quantization. Figure 13 shows the rate-distortion averaged over the validation dataset in terms of both MSE and MS-SSIM.

#### 4.6. Impact of the Filter Kernel Support in the Main Autoencoder

`Ballé(2018)–hyperprior-N128-M192`of [16] is also compared when the $5\times 5$ filters composing the convolutional layers of the main autoencoder are replaced by $3\times 3$ and $7\times 7$ filters. It is worth mentioning that all the variant architectures considered in this part share the same entropy model obtained through the same auxiliary autoencoder in terms of number of filters and kernel supports, since the objective here is not to assess the impact of the entropy model. According to Figure 13, a kernel support reduction from $5\times 5$ to $3\times 3$ leads to a performance decrease. This result is expected in the sense that filters with a smaller kernel support correspond to a reduced approximation capability. On the other hand, a kernel support increase from $5\times 5$ to $7\times 7$ does not lead to a significant performance improvement. This result indicates that the approximation capability obtained with a kernel support $5\times 5$ is sufficient.

#### 4.7. Impact of the Entropy Model Simplification

#### 4.7.1. At Low Rates

`Ballé(2018)–laplacian-N128-M192`(with the simplified entropy model) and

`Ballé(2018)-s-laplacian-N64-M192`(combining the reduction of the number of filters to N = 64 and the simplified Laplacian entropy model) are compared with the non-variational reference method

`Ballé(2017)–non-parametric-N192`[13], with the variational reference method

`Ballé(2018)–hyperprior-N128-M192`[16], with its version after reduction of the number of filters

`Ballé(2018)-s-hyperprior-N64-M192`, with the architecture denoted as

`Ballé(2018)-nonparametric-N128-M192`(combining the main auto-encoder in [16] and the non-parametric entropy model in [13]) and its version after reduction of the number of filters

`Ballé(2018)-s-non-parametric-N64-M192`. Table 4 shows that the coding part complexity of

`Ballé(2018)-s-laplacian-N64-M192`is 13% lower than the one of

`Ballé(2018)-s-hyperprior-N64-M192`.

`Ballé(2018)-s-laplacian-N64-M192`) achieves an intermediate performance between the variational model (

`Ballé(2018)-s-hyperprior-N64-M192`) and the non-variational model (

`Ballé (2018)-s-non-parametric-N64-M192`). Obviously, due to the entropy model simplification,

`Ballé(2018)-s-laplacian-N64-M192`underperforms the more general and thus more complex

`Ballé(2018)-s-hyperprior-N64-M192`model. However, the proposed entropy model, even if simpler, preserves the adaptability to the input image, unlike the models

`Ballé(2018)–non-parametric-N128-M192`and

`Ballé(2017)–non-parametric-N192`[13]. Please note that the simplified Laplacian entropy model perform close to the hyperprior model at relatively high rates. One possible explanation for this behaviour can be the increased amount of side information required by the hyperprior model [16] for these rates [28].

#### 4.7.2. At High Rates

`Ballé(2018)–laplacian-N192-M320`(with the simplified entropy model) and

`Ballé(2018)-s-laplacian-N64-M320`(combining the reduction of the number of filters to N = 64 and the simplified Laplacian entropy model) are compared with the non-variational reference method

`Ballé(2017)–non-parametric-N256`[13], with the variational reference method

`Ballé(2018)–hyperprior-N192-M320`[16], with its version after reduction of the number of filters

`Ballé(2018)-s-hyperprior-N64-M320`, with the architecture denoted as

`Ballé(2018)-nonparametric-N192-M320`(combining the main auto-encoder in [16] and the non-parametric entropy model in [13]) and its version after reduction of the number of filters

`Ballé(2018)-s-non-parametric-N64-M320`. Figure 11 displays the rate-distortion averaged over the validation dataset for the trained models in terms of MSE. The proposed simplified entropy method

`Ballé(2018)-s-laplacian-N64-M320`achieves an intermediate performance between the variational model (

`Ballé(2018)-s-hyperprior-N64-M320`) and the non-variational model

`Ballé(2018)-s-non-parametric-N64-M320`, similarly to the models targeting lower rates in Figure 10. Table 5 shows that the coding part complexity of

`Ballé(2018)-s-laplacian-N64-M320`is around 16% lower than the one of

`Ballé(2018)-s-hyperprior-N64-M320`.

#### 4.7.3. Summary

#### 4.8. Discussion About Complexity

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Abbreviations

CCSDS | Consultative committee for space data systems |

CNN | Convolutional neural networks |

DCT | Discrete cosine transform |

GDN | Generalized divisive normalization |

IGDN | Inverse generalized divisive normalization |

JPEG | Joint photographic experts group |

MSE | Mean square error |

MS-SSIM | Multi-scale structural similarity index |

PCA | Principal component analysis |

ReLU | Rectified Linear Unit |

## Appendix A. Table of Symbols Used

Symbol | Meaning | Reference |
---|---|---|

$\mathbf{x}$ | original image | Section 2.1 |

${G}_{a}\left(\mathbf{x}\right)$ | analysis transform | Section 2.1 |

$\mathbf{y}$ | learned representation | Section 2.1 |

$\widehat{\mathbf{y}}$ | quantized learned representation | Section 2.1 |

${G}_{s}\left(\widehat{\mathbf{y}}\right)$ | synthesis transform | Section 2.1 |

$\widehat{\mathbf{x}}$ | reconstructed image | Section 2.1 |

GDN | generalized divisive normalizations | Section 2.1 |

IGDN | inverse generalized divisive normalizations | Section 2.1 |

N | filters composing the convolutional layers | Section 2.1 |

$n\times n$ | kernel support | Section 2.1 |

M | filters composing the last layer of ${G}_{a}$ | Section 2.1 |

$k,l$ | coordinate indexes of the output of the ith filter | Section 2.1 |

${v}_{i}(k,l)$ | value indexed by $(k,l)$ of the output of the ith filter | Section 2.1 |

$Q\left(\mathbf{y}\right)$ | quantizer | Section 2.2 |

J | rate-distortion loss function | Section 2.3.1 |

$R\left(\widehat{\mathbf{y}}\right)$ | rate | Section 2.3.1 |

$D(\mathbf{x},\widehat{\mathbf{x}})$ | distortion between the original image $\mathbf{x}$ and the reconstructed image $\widehat{\mathbf{x}}$ | Section 2.3.1 |

$\lambda $ | parameter that tunes the rate-distortion trade-off | Section 2.3.1 |

$m\left(\widehat{\mathbf{y}}\right)$ | actual discrete probability distribution | Section 2.3.1 |

${p}_{\widehat{\mathbf{y}}}\left(\widehat{\mathbf{y}}\right)$ | probability model assigned to the quantized representation | Section 2.3.1 |

$H\left(\widehat{\mathbf{y}}\right)$ | bit-rate given by the Shannon cross entropy | Section 2.3.1 |

${\psi}^{\left(i\right)}$ | distribution model parameter vector associated with each element | Section 2.3.2 |

$\Delta \mathbf{y}$ | i.i.d uniform noise | Section 2.3.2 |

$\tilde{\mathbf{y}}$ | continuous approximated quantized learned representation | Section 2.3.2 |

$\mathbf{z}$ | set of auxiliary random variables | Section 2.3.2 |

${H}_{a}\left(\mathbf{y}\right)$ | hyperprior analysis transform | Section 2.3.2 |

${H}_{s}\left(\widehat{\mathbf{z}}\right)$ | hyperprior synthesis transform | Section 2.3.2 |

${\sigma}_{i}$ | standard deviation of a zero-mean Gaussian distribution | Section 2.3.2 |

${N}_{in}$ | number of features at the considered layer input | Section 3.1.1 |

${N}_{out}$ | number of features at the considered layer output | Section 3.1.1 |

${\mathrm{Param}}^{f}$ | number of parameters associated with the filtering part of the considered layer | Section 3.1.1 |

$\delta $ | term accounting for the bias | Section 3.1.1 |

D | downsampling factor | Section 3.1.1 |

${s}_{in}$ | channel input size | Section 3.1.1 |

${s}_{out}$ | downsampled input channel size | Section 3.1.1 |

${\mathrm{Operation}}^{f}$ | number of floating points operations | Section 3.1.1 |

${\mathrm{Param}}^{g}$ | number of parameters associated with each IGDN/GDN | Section 3.1.1 |

${\mathrm{Operation}}^{g}$ | number of floating points operations of each GDN/IGDN | Section 3.1.1 |

$\zeta $ | random variable that follows a Laplacian distribution | Section 3.2.1 |

$\mu $ | mean value of a Laplacian distribution | Section 3.2.1 |

b | scale parameter of a Laplacian distribution | Section 3.2.1 |

$Var\left(\zeta \right)$ | variance of a laplacian distributed random variable | Section 3.2.1 |

${y}_{i}$ | feature map elements | Section 3.2.2 |

${I}_{j}$ | set of indexes covering the jth feature | Section 3.2.2 |

## References

- Yu, G.; Vladimirova, T.; Sweeting, M.N. Image compression systems on board satellites. Acta Astronaut.
**2009**, 64, 988–1005. [Google Scholar] [CrossRef] - Huang, B. Satellite Data Compression; Springer Science & Business Media: New York, NY, USA, 2011. [Google Scholar]
- Qian, S.E. Optical Satellite Data Compression and Implementation; SPIE Press: Bellingham, WA, USA, 2013. [Google Scholar]
- Goyal, V.K. Theoretical foundations of transform coding. IEEE Signal Process. Mag.
**2001**, 18, 9–21. [Google Scholar] [CrossRef][Green Version] - Taubman, D.; Marcellin, M. JPEG2000 Image Compression Fundamentals, Standards and Practice; Springer Publishing Company: New York, NY, USA, 2013. [Google Scholar]
- Book, B. Consultative Committee for Space Data Systems (CCSDS), Image Data Compression CCSDS 122.0-B-1, Ser. Blue Book; CCSDS: Washington, DC, USA, 2005. [Google Scholar]
- Bengio, Y. Learning deep architectures for AI. Found. Trends Mach. Learn.
**2009**, 2, 1–127. [Google Scholar] [CrossRef] - Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Kaiser, P.; Wegner, J.D.; Lucchi, A.; Jaggi, M.; Hofmann, T.; Schindler, K. Learning aerial image segmentation from online maps. IEEE Trans. Geosci. Remote. Sens.
**2017**, 55, 6054–6068. [Google Scholar] [CrossRef] - Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Trans. Image Process.
**2017**, 26, 3142–3155. [Google Scholar] [CrossRef] [PubMed][Green Version] - Wiatowski, T.; Bölcskei, H. A mathematical theory of deep convolutional neural networks for feature extraction. IEEE Trans. Inf. Theory
**2017**, 64, 1845–1866. [Google Scholar] [CrossRef][Green Version] - Ballé, J.; Laparra, V.; Simoncelli, E. End-to-end optimized image compression. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
- Theis, L.; Shi, W.; Cunningham, A.; Huszár, F. Lossy image compression with compressive autoencoders. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
- Rippel, O.; Bourdev, L. Real-Time Adaptive Image Compression. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; pp. 2922–2930. [Google Scholar]
- Ballé, J.; Minnen, D.; Singh, S.; Hwang, S.J.; Johnston, N. Variational image compression with a scale hyperprior. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Bellard, F. BPG Image Format. 2015. Available online: https://bellard.org/bpg (accessed on 15 December 2020).
- Cheng, Z.; Sun, H.; Takeuchi, M.; Katto, J. Deep Residual Learning for Image Compression. In Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Ballé, J. Efficient Nonlinear Transforms for Lossy Image Compression. In Proceedings of the 2018 IEEE Picture Coding Symposium (PCS), San Francisco, CA, USA, 24–27 June 2018; pp. 248–252. [Google Scholar]
- Lyu, S. Divisive normalization: Justification and effectiveness as efficient coding transform. Adv. Neural Inf. Process. Syst.
**2010**, 23, 1522–1530. [Google Scholar] - Rissanen, J.; Langdon, G. Universal modeling and coding. IEEE Trans. Inf. Theory
**1981**, 27, 12–23. [Google Scholar] [CrossRef] - Martin, G. Range encoding: An algorithm for removing redundancy from a digitised message. In Proceedings of the Video and Data Recording Conference, Southampton, UK, 24–27 July 1979; pp. 24–27. [Google Scholar]
- Van Leeuwen, J. On the Construction of Huffman Trees. In Proceedings of the International Colloquium on Automata, Languages, and Programming (ICALP), Edinburgh, UK, 20–23 July 1976; pp. 382–410. [Google Scholar]
- Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multiscale structural similarity for image quality assessment. In Proceedings of the Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 9–12 November 2003; pp. 1398–1402. [Google Scholar]
- Dumas, T.; Roumy, A.; Guillemot, C. Autoencoder based image compression: Can the learning be quantization independent? In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 1188–1192. [Google Scholar]
- Lam, E.Y.; Goodman, J.W. A mathematical analysis of the DCT coefficient distributions for images. IEEE Trans. Image Process.
**2000**, 9, 1661–1666. [Google Scholar] [CrossRef] [PubMed][Green Version] - Pratt, J.W.; Gibbons, J.D. Kolmogorov-Smirnov two-sample tests. In Concepts of Nonparametric Theory; Springer: New York, NY, USA, 1981; pp. 318–344. [Google Scholar]
- Hu, Y.; Yang, W.; Ma, Z.; Liu, J. Learning End-to-End Lossy Image Compression: A Benchmark. arXiv
**2020**, arXiv:2002.03711. [Google Scholar] - Book, G. Consultative Committee for Space Data Systems (CCSDS), Image Data Compression CCSDS 120.1-G-2, Ser. Green Book; CCSDS: Washington, DC, USA, 2015. [Google Scholar]

**Figure 2.**Simulated 12-bit Pléiades image of Cannes with size $512\times 512$ and resolution 70 $\mathrm{cm}$.

**Figure 3.**First feature of Cannes image representation, its normalized histogram with Laplacian fitting.

**Figure 6.**Proposed architecture after entropy model simplification: main autoencoder (

**left column**) and simplified auxiliary autoencoder (

**right column**).

**Figure 12.**Impact of the bottleneck size in terms of MSE and MS-SSIM (dB) (derived as $-10{log}_{10}$($1-$MS-SSIM)).

**Figure 13.**Impact of the GDN/IGDN replacement and of the filter kernel support on performance in terms of MSE and MS-SSIM (dB) (derived as $-10{log}_{10}$($1-$MS-SSIM)).

Layer | Filter Size | Channels | Output | Parameters | FLOPp | |||
---|---|---|---|---|---|---|---|---|

n | n | ${\mathit{N}}_{\mathbf{in}}$ | $\phantom{\rule{1.em}{0ex}}{\mathit{N}}_{\mathbf{out}}$ | ${\mathit{s}}_{\mathbf{out}}$ | ${\mathit{s}}_{\mathbf{out}}$ | |||

conv1 | 5 | 5 | 1 | 64 | 256 | 256 | 1664 | $4.16\times {10}^{2}$ |

GDN1 | 4160 | $1.04\times {10}^{3}$ | ||||||

conv2 | 5 | 5 | 64 | 64 | 128 | 128 | 102,464 | $6.40\times {10}^{3}$ |

GDN2 | 4160 | $2.60\times {10}^{2}$ | ||||||

conv3 | 5 | 5 | 64 | 64 | 64 | 64 | 102,464 | $1.60\times {10}^{3}$ |

GDN3 | 4160 | $0.65\times {10}^{2}$ | ||||||

conv4 | 5 | 5 | 64 | 192 | 32 | 32 | 307,392 | $1.2\times {10}^{3}$ |

Hconv1 | 3 | 3 | 192 | 64 | 32 | 32 | 110,656 | $4.32\times {10}^{2}$ |

Hconv2 | 5 | 5 | 64 | 64 | 16 | 16 | 102,464 | $1.00\times {10}^{2}$ |

Hconv3 | 5 | 5 | 64 | 64 | 8 | 8 | 102,464 | $0.25\times {10}^{2}$ |

HTconv1 | 5 | 5 | 64 | 64 | 16 | 16 | 102,464 | $1.00\times {10}^{2}$ |

HTconv2 | 5 | 5 | 64 | 64 | 32 | 32 | 102,464 | $4.00\times {10}^{2}$ |

HTconv3 | 3 | 3 | 64 | 192 | 32 | 32 | 110,784 | $4.32\times {10}^{2}$ |

Tconv1 | 5 | 5 | 192 | 64 | 64 | 64 | 307,264 | $4.80\times {10}^{3}$ |

IGDN1 | 4160 | $0.65\times {10}^{2}$ | ||||||

Tconv2 | 5 | 5 | 64 | 64 | 128 | 128 | 102,464 | $6.40\times {10}^{3}$ |

IGDN2 | 4160 | $2.60\times {10}^{2}$ | ||||||

Tconv3 | 5 | 5 | 64 | 64 | 256 | 256 | 102,464 | $2.56\times {10}^{4}$ |

IGDN3 | 4160 | $1.04\times {10}^{3}$ | ||||||

Tconv4 | 5 | 5 | 64 | 1 | 512 | 512 | 1601 | $1.60\times {10}^{3}$ |

Total | 1,683,969 | $5.2264\times {10}^{4}$ |

**Table 2.**Comparative complexity of the global architectures-Case of target rates up to 2 bits/pixel.

Method | Parameters | FLOPp | Relative |
---|---|---|---|

Ballé(2018)–hyperprior-N128-M192 | 5,055,105 | $1.9115\times {10}^{5}$ | 1.00 |

Ballé(2018)-s-hyperprior-N64-M192 | 1,683,969 | $5.2264\times {10}^{4}$ | 0.27 |

Ballé(2018)-s-laplacian-N64-M192 | 1,052,737 | $5.0774\times {10}^{4}$ | 0.265 |

**Table 3.**Comparative complexity of the global architectures-Case of target rates above 2 bits/pixel.

Method | Parameters | FLOPp | Relative |
---|---|---|---|

Ballé(2018)–hyperprior-N192-M320 | 11,785,217 | $4.3039\times {10}^{5}$ | 1.00 |

Ballé(2018)-s-hyperprior-N64-M320 | 1,683,969 | $5.6966\times {10}^{4}$ | 0.13 |

Ballé(2018)-s-laplacian-N64-M320 | 1,052,737 | $5.4774\times {10}^{4}$ | 0.1273 |

**Table 4.**Reduction of the encoder complexity induced by simplified entropy model on the coding part-Case of rates up to 2 bits/pixel).

Method | Parameters | FLOPp | Relative |
---|---|---|---|

Ballé(2018)-s-hyperprior-N64-M192 | 1,157,696 | $1.25\times {10}^{4}$ | 1 |

Ballé(2018)-s-laplacian-N64-M192 | 526,464 | $1.09\times {10}^{4}$ | 0.87 |

**Table 5.**Reduction of the encoder complexity induced by simplified entropy model on the coding part-Cas of rates above 2 bits/pixel.

Method | Parameters | FLOPp | Relative |
---|---|---|---|

Ballé(2018)-s-hyperprior-N64-M320 | 1,715,008 | $1.3979\times {10}^{4}$ | 1 |

Ballé(2018)-s-laplacian-N64-M320 | 731,392 | $1.1787\times {10}^{4}$ | 0.8432 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Alves de Oliveira, V.; Chabert, M.; Oberlin, T.; Poulliat, C.; Bruno, M.; Latry, C.; Carlavan, M.; Henrot, S.; Falzon, F.; Camarero, R. Reduced-Complexity End-to-End Variational Autoencoder for on Board Satellite Image Compression. *Remote Sens.* **2021**, *13*, 447.
https://doi.org/10.3390/rs13030447

**AMA Style**

Alves de Oliveira V, Chabert M, Oberlin T, Poulliat C, Bruno M, Latry C, Carlavan M, Henrot S, Falzon F, Camarero R. Reduced-Complexity End-to-End Variational Autoencoder for on Board Satellite Image Compression. *Remote Sensing*. 2021; 13(3):447.
https://doi.org/10.3390/rs13030447

**Chicago/Turabian Style**

Alves de Oliveira, Vinicius, Marie Chabert, Thomas Oberlin, Charly Poulliat, Mickael Bruno, Christophe Latry, Mikael Carlavan, Simon Henrot, Frederic Falzon, and Roberto Camarero. 2021. "Reduced-Complexity End-to-End Variational Autoencoder for on Board Satellite Image Compression" *Remote Sensing* 13, no. 3: 447.
https://doi.org/10.3390/rs13030447