You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

18 August 2022

Serial Decoders-Based Auto-Encoders for Image Reconstruction

,
,
and
1
School of Information Engineering, Yangzhou University, Yangzhou 225000, China
2
Institut Supérieur d’Électronique de Paris, 92130 Issy Les Moulineaux, France
3
Department of Electrical Engineering, Polystim Neurotechnology Laboratory, Polytechnique Montreal, Montreal, QC H3T 1J4, Canada
4
CenBRAIN Lab, School of Engineering, Westlake University, Hangzhou 310024, China
This article belongs to the Special Issue Deep Neural Network: Algorithms and Applications

Abstract

Auto-encoders are composed of coding and decoding units; hence, they hold an inherent potential of being used for high-performance data compression and signal-compressed sensing. The main disadvantages of current auto-encoders comprise the following aspects: the research objective is not to achieve lossless data reconstruction but efficient feature representation; the evaluation of data recovery performance is neglected; it is difficult to achieve lossless data reconstruction using pure auto-encoders, even with pure deep learning. This paper aims at performing image reconstruction using auto-encoders, employs cascade decoders-based auto-encoders, perfects the performance of image reconstruction, approaches gradually lossless image recovery, and provides a solid theoretical and applicational basis for auto-encoders-based image compression and compressed sensing. The proposed serial decoders-based auto-encoders include the architectures of multi-level decoders and their related progressive optimization sub-problems. The cascade decoders consist of general decoders, residual decoders, adversarial decoders, and their combinations. The effectiveness of residual cascade decoders for image reconstruction is proven in mathematics. Progressive training can efficiently enhance the quality, stability, and variation of image reconstruction. It has been shown by the experimental results that the proposed auto-encoders outperform classical auto-encoders in the performance of image reconstruction.

1. Introduction

Since deep learning achieves the rules and features from input data using multi-layer stacked neural networks in a highly efficient manner, it has garnered unprecedented successful research and applications in the domains of data classification, recognition, compression, and processing [1,2]. Although the theoretical research and engineering applications of deep learning have matured, there is still much room to improve, and deep learning has not yet attained the requirements for general artificial intelligence [1,2]. Hence, it is incumbent on researchers to utilize deep learning to upgrade the performance of data compression and signal-compressed sensing.
Data reconstruction is the foundation of data compression and signal-compressed sensing. It contains multifarious meanings in understanding from a broad sense. In this paper, data reconstruction denotes high-dimensional original data being initially mapped into a low-dimensional space and then being recovered. Although the classical methods of data compression and signal-compressed sensing are full-blown, it is still necessary to investigate new algorithms that are based on deep learning. Currently, merely some of the components of traditional data compression methods such as prediction coding, transformation coding, and quantization coding are being replaced by deep learning methods. The principal difficulty is that lossless data reconstruction of pure deep learning-based methods has not yet been attained. In consideration of the powerful capabilities of deep learning, this article will explore new approaches to data reconstruction via pure deep-learning based methods.
Auto-encoders (AE) are a classical architecture of deep neural networks, which initially project high-dimensional data into a low-dimensional latent space according to a given rule, and then reconstruct the original data from latent space while minimizing reconstruction error [3,4,5,6]. Auto-encoders possess many theoretical models, including the following: sparse auto-encoders, convolutional auto-encoders, variational auto-encoders (VAE), adversarial auto-encoders (AAE), Wasserstein auto-encoders (WAE), graphical auto-encoders, extreme learning auto-encoders, integral learning auto-encoders, inverse function auto-encoders, recursive or recurrent auto-encoders, double or couple auto-encoders, de-noising auto-encoders, generative auto-encoders, fuzzy auto-encoders, non-negative auto-encoders, binary auto-encoders, quantum auto-encoders, linear auto-encoders, blind auto-encoders, group auto-encoders, kernel auto-encoders, etc. [3,4,5,6]. Some of the theoretical frameworks of traditional auto-encoders are collected in Table 1. Auto-encoders have garnered extensive research and applications in the domains of classification, recognition, encoding, sensing, and processing [3,4,5,6]. Since auto-encoders comprise encoding and decoding units, they hold the potential of being applied to high-performance data compression and signal-compressed sensing [7,8]. Classical auto-encoders shall be referred to as narrow auto-encoders. Other deep learning-based methods of data compression and signal-compressed sensing shall be referred to as generalized auto-encoders, because they contain encoding and decoding components, and each component can introduce an auto-encoder unit [9,10]. Narrow and generalized auto-encoders-based approaches of data compression and signal-compressed sensing can provide better performance in data reconstruction than the classical approaches [7,8,9,10].
Table 1. Some of the theoretical frameworks of traditional auto-encoders.
However, current research in auto-encoders exhibits the following problems: the research objective is not to achieve lossless data reconstruction but efficient feature representation; independent evaluation of the performance of data reconstruction is neglected; the performance of data reconstruction needs to be improved; it is difficult to attain lossless data reconstruction [17,18]. For instance, the performance of data reconstruction using AAE, one of the most advanced auto-encoders, is shown in Figure 1 [11]. The horizontal axis is the dimension of the latent space and the vertical axis is the average structural similarity (SSIM) of reconstructed images in comparison with original images. It is indicated in Figure 1 that the performance in data reconstruction of AAE increases while the dimension of hidden space increases, making it difficult to achieve lossless data reconstruction. Currently, pure deep learning-based methods of data compression and signal-compressed sensing cannot attain lossless data reconstruction.
Figure 1. Data reconstruction performance using AAE.
This manuscript attempts to regard lossless data reconstruction as a research goal of auto-encoders, and independently assesses the performance of data reconstruction of auto-encoders, enhances the quality of data reconstruction of auto-encoders, gradually approaches lossless data reconstruction of auto-encoders, and builds a solid theoretical and applicational foundation of data compression and signal-compressed sensing for auto-encoders.
This article proposes serial decoders-based auto-encoders for image reconstruction. The main contribution of this paper is to introduce cascade decoders into auto-encoders, including theoretical architectures and optimization problems. The optimization problems are divided into sequential sub-problems in order to progressively train the deep neural networks. Progressive training can efficiently improve the quality, stability, and variation of image reconstruction. The components of serial decoders consist of general decoders, residual decoders, adversarial decoders, and their combinations. The effectiveness of residual serial decoders for image reconstruction is proven in mathematics. Since AAE, VAE, and WAE are state-of-the-art auto-encoders, this article focuses on their cascade decoders-based versions.
The rest of this article is organized as follows: the related research is summarized in Section 2, theoretical foundations are established in Section 3, simulation experiments are designed in Section 4, and final conclusions are drawn in Section 5.

3. Theory

3.1. Notations and Abbreviations

For the convenience of content description, parts of the mathematical notations and abbreviations adopted in this manuscript are listed in Table 2.
Table 2. Mathematical notations and abbreviations.

3.2. Recall of Classical Auto-Encoders

The architecture of classical auto-encoders is illustrated in Figure 2. Classical auto-encoders are composed of two units: encoder and decoder. The encoder reduces the high-dimensional input data to a low-dimensional representation, and the decoder reconstructs the high-dimensional data from the low-dimensional representation. The classical auto-encoder can be described by the following formulas:
Figure 2. The architecture of classical auto-encoders.
z = E ( x ) y = D ( z ) x , y R H ; z R L H L
where the terms are defined as follows:
  • x is the high-dimensional input data. Taking image data as an example, x is the normalized version of original image for the convenience of numerical computation; each element of the original image is an integer in the range [0, 255]; each element of x is a real number in the range [0, 1] or [−1, +1]; x with elements in the range [0, 1] can be understood as probability variables; x can also be regarded as a vector which is a reshaping version of an image matrix.
  • z is the low-dimensional representation in a latent space.
  • y is the high-dimensional data, such as a reconstruction image.
  • E is the encoder.
  • D is the decoder.
  • H is the dimension of x or y; for image data, H is equal to the product of image width and height.
  • L is the dimension of z and L is far less than H.
The classical auto-encoders can be resolved by the following optimization problem:
( θ , z , y ) = arg min θ , y , z y x 2 2 s . t .   z = E ( x ) , y = D ( z ) , C z = z z g 2 2 < δ z , C y = y D y s y 2 2 + λ y s y 1 < δ y
where the terms are defined as follows:
  • θ are the parameters of auto-encoders, including the parameters of the encoder and decoder.
  • Cz is the constraint on low-dimensional representation z; for example, z satisfies a given probability distribution; it has been considered to match a known distribution by classical adversarial auto-encoders, variational auto-encoders, and Wasserstein auto-encoders.
  • zg is a related variable which meets a given distribution.
  • δz is a small constant.
  • Cy is the constraint on high-dimensional reconstruction data y; for instance, y meets a prior of local smoothness or non-local similarity. Auto-encoders require y to reconstruct x based on the prior to a great extent; other constraints, such as sparsity and low-rank properties of high-dimensional reconstruction data can also be utilized; hereby, sparse prior is taken as an example.
  • Dy is a matrix of sparse dictionary.
  • sy is a vector of sparse coefficients.
  • λy is a small constant.
  • δy is a small constant.

3.3. Proposed Cascade of Decoders-Based Auto-Encoders

The framework of the proposed cascade decoders-based auto-encoders (CDAE) is exhibited in Figure 3. The framework consists of two components: encoder and cascade decoders. The encoder is similar to that in the classical auto-encoder. Cascade decoders comprise N serial decoders, from decoder 1 to N. The framework can be depicted by the following expressions:
Figure 3. The architecture of cascade decoders-based auto-encoders.
z = E ( x ) y n = { D 1 ( z ) , n = 1 D n ( y n 1 ) , n = 2 , , N , y = y N
where the terms are defined as follows:
  • Dn is the nth decoder;
  • yn is the reconstruction data of Dn.
The reconstruction data can be solved by the following optimization problem:
( θ ; z ; y 1 , , y N ; y ) = arg min θ ; z ; y 1 , , y N ; y n = 1 N y n x 2 2 s . t .   z = E ( x ) ; y 1 = D 1 ( z ) ; y n = D n ( y n 1 ) , n = 2 , , N ; C z ; C y n , n = 1 , , N ; y = y N
where the terms are defined as follows:
  • θ are the parameters of cascade decoders-based auto-encoders;
  • Cyn is the constraint on yn.
For the purpose of gradually and serially training cascade decoders-based auto-encoders, the optimization problem in Equation (4) can be divided into the following suboptimization problems:
( θ 1 ; z ; y 1 ) = arg min θ 1 ; z ; y 1 y 1 x 2 2 ( θ 2 ; y 2 ) = arg min θ 2 ; y 2 y 2 x 2 2 ( θ N ; y N ; y ) = arg min θ N ; y N ; y y N x 2 2 s . t .   y = E ( x ) ; y 1 = D 1 ( z ) ; y n = D n ( y n 1 ) , n = 2 , , N ; C z ; C y n , n = 1 , , N ; y = y N
where the terms are defined as follows:
  • θ1 are the parameters of encoder and decoder 1;
  • θ2, …, and θN are the parameters of decoder 2 to N.
The proposed cascade decoders include general cascade decoders, residual cascade decoders, adversarial cascade decoders and their combinations. The general cascade decoders-based auto-encoders (GCDAE) have already been introduced in Figure 3. The other cascade decoders are elaborated in the following sections.

3.3.1. Residual Cascade Decoders-Based Auto-Encoders

The infrastructure of residual cascade decoders-based auto-encoders (RCDAE) is demonstrated in Figure 4. The blue signal flow is for the training phase, and the green signal flow is for both phases of training and testing. Each decoder is a residual module. This architecture is different from the traditional residual network (ResNet) because the former has an extra training channel for residual computation.
Figure 4. The architecture of residual cascade decoders-based auto-encoders.
The reconstruction data can be resolved by the following optimization problem:
( θ ; z ; r 1 , , r N ; y 1 , , y N ; y ) = arg min θ ; z ; y 1 , , y N ; y n = 1 N x y n 1 r n 2 2 s . t .   z = E ( x ) ; r 1 = D 1 ( z ) ; r n = D n ( y n 1 ) , n = 2 , , N ; C z ; C y n , n = 1 , , N ; y 0 = 0 ; y n = r n + y n 1 , n = 1 , , N ; y = y N
where the variables are defined as follows:
  • rn is the residual sample between x and yn;
  • y0 is the zero sample;
  • y is the final reconstruction sample.
For the purpose of gradually and serially training residual cascade decoders-based auto-encoders, the optimization problem in Equation (6) can be partitioned into the following suboptimization problems:
( θ 1 ; z ; r 1 ; y 1 ) = arg min θ 1 ; z ; r 1 ; y 1 x y 0 r 1 2 2 ( θ 2 ; r 2 ; y 2 ) = arg min θ 2 ; r 2 ; y 2 x y 1 r 2 2 2 ( θ N ; r N ; y N ; y ) = arg min θ N ; r N ; y N : y x y N 1 r N 2 2 s . t .   z = E ( x ) ; r 1 = D 1 ( z ) ; r n = D n ( y n 1 ) , n = 2 , , N ; C z ; C y n , n = 1 , , N ; y 0 = 0 ; y n = r n + y n 1 , n = 1 , , N ; y = y N
The effectiveness of the residual cascade that decodes for image reconstruction can be proven as follows:
y n = r n + y n 1 , n = 1 , , N x y n = ( x y n 1 ) r n r n ( x y n 1 ) μ = lim r n ( x y n 1 ) r n ( x y n 1 ) 1 r n ( x y n 1 ) μ 1 , ε 0 r n = μ ( x y n 1 ) + ε , x y n = ( x y n 1 ) μ ( x y n 1 ) ε = ( 1 μ ) ( x y n 1 ) ε x y n 2 = ( 1 μ ) ( x y n 1 ) ε 2 x y n 2 < ( 1 μ ) ( x y n 1 ) 2 + ε 2 x y n 2 < ε 0 ( 1 μ ) ( x y n 1 ) 2 = | 1 μ | ( x y n 1 ) 2 x y n 2 < μ 1 1 ( x y n 1 ) 2 = ( x y n 1 ) 2 x y N 2 < x y N 1 2 < < x y 2 2 < x y 1 2
where rn is close to (xyn−1) in the training phase in Equation (6) and Figure 4; rn is the summation of a scaled (xyn−1) and a small error; u is a scale coefficient which is approximate to 1; ε is an error vector which is approximate to 0; reconstruction error decreases when the total number of decoders increases.

3.3.2. Adversarial Cascade Decoders-Based Auto-Encoders

The architecture of adversarial cascade decoders-based auto-encoders (ACDAE) is displayed in Figure 5. The blue flow line represents the training phase, and the green flow represents both phases of training and testing. Each decoder is an adversarial module.
Figure 5. The architecture of adversarial cascade decoders-based auto-encoders.
The reconstruction data can be solved by the following optimization problem:
( θ ; z ; y 1 , , y N ; y ) = arg min E ; D max D C 1 , , D C N n = 1 N ( α n M ( ln ( D C n ( x ) ) ) + β n M ( ln ( 1 D C n ( y n ) ) ) ) s . t .   z = E ( x ) ; y 1 = D 1 ( z ) ; y n = D n ( y n 1 ) , n = 2 , , N ; C z ; n = 1 N y n x 2 2 < ε , n = 1 , , N ; y = y N
where the variables are described as follows:
  • DCn is the nth discriminator;
  • αn is a constant;
  • βn is a constant;
  • ε is a small positive constant;
  • M is the mean operator.
For the sake of gradually and serially training adversarial cascade decoders-based auto-encoders, the optimization problem in Equation (9) can be divided into the following sub optimization problems:
( θ 1 ; z ; y 1 ) = arg min E ; D 1 max D C 1 ( α 1 M ( ln ( D C 1 ( x ) ) ) + β 1 M ( ln ( 1 D C 1 ( y 1 ) ) ) ) ( θ 2 ; y 2 ) = arg min D 2 max D C 2 ( α 2 M ( ln ( D C 2 ( x ) ) ) + β 2 M ( ln ( 1 D C 2 ( y 2 ) ) ) ) . . . . . . ( θ N ; y N ; y ) = arg min D N max D C N ( α N M ( ln ( D C N ( x ) ) ) + β N M ( ln ( 1 D C N ( y N ) ) ) ) s . t .   z = E ( x ) ; y 1 = D 1 ( z ) ; y n = D n ( y n 1 ) , n = 2 , , N ; C z ; n = 1 N y n x 2 2 < ε , n = 1 , , N ; y = y N

3.3.3. Residual-Adversarial Cascade Decoders-Based Auto-Encoders

The framework of residual-adversarial cascade decoders-based auto-encoders (RACDAE) is shown in Figure 6. The blue signal line denotes the training phase, and the green signal line denotes both phases of training and testing. Each decoder is a residual-adversarial module.
Figure 6. The architecture of residual-adversarial cascade decoders-based auto-encoders.
The recovery data can be resolved by the following minimization-maximization problem:
( θ ; z ; r 1 , , r N ; y 1 , , y N ; y ) = arg min E ; D max D C 1 , , D C N n = 1 N ( α n M ( ln ( D C n ( x y n 1 ) ) ) + β n M ( ln ( 1 D C n ( r n ) ) ) ) s . t .   z = E ( x ) ; r 1 = D 1 ( z ) ; r n = D n ( y n 1 ) , n = 2 , , N ; C z ; n = 1 N x y n 1 r n 2 2 < ε , n = 1 , , N ; y 0 = 0 ; y n = y n 1 + r n , n = 1 , , N ; y = y N
For the purpose of gradually and serially training residual adversarial cascade decoders-based auto-encoders, the optimization problem in Equation (11) can be divided into the following suboptimization problems:
( θ 1 ; z ; r 1 ; y 1 ) = arg min E ; D 1 max D C 1 ( α 1 M ( ln ( D C 1 ( x y 0 ) ) ) + β 1 M ( ln ( 1 D C 1 ( r 1 ) ) ) ) ( θ 2 ; r 2 ; y 2 ) = arg min D 2 max D C 2 ( α 2 M ( ln ( D C 2 ( x y 1 ) ) ) + β 2 M ( ln ( 1 D C 2 ( r 2 ) ) ) ) . . . . . . ( θ N ; r N ; y N ; y ) = arg min D N max D C N ( α N M ( ln ( D C N ( x y N 1 ) ) ) + β N M ( ln ( 1 D C N ( r N ) ) ) ) s . t .   z = E ( x ) ; r 1 = D 1 ( z ) ; r n = D n ( y n 1 ) , n = 2 , , N ; C z ; n = 1 N x y n 1 r n 2 2 < ε , n = 1 , , N ; y 0 = 0 ; y n = y n 1 + r n , n = 1 , , N ; y = y N

3.4. Adversarial Auto-Encoders

3.4.1. Reminiscence of Classical Adversarial Auto-Encoders

The infrastructure of the classical adversarial auto-encoders is exhibited in Figure 7. The blue signal flow is for the training phase, and the green signal flow is for both phases of training and testing. AAE are the combination of auto-encoders and adversarial learning. Alireza Makhzani et al. proposed the AAE, utilized the encoder unit of auto-encoders as generator and added an independent discriminator, employed adversarial learning in latent space and let the hidden variable satisfy a given distribution, and finally achieved better performance in data reconstruction [7]. Compared with the classical auto-encoders, AAE infrastructure holds an extra discriminator, which makes the output of the encoder maximally approach a given distribution. The infrastructure can be expressed by the following equations:
Figure 7. The architecture of classical adversarial auto-encoders.
z = E ( x ) y = D ( z ) D C ( z h ) = 1 , D C ( z ) = 0
where the terms are defined as follows:
  • zh is the variable related to z which satisfies a given distribution;
  • DC is the discriminator.
The reestablishment data can be resolved by the following minimization-maximization problem:
( θ , z , y ) = arg min E , D max D C ( α M ( ln ( D C ( z h ) ) ) + β M ( ln ( 1 D C ( z ) ) ) ) s . t .   z = E ( x ) , y = D ( z ) , y x 2 2 < ε , C y

3.4.2. Proposed Cascade Decoders-Based Adversarial Auto-Encoders

The architecture of the proposed cascade decoders-based adversarial auto-encoders (CDAAE) is illustrated in Figure 8. The blue flow line represents the training phase, and the green flow line represents both phases of training and testing. Compared with the cascade decoders-based auto-encoders, the proposed architecture has an extra discriminator, which makes the output of the encoder maximally approximate to a known distribution. The architecture can be described by the following formulas:
Figure 8. The architecture of cascade decoders-based adversarial auto-encoders.
z = E ( x ) y n = { D 1 ( z ) , n = 1 D n ( y n 1 ) , n = 2 , , N D C ( z h ) = 1 , D C ( z ) = 0
The restoration data can be resolved by the following optimization problem:
( θ ; z ; y 1 , , y N ) = arg min E ; D 1 , , D N max D C ( α M ( ln ( D C ( z h ) ) ) + β M ( ln ( 1 D C ( z ) ) ) ) s . t .   z = E ( x ) ; y 1 = D 1 ( z ) ; y n = D n ( y n 1 ) , n = 2 , , N ; n = 1 N y n x 2 2 < ε ; C y n , n = 1 , , N
For the purpose of gradually and serially training cascade decoders-based adversarial auto-encoders, the optimization problem in Equation (16) can be partitioned into the following suboptimization problems:
( θ 1 ; z ; y 1 ) = arg min E ; D 1 max D C ( α M ( ln ( D C ( z h ) ) ) + β M ln ( 1 D C ( z ) ) ) ( θ 2 ; y 2 ) = arg min D 2 y 2 x 2 2 ( θ N ; y N ) = arg min D N y N x 2 2 s . t .   z = E ( x ) ; y 1 = D 1 ( z ) ; y n = D n ( y n 1 ) , n = 2 , , N ; y 1 x 2 2 < ε ; C y n , n = 1 , , N
The architecture in Figure 8 represents general cascade decoders-based adversarial auto-encoders (GCDAAE), and it can be easily be expanded to residual cascade decoders-based adversarial auto-encoders (RCDAAE), adversarial cascade decoders-based adversarial auto-encoders (ACDAAE), and residual-adversarial cascade decoders-based adversarial auto-encoders (RACDAAE).

3.5. Variational Auto-Encoders

3.5.1. Remembrance of Classical Variational Auto-Encoders

The framework of classical variational auto-encoders is shown in Figure 9 [4]. The blue signal line denotes the training phase, and the green signal line denotes both phases of training and testing. It can be resolved by the following optimization problem:
Figure 9. The architecture of classical variational auto-encoders.
( θ , z , y ) = arg min θ , z , y ( α KL ( q ( z h ) | | p ( z ) ) + β y x 2 2 ) = arg min θ , z , y ( α z h q ( z h ) log q ( z h ) p ( z ) + β y x 2 2 ) s . t .   z = E ( x ) , y = D ( z ) , C y
where the terms are defined as follows:
  • KL(·) is the Kullback–Leibler divergence;
  • q(zh) is the given distribution of zh;
  • p(z) is the distribution of z.

3.5.2. Proposed Cascade Decoders-Based Variational Auto-Encoders

The proposed infrastructure of cascade decoders-based variational auto-encoders is shown in Figure 10. The blue signal flow is for the training phase, and the green signal flow is for both phases of training and testing. It can be resolved by the following optimization problem:
( θ ; z ; y 1 , , y N ) = arg min θ ; z ; y 1 , , y N ( α KL ( q ( z h ) | | p ( z ) ) + β n = 1 N y n x 2 2 ) s . t .   z = E ( x ) ; y 1 = D 1 ( z ) ; y n = D n ( y n 1 ) , n = 2 , , N ; C y n , n = 1 , , N
Figure 10. The architecture of cascade decoders-based variational auto-encoders.
For the sake of gradually and serially training cascade decoders-based auto-encoders, the optimization problem in Equation (19) can be divided into the following suboptimization problems:
( θ 1 ; z ; y 1 ) = arg min θ 1 ; z ; y 1 ( α KL ( q ( z h ) | | p ( z ) ) + β y 1 x 2 2 ) ( θ 2 ; y 2 ) = arg min θ 2 ; y 2 y 2 x 2 2 ( θ N ; y N ) = arg min θ N ; y N y N x 2 2 s . t .   z = E ( x ) ; y 1 = D 1 ( z ) ; y n = D n ( y n 1 ) , n = 2 , , N ; C y n , n = 1 , , N
The infrastructure in Figure 10 represents general cascade decoders-based variational auto-encoders (GCDVAE), and it can be easily be extended to residual cascade decoders-based variational auto-encoders (RCDVAE), adversarial cascade decoders-based variational auto-encoders (ACDVAE), and residual-adversarial cascade decoders-based variational auto-encoders (RACDVAE).

3.6. Wasserstein Auto-Encoders

3.6.1. Recollection of Classical Wasserstein Auto-Encoders

The classical Wasserstein auto-encoders can be resolved by the following optimization problem [20]:
( θ , z , y ) = arg min θ , z , y ( α W z ( p ( z ) , q ( z h ) ) + β W y ( y , x ) ) s . t .   z = E ( x ) , y = D ( z ) , C y
where the variables are defined as follows:
  • Wz is the regularizer between distribution p(z) and q(zh);
  • Wy is the reconstruction cost.

3.6.2. Proposed Cascade Decoders-Based Wasserstein Auto-Encoders

The proposed cascade decoders-based Wasserstein auto-encoders can be resolved by the following optimization problem:
( θ ; z ; y 1 , , y N ) = arg min θ ; z ; y 1 , , y N ( α W z ( p ( z ) , q ( z h ) ) + β n = 1 N W y ( y n , x ) ) s . t .   z = E ( x ) ; y 1 = D 1 ( z ) ; y n = D n ( y n 1 ) , n = 2 , , N ; C y n , n = 1 , , N
For the purpose of gradually and serially training cascade decoders-based auto-encoders, the optimization problem in Equation (22) can be divided into the following suboptimization problems:
( θ 1 ; z ; y 1 ) = arg min θ 1 ; z ; y 1 ( α W z ( p ( z ) , q ( z h ) ) + β W y ( y 1 , x ) ) ( θ 2 ; y 2 ) = arg min θ 2 ; y 2 W y ( y 2 , x ) ( θ N ; y N ) = arg min θ N ; y N W y ( y N , x ) s . t .   z = E ( x ) ; y 1 = D 1 ( z ) ; y n = D n ( y n 1 ) , n = 2 , , N ; C y n , n = 1 , , N
The aforementioned architecture represents general cascade decoders-based Wasserstein auto-encoders (GCDWAE), and it can be easily be expanded to residual cascade decoders-based Wasserstein auto-encoders (RCDWAE), adversarial cascade decoders-based Wasserstein auto-encoders (ACDWAE), and residual-adversarial cascade decoders-based Wasserstein auto-encoders (RACDWAE).

3.7. Pseudocodes of Cascade Decoders-Based Auto-Encoders

The pseudo codes of the proposed cascade decoders-based auto-encoders are shown in Algorithm 1.
Algorithm 1: The pseudo codes of cascade decoders-based auto-encoders.
Input: x: the training data
      I: the total number of iteration
      N: the total number of sub minimization problem
Initialization: i =1
Training:
   While i <= I
      i++
      n = 1
      While n <= N
         n++
         resolve the nth sub problem in Equations (5), (7), (10), (12), (17), (20) or (23)
Output:
   θ: the parameters of deep neural networks
   z: the representations of hidden space
   y1, …, yN: the output of cascade decoders
   y: the output of the last decoder

4. Experiments

4.1. Experimental Data Sets

The purpose of the simulation experiments is to compare the data reconstruction performance of the proposed cascade decoders-based auto-encoders and the classical auto-encoders.
Four data sets are utilized to evaluate algorithm performance [29,30,31,32]. The mixed national institute of standards and technology (MNIST) data set has 10 classes of handwritten digit images [29]; the extending MNIST (EMNIST) data set holds 6 subcategories of handwritten digit and letter images [30]; the fashion-MNIST (FMNIST) data set possesses 10 classes of fashion product images [31]; and the medical MNIST (MMNIST) data set owns 10 subcategories of medical images [32]. The image size is 28 × 28. All color images are converted into gray images. In order to reduce the computational load, small resolution images and gray images are chosen. Certainly, if the computational capability is ensured, the proposed methods can be easily and directly utilized on large resolution images, the components of color images or their sub-patches. A large image can be divided into small patches. In traditional image compression methods, the size of image patch for compression is 8 × 8. Therefore, the proposed methods can be used for each image block. In brief, large image size will not degrade the performance of the proposed methods from the viewpoint of small image patches. For the convenience of training and testing deep neural networks, each pixel value is normalized from range [0, 255] to range [−1, +1] in the phase of pre-processing, and is re-scaled back to range [0, 255] in the phase of post-processing. The numbers of classes and samples in the four data sets are enumerated in Table 3. The sample images of the four data sets are illustrated in Figure 11. From top to bottom, there are images of MNIST digits, EMINST digits, EMNIST letters, FMNIST goods, MMNIST breast, chest, derma, optical coherence tomography (OCT), axial organ, coronal organ, sagittal organ, pathology, pneumonia, and retina.
Table 3. Class and sample numbers of experimental data sets.
Figure 11. Sample images of experimental data sets.

4.2. Experimental Conditions

The experimental software platform is MATLAB 2020b on Windows 10 or Linux. For the small data sets, MNIST, FMNIST, and MMNIST, the experimental hardware platform is a laptop with a 2.6 GHz dual-core processor and 8 GB memory; For the large data set, EMNIST, the experimental hardware platform is a super computer with high-speed GPUs and substantial memory.
The components of auto-encoders are made up of fully-connected (FC) layers, leaky rectified linear unit (LRELU) layers, hyperbolic tangent (Tanh) layers, Sigmoid layers, etc. In order to reduce the calculation complexity, the convolutional (CONV) layer is not utilized.
The composition of the encoder is shown in Figure 12, which consists of input, FC, LRELU, and hidden layers.
Figure 12. Composition of the encoder.
The constitution of the decoder is illustrated in Figure 13, which consists of input, FC, LRELU, Tanh and output layers. The input layer can be the hidden layer for the first decoder, and can be the output layer of the preceding decoder for the latter decoders. The dashed line shows the two situations.
Figure 13. Composition of the decoder.
The organization of the discriminator is demonstrated in Figure 14, which comprises input, FC, LRELU, Sigmoid, and output layers. The input layer can be the hidden layer and can be the output of each decoder. The dashed line indicates the two cases.
Figure 14. Composition of the discriminator.
The deep learning parameters, such as image size, latent dimension, decoder number, batch size, learning rate, and iteration epoch, are summarized in Table 4.
Table 4. The deep learning parameters.

4.3. Experimental Results

The experimental results of the proposed and classical algorithms on the MNIST data set, EMNIST data set, FMNIST data set, and MMNIST data set are respectively shown in Table 5, Table 6, Table 7 and Table 8. SSIM is the average structure similarity between reconstruction images and original images. ΔSIMM is the average SSIM difference between the proposed approaches and the conventional AE approach. The experimental results are also displayed in Figure 15, where the horizontal coordinate is data sets and the vertical coordinate is ΔSIMM.
Table 5. Experimental results on the MNIST data set.
Table 6. Experimental results on the EMNIST data set.
Table 7. Experimental results on the FMNIST data set.
Table 8. Experimental results on the MMNIST data set.
Figure 15. Experimental results of different algorithms on the four data sets.
It can be found in Table 5, Table 6, Table 7 and Table 8 and Figure 15 that the proposed methods, except for ACDAE and ACDAAE, are superior to the classical AE and AAE methods in the performance of image reconstruction. Therefore, it proves the correctness and effectiveness of the proposed cascade decoders-based auto-encoders for image reconstruction.
It can also be discovered in Table 5, Table 6, Table 7 and Table 8 and Figure 15 that the proposed RCDAE and RACDAAE algorithms post the best recovery performance across nearly all four data sets. Hence, residual learning is very suitable for image recovery. This is owing to the fact that the residual has a smaller average and variance than the original image, which is beneficial for the deep neural network to learn relationships between the input and output.
It can further be observed in Table 5, Table 6, Table 7 and Table 8 and Figure 15 that the proposed ACDAE and ACDAAE algorithms yield some minus ΔSIMM across the four data sets. Thus, in line with the predictions of this paper, pure adversarial learning is unsuitable for image re-establishment. However, a combination of residual learning and adversarial learning, such as the aforementioned RACDAAE, can obtain high re-establishment performance.
It can additionally be found in Table 5, Table 6, Table 7 and Table 8 and Figure 15 that the AAE algorithm possesses some minus ΔSIMM across the four data sets. Therefore, consistent with the circumstances of this article, pure AAE cannot outperform AE in image reconstitution. Nevertheless, a combination of residual learning and adversarial learning, such as aforementioned RACDAAE, can produce high reconstitution performance.
Finally, it can be found in Table 5, Table 6, Table 7 and Table 8 and Figure 15 that SSIM differences for MMNIST-axial and MMNIST-sagittal are substantially higher than for other data sets. The reason for this may be that these training and testing samples are more similar than other data sets.
In order to clearly compare the reconstruction performance between the proposed algorithms and the classical algorithms, the reconstruction images are illustrated in Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, Figure 21, Figure 22, Figure 23, Figure 24 and Figure 25. Since the proposed RCDAE algorithm owns the best performance, it is taken as an example.
Figure 16. Reconstruction images on the MNIST data set. For each subfigure, the top row shows the original images, the middle row shows the recovery images of AE, and the bottom row shows the recovery images of RCDAE.
Figure 17. Marked reconstruction images on the MNIST data set.
Figure 18. Reconstruction image on the EMNIST data set (big letters). For each subfigure, the top row displays the original images, the middle row displays the recovery images of AE, and the bottom row displays the recovery images of RCDAE.
Figure 19. Marked reconstruction images on the EMNIST data set (big letters).
Figure 20. Reconstruction images on the EMNIST data set (small letters). For each subfigure, the top row exhibits the original images, the middle row exhibits the recovery images of AE, and the bottom row exhibits the recovery images of RCDAE.
Figure 21. Marked reconstruction images on the EMNIST data set (small letters).
Figure 22. Reconstruction images on the FMNIST data set. For each subfigure, the top row demonstrates the original images, the middle row demonstrates the recovery images of AE, and the bottom row demonstrates the recovery images of RCDAE.
Figure 23. Marked reconstruction images on the FMNIST data set.
Figure 24. Reconstruction images on the MMNIST data set. For each subfigure, the top row reveals the original images, the middle row reveals the recovery images of AE, and the bottom row reveals the recovery images of RCDAE.
Figure 25. Marked reconstruction images on the MMNIST data set.
The recovery images on the MNIST data set are shown in Figure 16. For each subfigure in Figure 16, the top row shows the original images, the middle row shows the recovery images of AE, and the bottom row shows the recovery images of RCDAE. It is not easy to find the SSIM differences between AE and RCDAE in Figure 16. Therefore, the marked re-establishment images on the MNIST data set are illustrated in Figure 17. The left column shows the original images, the middle column shows the re-establishment images of AE, and the right column shows the re-establishment images of RCDAE. It is easy to notice the SSIM differences between AE and RCDAE in the red marked squares in Figure 17.
Similarly, the reconstitution images on the EMNIST data set (big letters) are demonstrated in Figure 18; the marked reconstitution images on the EMNIST data set (big letters) are demonstrated in Figure 19. The rebuilding images on the EMNIST data set (small letters) are displayed in Figure 20; the marked rebuilding images on the EMNIST data set (small letters) are displayed in Figure 21. The reconstruction images on the FMNIST data set are shown in Figure 22; the marked reconstruction images on the FMNIST data set are shown in Figure 23. The recovery images on the MMNIST data set are displayed in Figure 24; the marked recovery images on the MMNIST data set are displayed in Figure 25.
It is revealed from Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, Figure 21, Figure 22, Figure 23, Figure 24 and Figure 25 that the proposed algorithms achieve significant improvements in re-establishment performance on the MNST and EMNIST data sets. It is also manifested in Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, Figure 21, Figure 22, Figure 23, Figure 24 and Figure 25 that the proposed methods merely obtain unobvious promotion of re-establishment performance on the FMNIST and MMIST data sets. For instance, in the first row of Figure 25, the difference between the proposed and classical methods can only be found after enlarging the images; in the eighth row of Figure 25, conspicuous differences still cannot be found even after enlarging the images. Nevertheless, both of them are the true experimental results, which should be accepted and explained. The lack of differences in these results is attributed to four reasons. The first reason is that the quality of the original images is low on the FMNIST and MMNIST data sets. The second reason is that only the illumination component of original color images on part of the MMNIST data sets is reserved. The reconstruction performance will be improved if the original color images are utilized. The third reason is that the dimension of latent space is 30. It a very low choice compared with 784 (28 × 28), the dimensions of the original image. The fourth reason is that the convolutional layer is not utilized in the architecture of auto-encoders. For the purpose of decreasing the computational load, the convolutional layer was not adopted in the proposed approaches. The convolutional layer can effectively extract image features and reconstruct the original image, and is expected to further improve the reconstruction performance of the proposed approaches; this will be investigated in our future research.

5. Conclusions

This paper proposes cascade decoders-based auto-encoders for image reconstruction. They comprise the architecture of multi-level decoders and related optimization problems and training algorithms. This article concentrates on the classical AE and AAE, as well as their serial decoders-based versions. Residual learning and adversarial learning are contained in the proposed approaches. The effectiveness of cascade decoders for image reconstruction is demonstrated in mathematics. It is evaluated based on the experimental results on four open data sets that the proposed cascade decoders-based auto-encoders are superior to classical auto-encoders in the performance of image reconstruction. In particular, residual learning is well suited for image reconstruction.
In our future research, experiments on data sets with large resolution images and colorful images will be conducted. Experiments on other advanced auto-encoders, such as VAE and WAE, will also be explored. The convolutional layer or transformer layer will be introduced into the proposed algorithms. The constraints on high-dimensional reconstruction data, such as sparse and low-rank priors, will be utilized to advance the reconstruction performance of auto-encoders. Generalized auto-encoders-based data compression and signal-compressed sensing will also be probed. The auto-encoders-based lossless reconstruction will further be studied.

6. Patents

The patent with application number CN202110934815.7 and publication number CN113642709A results from the research reported in this manuscript.

Author Contributions

Conceptualization, H.L. and M.T.; methodology, H.L. and D.G.; writing, H.L.; supervision, M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would very much like to thank Yui Chun Leung for the collection of MATLAB implementations of Generative Adversarial Networks (GANs) on the GitHub website (https://github.com/zcemycl/Matlab-GAN, accessed on 26 November 2020). We took advantage of the AAE codes and halved the dimension of AAE latent space.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhang, Q.; Yang, L.T.; Chen, Z.; Li, P. A survey on deep learning for big data. Inf. Fusion 2018, 42, 146–157. [Google Scholar] [CrossRef]
  2. Grant, W.H.; Wei, Y. A survey of deep learning: Platforms, applications and emerging research trends. IEEE Access 2018, 6, 24411–24432. [Google Scholar]
  3. Dong, G.G.; Liao, G.S.; Liu, H.W.; Kuang, G.Y. A Review of the autoencoder and its variants: A comparative perspective from target recognition in synthetic-aperture radar images. IEEE Geosci. Remote Sens. Mag. 2018, 6, 44–68. [Google Scholar] [CrossRef]
  4. Doersch, C. Tutorial on variational autoencoders. arXiv 2016, arXiv:1606.05908. [Google Scholar]
  5. Kingma, P.D.; Welling, M. Auto-encoding variational Bayes. arXiv 2014, arXiv:1312.6114. [Google Scholar]
  6. Angshul, M. Blind denoising autoencoder. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 312–317. [Google Scholar]
  7. Wu, T.; Zhao, W.F.; Keefer, E.; Zhi, Y. Deep compressive autoencoder for action potential compression in large-scale neural recording. J. Neural Eng. 2018, 15, 066019. [Google Scholar] [CrossRef] [PubMed]
  8. Anupriya, G.; Angshul, M.; Rabab, W. Semi-supervised stacked label consistent autoencoder for reconstruction and analysis of biomedical signals. IEEE Trans. Biomed. Eng. 2017, 64, 2196–2205. [Google Scholar]
  9. Toderici, G.; O’Malley, M.S.; Hwang, S.J.; Vincent, D.; Minnen, D.; Baluja, S.; Covell, M.; Sukthankar, R. Variable rate image compression with recurrent neural networks. In Proceedings of the 4th International Conference of Learning Representations (ICLR2016), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
  10. Rippel, O.; Nair, S.; Lew, C.; Branson, S.; Anderson, G.A.; Bourdevar, L. Learned video compression. arXiv 2018, arXiv:1811.06981. [Google Scholar]
  11. Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I.; Frey, B. Adversarial autoencoders. arXiv 2016, arXiv:1511.05644v2. [Google Scholar]
  12. Ozal, Y.; Ru, T.S.; Rajendra, U.A. An efficient compression of ECG signals using deep convolutional autoencoders. Cogn. Syst. Res. 2018, 52, 198–211. [Google Scholar]
  13. Jonathan, R.; Jonathan, P.O.; Alan, A.G. Quantum autoencoders for efficient compression of quantum data. Quantum Sci. Technol. 2017, 2, 045001. [Google Scholar]
  14. Han, T.; Hao, K.R.; Ding, Y.S.; Tang, X.S. A sparse autoencoder compressed sensing method for acquiring the pressure array information of clothing. Neurocomputing 2018, 275, 1500–1510. [Google Scholar] [CrossRef]
  15. Tolstikhin, I.; Bousquet, O.; Gelly, S.; Schoelkopf, B. Wasserstein auto-encoders. In Proceedings of the 6th International Conference of Learning Representations (ICLR2018), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  16. Angshul, M. Graph structured autoencoder. Neural Netw. 2018, 106, 271–280. [Google Scholar]
  17. Li, H.G. Deep linear autoencoder and patch clustering based unified 1D coding of image and video. J. Electron. Imaging 2017, 26, 053016. [Google Scholar] [CrossRef]
  18. Li, H.G.; Trocan, M. Deep residual learning-based reconstruction of stacked autoencoder representation. In Proceedings of the 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS2018), Bordeaux, France, 9–12 December 2018. [Google Scholar]
  19. Perera, P.L.; Mo, B. Ship performance and navigation data compression and communication under autoencoder system architecture. J. Ocean Eng. Sci. 2018, 3, 133–143. [Google Scholar] [CrossRef]
  20. Sun, B.; Feng, H. Efficient compressed sensing for wireless neural recording: A deep learning approach. IEEE Signal Proc. Lett. 2017, 24, 863–867. [Google Scholar] [CrossRef]
  21. Angshul, M. An autoencoder based formulation for compressed sensing reconstruction. Magn. Reson. Imaging 2018, 52, 62–68. [Google Scholar]
  22. Yang, Y.M.; Wu, Q.M.J.; Wang, Y.N. Autoencoder with invertible functions for dimension reduction and image reconstruction. IEEE Trans. Syst. Man Cybern. Syst. 2018, 48, 1065–1079. [Google Scholar] [CrossRef]
  23. Majid, S.; Fardin, A.M. A deep learning-based compression algorithm for 9-DOF inertial measurement unit signals along with an error compensating mechanism. IEEE Sens. J. 2019, 19, 632–640. [Google Scholar]
  24. Cho, S.H. A technical analysis on deep learning based image and video compression. J. Broadcast Eng. 2018, 23, 383–394. [Google Scholar]
  25. Li, J.H.; Li, B.; Xu, J.Z.; Xiong, R.Q.; Gao, W. Fully connected network-based intra prediction for image coding. IEEE Trans. Image Proc. 2018, 27, 3236–3247. [Google Scholar] [CrossRef] [PubMed]
  26. Lu, G.; Ouyang, W.L.; Xu, D.; Zhang, X.Y.; Cai, C.L.; Gao, Z.Y. DVC: An end-to-end deep video compression framework. arXiv 2018, arXiv:1812.00101. [Google Scholar]
  27. Cui, W.X.; Jiang, F.; Gao, X.W.; Tao, W.; Zhao, D.B. Deep neural network based sparse measurement matrix for image compressed sensing. arXiv 2018, arXiv:1806.07026. [Google Scholar]
  28. Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. In Proceedings of the 6th International Conference of Learning Representations (ICLR2018), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  29. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  30. Gregory, C.; Saeed, A.; Jonathan, T.; Andre, S.V. EMNIST: Extending MNIST to handwritten letters. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 2921–2926. [Google Scholar]
  31. Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar]
  32. Yang, J.C.; Shi, R.; Ni, B.B. MedMNIST classification decathlon: A lightweight AutoML benchmark for medical image analysis. arXiv 2020, arXiv:2010.14925. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.