Serial Decoders-Based Auto-Encoders for Image Reconstruction

Auto-encoders are composed of coding and decoding units; hence, they hold an inherent potential of being used for high-performance data compression and signal-compressed sensing. The main disadvantages of current auto-encoders comprise the following aspects: the research objective is not to achieve lossless data reconstruction but efficient feature representation; the evaluation of data recovery performance is neglected; it is difficult to achieve lossless data reconstruction using pure auto-encoders, even with pure deep learning. This paper aims at performing image reconstruction using auto-encoders, employs cascade decoders-based auto-encoders, perfects the performance of image reconstruction, approaches gradually lossless image recovery, and provides a solid theoretical and applicational basis for auto-encoders-based image compression and compressed sensing. The proposed serial decoders-based auto-encoders include the architectures of multi-level decoders and their related progressive optimization sub-problems. The cascade decoders consist of general decoders, residual decoders, adversarial decoders, and their combinations. The effectiveness of residual cascade decoders for image reconstruction is proven in mathematics. Progressive training can efficiently enhance the quality, stability, and variation of image reconstruction. It has been shown by the experimental results that the proposed auto-encoders outperform classical auto-encoders in the performance of image reconstruction.


Introduction
Since deep learning achieves the rules and features from input data using multilayer stacked neural networks in a highly efficient manner, it has garnered unprecedented successful research and applications in the domains of data classification, recognition, compression, and processing [1,2]. Although the theoretical research and engineering applications of deep learning have matured, there is still much room to improve, and deep learning has not yet attained the requirements for general artificial intelligence [1,2]. Hence, it is incumbent on researchers to utilize deep learning to upgrade the performance of data compression and signal-compressed sensing.
Data reconstruction is the foundation of data compression and signal-compressed sensing. It contains multifarious meanings in understanding from a broad sense. In this paper, data reconstruction denotes high-dimensional original data being initially mapped into a low-dimensional space and then being recovered. Although the classical methods of data compression and signal-compressed sensing are full-blown, it is still necessary Table 1. Some of the theoretical frameworks of traditional auto-encoders.

Auto-Encoders References
Variational auto-encoders [4] Adversarial auto-encoders [11] Convolutional auto-encoders [12] Quantum auto-encoders [13] Sparse auto-encoders [14] Wasserstein auto-encoders [15] Graphical auto-encoders [16] However, current research in auto-encoders exhibits the following problems: the research objective is not to achieve lossless data reconstruction but efficient feature representation; independent evaluation of the performance of data reconstruction is neglected; the performance of data reconstruction needs to be improved; it is difficult to attain lossless data reconstruction [17,18]. For instance, the performance of data reconstruction using AAE, one of the most advanced auto-encoders, is shown in Figure 1 [11]. The horizontal axis is the dimension of the latent space and the vertical axis is the average structural similarity (SSIM) of reconstructed images in comparison with original images. It is indicated in Figure 1 that the performance in data reconstruction of AAE increases while the dimension of hidden space increases, making it difficult to achieve lossless data reconstruction. Currently, pure deep learning-based methods of data compression and signal-compressed sensing cannot attain lossless data reconstruction. This manuscript attempts to regard lossless data reconstruction as a research goal of auto-encoders, and independently assesses the performance of data reconstruction of auto-encoders, enhances the quality of data reconstruction of auto-encoders, gradually approaches lossless data reconstruction of auto-encoders, and builds a solid theoretical and applicational foundation of data compression and signal-compressed sensing for auto-encoders.
This article proposes serial decoders-based auto-encoders for image reconstruction. The main contribution of this paper is to introduce cascade decoders into auto-encoders, including theoretical architectures and optimization problems. The optimization problems are divided into sequential sub-problems in order to progressively train the deep neural networks. Progressive training can efficiently improve the quality, stability, and variation of image reconstruction. The components of serial decoders consist of general decoders, residual decoders, adversarial decoders, and their combinations. The effectiveness of residual serial decoders for image reconstruction is proven in mathematics. Since AAE, VAE, and WAE are state-of-the-art auto-encoders, this article focuses on their cascade decoders-based versions.
The rest of this article is organized as follows: the related research is summarized in Section 2, theoretical foundations are established in Section 3, simulation experiments are designed in Section 4, and final conclusions are drawn in Section 5.

Related Research
Narrow auto-encoders-based data compression and signal compressing have progressed rapidly [7,8,[12][13][14][19][20][21]. Firstly, auto-encoders have been studied and applied in the compression of medical signals, navigation data, and quantum states [7,12,13,19]. For example, Wu Tong et al. proposed an auto-encoders-based compression method of brain neural signals [7]. Yildirim Ozal et al. utilized convolutional auto-encoders to compress electrocardio signals [12]. Lokukaluge P. Perera et al. employed linear auto-encoders to compress navigation data [19]. Romero Jonathan et al. used quantum auto-encoders to compress quantum states [13]. Secondly, auto-encoders have already been studied and applied in the compressed sensing of biomedical signals, images, and sensor data [8,14,20,21]. For instance, Gogna Anupriya et al. utilized stacked and label-consistent auto-encoders to reconstruct electrocardio signals and electroencephalograms [8]. Biao This manuscript attempts to regard lossless data reconstruction as a research goal of auto-encoders, and independently assesses the performance of data reconstruction of auto-encoders, enhances the quality of data reconstruction of auto-encoders, gradually approaches lossless data reconstruction of auto-encoders, and builds a solid theoretical and applicational foundation of data compression and signal-compressed sensing for auto-encoders.
This article proposes serial decoders-based auto-encoders for image reconstruction. The main contribution of this paper is to introduce cascade decoders into auto-encoders, including theoretical architectures and optimization problems. The optimization problems are divided into sequential sub-problems in order to progressively train the deep neural networks. Progressive training can efficiently improve the quality, stability, and variation of image reconstruction. The components of serial decoders consist of general decoders, residual decoders, adversarial decoders, and their combinations. The effectiveness of residual serial decoders for image reconstruction is proven in mathematics. Since AAE, VAE, and WAE are state-of-the-art auto-encoders, this article focuses on their cascade decoders-based versions.
The rest of this article is organized as follows: the related research is summarized in Section 2, theoretical foundations are established in Section 3, simulation experiments are designed in Section 4, and final conclusions are drawn in Section 5.
Important developments in narrow auto-encoders also include the following: the Wasserstein auto-encoder, the inverse function auto-encoder, and graphical auto-encoders [15,16,22]. For example, Ilya Tolstikhin et al. raised Wasserstein auto-encoders, which are generalized adversarial auto-encoders, and utilized the Wasserstein distance to measure the difference between data model distribution and target distribution, in order to gain better performance in data reconstruction than classical variational auto-encoders and adversarial auto-encoders [15]. Yimin Yang et al. employed inverse activation function and pseudo inverse matrix to achieve the analysis representation of the network parameters of autoencoders for dimensional reduction and reconstruction of image data, and hence improve the data reconstruction performance of auto-encoders [22]. Majumdar Angshul presented graphical auto-encoders, used graphical regularization for data de-noising, clustering and classification, and consequently attained better data reconstruction performance than classical auto-encoders [16].
Generalized auto-encoders-based data compression and signal-compressed sensing have also achieved significant evolution [9,10,23]. These methods usually utilize multi-level auto-encoders to overcome the disadvantage that single-level auto-encoders have in being unable to achieve lossless data reconstruction; these methods use auto-encoders to replace one unit of the classical data compression and signal-compressed sensing model, such as the prediction, transformation, or quantization unit of data compression, as well as the measurement or recovery unit of signal-compressed sensing. For instance, George Toderici et al. applied two-level auto-encoders for image compression. The first-level autoencoders compress image blocks, and the second-level auto-encoders compress the recovery residuals of the first-level auto-encoders. This approach makes up for the disadvantage that single-level auto-encoders have in being unable to implement lossless data reconstruction to a great degree [9]. Oren Rippel et al. adopted multi-level auto-encoders to implement the transformation coding unit of video compression. The first-level auto-encoders compress the prediction residuals, and the next-level auto-encoders compress the reconstruction residuals of the previous-level auto-encoders to a great extent [10]. Majid Sepahvand et al. employed auto-encoders to implement the prediction coding unit of compressed sensing of sensor signals [23].
The main research advances in generalized auto-encoders also comprise the use of other architectures of deep neural networks to implement data compression and signalcompressed sensing [24][25][26][27]. In data compression, these methods usually wield deep neural networks to substitute the prediction, transformation, or quantization units of classical methods. In signal-compressed sensing, these methods usually implement deep neural networks to substitute the measurement or recovery units of classical methods. For example, Jiahao Li et al. utilized fully-connected deep neural networks to realize the intra prediction coding unit of video compression [25]. Guo Lu et al. adopted deep convolutional neural networks to replace the transformation coding unit of video compression [26]. Wenxue Cui et al. employed a deep convolutional neural network to accomplish the sampling and reconstruction units of image-compressed sensing [27].
This paper focuses on narrow auto-encoders, incorporates multi-level decoders into auto-encoders, and boosts the performance of data reconstruction. To the best of our knowledge, cascade decoders in auto-encoders have never been studied. Although serial auto-encoders have already been investigated, serial decoders in auto-encoders play a more important role in data reconstruction. In addition, Tero Karras et al. progressively trained generative adversarial networks by gradually increasing the layer numbers of generator and discriminator in order to improve the quality, stability, and variability in data reconstruction [28]. This method will be borrowed for progressively training the proposed cascade decoders-based auto-encoders. The proposed training method gradually increases the decoders of auto-encoders. It is difficult for us to train stable auto-encoders using multiple decoders and large hypo-parameters. However, it is easier for us to train a stable unit of auto-encoders using a single decoder and small hypo-parameters. A decoder can merely learn low image variation, but serial decoders can learn high image variation. Hence, progressive training can efficiently strengthen the quality, stability, and variability of image reconstruction.

Notations and Abbreviations
For the convenience of content description, parts of the mathematical notations and abbreviations adopted in this manuscript are listed in Table 2.

Recall of Classical Auto-Encoders
The architecture of classical auto-encoders is illustrated in Figure 2. Classical autoencoders are composed of two units: encoder and decoder. The encoder reduces the highdimensional input data to a low-dimensional representation, and the decoder reconstructs the high-dimensional data from the low-dimensional representation. The classical autoencoder can be described by the following formulas: using multiple decoders and large hypo-parameters. However, it is easier for us to train a stable unit of auto-encoders using a single decoder and small hypo-parameters. A decoder can merely learn low image variation, but serial decoders can learn high image variation. Hence, progressive training can efficiently strengthen the quality, stability, and variability of image reconstruction.

Notations and Abbreviations
For the convenience of content description, parts of the mathematical notations and abbreviations adopted in this manuscript are listed in Table 2.

Recall of Classical Auto-Encoders
The architecture of classical auto-encoders is illustrated in Figure 2. Classical autoencoders are composed of two units: encoder and decoder. The encoder reduces the highdimensional input data to a low-dimensional representation, and the decoder reconstructs the high-dimensional data from the low-dimensional representation. The classical autoencoder can be described by the following formulas: where the terms are defined as follows: x is the high-dimensional input data. Taking image data as an example, x is the normalized version of original image for the convenience of numerical computation; each element of the original image is an integer in the range [0, 255]; each element of x is a real number in the range [0, 1] or [−1, +1]; x with elements in the range [0, 1] can be understood as probability variables; x can also be regarded as a vector which is a reshaping version of an image matrix.
z is the low-dimensional representation in a latent space. y is the high-dimensional data, such as a reconstruction image. E is the encoder. D is the decoder. H is the dimension of x or y; for image data, H is equal to the product of image width and height. L is the dimension of z and L is far less than H.
The classical auto-encoders can be resolved by the following optimization problem: (θ, z, y) = argmin θ,y,z where the terms are defined as follows: θ are the parameters of auto-encoders, including the parameters of the encoder and decoder. C z is the constraint on low-dimensional representation z; for example, z satisfies a given probability distribution; it has been considered to match a known distribution by classical adversarial auto-encoders, variational auto-encoders, and Wasserstein auto-encoders. z g is a related variable which meets a given distribution. δ z is a small constant. C y is the constraint on high-dimensional reconstruction data y; for instance, y meets a prior of local smoothness or non-local similarity. Auto-encoders require y to reconstruct x based on the prior to a great extent; other constraints, such as sparsity and low-rank properties of high-dimensional reconstruction data can also be utilized; hereby, sparse prior is taken as an example. D y is a matrix of sparse dictionary. s y is a vector of sparse coefficients. λ y is a small constant. δ y is a small constant.

Proposed Cascade of Decoders-Based Auto-Encoders
The framework of the proposed cascade decoders-based auto-encoders (CDAE) is exhibited in Figure 3. The framework consists of two components: encoder and cascade decoders. The encoder is similar to that in the classical auto-encoder. Cascade decoders comprise N serial decoders, from decoder 1 to N. The framework can be depicted by the following expressions: where the terms are defined as follows: Dn is the nth decoder; yn is the reconstruction data of Dn.
The reconstruction data can be solved by the following optimization problem: where the terms are defined as follows: D n is the nth decoder; Appl. Sci. 2022, 12, 8256 7 of 33 y n is the reconstruction data of D n .
The reconstruction data can be solved by the following optimization problem: (θ; z; y 1 , · · · , y N ; y) = argmin θ;z;y 1 ,··· ,y N ;y where the terms are defined as follows: θ are the parameters of cascade decoders-based auto-encoders; C yn is the constraint on y n .
For the purpose of gradually and serially training cascade decoders-based autoencoders, the optimization problem in Equation (4) can be divided into the following suboptimization problems: where the terms are defined as follows: θ 1 are the parameters of encoder and decoder 1; θ 2 , . . . , and θ N are the parameters of decoder 2 to N.
The proposed cascade decoders include general cascade decoders, residual cascade decoders, adversarial cascade decoders and their combinations. The general cascade decoders-based auto-encoders (GCDAE) have already been introduced in Figure 3. The other cascade decoders are elaborated in the following sections.

Residual Cascade Decoders-Based Auto-Encoders
The infrastructure of residual cascade decoders-based auto-encoders (RCDAE) is demonstrated in Figure 4. The blue signal flow is for the training phase, and the green signal flow is for both phases of training and testing. Each decoder is a residual module. This architecture is different from the traditional residual network (ResNet) because the former has an extra training channel for residual computation.
The reconstruction data can be resolved by the following optimization problem: (θ; z; r 1 , · · · , r N ; y 1 , · · · , y N ; y) = argmin θ;z;y 1 ,··· ,y N ;y s.t. z = E(x); r 1 = D 1 (z); r n = D n y n−1 , n = 2, . . . , N; C z ; C y n , n = 1, . . . , N; y 0 = 0; y n = r n + y n−1 , n = 1, . . . , N; y = y N (6) where the variables are defined as follows: r n is the residual sample between x and y n ; y 0 is the zero sample; y is the final reconstruction sample. The infrastructure of residual cascade decoders-based auto-encoders (RCDAE) is demonstrated in Figure 4. The blue signal flow is for the training phase, and the green signal flow is for both phases of training and testing. Each decoder is a residual module. This architecture is different from the traditional residual network (ResNet) because the former has an extra training channel for residual computation.
where the variables are defined as follows: rn is the residual sample between x and yn; y0 is the zero sample; y is the final reconstruction sample.
For the purpose of gradually and serially training residual cascade decoders-based auto-encoders, the optimization problem in Equation (6) can be partitioned into the following suboptimization problems: The effectiveness of the residual cascade that decodes for image reconstruction can be proven as follows: For the purpose of gradually and serially training residual cascade decoders-based auto-encoders, the optimization problem in Equation (6) can be partitioned into the following suboptimization problems: (θ 1 ; z; r 1 ; y 1 ) = argmin · · · · · · (θ N ; r N ; y N ; y) = argmin θ N ;r N ;y N :y s.t. z = E(x); r 1 = D 1 (z); r n = D n y n−1 , n = 2, . . . , N; C z ; C y n , n = 1, . . . , N; y 0 = 0; y n = r n + y n−1 , n = 1, . . . , N; y = y N The effectiveness of the residual cascade that decodes for image reconstruction can be proven as follows: where r n is close to (x − y n−1 ) in the training phase in Equation (6) and Figure 4; r n is the summation of a scaled (x − y n−1 ) and a small error; u is a scale coefficient which is approximate to 1; ε is an error vector which is approximate to 0; reconstruction error decreases when the total number of decoders increases.

Adversarial Cascade Decoders-Based Auto-Encoders
The architecture of adversarial cascade decoders-based auto-encoders (ACDAE) is displayed in Figure 5. The blue flow line represents the training phase, and the green flow represents both phases of training and testing. Each decoder is an adversarial module.
x y x y x y x y where rn is close to (x − yn−1) in the training phase in Equation (6) and Figure 4; rn is the summation of a scaled (x − yn−1) and a small error; u is a scale coefficient which is approximate to 1; ε is an error vector which is approximate to 0; reconstruction error decreases when the total number of decoders increases.

Adversarial Cascade Decoders-Based Auto-encoders
The architecture of adversarial cascade decoders-based auto-encoders (ACDAE) is displayed in Figure 5. The blue flow line represents the training phase, and the green flow represents both phases of training and testing. Each decoder is an adversarial module. The reconstruction data can be solved by the following optimization problem: The reconstruction data can be solved by the following optimization problem: (θ; z; y 1 , · · · , y N ; y) = argmin E;D max DC 1 ,··· ,DC N N ∑ n=1 (α n M(ln(DC n (x))) + β n M(ln(1 − DC n (y n )))) s.t. z = E(x); y 1 = D 1 (z); y n = D n y n−1 , n = 2, . . . , N; where the variables are described as follows: DC n is the nth discriminator; α n is a constant; β n is a constant; ε is a small positive constant; M is the mean operator.

Residual-Adversarial Cascade Decoders-Based Auto-Encoders
The framework of residual-adversarial cascade decoders-based auto-encoders (RAC-DAE) is shown in Figure 6. The blue signal line denotes the training phase, and the green signal line denotes both phases of training and testing. Each decoder is a residualadversarial module. The recovery data can be resolved by the following minimization-maximization problem: For the purpose of gradually and serially training residual adversarial cascade decoders-based auto-encoders, the optimization problem in Equation (11) can be divided into the following suboptimization problems:

Reminiscence of Classical Adversarial Auto-Encoders
The infrastructure of the classical adversarial auto-encoders is exhibited in Figure 7. The blue signal flow is for the training phase, and the green signal flow is for both phases of training and testing. AAE are the combination of auto-encoders and adversarial learning. Alireza Makhzani et al. proposed the AAE, utilized the encoder unit of auto-encoders as generator and added an independent discriminator, employed adversarial learning in latent space and let the hidden variable satisfy a given distribution, and finally achieved better performance in data reconstruction [7]. Compared with the classical auto-encoders, AAE infrastructure holds an extra discriminator, which makes the output of the encoder maximally approach a given distribution. The infrastructure can be expressed by the following equations: Appl. Sci. 2022, 12, x FOR PEER REVIEW 12 of 37 latent space and let the hidden variable satisfy a given distribution, and finally achieved better performance in data reconstruction [7]. Compared with the classical auto-encoders, AAE infrastructure holds an extra discriminator, which makes the output of the encoder maximally approach a given distribution. The infrastructure can be expressed by the following equations: where the terms are defined as follows: zh is the variable related to z which satisfies a given distribution; DC is the discriminator.
The reestablishment data can be resolved by the following minimization-maximization problem:

Proposed Cascade Decoders-Based Adversarial Auto-Encoders
The architecture of the proposed cascade decoders-based adversarial auto-encoders (CDAAE) is illustrated in Figure 8. The blue flow line represents the training phase, and the green flow line represents both phases of training and testing. Compared with the cascade decoders-based auto-encoders, the proposed architecture has an extra discriminator, which makes the output of the encoder maximally approximate to a known distribution. The architecture can be described by the following formulas: where the terms are defined as follows: z h is the variable related to z which satisfies a given distribution; DC is the discriminator.

Remembrance of Classical Variational Auto-Encoders
The framework of classical variational auto-encoders is shown in Figure 9 [4]. The blue signal line denotes the training phase, and the green signal line denotes both phases of training and testing. It can be resolved by the following optimization problem:

Remembrance of Classical Variational Auto-encoders
The framework of classical variational auto-encoders is shown in Figure 9 [4]. The blue signal line denotes the training phase, and the green signal line denotes both phases of training and testing. It can be resolved by the following optimization problem: where the terms are defined as follows: KL(·) is the Kullback-Leibler divergence; q(zh) is the given distribution of zh; p(z) is the distribution of z.

Proposed Cascade Decoders-Based Variational Auto-encoders
The proposed infrastructure of cascade decoders-based variational auto-encoders is shown in Figure 10. The blue signal flow is for the training phase, and the green signal flow is for both phases of training and testing. It can be resolved by the following optimization problem: For the sake of gradually and serially training cascade decoders-based auto-encoders, the optimization problem in Equation (19) can be divided into the following suboptimization problems: (θ, z, y) = argmin θ,z,y where the terms are defined as follows: KL(·) is the Kullback-Leibler divergence; q(z h ) is the given distribution of z h ; p(z) is the distribution of z.

Recollection of Classical Wasserstein Auto-encoders
The classical Wasserstein auto-encoders can be resolved by the following optimization problem [20]: where the variables are defined as follows: Wz is the regularizer between distribution p(z) and q(zh); Wy is the reconstruction cost.

Proposed Cascade Decoders-Based Wasserstein Auto-Encoders
The proposed cascade decoders-based Wasserstein auto-encoders can be resolved by the following optimization problem:

Recollection of Classical Wasserstein Auto-Encoders
The classical Wasserstein auto-encoders can be resolved by the following optimization problem [20]: (θ, z, y) = argmin θ,z,y αW z (p(z), q(z h )) + βW y (y, x) s.t. z = E(x), y = D(z), C y (21) where the variables are defined as follows: W z is the regularizer between distribution p(z) and q(z h ); W y is the reconstruction cost.

Pseudocodes of Cascade Decoders-Based Auto-Encoders
The pseudo codes of the proposed cascade decoders-based auto-encoders are shown in Algorithm 1.

Experimental Data Sets
The purpose of the simulation experiments is to compare the data reconstruction performance of the proposed cascade decoders-based auto-encoders and the classical auto-encoders.
Four data sets are utilized to evaluate algorithm performance [29][30][31][32]. The mixed national institute of standards and technology (MNIST) data set has 10 classes of handwritten digit images [29]; the extending MNIST (EMNIST) data set holds 6 subcategories of handwritten digit and letter images [30]; the fashion-MNIST (FMNIST) data set possesses 10 classes of fashion product images [31]; and the medical MNIST (MMNIST) data set owns 10 subcategories of medical images [32]. The image size is 28 × 28. All color images are converted into gray images. In order to reduce the computational load, small resolution images and gray images are chosen. Certainly, if the computational capability is ensured, the proposed methods can be easily and directly utilized on large resolution images, the components of color images or their sub-patches. A large image can be divided into small patches. In traditional image compression methods, the size of image patch for compression is 8 × 8. Therefore, the proposed methods can be used for each image block. In brief, large image size will not degrade the performance of the proposed methods from the viewpoint of small image patches. For the convenience of training and testing deep neural networks, each pixel value is normalized from range [0, 255] to range [−1, +1] in the phase of pre-processing, and is re-scaled back to range [0, 255] in the phase of post-processing. The numbers of classes and samples in the four data sets are enumerated in Table 3. The sample images of the four data sets are illustrated in Figure 11. From top to bottom, there are images of MNIST digits, EMINST digits, EMNIST letters, FMNIST goods, MMNIST breast, chest, derma, optical coherence tomography (OCT), axial organ, coronal organ, sagittal organ, pathology, pneumonia, and retina.  Figure 11. Sample images of experimental data sets.

Experimental Conditions
The experimental software platform is MATLAB 2020b on Windows 10 or Linux. For the small data sets, MNIST, FMNIST, and MMNIST, the experimental hardware platform is a laptop with a 2.6 GHz dual-core processor and 8 GB memory; For the large data set, EMNIST, the experimental hardware platform is a super computer with high-speed GPUs and substantial memory.
The components of auto-encoders are made up of fully-connected (FC) layers, leaky rectified linear unit (LRELU) layers, hyperbolic tangent (Tanh) layers, Sigmoid layers, etc.

Experimental Conditions
The experimental software platform is MATLAB 2020b on Windows 10 or Linux. For the small data sets, MNIST, FMNIST, and MMNIST, the experimental hardware platform is a laptop with a 2.6 GHz dual-core processor and 8 GB memory; For the large data set, EMNIST, the experimental hardware platform is a super computer with high-speed GPUs and substantial memory.
The components of auto-encoders are made up of fully-connected (FC) layers, leaky rectified linear unit (LRELU) layers, hyperbolic tangent (Tanh) layers, Sigmoid layers, etc. In order to reduce the calculation complexity, the convolutional (CONV) layer is not utilized.
The composition of the encoder is shown in Figure 12, which consists of input, FC, LRELU, and hidden layers. The constitution of the decoder is illustrated in Figure 13, which consists of input, FC, LRELU, Tanh and output layers. The input layer can be the hidden layer for the first decoder, and can be the output layer of the preceding decoder for the latter decoders. The dashed line shows the two situations. The constitution of the decoder is illustrated in Figure 13, which consists of input, FC, LRELU, Tanh and output layers. The input layer can be the hidden layer for the first decoder, and can be the output layer of the preceding decoder for the latter decoders. The dashed line shows the two situations.
The organization of the discriminator is demonstrated in Figure 14, which comprises input, FC, LRELU, Sigmoid, and output layers. The input layer can be the hidden layer and can be the output of each decoder. The dashed line indicates the two cases.
The deep learning parameters, such as image size, latent dimension, decoder number, batch size, learning rate, and iteration epoch, are summarized in Table 4.  The constitution of the decoder is illustrated in Figure 13, which consists of input, FC, LRELU, Tanh and output layers. The input layer can be the hidden layer for the first decoder, and can be the output layer of the preceding decoder for the latter decoders. The dashed line shows the two situations.  The organization of the discriminator is demonstrated in Figure 14, which comprises input, FC, LRELU, Sigmoid, and output layers. The input layer can be the hidden layer and can be the output of each decoder. The dashed line indicates the two cases. The deep learning parameters, such as image size, latent dimension, decoder number, batch size, learning rate, and iteration epoch, are summarized in Table 4.

Experimental Results
The experimental results of the proposed and classical algorithms on the MNIST data set, EMNIST data set, FMNIST data set, and MMNIST data set are respectively shown in Tables 5-8. SSIM is the average structure similarity between reconstruction images and original images. ∆SIMM is the average SSIM difference between the proposed approaches and the conventional AE approach. The experimental results are also displayed in Figure 15, where the horizontal coordinate is data sets and the vertical coordinate is ∆SIMM.
It can be found in Tables 5-8 and Figure 15 that the proposed methods, except for ACDAE and ACDAAE, are superior to the classical AE and AAE methods in the performance of image reconstruction. Therefore, it proves the correctness and effectiveness of the proposed cascade decoders-based auto-encoders for image reconstruction.
It can also be discovered in Tables 5-8 and Figure 15 that the proposed RCDAE and RACDAAE algorithms post the best recovery performance across nearly all four data sets. Hence, residual learning is very suitable for image recovery. This is owing to the fact that the residual has a smaller average and variance than the original image, which is beneficial for the deep neural network to learn relationships between the input and output.  It can further be observed in Tables 5-8 and Figure 15 that the proposed ACDAE and ACDAAE algorithms yield some minus ∆SIMM across the four data sets. Thus, in line with the predictions of this paper, pure adversarial learning is unsuitable for image re-establishment. However, a combination of residual learning and adversarial learning, such as the aforementioned RACDAAE, can obtain high re-establishment performance.
It can additionally be found in Tables 5-8 and Figure 15 that the AAE algorithm possesses some minus ∆SIMM across the four data sets. Therefore, consistent with the circumstances of this article, pure AAE cannot outperform AE in image reconstitution. Nevertheless, a combination of residual learning and adversarial learning, such as aforementioned RACDAAE, can produce high reconstitution performance.
Finally, it can be found in Tables 5-8 and Figure 15 that SSIM differences for MMNISTaxial and MMNIST-sagittal are substantially higher than for other data sets. The reason for this may be that these training and testing samples are more similar than other data sets.
In order to clearly compare the reconstruction performance between the proposed algorithms and the classical algorithms, the reconstruction images are illustrated in Figures 16-25.
Since the proposed RCDAE algorithm owns the best performance, it is taken as an example. (1) (4) (10)   The recovery images on the MNIST data set are shown in Figure 16. For each subfigure in Figure 16, the top row shows the original images, the middle row shows the recovery images of AE, and the bottom row shows the recovery images of RCDAE. It is not easy to find the SSIM differences between AE and RCDAE in Figure 16. Therefore, the marked re-establishment images on the MNIST data set are illustrated in Figure 17. The left column shows the original images, the middle column shows the re-establishment images of AE, and the right column shows the re-establishment images of RCDAE. It is easy to notice the SSIM differences between AE and RCDAE in the red marked squares in Figure 17. ci. 2022, 12, x FOR PEER REVIEW 27 of 37 (1) (4)   Similarly, the reconstitution images on the EMNIST data set (big letters) are demonstrated in Figure 18; the marked reconstitution images on the EMNIST data set (big letters) are demonstrated in Figure 19. The rebuilding images on the EMNIST data set (small letters) are displayed in Figure 20; the marked rebuilding images on the EMNIST data set (small letters) are displayed in Figure 21. The reconstruction images on the FMNIST data set are shown in Figure 22; the marked reconstruction images on the FMNIST data set are shown in Figure 23. The recovery images on the MMNIST data set are displayed in Figure 24; the marked recovery images on the MMNIST data set are displayed in Figure 25. ci. 2022, 12, x FOR PEER REVIEW 29 of 37 (1) (4)   It is revealed from Figures 16-25 that the proposed algorithms achieve significant improvements in re-establishment performance on the MNST and EMNIST data sets. It is also manifested in Figures 16-25 that the proposed methods merely obtain unobvious promotion of re-establishment performance on the FMNIST and MMIST data sets. For instance, in the first row of Figure 25, the difference between the proposed and classical methods can only be found after enlarging the images; in the eighth row of Figure 25, conspicuous differences still cannot be found even after enlarging the images. Nevertheless, both of them are the true experimental results, which should be accepted and explained. The lack of differences in these results is attributed to four reasons. The first reason is that the quality of the original images is low on the FMNIST and MMNIST data sets. The second reason is that only the illumination component of original color images on part of the MMNIST data sets is reserved. The reconstruction performance will be improved if the original color images are utilized. The third reason is that the dimension of latent space is 30. It a very low choice compared with 784 (28 × 28), the dimensions of the original image. The fourth reason is that the convolutional layer is not utilized in the architecture of auto-encoders. For the purpose of decreasing the computational load, the convolutional layer was not adopted in the proposed approaches. The convolutional layer can effectively extract image features and reconstruct the original image, and is expected to further improve the reconstruction performance of the proposed approaches; this will be investigated in our future research.
Sci. 2022, 12, x FOR PEER REVIEW 31 of 37 (1) (4) (10) Figure 22. Reconstruction images on the FMNIST data set. For each subfigure, the top row demon- (4)  The recovery images on the MNIST data set are shown in Figure 16. For each subfigure in Figure 16, the top row shows the original images, the middle row shows the recovery images of AE, and the bottom row shows the recovery images of RCDAE. It is not easy to find the SSIM differences between AE and RCDAE in Figure 16. Therefore, the marked re-establishment images on the MNIST data set are illustrated in Figure 17. The left column shows the original images, the middle column shows the re-establishment images of AE, and the right column shows the re-establishment images of RCDAE. It is easy to notice the SSIM differences between AE and RCDAE in the red marked squares in Figure 17.
Similarly, the reconstitution images on the EMNIST data set (big letters) are demonstrated in Figure 18; the marked reconstitution images on the EMNIST data set (big letters) are demonstrated in Figure 19. The rebuilding images on the EMNIST data set (small

Conclusions
This paper proposes cascade decoders-based auto-encoders for image reconstruction. They comprise the architecture of multi-level decoders and related optimization problems and training algorithms. This article concentrates on the classical AE and AAE, as well as their serial decoders-based versions. Residual learning and adversarial learning are contained in the proposed approaches. The effectiveness of cascade decoders for image reconstruction is demonstrated in mathematics. It is evaluated based on the experimental results on four open data sets that the proposed cascade decoders-based auto-encoders are superior to classical auto-encoders in the performance of image reconstruction. In particular, residual learning is well suited for image reconstruction.
In our future research, experiments on data sets with large resolution images and colorful images will be conducted. Experiments on other advanced auto-encoders, such as VAE and WAE, will also be explored. The convolutional layer or transformer layer will be introduced into the proposed algorithms. The constraints on high-dimensional reconstruction data, such as sparse and low-rank priors, will be utilized to advance the reconstruction performance of auto-encoders. Generalized auto-encoders-based data compression and signal-compressed sensing will also be probed. The auto-encoders-based lossless reconstruction will further be studied.

Patents
The patent with application number CN202110934815.7 and publication number CN113642709A results from the research reported in this manuscript.