An Information-Theoretic Perspective on Proper Quaternion Variational Autoencoders

Variational autoencoders are deep generative models that have recently received a great deal of attention due to their ability to model the latent distribution of any kind of input such as images and audio signals, among others. A novel variational autoncoder in the quaternion domain H, namely the QVAE, has been recently proposed, leveraging the augmented second order statics of H-proper signals. In this paper, we analyze the QVAE under an information-theoretic perspective, studying the ability of the H-proper model to approximate improper distributions as well as the built-in H-proper ones and the loss of entropy due to the improperness of the input signal. We conduct experiments on a substantial set of quaternion signals, for each of which the QVAE shows the ability of modelling the input distribution, while learning the improperness and increasing the entropy of the latent space. The proposed analysis will prove that proper QVAEs can be employed with a good approximation even when the quaternion input data are improper.

Among such generative methods, the variational autoencoders (VAEs) have been proven to perform stochastic variational inference and learning even for large datasets and intractable posterior distributions [13]. The main advantages of the VAEs rely on their capability of learning smooth latent representations of the input data. This has led VAEs to be used in several fields of applications, including high-quality image generation [14], speech enhancement [15], music style transfer [16], data augmentation [17], 3D scene generation [18], gesture similarity analysis [19], text generation [20] and sequential recommendation [21], among others.
Recent advances on VAEs focus both on theoretical aspects, such as the improvement of the stochastic inference approach [22], and on architectural aspects, such as the use of different types of latent variables to learn local and global structures [23], the definition of hierarchical schemes [24], rather than the use of a multi-agent generator [25]. Among the most recent VAE models, we focus on the quaternion-valued variational autoencoder (QVAE), which exploits the properties of quaternion algebra to improve performance, on one hand, and to significantly reduce the overall number of network parameters [26], on the other hand.
As with any other complex-valued and hypercomplex-valued algebra, the quaternionvalued learning can rely on the properties of the second-order statistics, which are fundamental to characterize the input data based on their correlation. A characterization of correlation is achieved in terms of covariance and pseudo-covariance. In particular, random variables and processes with a vanishing pseudo-covariance are called proper [49][50][51]. Properness is preserved under affine transformations. Moreover, it should be considered that the multivariate Gaussian density assumes a natural form only for proper random variables. Properness in the quaternion domain, also denoted as H-properness, is based on the vanishing of three different complementary covariance matrices [51]. In the case of possibly improper random variables, we need to take into account the quaternion conjugate and augmented second-order statistics [51][52][53]. The degree of improperness of a quaternion random vector can be measured in several ways. An interesting approach is to use an improperness measure based on the Kullback-Leibler (KL) divergence between the augmented covariance matrix and its closest proper version in the KL sense [51]. Quaternion second-order statistics have been used in several applications, from independent component analysis to canonical transform [54,55], from linear to nonlinear adaptive filtering [56][57][58].
Recently, augmented quaternion second-order statistics have been also exploited for the first time for the development of a deep learning model, the QVAE [26]. Deep VAEs should model long-range correlations in data [59] and quaternion layers allow to leverage internal latent relations between input features. The QVAE also leads to a significant reduction in the number of parameters, which may benefit the decoder. A full improper QVAE should be developed to completely exploit correlated multidimensional input data, such as color images. However, on one hand an improper QVAE may be too complex due to the sophisticated structure of the complete covariance matrix. On the other hand, the proper QVAEs may represent a good tradeoff between complexity and performance, even when input data are improper. To this end, an information-theoretic analyses, thanks to its relation with improperness measures, may help in understanding how good such approximation may be in terms of properness [60,61].
In order to understand and demystify the theoretical capabilities of generating data with a proper model when the input data is either proper or improper, in this paper we investigate the proper QVAE from an information-theoretic perspective. To this end, we show how the KL divergence may establish a relation between the loss function of the QVAE and the improperness measure. We exploit the KL-based measure to show the entropy loss due to the improperness of the quaternion random vector input of the QVAE, or equivalently, the mutual information among the quaternion random vector and its involutions over three pure unit quaternions [51,52]. We show how the entropy in input data plays an important role in the generation of improper signals. Furthermore, we illustrate how the improperness can be calibrated in the latent space in order to enhance its differential entropy. Simulation results on artificial signals will prove the ability of the proper QVAE to be adopted with a good approximation for the generation of improper signals. Moreover, results will show how the properness is preserved by the model, even if it is composed of several nonlinearities. The paper is organized as follows. In Section 2, we briefly introduce the main properties of quaternion algebra that are exploited in quaternion-valued processing. In Section 3, we analyze the quaternion properness and the augmented second-order statistics of quaternion-valued signals, which are then exploited to define the quaternion variational autoencoders. In particular, the proper QVAE is described in Section 4, while in Section 5 we define the relation between the loss function of the QVAE and the properness measures of the involved signals by using the quaternion-valued entropy. Experimental simulations are shown in Section 6, while final conclusions are derived in Section 7.

Fundamental Properties of Quaternion Algebra
Quaternions are hypercomplex numbers consisting of four components, a scalar part and three imaginary ones, and elements of R 4 : The quaternion domain H is an extension of the complex domain C. A quaternion is identified as in which q 0 , q 1 , q 2 , q 3 are real coefficients andî,,κ imaginary units. A quaternion without the scalar part q 0 is named pure quaternion and represents a vector in R 3 since the imaginary units play the role of orthonormal basis, beingî = (1, 0, 0), = (0, 1, 0),κ = (0, 0, 1). The quaternions' peculiarity is the relation among imaginary components which comply with the following properties: The fundamental properties of quaternion operations are then described below. Addition and subtraction of two quaternions. Addition and subtraction among quaternions q and p are performed component-wise: Scalar product of two quaternions. As for addition and subtraction, the scalar product is also an element-wise operation: Vector product of two quaternions. Due to the relations among imaginary units in (2) and the non commutative properties of vector product in the hypercomplex domain, a new kind of vector multiplication has to be introduced. The Hamilton product, denoted by ⊗, is at the core of quaternion neural networks and it is described by the following equation: The Hamilton product can also be written in concise form: from which it is straightforward obtaining the Hamilton product for pure quaternions, by fixing q 0 = p 0 = 0: Conjugate of a quaternion. The conjugate of a quaternion number is obtained by subtracting the imaginary units instead of adding them to the scalar part: Norm of a quaternion. The norm of a quaternion is simply the euclidean norm in R 4 , thus |q| = √ qq * = q 2 0 + q 2 1 + q 2 2 + q 2 3 . Quaternion polar form. Quaternions admit an Euler representation, so considering a quaternion q, an angle θ ∈ R and a pure unit quaternion ν = q/ q , the polar form is: q = |q|(cos(θ) + ν sin(θ)) = |q|e νθ in this setting, cos(θ) = q 0 / q and sin(θ) = q / q . Quaternion involutions. A quaternion involution is generally defined as a mapping which is its own inverse or self-inverse mapping. Quaternions have infinite involutions that can be generalized by [62]: q ν = −νqν (7) in which q is the quaternion to be involved and ν is the axis of the involution. Interestingly, the scalar part of the quaternion is invariant to any kind of involution while the imaginary parts are reflected across the axis of the involution. Consequently, the conjugate of a quaternion is an involution since the scalar part remains unchanged while the imaginary units are reflected through the axis. Together with the conjugate, there are three crucial involutions we have to consider, which are perpendicular involutions: In the perpendicular involutions, the quaternion is involved across its imaginary units.

Statistics for Quaternion Random Signals
In order to understand the behavior of a signal, it is often crucial to study its statistics. A generic Gaussian signal can be completely described by the mean and the covariance matrix. For quaternion random variables, the mean can be easily derived by (3), resulting in another quaternion: in which q δ , δ = {0, 1, 2, 3} is the average of each quaternion component. Conversely, deriving second-order statistics for quaternion signals requires more detailed computations due to the interactions among components. As previously demonstrated for complex numbers [49], the covariance matrix is not able to describe the second-order information and the complementary covariance matrix needs to be considered too. Consequently, taking the standard covariance matrix of a quaternion signal into account is not sufficient to cover the complete second-order statistics; thus, the information is augmented through three complementary covariance matrices [52]. In order to build this complete statistical description, we need to introduce the information brought by the perpendicular involutions qˆı, qˆ, qκ introduced in (8). The matrices to be considered are then: which are, respectively, the standard covariance matrix and the three complementary matrices. Note that (·) H is the conjugate transpose operator. These matrices are still quaternions, so they comprise a real part and three imaginary units. As an example, we analyze the first matrix C qq . It is composed by the covariance matrices of the quaternion components q 0 , q 1 , q 2 , q 3 as follows: Similarly, also the matrices C qqˆı , C qqˆ , C qqκ can be determined and written in the above quaternion form. The composition and the combination of the standard covariance matrix and of the complementary covariance matrices give rise to the augmented covariance matrixC qq which recovers the complete second-order information of the augmented quaternion vector q = q, qˆı, qˆ, qκ [52]. The matrices in (10) have interesting properties: each component of the matrix is symmetric but the component corresponding to the involved imaginary unit is, instead, skew-symmetric. These characteristics are included in the Hermitian property, according to which C qqˆı isî-Hermitian, C qqˆ is-Hermitian and C qqκ isκ-Hermitian. The derived augmented covariance matrix follows.
By means of the mean and augmented covariance matrix and following [52], it is now possible to define a generic quaternion Gaussian distribution p(q) = p q, qˆı, qˆ, qκ , which is Gaussian if its components are jointly normal: where the meanμ q is the mean of the augmented quaternionq and N the number of samples.

H-Properness
In the previous subsection, we describe the complete second-order information of a quaternion signal by means of the augmented covariance matrix which is comprised of the covariance matrices of the quaternion with its perpendicular involutions. Due to the four degrees of freedom in quaternions, the H domain includes manifold properness definitions that are employed to characterize the relation of q with qˆı, qˆ, qκ in some particular cases.
The lighter condition is described in the R-properness, which holds for a signal that is uncorrelated with one of its perpendicular involutions, leading to the cancellation of either one of the complementary covariance submatrices C qqˆı , C qqˆ , C qqκ . A slightly harder constraint is brought by the C-properness which claims the nullification of two of the three complementary covariance submatrices. A composition of the two already defined properties gives rise to the strongest properness condition, the H-properness. The latter holds when each complementary covariance matrix vanishes, that is C qqˆı = C qqˆ = C qqκ = 0. However, according to [52], an H-proper variable has also several other features that are reported in Table 1. An H-proper signal is then uncorrelated with its perpendicular involutions and, according to the second property in Table 1, it also has uncorrelated components q 0 , q 1 , q 2 q 3 which have equal variance σ 2 . From these assumptions, we can derive the corresponding augmented covariance matrix for H-proper signals which will be symmetric and positive definite: The structure of the matrix C qq is invariant to any linear transformation ν = T H q where T is a quaternion weight matrix [54], thus ensuring that the quaternion properness is preserved by linear layers in neural networks.
As for improper signals in Section 3.1, a probability density function can be derived for H-proper signals too. In this framework, the distribution is described by the covariance matrix in (14) which is equal to 4σ 2 I and by the mean of the quaternion component in (9): In order to evaluate the properness of an augmented zero-mean quaternion random vector, a simple and fast coefficient was proposed in [32]. It indicates the correlation of the signal with its perpendicular involutions through the complementary covariance matrices. Note that the coefficient is bounded in [0, 1] and it is 0 in the case of H-proper signals, while it is equal to 1 for improper ones. It is defined as In our experiments, since we are dealing with H-proper signals, we can just report the average of the three coefficients, simply denoted as I. Conversely, if we evaluate R or C proper signals, we should consider each coefficient independently.

The H-Proper Variational Autoencoder
Originally introduced in [26] (the implementation of the QVAE is available online at: https://github.com/eleGAN23/QVAE, accessed on 1 July 2021), the quaternion variational autoencoder (QVAE) is a generative model which learns and controls the latent distribution of an n-dimensional H-proper quaternion input. Once the distribution is learnt, then the model is able to generate new samples from it thanks to the probabilistic relation between the two quaternion spaces of the input and of the latent distribution.
Similar to the variational autoencoder (VAE) for real-valued distributions in [13], the QVAE grasps the mapping of the original quaternion input x ∈ H into the latent space described by a prior distribution p θ ( ), with ∈ H. Then, it learns the conditional distribution p θ (x|z) of the input with respect to the latent vector z ∈ H, that is the prior distribution adjusted with the statistics learnt by the encoder. However, due to the intractability of the mapping distribution p θ (z|x), the QVAE introduces an approximation q φ (z|x) to encode the input into the latent vector, as in its real-valued counterpart [13]. The parameters of this recognition model (or inference model) are then optimized to match the true intractable conditional distribution as q φ (z|x) ≈ p θ (z|x). By employing this approximation, it is possible to express the marginal likelihood is the so called evidence lower bound (ELBO) [13] with respect to the parameters φ and θ, which is a lower bound on the log-likelihood of the data distribution [63]. D KL (·) is the KL divergence between the H-proper centered isotropic Gaussian variable p θ ( ) ∼ N (0, C ) and the H-proper Gaussian q φ (z|x) ∼ N (µ z , C zz ), with C zz taking the form of (14) and C = 4Iσ 2 . The KL divergence in the quaternion domain takes the following form: The optimization of the QVAE cost function (17) goes through the minimization of the KL divergence, through which the QVAE forces the recognition model to be as close as possible to the prior distribution by operating on the statistics µ z and C zz .

Kullback-Leibler Divergence as Entropy Loss Due to Improperness
The KL divergence in (17) can be indeed seen as a measure of the improperness of the recognition model. As defined in [51], the measure is described by: where C is the set of all the augmented covariance matrices respecting one of the quaternion propernesses introduced in Section 3.2. This measure estimates the distance of the recognition model q φ (z|x) with an augmented covariance matrix C zz from the H-proper isotropic Gaussian variable p θ ( ) described by the augmented covariance matrix C , by assuming µ z = 0. Since we are considering distributions with zero mean, the first term in the second line of the KL Equation (17) vanishes. Furthermore, Equation (18) minimizes the divergence over the set C; thus, in the H-proper case, the matrix which minimizes (18) is the diagonal covariance matrix defined in (14). Employing these considerations and with some computations, we can reduce the improperness measure to: If the signal for which we want to measure the improperness, i.e., the recognition model q φ (z|x), is H-proper, then the measure will be close to 0 since C zz will be close to C in the KL sense. Conversely, if the signal is improper, P will be far from 0.
By defining the improperness measure through the KL, it is possible to leverage the relation among the divergence and the entropy and exploring the mutual information of the signal. Therefore, we can use the definition of the differential entropy for a generic quaternion vector [51,52] to compute the entropy of the recognition model q φ (z|x) : The entropy expression can be simplified for isotropic H-proper variables such as the prior distribution p θ ( ), by replacing the value of the determinant from (14), which is equal to (4σ 2 ) 4N .
In [52], the authors proved that the differential entropy is maximum for H-proper signals. Thus, (21) is an upper bound for the general entropy definition (20), hence: H q φ (z|x) ≤ H(p θ ( )).
Then, according to [51], the improperness measure P can be rewritten through the difference between the entropy of the generic augmented quaternion and the entropy of the isotropic variable: This formulation elegantly evaluates the loss of entropy that is due to the improperness of the recognition model. Therefore, we can measure the properness of the recognition model and compute the QVAE entropy loss due to the measure of properness (or improperness) of q φ (z|x). Thus, when optimizing the QVAE cost function during training, together with the ELBO minimization, through the KL divergence we are jointly minimize also the difference between the Gaussian distributions p φ ( ) and q φ (z|x) and the entropy loss of the latter.
Since the KL under the information theoretic perspective allows the QVAE to deal also with improper signals, although the QVAE is originally set up for H-proper distributions, by means of an approximation it can be employed also for improper random variables. Let us consider an incoming improper signal x ∈ H, so both the recognition model q φ (z|x) and the generative model p θ (x|z) are represented by improper distributions too. We leave instead unchanged the prior distribution p θ ( ), which is the centered isotropic H-proper distribution. On the training stage, the KL divergence of the QVAE will move the improper distribution towards the H-proper prior distribution. While the distribution learning is well performed by the model, the KL takes care of increasing the entropy by slightly reducing the improperness. However, the overall original (improper) structure of the signal is preserved by the QVAE which just aims at enhancing the entropy of the recognition model, while leaving free the generative model.

Experimental Results
In this section, we report the sets of experiments we performed for testing the reliability of the improperness measures and the performances of the H-proper variational autoencoder in learning proper as well as improper distributions.

Proper and Improper Test Signals
We consider two proper signals and two improper signals and we evaluate the improperness coefficient (16), the differential entropy (20), the improperness measure (18) and the entropy loss (22). We expect that the proper signals have improperness measure and coefficients close to 0, as well as the entropy loss. On the contrary, the entropy should be greater than for the corresponding improper signals.
As input signals, we consider:

1.
H-proper signal x with independent Gaussian components defined as: H-proper filtered signal x from a colored Gaussian noise η[n] as [31]: Improper signal x from a quaternion q with components: and composed as: I1: x = ze q , with z ∼ N (0.03, 1)

4.
Improper filtered signal x from a Gaussian noise η[n] as: Mild improper signals (i.e., with I α different from 0 and to 1) can be generated by varying the coefficient b in the improper signal I2. As an example, by setting b = [0.01, 0.2, 0.75, 0.99], it is possible to obtain a mild-improper signal with 0.4 < I α < 0.7. However, we want to stress the H-proper QVAE with completely improper cases that are the extreme and most difficult cases. For this reason, we only consider improper signals with I α = 1.
We compute the measures defined in Section 5 for each of these signal by using the Quaternion Toolbox for MATLAB (S. J. Sangwine and N. Le Bihan. Quaternion Toolbox for MATLAB , Version 2 with support for Octonions, 2013. First public release 2005. Software library available online at: http://qtfm.sourceforge.net/, accessed on 1 July 2021) and report the results in Table 2. As we expect, the proper signals have low improperness coefficients, meaning that the complementary covariances are sparse, while the coefficients for the improper signals are reasonably equal to 1. On the other hand, the differential entropy is low and negative for the improper signals, while it is highest for the proper ones, proving that (21) is an upper bound for the generic differential entropy. Moreover, the entropy loss is almost 0 for proper signals, confirming that there is no loss of entropy due to the improperness. On the contrary, the improper signals come at the cost of a high entropy drop. Table 2. In order to prove the measures and the assumptions presented in Section 5, we compute the average improperness coefficient I (16), the differential entropy H (20), the improperness measure P (19) and entropy loss P H (22) for different kinds of H-proper and improper signals. The entropy H is highest for the H-proper signals, while the improperness measure and coefficients are close to 0. As is clear from P and P H , there is a low entropy loss due to the improperness in the first two signals, while it is high for the last two, improper, random vectors.

Proper and Improper Signals for an H-Proper QVAE
We consider a plain QVAE with a quaternion encoder network and a quaternion decoder network. The first model is comprised of a stack of five quaternion fully connected layers (QFC) with an increasing number of weights [32,64,128,256,512], as in [26,64] and split ReLU activation function as the PixelHVAE in [65]. The statistics of the latent distribution are learnt through two additionally quaternion linear layers with number of weights equal to 4 times the latent dimension, which we fix to 25. The decoder has a mirrored structure, and thus quaternion fully connected layers are piled up as [512,256,128,64,32], with an additional refiner layer at the end of the stack. We do not consider including the quaternion batch normalization [38,45] since it can introduce randomness which may affect the correct learning of the distribution statistics [24,66]. For every experiment, the prior distribution is a centered isotropic H-proper quaternion Gaussian distribution, as described in Section 4. We employ a training signal with 1000 samples, a validation set of 500 and a test set of 1000 without mini batches and we perform 5k iterations with Adam having a learning rate equal to 0.0005. As in the original QVAE [26], the cost function is a weighted sum of a quaternion mean squared error as defined in [37], and the KL divergence of Section 5, for which we fix the scale parameter λ equal to 0.1. Figure 1 shows the architecture of the QVAE we consider in our tests.  Figure 1. Plain QVAE architecture. The input signal can be either proper or improper. The encoder network is composed of quaternion fully connected layers (QFC) and it learns the statistics of the original distribution and employs them to build the latent representation, on which the KL divergence acts. The representation is then passed to the decoder network which reconstruct the input signal, evaluating the result by means of a quaternion MSE loss (MSE q ).
For each experiment, we evaluate the performance of the model in the reconstruction task and generation task as well as on the ability of preserving the properness or the improperness of the incoming signal. In order to assess the generation ability of the model, in Figure 2, we plot the generated components and the complete signal sampling from the model trained on a proper signal and on an improper signal. While the components x 0 , x 1 , x 2 , x 3 of the improper signal show different variances, the ones from the proper signal has the same variance according to the properties in Table 1. Figure 3 reports instead the original and reconstructed distributions for the proper and improper case, in the first and in the second line, respectively, with an MSE equal to 0.1161 for the proper signal and to 0.01293 for the improper one. In the figure, the first four columns represent the four components x 0 , x 1 , x 2 , x 3 , while the last one displays the distribution of the complete signal x. The H-QVAE is able to perfectly reconstruct the input distribution even in the improper case, grasping the different variances in each component and learning the complete signal. Thus, the QVAE is effective even if approximated for improper distributions, for which it is capable of learning the correct statistics of each component.   Table 3 presents the results of the improperness and entropy analysis on the reconstructed and generated signals on the test set. As for the original samples in Table 2, we compute the improperness coefficient (16), the differential entropy (20), the improperness measure (18) and the entropy loss (19). It is worth noting that the QVAE trained on H-proper signals still generates and reconstruct H-proper signals, as measured by the improperness measures and the entropy loss. Moreover, the entropy of this kind of signal is higher than the entropy of the improper samples. While we already knew that H-properness is preserved by linear transformations, this result asserts that it is maintained also for non-linear transformations brought by activation functions in neural networks, adding an important milestone in the study of H-proper signals.
Another interesting result comes when the QVAE is trained on improper signals. While the improperness coefficients of both reconstructed and generated signals are close to the coefficients of their respective original signals in Table 2, indicating the improperness of these random variable, the entropy values and losses values are intriguing. Indeed, the differential entropy is increased with respect to the input signal (originally, ≈−51 and ≈−45), approaching values from ≈−10 up to 0, meaning that the H-proper structure of the QVAE works towards an increase in entropy and an improperness reduction in the input signal. Confirming this, the loss of entropy P H was drastically reduced with respect to the original signal, ranging from approximately 8 and 13 while the input entropy loss is around ≈51.
Hence, the H-proper QVAE can be approximated to learn improper distributions, on which it preserves the original structure of the signal while increasing the entropy of the latent space, at the cost of a limited improperness reduction. Although the slight modification on the distribution, the QVAE is still able to catch the correct statistics and to perform a good generation as well as a precise reconstruction, as Figures 2 and 3 show.
To further assess the performance of the QVAE and provide a comparison with its realvalued counterpart, we test a real-valued VAE on the same signal benchmarks. We expect that the real-valued VAE performs similar to the QVAE in the proper case, but that the quaternion model outperforms the real-valued one when dealing with improper signals. Indeed, thanks to the quaternion algebra properties, the QVAE is able to learn intracomponent correlations while the VAE do not catch them. Thus, in cases with no correlation among components i.e., proper signals, the two models should have similar performances. On the contrary, in the improper case where the correlation among components is higher the QVAE gains advantages from the quaternion algebra and obtains better performances. Results and details are reported in the Appendix A. However, thanks to the quaternion algebra properties, the QVAE has just 25% of free parameters with respect to its real-valued counterpart.

Conclusions
In this paper, we have conducted a thorough analysis on a H-proper quaternion-valued variational autoencoder (QVAE) under an information-theoretic perspective, investigating the entropy loss of the model due to the measure of improperness in the input signal. We have proved that the QVAE can be approximated for improper signals, for which it increases the entropy of the latent space by slightly reducing the improperness. Moreover, it is worth noting that, in each of our experiments, the properness is preserved also for nonlinear transformations brought by activation functions and not only for simple quaternion linear operations. As a conclusion, the QVAE is able to learn the correct distribution of both H-proper and improper random variables, while the measure of improperness, as well as the entropy of the latent space, can be calibrated through the KL divergence in the cost function.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Evaluation of the Real-Valued VAE on Proper and Improper Signals
In order to further assess the performance of the H-proper QVAE, we perform a set of experiments with a real-valued VAE on the same benchmarks. In order to make a fair comparison, the hyperparameters are the same as for the QVAE, while the architecture is built with real-valued blocks. Moreover, the KL divergence is the standard divergence in [13]. However, due to the algebra in real domain, the VAE has 506, 316 parameters while the QVAE has just 127, 980, thus the proposed model in the quaternion domain has exactly the 25% of parameters with respect to its real-valued counterpart. Thus, the QvAE is able to learn the improperness of the input signal and to generate or to reconstruct precise signals while requiring 1/4 parameters and memory. Moreover, the QVAE is able to perfectly reconstruct the improper signal, as shown in Figure 3, while the real-valued VAE makes some errors. These results can be seen in Figures A1 and A2. As we expect, the QVAE outperform the real-valued VAE when dealing with correlated signals i.e., improper cases, since the proposed method leverages the quaternion algebra properties, including the Hamilton product, to learn intra-components relations, while the real-valued VAE is not able to catch them. Thus, while for proper signals the performances between VAE and QVAE are similar, in the improper cases the QVAE has better performances thanks to the Hamilton product in its core blocks. Finally, the improperness measures and corresponding entropy loss for each of the signals considered are reported in Table A1. Table A1. Average improperness coefficient I (16), differential entropy H (20), improperness measure P (19) and entropy loss P H (22) for different output from the real-valued VAE (Recons stands for the reconstructed signal, Sample for the generated one) with different kinds of H-proper and improper signals. As the QVAE, also the VAE, which is however 75% larger in terms of free parameters, is able to learn the improperness of the input.

Signal
Out