Next Article in Journal
Fluid Densities Defined from Probability Density Functions, and New Families of Conservation Laws
Previous Article in Journal
Equivariant Neural Networks and Differential Invariants Theory for Solving Partial Differential Equations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Classification and Uncertainty Quantification of Corrupted Data Using Supervised Autoencoders †

by
Philipp Joppich
1,2,*,
Sebastian Dorn
2,*,
Oliver De Candido
1,
Jakob Knollmüller
3,4 and
Wolfgang Utschick
1
1
School of Computation, Information and Technology, Technical University of Munich, Arcisstr. 21, 80333 Munich, Germany
2
AUDI AG, Auto-Union-Str. 1, 85057 Ingolstadt, Germany
3
School of Natural Sciences, Technical University of Munich, James-Franck-Str. 1, 85748 Garching, Germany
4
Excellence Cluster ORIGINS, Boltzmannstr. 2, 85748 Garching, Germany
*
Authors to whom correspondence should be addressed.
Presented at the 41st International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Paris, France, 18–22 July 2022.
Phys. Sci. Forum 2022, 5(1), 12; https://doi.org/10.3390/psf2022005012
Published: 7 November 2022

Abstract

:
Parametric and non-parametric classifiers often have to deal with real-world data, where corruptions such as noise, occlusions, and blur are unavoidable. We present a probabilistic approach to classify strongly corrupted data and quantify uncertainty, even though the corrupted data do not have to be included to the training data. A supervised autoencoder is the underlying architecture. We used the decoding part as a generative model for realistic data and extended it by convolutions, masking, and additive Gaussian noise to describe imperfections. This constitutes a statistical inference task in terms of the optimal latent space activations of the underlying uncorrupted datum. We solved this problem approximately with Metric Gaussian Variational Inference (MGVI). The supervision of the autoencoder’s latent space allowed us to classify corrupted data directly under uncertainty with the statistically inferred latent space activations. We show that the derived model uncertainty can be used as a statistical “lie detector” of the classification. Independent of that, the generative model can optimally restore the corrupted datum by decoding the inferred latent space activations.

1. Introduction and Motivation

Many real-world applications of data-driven classifiers, e.g., neural networks, involve corruptions that pose significant challenges to the pretrained classifiers. Often, the corruption must previously be included, and, thus, already be known during training. For instance, noise (e.g., due to sensor imperfections) and convolutions (e.g., due to lens flares or unfocused images) are inevitable in image processing systems and may occur spontaneously and irregularly. The same holds for masking, which may occur when a foreign object occludes the actual object of interest (e.g., water droplets, dirt, or scratches on the camera lens). Hence, we aimed to answer the following question in this paper: How can we classify corrupted data with a parametric classifier without imposing any constraints on the training data? As classifying corrupted data naturally demands a measure of uncertainty for validation, we included both model uncertainty δ m and reconstruction uncertainty δ r in the classification. We refer to δ m as the model’s confidence of the classification itself. In contrast, we refer to δ r as the confidence of the process of reconstructing the latent space activations given some corrupted datum. An overview of the proposed method is illustrated in Figure 1.

2. Classification and Uncertainty Quantification of Corrupted Data

2.1. Methodology Overview and Related Work

To address the challenge of classification and uncertainty quantification of corrupted data, we propose the following core approach, illustrated in Figure 2.
① In the first step, we trained a supervised autoencoder [1] that is: (a) capable of classifying the input data with its latent space activations h and (b) capable of decoding the (supervised) latent space activations to generate higher-dimensional data, targeting it to be identical to the input. Except for these two constraints ((a) and (b)), we did not impose any further restrictions on the autoencoder.
② In the second step, we decoupled the decoder g from the autoencoder and treated the decoder as a fixed generative function g. Neither retraining nor further modifying of g is performed in the following steps.
③ In the third step, we included g in an Additive White Gaussian Noise (AWGN) channel model d = m C g ( h ) + n . This AWGN channel model additionally involves heavy corruption such as convolution C and masking m .
④ In the final step, we approximated the posterior probability distribution P ( h | d ) in the latent space and derived the mean and standard deviation, corresponding optimally to some uncorrupted datum g ( h ) , given the corrupted datum d . Due to supervision in the latent space, this reconstruction enables a direct classification of d including model and reconstruction uncertainty quantification, even though the decoding function was trained on uncorrupted data. We used a set of samples H from the approximate posterior probability distribution to determine the sample mean mean ( H ) = H ¯ , as well as the set’s reconstruction uncertainty δ r with the sample standard deviation std ( H ) . Samples are statistically inferred by Metric Gaussian Variational Inference (MGVI) [2]. In addition to the reconstruction uncertainty δ r , we determined the model uncertainty by calculating the Mahalanobis distance (M-distance) in the latent space representation, slightly different from [3]. Here, we distinguish between reconstruction uncertainty δ r and model uncertainty δ m to evaluate the confidence of the process of inferring h and to evaluate the confidence of the classification given by the supervised latent space, respectively. Similar to our approach, references [4,5] showed that the reconstruction of the latent space by posterior inference and by using generative models [6,7,8] for a corrupted datum can lead to an optimal image restoration with uncertainty quantification. These methods do not, however, focus on classifying the corrupted datum in the latent space, nor do they use supervised autoencoder structures. In the field of quantifying uncertainties of classifications, several methods exist. Predominantly, Bayesian Neural Networks (BNNs) [9] and Monte Carlo dropout (MC-dropout) [10] have recently shown success. More recently, Evidential Deep Learning (EDL) [11] was introduced as yet another probabilistic method to quantify classification uncertainty. The latter two methods are compared to our method in Section 3. Finally, various methods to perform image restoration exist in the literature, such as the well-known denoising autoencoder [12]. These conventional methods require prior knowledge of the corruption to be included to the training data.

2.2. Generative Model and Bayesian Inference with Neural Networks

The first step of our method is to train a supervised autoencoder. The autoencoder involves the encoding function f (mapping data x R p to the latent space representation with activations h R z , z N ), as well as the decoding function g (mapping h to the data space representation x ^ R p , p N , p z ). The parameters of f : R p R z and g : R z R p are optimized via a combination of two loss terms L g f (representing the reconstruction loss in the data space) and L f (representing the classification loss in the latent space):
L SAE = L g f ( g ( f ( x ) ) , x ) + L f ( f ( x ) j , y ) = L g f ( x ^ , x ) + L f ( h j , y ) ,
where j denotes the number of activations in the latent space h that are supervised, i.e., h j = [ h 1 , , h j ] . After normalizing all data samples in the range of [ 0 , 1 ] , we used the corresponding cross-entropy for each respective loss term to penalize false classifications in the latent space and inaccurate reconstructions in the data space. Note that the loss term L f ( h j , y ) processes activations from the latent space h j with the softmax -function (i.e., L SAE = L g f ( x ^ , x ) + L f ( softmax ( h j ) , y ) ). The softmax -function’s output yields values ranging from [ 0 , 1 ] , which can be penalized by one-hot-encoded labels y . The softmax -function is not included as an activation function in our neural network, where the latent space h is activated linearly; see Section 3 for details. We minimized the general loss function of Equation (1) using the Adam optimizer [13] (test accuracy of [ 98.6 % ; 89.4 % ] on the encoding function f with [MNIST; Fashion-MNIST]). Once the training procedure converged, we decoupled the decoding function g from the autoencoder. Without loss of generality, we then used an AWGN model including the nonlinearity g ( h ) , which additionally involves masking m and convolution C on g:
d = m C g ( h ) + n .
Additive white Gaussian noise, n R p N ( 0 , Σ n ) , is applied to the decoded latent space signal g ( h ) , which yields the corrupted data d R p . Note that, for the implementation of h = A ξ + μ h , the reparametrization trick [14] is applied ( Σ h = cov ( f ( X Val ) ) , Σ h = A A T , Σ h R z × z , ξ N ( 0 , I ) , ξ R z , μ h = mean ( X Val ) , μ h R z ). In addition to AWGN, we included the corruptions of masking m and convolutions C , which are both linear operations.
Since we are interested in reconstructing the latent space activation h from d alongside uncertainty quantification, the goal is to determine the posterior distribution P ( h | d ) P ( d | h ) P ( h ) . The log-probability distribution reads
ln P ( h | d ) = 1 2 d m C g ( h ) T Σ n 1 d m C g ( h ) + h T Σ h 1 h h ) T + const . ,
where ( · ) T denotes the matrix transpose. Since we are ultimately interested in the analytically intractable mean of h , h P ( h | d ) = h P ( h | d ) d h , we approximately determined the mean and the variance of P ( h | d ) by applying MGVI. Similar to other variational inference methods [14,15], MGVI approximates the distribution by a simpler, but tractable distribution from within a variational family, Q ( h ) . The parameters of Q ( h ) , i.e., mean η and covariance Δ , are obtained by minimizing the variational lower bound. The size of a full variational covariance scales quadratically with the number of latent variables. Taking these limitations into account, we employed MGVI, which locally approximates the target distribution using the inverse Fisher metric as an uncertainty estimate around the variational mean η , which we are optimizing for. The approximation is represented by an ensemble of samples H = { h ˜ 1 , h ˜ 2 , , h ˜ n } with h ˜ R z , which we used for our analysis. h ˜ refers to the inferred sample. We here call H ¯ the posterior mean and δ r the posterior standard deviation, or the reconstruction uncertainty.

2.3. Classification and Uncertainty Quantification

The supervision of the latent space allows us to classify the input d in a straightforward manner by evaluating the sample mean and sample standard deviation of the set H . While the sampling mean of the set mean ( H ) = H ¯ gives the class of the most likely classification, the sampling standard deviation reflects the reconstruction uncertainty δ r of the latent space posterior distribution. δ r depends on the type and magnitude of the corruption, as well as the prior probability distribution we included in the channel model (Equation (3)). We visualized this dependency with various experiments; see Figure 3.
Since we are additionally interested in the uncertainty of the model, δ m , we evaluated the M-distance of all samples in H to every class conditional distribution in the latent space (see arrows in Figure 4). We initially determined the parameters of these class conditional distributions by passing the uncorrupted data samples from an independent (i.e., independent of training and testing) dataset X Val (see Section 3) through the encoder f. We then evaluated the closest class conditional distribution to a single sample h ˜ , which corresponds to the most likely class. The absolute value of the M-distance to the closest class conditional distribution serves as a measure of the model uncertainty δ m . In this work, all class conditional distributions in the latent space were assumed to follow multivariate Gaussian distributions with covariance Σ i and mean μ i . This method is an implementation slightly different from [3], where it was shown that the Mahalanobis distance is not only an accurate classifier in this context, but also a reliable out-of-distribution detector reflecting the model uncertainty. Reference [3] used tied covariance matrices instead of individual covariance matrices for each class conditional distribution, as done in our method.

3. Experiments

To validate our method, we conducted several experiments (see for details of implementation and code: https://github.com/pjoppich/corrupted_data_classification) on the MNIST [16] and the Fashion-MNIST [17] dataset. We evaluated the performance on various corruption types and magnitudes and performed a comparison to MC-dropout [10] and EDL [11]. The following architecture was used for the supervised autoencoder (we used the same architecture for both datasets): A feedforward neural network was built with dimensions 784 { 0 } 512 { 1 } 256 { 2 } 128 { 3 } 10 { 4 } 128 { 5 } 256 { 6 } 512 { 7 } 784 { 8 } , where layers { 0 } { 2 } and { 4 } { 7 } use the SeLU activation function [18], layer { 3 } linear, and layer { 8 } sigmoid activations. Note that, in our case, for simplicity, the number of latent space dimensions z is equal to the number of supervised classes j, although j z holds generally. We split each dataset into three subsets, X Train ( 48 · 10 3 samples, used for training), X Test ( 10 · 10 3 samples, used for testing and experiments), and X Val ( 12 · 10 3 samples, used for determining Σ h and Σ C 1 Σ C K ). We used the MGVI implementation of NIFTy [19] to perform the inference [3].

3.1. Classification

We visualize experiments (1)–(3) in Figure 3. In the first experiment (1), we classified data from an independent test set of the MNIST-dataset corrupted by different noise levels with the proposed method. We denote α as the noise level of n, n N ( 0 , α ) . We compared the accuracy of our method to the baseline of processing corrupted data through the encoder of the pretrained autoencoder. We show that we significantly improved the accuracy of classifying corrupted data in comparison to the straightforward classification by f ( d ) . For the second experiment (2), we used the same data samples as for (1) with the exception that we now additionally corrupted the data with window masking at a constant noise level of α = 0.1 . Again, we compared the accuracy of our method to the baseline of processing the same data samples through the encoder. In the third experiment (3), we corrupted the data by convolving them with a Gaussian blur kernel with a filter size of 7 × 7 and different magnitudes γ at a constant noise level of α = 0.1 .
Experiments (1), (2), and (3) led to the following conclusions:
  • The reconstruction uncertainty δ r True of correct classifications is approximately equivalent to the δ r False of wrong classifications. This behavior indicates that the correctness of the classification does not influence the reconstruction uncertainty δ r , showing evidence that δ r is independent of the model uncertainty δ m .
  • As opposed to δ r , the model uncertainty δ m strongly depends on the correct/wrong classification of the corrupted datum: δ m is significantly and consistently higher for false classifications than for true classifications. This characteristic sets the basis for a statistical “lie detector” (see Section 3.2) of classification. Fields of application could be the validation of neural networks in, e.g., medical imaging and other safety-critical applications.
  • Classifying corrupted data through the decoder (see ACC g in Figure 3) (rather than the encoder (see ACC f )), with a suitable channel model considering the corruption, significantly improved the model’s accuracy without the necessity of retraining the autoencoder. Especially for high levels of all corruption types, the accuracy of the model notably improved. Corruption by convolution had catastrophic consequences for classifying data in a straightforward manner through the encoder f, while this type of corruption seemed to only have a minor impact on our method.
  • Both uncertainties δ r and δ m rose with increasing levels of corruption.

3.2. Detection of False Classifications

Finally, in experiment (4) (see Figure 5), we validated the model uncertainty of our method by introducing the Uncertainty-based Receiver Operating Characteristics (U-ROC) curve of detecting false classifications with the M-distance. We evaluated the binary classification task of the two classes “The neural network correctly classifies a corrupted datum” (Positive Class) and “The neural network falsely classifies a corrupted datum” (Negative Class). Based on the model uncertainty of our method, we aimed to predict the two classes without further knowledge, providing the initially proposed “lie detector”. The U-ROC curve was built from the True Positive Rate and the False Positive Rate. We compared our U-ROC curve with the U-ROC curve of the MC-dropout method [10] and with the U-ROC curve of EDL [11], feeding all methods with the identical input of a datum corrupted by noise at α = [ 0.1 , 0.5 , 1.0 ] . We made the following conclusions from experiment (4), Figure 5:
  • Our method seemed to outperform MC-dropout and EDL to detect false classifications given the same data samples at the input for α = 0.1 and α = 0.5 . One reason for this might be that the M-distance serves as a reliable out-of-distribution detector, exploiting the inherent latent space structure of uncorrupted data as a reference, as opposed to MC-dropout and EDL. For α = 1.0 , both EDL and our method outperformed MC-dropout, while the Area Under the Curve (AUC) of EDL was largest. Here, it should be noted that EDL cannot classify the corrupted data at this noise level (accuracy: 8.9 % ), resulting in only few samples to test the cases of True Positives and False Positives.
  • All three methods provided reliable results for detecting false classifications for low noise levels. The model uncertainty δ m truly reflects the confidence of the classification, i.e., a high value of δ m correlates empirically with a higher probability of false classification.
  • The U-ROC curve combined with the accuracy indicates that EDL seemed to overestimate uncertainties, leading to a very robust U-ROC curve for high noise levels, but simultaneously leading to a severe drop in the accuracy in the presence of data corruption. We observed comparable results on F-MNIST data.

4. Requirements and Summary

Our proposed methodology to classify a corrupted datum d including uncertainty quantification requires the following inputs in addition to d :
  • m , C : Without loss of generality, here, we assumed corruption by masking and convolution represented by m and C in the AWGN channel model, as depicted in Equation (2). Here, C can in real-world applications often be derived from the image processing system in use. Algorithms to detect possibly occluding objects (represented by masking m ) were given by, e.g., [20].
  • Σ n : Noise covariance matrix. AWGN with n N ( 0 , Σ n ) and applied additively to the data d . Among others, the methodology published by [21] enables the derivation of Σ n given noisy data d .
  • Σ h : Sampling covariance matrix of all (uncorrupted) latent space activations processed by the encoding function f. We used the assumption that an autoencoder can represent an inherent, lower-dimensional structure of the data in its latent space and assumed this sub-dimensional structure to sufficiently follow a multivariate Gaussian probability distribution.
In summary, our approach was able to classify heavily corrupted data with parametric classifiers. The method does not require corrupted data for training. As we built our procedure on a probabilistic architecture, we quantified the classification and the model uncertainties, allowing for a reliable detection of false classifications. We see our method as a highly flexible and robust framework that can be applied to any generative neural network to improve performance on corrupted data significantly. If the generative neural network comes with a supervised encoded space, it can classify the data directly. We showed that the M-distance can independently be used to classify data. The limitations of our method include that the corruption type needs to be modeled, as well as there is a higher computational cost than MC-dropout and EDL (mainly due to the approximation of the posterior probability distribution; Step 4 in Section 2.1).

Author Contributions

Conceptualization, P.J., S.D. and O.D.C.; methodology, P.J., S.D. and O.D.C.; software, P.J. and J.K.; validation, P.J. and S.D.; formal analysis, P.J., S.D. and O.D.C., J.K. and W.U.; investigation, P.J.; resources, n.a.; data curation, P.J.; writing—original draft preparation, P.J., S.D. and O.D.C.; writing—review and editing, P.J., S.D., O.D.C., J.K. and W.U.; visualization, P.J.; supervision, W.U.; project administration, S.D., O.D.C. and W.U.; funding acquisition, n.a. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Acknowledgments

The authors thank T.A. Enßlin for valuable discussions and for supporting the implementation of the Bayesian inference methods. Furthermore, we thank AUDI AG and the Technical University of Munich for providing an optimal research environment. Jakob Knollmüller acknowledges the financial support by the Excellence Cluster ORIGINS, which is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy—EXC-2094-390783311.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Le, L.; Patterson, A.; White, M. Supervised autoencoders: Improving generalization performance with unsupervised regularizers. In NIPS’18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 3–8 December 2018; Curran Associates, Inc.: Red Hook, NY, USA, 2018; pp. 107–117. [Google Scholar]
  2. Knollmüller, J.; Enßlin, T. Metric Gaussian Variational Inference. arXiv 2020, arXiv:1901.11033. [Google Scholar]
  3. Lee, K.; Lee, K.; Lee, H.; Shin, J. A Simple Unified Framework for Detecting out-of-Distribution Samples and Adversarial Attacks. In NIPS’18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 3–8 December 2018; Curran Associates, Inc.: Red Hook, NY, USA, 2018; pp. 7167–7177. [Google Scholar]
  4. Böhm, V.; Lanusse, F.; Seljak, U. Uncertainty Quantification with Generative Models. arXiv 2019, arXiv:1910.10046. [Google Scholar]
  5. Böhm, V.; Seljak, U. Probabilistic Auto-Encoder. arXiv 2020, arXiv:2006.05479. [Google Scholar]
  6. Adler, J.; Öktem, O. Deep Bayesian Inversion. arXiv 2018, arXiv:1811.05910. [Google Scholar]
  7. Seljak, U.; Yu, B. Posterior Inference Unchained with EL_2O. arXiv 2019, arXiv:1901.04454. [Google Scholar]
  8. Wu, G.; Domke, J.; Sanner, S. Conditional Inference in Pre-trained Variational Autoencoders via Cross-coding. arXiv 2018, arXiv:1805.07785. [Google Scholar]
  9. Neal, R.M. Bayesian Learning for Neural Networks; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
  10. Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; Volume 48, pp. 1050–1059. [Google Scholar]
  11. Sensoy, M.; Kaplan, L.; Kandemir, M. Evidential Deep Learning to Quantify Classification Uncertainty. In NIPS’18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 3–8 December 2018; Curran Associates, Inc.: Red Hook, NY, USA, 2018; pp. 3183–3193. [Google Scholar]
  12. Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 1096–1103. [Google Scholar]
  13. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  14. Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
  15. Kucukelbir, A.; Tran, D.; Ranganath, R.; Gelman, A.; Blei, D.M. Automatic Differentiation Variational Inference. J. Mach. Learn. Res. 2017, 18, 430–474. [Google Scholar]
  16. LeCun, Y. The MNIST database of Handwritten Digits. 1998. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 6 June 2020).
  17. Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar]
  18. Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. Self-Normalizing Neural Networks. In NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 972–981. [Google Scholar]
  19. Selig, M.; Bell, M.R.; Junklewitz, H.; Oppermann, N.; Reinecke, M.; Greiner, M.; Pachajoa, C.; Enßlin, T.A. NIFTY–Numerical Information Field Theory-A versatile PYTHON library for signal inference. Astron. Astrophys. 2013, 554, A26. [Google Scholar] [CrossRef]
  20. Li, B.; Hu, W.; Wu, T.; Zhu, S.C. Modeling occlusion by discriminative and-or structures. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 2560–2567. [Google Scholar]
  21. Liu, X.; Tanaka, M.; Okutomi, M. Noise level estimation using weak textured patches of a single noisy image. In Proceedings of the 2012 19th IEEE International Conference on Image Processing, Orlando, FL, USA, 30 September–3 October 2012; pp. 665–668. [Google Scholar]
Figure 1. From left to right: Ground truth image x in the data space, corrupted image d in the data space (random masking m , Gaussian blur C , additive white Gaussian noise n ), posterior mean H ¯ in the latent space with reconstruction uncertainty δ r , model uncertainty δ m , and the restored image g ( H ¯ ) (decoded posterior mean) in the data space. We included the encoding of the uncorrupted data f ( x ) (illustrated by the shaded white bars in the third column). Top row: data sample from the MNIST-dataset (ground truth label: 4). Bottom row: data sample from the Fashion-MNIST-dataset (ground truth label: 2 (pullover)). We can classify d using the posterior mean H ¯ as the autoencoder’s latent space is supervised (note the highlighted max. activation responsible for classification). We are able to classify and quantify model uncertainty δ m with the Mahalanobis distance in the latent space (note the highlighted min. activation responsible for classification). Strong overlapping for the Fashion-MNIST-example of the 1 · σ error bars of δ r across different classes indicates that no reliable and confident classification is possible due to heavy corruption.
Figure 1. From left to right: Ground truth image x in the data space, corrupted image d in the data space (random masking m , Gaussian blur C , additive white Gaussian noise n ), posterior mean H ¯ in the latent space with reconstruction uncertainty δ r , model uncertainty δ m , and the restored image g ( H ¯ ) (decoded posterior mean) in the data space. We included the encoding of the uncorrupted data f ( x ) (illustrated by the shaded white bars in the third column). Top row: data sample from the MNIST-dataset (ground truth label: 4). Bottom row: data sample from the Fashion-MNIST-dataset (ground truth label: 2 (pullover)). We can classify d using the posterior mean H ¯ as the autoencoder’s latent space is supervised (note the highlighted max. activation responsible for classification). We are able to classify and quantify model uncertainty δ m with the Mahalanobis distance in the latent space (note the highlighted min. activation responsible for classification). Strong overlapping for the Fashion-MNIST-example of the 1 · σ error bars of δ r across different classes indicates that no reliable and confident classification is possible due to heavy corruption.
Psf 05 00012 g001
Figure 2. Concept visualization: steps involved in our methodology as described in Section 2.1.
Figure 2. Concept visualization: steps involved in our methodology as described in Section 2.1.
Psf 05 00012 g002
Figure 3. Accuracy and uncertainty of classifications of data samples of the MNIST dataset at different noise levels (left column, (a)), different masking levels (middle column, (b)), and different convolution levels (right column, (c)) exploiting the supervised latent space structure (top row) with reconstruction uncertainty δ r and the M-distance δ m (bottom row) as classifying features. ACC f serves as the baseline and is the accuracy of the plain encoding function f classifying corrupted data. ACC g corresponds to the accuracy of the method proposed in Section 2.1. Additionally, we distinguish between the uncertainties of correct classifications δ r True , δ m True and of false classifications δ r False , δ m False . The plot was generated with 1000 test samples for each data point.
Figure 3. Accuracy and uncertainty of classifications of data samples of the MNIST dataset at different noise levels (left column, (a)), different masking levels (middle column, (b)), and different convolution levels (right column, (c)) exploiting the supervised latent space structure (top row) with reconstruction uncertainty δ r and the M-distance δ m (bottom row) as classifying features. ACC f serves as the baseline and is the accuracy of the plain encoding function f classifying corrupted data. ACC g corresponds to the accuracy of the method proposed in Section 2.1. Additionally, we distinguish between the uncertainties of correct classifications δ r True , δ m True and of false classifications δ r False , δ m False . The plot was generated with 1000 test samples for each data point.
Psf 05 00012 g003
Figure 4. Illustration of the latent space structure of a supervised autoencoder and the M-distance as a classifier based on MNIST. For an arbitrary corrupted datum d , the inferred posterior mean H ¯ in the latent space is marked accordingly. To classify H ¯ , the M-distance is computed for every cluster in the latent space to obtain δ m i for all ten classes. The shortest distance argmin ( δ m ) serves as the classification. Samples shown in the figure are transformed to a two-dimensional subspace by principal component analysis. The first two principal components (PC-1, PC-2) are plotted.
Figure 4. Illustration of the latent space structure of a supervised autoencoder and the M-distance as a classifier based on MNIST. For an arbitrary corrupted datum d , the inferred posterior mean H ¯ in the latent space is marked accordingly. To classify H ¯ , the M-distance is computed for every cluster in the latent space to obtain δ m i for all ten classes. The shortest distance argmin ( δ m ) serves as the classification. Samples shown in the figure are transformed to a two-dimensional subspace by principal component analysis. The first two principal components (PC-1, PC-2) are plotted.
Psf 05 00012 g004
Figure 5. U-ROC of the proposed identifier of false classifications for different noise levels α of our method in comparison with MC-dropout and with EDL. In this experiment, the formulation of the, e.g., “True Negative” case would be: based on the uncertainty value, the sample is correctly detected as a false classification—the “lie detector” works. Samples are taken from the MNIST-dataset. Top left: corrupted datum at α = 0.1 . Bottom left: corrupted datum at α = 1.0 . The irregularity in the U-ROC curve of the dropout model is due to the stochastic nature of MC-dropout. Bottom right: Evaluation of Accuracy ( ACC ) for all given noise values α and the AUC for all U-ROCs.
Figure 5. U-ROC of the proposed identifier of false classifications for different noise levels α of our method in comparison with MC-dropout and with EDL. In this experiment, the formulation of the, e.g., “True Negative” case would be: based on the uncertainty value, the sample is correctly detected as a false classification—the “lie detector” works. Samples are taken from the MNIST-dataset. Top left: corrupted datum at α = 0.1 . Bottom left: corrupted datum at α = 1.0 . The irregularity in the U-ROC curve of the dropout model is due to the stochastic nature of MC-dropout. Bottom right: Evaluation of Accuracy ( ACC ) for all given noise values α and the AUC for all U-ROCs.
Psf 05 00012 g005
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Joppich, P.; Dorn, S.; De Candido, O.; Knollmüller, J.; Utschick, W. Classification and Uncertainty Quantification of Corrupted Data Using Supervised Autoencoders. Phys. Sci. Forum 2022, 5, 12. https://doi.org/10.3390/psf2022005012

AMA Style

Joppich P, Dorn S, De Candido O, Knollmüller J, Utschick W. Classification and Uncertainty Quantification of Corrupted Data Using Supervised Autoencoders. Physical Sciences Forum. 2022; 5(1):12. https://doi.org/10.3390/psf2022005012

Chicago/Turabian Style

Joppich, Philipp, Sebastian Dorn, Oliver De Candido, Jakob Knollmüller, and Wolfgang Utschick. 2022. "Classification and Uncertainty Quantification of Corrupted Data Using Supervised Autoencoders" Physical Sciences Forum 5, no. 1: 12. https://doi.org/10.3390/psf2022005012

Article Metrics

Back to TopTop