Entropy

Research

14 pages, 1467 KB

Open AccessArticle

A Manifold Learning Perspective on Representation Learning: Learning Decoder and Representations without an Encoder

by Viktoria Schuster and Anders Krogh

Entropy 2021, 23(11), 1403; https://doi.org/10.3390/e23111403 - 25 Oct 2021

Cited by 6 | Viewed by 4304

Autoencoders are commonly used in representation learning. They consist of an encoder and a decoder, which provide a straightforward method to map n-dimensional data in input space to a lower m-dimensional representation space and back. The decoder itself defines an m [...] Read more.

Autoencoders are commonly used in representation learning. They consist of an encoder and a decoder, which provide a straightforward method to map n-dimensional data in input space to a lower m-dimensional representation space and back. The decoder itself defines an m-dimensional manifold in input space. Inspired by manifold learning, we showed that the decoder can be trained on its own by learning the representations of the training samples along with the decoder weights using gradient descent. A sum-of-squares loss then corresponds to optimizing the manifold to have the smallest Euclidean distance to the training samples, and similarly for other loss functions. We derived expressions for the number of samples needed to specify the encoder and decoder and showed that the decoder generally requires much fewer training samples to be well-specified compared to the encoder. We discuss the training of autoencoders in this perspective and relate it to previous work in the field that uses noisy training examples and other types of regularization. On the natural image data sets MNIST and CIFAR10, we demonstrated that the decoder is much better suited to learn a low-dimensional representation, especially when trained on small data sets. Using simulated gene regulatory data, we further showed that the decoder alone leads to better generalization and meaningful representations. Our approach of training the decoder alone facilitates representation learning even on small data sets and can lead to improved training of autoencoders. We hope that the simple analyses presented will also contribute to an improved conceptual understanding of representation learning. Full article

(This article belongs to the Special Issue Representation Learning: Theory, Applications and Ethical Issues)

► Show Figures

Figure 1

21 pages, 1400 KB

Open AccessArticle

The Problem of Fairness in Synthetic Healthcare Data

by Karan Bhanot, Miao Qi, John S. Erickson, Isabelle Guyon and Kristin P. Bennett

Entropy 2021, 23(9), 1165; https://doi.org/10.3390/e23091165 - 4 Sep 2021

Cited by 82 | Viewed by 9038

Abstract

Access to healthcare data such as electronic health records (EHR) is often restricted by laws established to protect patient privacy. These restrictions hinder the reproducibility of existing results based on private healthcare data and also limit new research. Synthetically-generated healthcare data solve this [...] Read more.

Access to healthcare data such as electronic health records (EHR) is often restricted by laws established to protect patient privacy. These restrictions hinder the reproducibility of existing results based on private healthcare data and also limit new research. Synthetically-generated healthcare data solve this problem by preserving privacy and enabling researchers and policymakers to drive decisions and methods based on realistic data. Healthcare data can include information about multiple in- and out- patient visits of patients, making it a time-series dataset which is often influenced by protected attributes like age, gender, race etc. The COVID-19 pandemic has exacerbated health inequities, with certain subgroups experiencing poorer outcomes and less access to healthcare. To combat these inequities, synthetic data must “fairly” represent diverse minority subgroups such that the conclusions drawn on synthetic data are correct and the results can be generalized to real data. In this article, we develop two fairness metrics for synthetic data, and analyze all subgroups defined by protected attributes to analyze the bias in three published synthetic research datasets. These covariate-level disparity metrics revealed that synthetic data may not be representative at the univariate and multivariate subgroup-levels and thus, fairness should be addressed when developing data generation methods. We discuss the need for measuring fairness in synthetic healthcare data to enable the development of robust machine learning models to create more equitable synthetic healthcare datasets. Full article

(This article belongs to the Special Issue Representation Learning: Theory, Applications and Ethical Issues)

► Show Figures

Figure 1

21 pages, 1809 KB

Open AccessArticle

Occlusion-Based Explanations in Deep Recurrent Models for Biomedical Signals

by Michele Resta, Anna Monreale and Davide Bacciu

Entropy 2021, 23(8), 1064; https://doi.org/10.3390/e23081064 - 17 Aug 2021

Cited by 15 | Viewed by 5874

Abstract

The biomedical field is characterized by an ever-increasing production of sequential data, which often come in the form of biosignals capturing the time-evolution of physiological processes, such as blood pressure and brain activity. This has motivated a large body of research dealing with [...] Read more.

The biomedical field is characterized by an ever-increasing production of sequential data, which often come in the form of biosignals capturing the time-evolution of physiological processes, such as blood pressure and brain activity. This has motivated a large body of research dealing with the development of machine learning techniques for the predictive analysis of such biosignals. Unfortunately, in high-stakes decision making, such as clinical diagnosis, the opacity of machine learning models becomes a crucial aspect to be addressed in order to increase the trust and adoption of AI technology. In this paper, we propose a model agnostic explanation method, based on occlusion, that enables the learning of the input’s influence on the model predictions. We specifically target problems involving the predictive analysis of time-series data and the models that are typically used to deal with data of such nature, i.e., recurrent neural networks. Our approach is able to provide two different kinds of explanations: one suitable for technical experts, who need to verify the quality and correctness of machine learning models, and one suited to physicians, who need to understand the rationale underlying the prediction to make aware decisions. A wide experimentation on different physiological data demonstrates the effectiveness of our approach both in classification and regression tasks. Full article

(This article belongs to the Special Issue Representation Learning: Theory, Applications and Ethical Issues)

► Show Figures

Figure 1

26 pages, 5039 KB

Open AccessArticle

Toward Learning Trustworthily from Data Combining Privacy, Fairness, and Explainability: An Application to Face Recognition

by Danilo Franco, Luca Oneto, Nicolò Navarin and Davide Anguita

Entropy 2021, 23(8), 1047; https://doi.org/10.3390/e23081047 - 14 Aug 2021

Cited by 22 | Viewed by 5607

Abstract

In many decision-making scenarios, ranging from recreational activities to healthcare and policing, the use of artificial intelligence coupled with the ability to learn from historical data is becoming ubiquitous. This widespread adoption of automated systems is accompanied by the increasing concerns regarding their [...] Read more.

In many decision-making scenarios, ranging from recreational activities to healthcare and policing, the use of artificial intelligence coupled with the ability to learn from historical data is becoming ubiquitous. This widespread adoption of automated systems is accompanied by the increasing concerns regarding their ethical implications. Fundamental rights, such as the ones that require the preservation of privacy, do not discriminate based on sensible attributes (e.g., gender, ethnicity, political/sexual orientation), or require one to provide an explanation for a decision, are daily undermined by the use of increasingly complex and less understandable yet more accurate learning algorithms. For this purpose, in this work, we work toward the development of systems able to ensure trustworthiness by delivering privacy, fairness, and explainability by design. In particular, we show that it is possible to simultaneously learn from data while preserving the privacy of the individuals thanks to the use of Homomorphic Encryption, ensuring fairness by learning a fair representation from the data, and ensuring explainable decisions with local and global explanations without compromising the accuracy of the final models. We test our approach on a widespread but still controversial application, namely face recognition, using the recent FairFace dataset to prove the validity of our approach. Full article

(This article belongs to the Special Issue Representation Learning: Theory, Applications and Ethical Issues)

► Show Figures

Figure 1

18 pages, 492 KB

Open AccessArticle

Propositional Kernels

by Mirko Polato and Fabio Aiolli

Entropy 2021, 23(8), 1020; https://doi.org/10.3390/e23081020 - 7 Aug 2021

Cited by 1 | Viewed by 2435

Abstract

The pervasive presence of artificial intelligence (AI) in our everyday life has nourished the pursuit of explainable AI. Since the dawn of AI, logic has been widely used to express, in a human-friendly fashion, the internal process that led an (intelligent) system to [...] Read more.

The pervasive presence of artificial intelligence (AI) in our everyday life has nourished the pursuit of explainable AI. Since the dawn of AI, logic has been widely used to express, in a human-friendly fashion, the internal process that led an (intelligent) system to deliver a specific output. In this paper, we take a step forward in this direction by introducing a novel family of kernels, called Propositional kernels, that construct feature spaces that are easy to interpret. Specifically, Propositional Kernel functions compute the similarity between two binary vectors in a feature space composed of logical propositions of a fixed form. The Propositional kernel framework improves upon the recent Boolean kernel framework by providing more expressive kernels. In addition to the theoretical definitions, we also provide an algorithm (and the source code) to efficiently construct any propositional kernel. An extensive empirical evaluation shows the effectiveness of Propositional kernels on several artificial and benchmark categorical data sets. Full article

(This article belongs to the Special Issue Representation Learning: Theory, Applications and Ethical Issues)

► Show Figures

Figure 1

22 pages, 1936 KB

Open AccessFeature PaperEditor’s ChoiceArticle

Feature Selection for Recommender Systems with Quantum Computing

by Riccardo Nembrini, Maurizio Ferrari Dacrema and Paolo Cremonesi

Entropy 2021, 23(8), 970; https://doi.org/10.3390/e23080970 - 28 Jul 2021

Cited by 63 | Viewed by 7938

Abstract

The promise of quantum computing to open new unexplored possibilities in several scientific fields has been long discussed, but until recently the lack of a functional quantum computer has confined this discussion mostly to theoretical algorithmic papers. It was only in the last [...] Read more.

The promise of quantum computing to open new unexplored possibilities in several scientific fields has been long discussed, but until recently the lack of a functional quantum computer has confined this discussion mostly to theoretical algorithmic papers. It was only in the last few years that small but functional quantum computers have become available to the broader research community. One paradigm in particular, quantum annealing, can be used to sample optimal solutions for a number of NP-hard optimization problems represented with classical operations research tools, providing an easy access to the potential of this emerging technology. One of the tasks that most naturally fits in this mathematical formulation is feature selection. In this paper, we investigate how to design a hybrid feature selection algorithm for recommender systems that leverages the domain knowledge and behavior hidden in the user interactions data. We represent the feature selection as an optimization problem and solve it on a real quantum computer, provided by D-Wave. The results indicate that the proposed approach is effective in selecting a limited set of important features and that quantum computers are becoming powerful enough to enter the wider realm of applied science. Full article

(This article belongs to the Special Issue Representation Learning: Theory, Applications and Ethical Issues)

► Show Figures

Figure 1

18 pages, 5333 KB

Open AccessArticle

Learning Ordinal Embedding from Sets

by Aïssatou Diallo and Johannes Fürnkranz

Entropy 2021, 23(8), 964; https://doi.org/10.3390/e23080964 - 27 Jul 2021

Viewed by 3146

Abstract

Ordinal embedding is the task of computing a meaningful multidimensional representation of objects, for which only qualitative constraints on their distance functions are known. In particular, we consider comparisons of the form “Which object from the pair

(j, k)

is [...] Read more.

Ordinal embedding is the task of computing a meaningful multidimensional representation of objects, for which only qualitative constraints on their distance functions are known. In particular, we consider comparisons of the form “Which object from the pair

(j, k)

is more similar to object i?”. In this paper, we generalize this framework to the case where the ordinal constraints are not given at the level of individual points, but at the level of sets, and propose a distributional triplet embedding approach in a scalable learning framework. We show that the query complexity of our approach is on par with the single-item approach. Without having access to features of the items to be embedded, we show the applicability of our model on toy datasets for the task of reconstruction and demonstrate the validity of the obtained embeddings in experiments on synthetic and real-world datasets. Full article

(This article belongs to the Special Issue Representation Learning: Theory, Applications and Ethical Issues)

► Show Figures

Figure 1

14 pages, 1627 KB

Open AccessArticle

Learning Numerosity Representations with Transformers: Number Generation Tasks and Out-of-Distribution Generalization

by Tommaso Boccato, Alberto Testolin and Marco Zorzi

Entropy 2021, 23(7), 857; https://doi.org/10.3390/e23070857 - 3 Jul 2021

Cited by 5 | Viewed by 3605

Abstract

One of the most rapidly advancing areas of deep learning research aims at creating models that learn to disentangle the latent factors of variation from a data distribution. However, modeling joint probability mass functions is usually prohibitive, which motivates the use of conditional [...] Read more.

One of the most rapidly advancing areas of deep learning research aims at creating models that learn to disentangle the latent factors of variation from a data distribution. However, modeling joint probability mass functions is usually prohibitive, which motivates the use of conditional models assuming that some information is given as input. In the domain of numerical cognition, deep learning architectures have successfully demonstrated that approximate numerosity representations can emerge in multi-layer networks that build latent representations of a set of images with a varying number of items. However, existing models have focused on tasks requiring to conditionally estimate numerosity information from a given image. Here, we focus on a set of much more challenging tasks, which require to conditionally generate synthetic images containing a given number of items. We show that attention-based architectures operating at the pixel level can learn to produce well-formed images approximately containing a specific number of items, even when the target numerosity was not present in the training distribution. Full article

(This article belongs to the Special Issue Representation Learning: Theory, Applications and Ethical Issues)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Representation Learning: Theory, Applications and Ethical Issues

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI