Special Issue "Information Theoretic Learning and Kernel Methods"

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (31 August 2019).

Special Issue Editors

Prof. Badong Chen
E-Mail Website
Guest Editor
Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, P. R. China
Tel. 86-18392892686; Fax: +86-29-8266-8672
Interests: entropy; probability theory; estimation theory; machine learning; information theoretic learning; signal processing; system identification; adaptive filtering; kernel methods
Special Issues and Collections in MDPI journals
Prof. Jose C. Principe
E-Mail Website
Guest Editor
Computational NeuroEngineering Lab, University of Florida, Gainesville, FL 32611, USA
Interests: information theoretic learning; kernel methods; adaptive signal processing; brain machine interfaces
Special Issues and Collections in MDPI journals

Special Issue Information

Dear Colleagues,

The extraction of information from data and its feedback to the machine is a crucial step in successfully building a general framework for machine learning. We believe that information descriptors like entropy and divergence aptly suit this role, since these scalar quantifiers of information in data are easy to work with to derive various learning rules. Information theoretic learning (ITL) was originally derived for supervised learning applications. The idea is that the error distribution in supervised learning is often non-Gaussian, therefore traditional mean square error (MSE) is not the optimal criterion to use, and in such cases the information theoretic descriptors-based costs can provide better nonlinear models in a range of problems from system identification to classification. The popular ITL criteria include the minimum error entropy (MEE) criterion, maximum (or minimum) mutual information criterion, minimum divergence criterion, maximum correntropy criterion (MCC), etc. On the other hand, the kernel methods are powerful tools for nonlinear systems modeling in machine learning community. So far, many kernel learning algorithms have been developed, including SVM, kernel PCA, kernel adaptive filtering (KAF), and so on. Therefore, ITL and kernel methods are efficient approaches for learning nonlinear mapping in non-Gaussian environments. It is also worth noting that there are close relationships between the ITL and kernel methods. For example, the sample estimator of the Renyi’s quadratic entropy can be viewed as a similarity measure in kernel space.          

In this Special Issue, we seek contributions that apply either information theoretic descriptors or kernel methods to deal with various machine learning problems. The scope of the contributions will be very broad, including theoretical studies and practical applications to regression, classification, system identification, deep learning, unsupervised learning and reinforcement learning, etc.

Prof. Badong Chen
Prof. Jose C. Principe
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • information theoretic learning
  • kernel methods
  • entropy
  • nonlinear systems,
  • non-Gaussian signals

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Open AccessArticle
Information Theoretic Causal Effect Quantification
Entropy 2019, 21(10), 975; https://doi.org/10.3390/e21100975 - 05 Oct 2019
Abstract
Modelling causal relationships has become popular across various disciplines. Most common frameworks for causality are the Pearlian causal directed acyclic graphs (DAGs) and the Neyman-Rubin potential outcome framework. In this paper, we propose an information theoretic framework for causal effect quantification. To this [...] Read more.
Modelling causal relationships has become popular across various disciplines. Most common frameworks for causality are the Pearlian causal directed acyclic graphs (DAGs) and the Neyman-Rubin potential outcome framework. In this paper, we propose an information theoretic framework for causal effect quantification. To this end, we formulate a two step causal deduction procedure in the Pearl and Rubin frameworks and introduce its equivalent which uses information theoretic terms only. The first step of the procedure consists of ensuring no confounding or finding an adjustment set with directed information. In the second step, the causal effect is quantified. We subsequently unify previous definitions of directed information present in the literature and clarify the confusion surrounding them. We also motivate using chain graphs for directed information in time series and extend our approach to chain graphs. The proposed approach serves as a translation between causality modelling and information theory. Full article
(This article belongs to the Special Issue Information Theoretic Learning and Kernel Methods)
Open AccessArticle
Kernel Mixture Correntropy Conjugate Gradient Algorithm for Time Series Prediction
Entropy 2019, 21(8), 785; https://doi.org/10.3390/e21080785 - 11 Aug 2019
Cited by 1
Abstract
Kernel adaptive filtering (KAF) is an effective nonlinear learning algorithm, which has been widely used in time series prediction. The traditional KAF is based on the stochastic gradient descent (SGD) method, which has slow convergence speed and low filtering accuracy. Hence, a kernel [...] Read more.
Kernel adaptive filtering (KAF) is an effective nonlinear learning algorithm, which has been widely used in time series prediction. The traditional KAF is based on the stochastic gradient descent (SGD) method, which has slow convergence speed and low filtering accuracy. Hence, a kernel conjugate gradient (KCG) algorithm has been proposed with low computational complexity, while achieving comparable performance to some KAF algorithms, e.g., the kernel recursive least squares (KRLS). However, the robust learning performance is unsatisfactory, when using KCG. Meanwhile, correntropy as a local similarity measure defined in kernel space, can address large outliers in robust signal processing. On the basis of correntropy, the mixture correntropy is developed, which uses the mixture of two Gaussian functions as a kernel function to further improve the learning performance. Accordingly, this article proposes a novel KCG algorithm, named the kernel mixture correntropy conjugate gradient (KMCCG), with the help of the mixture correntropy criterion (MCC). The proposed algorithm has less computational complexity and can achieve better performance in non-Gaussian noise environments. To further control the growing radial basis function (RBF) network in this algorithm, we also use a simple sparsification criterion based on the angle between elements in the reproducing kernel Hilbert space (RKHS). The prediction simulation results on a synthetic chaotic time series and a real benchmark dataset show that the proposed algorithm can achieve better computational performance. In addition, the proposed algorithm is also successfully applied to the practical tasks of malware prediction in the field of malware analysis. The results demonstrate that our proposed algorithm not only has a short training time, but also can achieve high prediction accuracy. Full article
(This article belongs to the Special Issue Information Theoretic Learning and Kernel Methods)
Show Figures

Figure 1

Open AccessArticle
Electricity Consumption Forecasting using Support Vector Regression with the Mixture Maximum Correntropy Criterion
Entropy 2019, 21(7), 707; https://doi.org/10.3390/e21070707 - 19 Jul 2019
Abstract
The electricity consumption forecasting (ECF) technology plays a crucial role in the electricity market. The support vector regression (SVR) is a nonlinear prediction model that can be used for ECF. The electricity consumption (EC) data are usually nonlinear and non-Gaussian and present outliers. [...] Read more.
The electricity consumption forecasting (ECF) technology plays a crucial role in the electricity market. The support vector regression (SVR) is a nonlinear prediction model that can be used for ECF. The electricity consumption (EC) data are usually nonlinear and non-Gaussian and present outliers. The traditional SVR with the mean-square error (MSE), however, is insensitive to outliers and cannot correctly represent the statistical information of errors in non-Gaussian situations. To address this problem, a novel robust forecasting method is developed in this work by using the mixture maximum correntropy criterion (MMCC). The MMCC, as a novel cost function of information theoretic, can be used to solve non-Gaussian signal processing; therefore, in the original SVR, the MSE is replaced by the MMCC to develop a novel robust SVR method (called MMCCSVR) for ECF. Besides, the factors influencing users’ EC are investigated by a data statistical analysis method. We find that the historical temperature and historical EC are the main factors affecting future EC, and thus these two factors are used as the input in the proposed model. Finally, real EC data from a shopping mall in Guangzhou, China, are utilized to test the proposed ECF method. The forecasting results show that the proposed ECF method can effectively improve the accuracy of ECF compared with the traditional SVR and other forecasting algorithms. Full article
(This article belongs to the Special Issue Information Theoretic Learning and Kernel Methods)
Show Figures

Figure 1

Open AccessArticle
A Novel Active Learning Regression Framework for Balancing the Exploration-Exploitation Trade-Off
Entropy 2019, 21(7), 651; https://doi.org/10.3390/e21070651 - 01 Jul 2019
Abstract
Recently, active learning is considered a promising approach for data acquisition due to the significant cost of the data labeling process in many real world applications, such as natural language processing and image processing. Most active learning methods are merely designed to enhance [...] Read more.
Recently, active learning is considered a promising approach for data acquisition due to the significant cost of the data labeling process in many real world applications, such as natural language processing and image processing. Most active learning methods are merely designed to enhance the learning model accuracy. However, the model accuracy may not be the primary goal and there could be other domain-specific objectives to be optimized. In this work, we develop a novel active learning framework that aims to solve a general class of optimization problems. The proposed framework mainly targets the optimization problems exposed to the exploration-exploitation trade-off. The active learning framework is comprehensive, it includes exploration-based, exploitation-based and balancing strategies that seek to achieve the balance between exploration and exploitation. The paper mainly considers regression tasks, as they are under-researched in the active learning field compared to classification tasks. Furthermore, in this work, we investigate the different active querying approaches—pool-based and the query synthesis—and compare them. We apply the proposed framework to the problem of learning the price-demand function, an application that is important in optimal product pricing and dynamic (or time-varying) pricing. In our experiments, we provide a comparative study including the proposed framework strategies and some other baselines. The accomplished results demonstrate a significant performance for the proposed methods. Full article
(This article belongs to the Special Issue Information Theoretic Learning and Kernel Methods)
Show Figures

Figure 1

Open AccessArticle
Smooth Function Approximation by Deep Neural Networks with General Activation Functions
Entropy 2019, 21(7), 627; https://doi.org/10.3390/e21070627 - 26 Jun 2019
Abstract
There has been a growing interest in expressivity of deep neural networks. However, most of the existing work about this topic focuses only on the specific activation function such as ReLU or sigmoid. In this paper, we investigate the approximation ability of deep [...] Read more.
There has been a growing interest in expressivity of deep neural networks. However, most of the existing work about this topic focuses only on the specific activation function such as ReLU or sigmoid. In this paper, we investigate the approximation ability of deep neural networks with a broad class of activation functions. This class of activation functions includes most of frequently used activation functions. We derive the required depth, width and sparsity of a deep neural network to approximate any Hölder smooth function upto a given approximation error for the large class of activation functions. Based on our approximation error analysis, we derive the minimax optimality of the deep neural network estimators with the general activation functions in both regression and classification problems. Full article
(This article belongs to the Special Issue Information Theoretic Learning and Kernel Methods)
Open AccessArticle
Kernel Risk-Sensitive Mean p-Power Error Algorithms for Robust Learning
Entropy 2019, 21(6), 588; https://doi.org/10.3390/e21060588 - 13 Jun 2019
Abstract
As a nonlinear similarity measure defined in the reproducing kernel Hilbert space (RKHS), the correntropic loss (C-Loss) has been widely applied in robust learning and signal processing. However, the highly non-convex nature of C-Loss results in performance degradation. To address this issue, a [...] Read more.
As a nonlinear similarity measure defined in the reproducing kernel Hilbert space (RKHS), the correntropic loss (C-Loss) has been widely applied in robust learning and signal processing. However, the highly non-convex nature of C-Loss results in performance degradation. To address this issue, a convex kernel risk-sensitive loss (KRL) is proposed to measure the similarity in RKHS, which is the risk-sensitive loss defined as the expectation of an exponential function of the squared estimation error. In this paper, a novel nonlinear similarity measure, namely kernel risk-sensitive mean p-power error (KRP), is proposed by combining the mean p-power error into the KRL, which is a generalization of the KRL measure. The KRP with p = 2 reduces to the KRL, and can outperform the KRL when an appropriate p is configured in robust learning. Some properties of KRP are presented for discussion. To improve the robustness of the kernel recursive least squares algorithm (KRLS) and reduce its network size, two robust recursive kernel adaptive filters, namely recursive minimum kernel risk-sensitive mean p-power error algorithm (RMKRP) and its quantized RMKRP (QRMKRP), are proposed in the RKHS under the minimum kernel risk-sensitive mean p-power error (MKRP) criterion, respectively. Monte Carlo simulations are conducted to confirm the superiorities of the proposed RMKRP and its quantized version. Full article
(This article belongs to the Special Issue Information Theoretic Learning and Kernel Methods)
Show Figures

Figure 1

Open AccessArticle
A Correntropy-Based Proportionate Affine Projection Algorithm for Estimating Sparse Channels with Impulsive Noise
Entropy 2019, 21(6), 555; https://doi.org/10.3390/e21060555 - 02 Jun 2019
Cited by 1
Abstract
A novel robust proportionate affine projection (AP) algorithm is devised for estimating sparse channels, which often occur in network echo and wireless communication channels. The newly proposed algorithm is realized by using the maximum correntropy criterion (MCC) and the data reusing scheme used [...] Read more.
A novel robust proportionate affine projection (AP) algorithm is devised for estimating sparse channels, which often occur in network echo and wireless communication channels. The newly proposed algorithm is realized by using the maximum correntropy criterion (MCC) and the data reusing scheme used in AP to overcome the identification performance degradation of the traditional PAP algorithm in impulsive noise environments. The proposed algorithm is referred to as the proportionate affine projection maximum correntropy criterion (PAPMCC) algorithm, which is derived in the context of channel estimation framework. Many simulation results were obtained to verify that the PAPMCC algorithm is superior to early reported AP algorithms with different input signals under impulsive noise environments. Full article
(This article belongs to the Special Issue Information Theoretic Learning and Kernel Methods)
Show Figures

Figure 1

Open AccessFeature PaperArticle
Reduction of Markov Chains Using a Value-of-Information-Based Approach
Entropy 2019, 21(4), 349; https://doi.org/10.3390/e21040349 - 30 Mar 2019
Abstract
In this paper, we propose an approach to obtain reduced-order models of Markov chains. Our approach is composed of two information-theoretic processes. The first is a means of comparing pairs of stationary chains on different state spaces, which is done via the negative, [...] Read more.
In this paper, we propose an approach to obtain reduced-order models of Markov chains. Our approach is composed of two information-theoretic processes. The first is a means of comparing pairs of stationary chains on different state spaces, which is done via the negative, modified Kullback–Leibler divergence defined on a model joint space. Model reduction is achieved by solving a value-of-information criterion with respect to this divergence. Optimizing the criterion leads to a probabilistic partitioning of the states in the high-order Markov chain. A single free parameter that emerges through the optimization process dictates both the partition uncertainty and the number of state groups. We provide a data-driven means of choosing the ‘optimal’ value of this free parameter, which sidesteps needing to a priori know the number of state groups in an arbitrary chain. Full article
(This article belongs to the Special Issue Information Theoretic Learning and Kernel Methods)
Show Figures

Figure 1

Open AccessArticle
Adaptive Extended Kalman Filter with Correntropy Loss for Robust Power System State Estimation
Entropy 2019, 21(3), 293; https://doi.org/10.3390/e21030293 - 18 Mar 2019
Cited by 1
Abstract
Monitoring the current operation status of the power system plays an essential role in the enhancement of the power grid for future requirements. Therefore, the real-time state estimation (SE) of the power system has been of widely-held concern. The Kalman filter is an [...] Read more.
Monitoring the current operation status of the power system plays an essential role in the enhancement of the power grid for future requirements. Therefore, the real-time state estimation (SE) of the power system has been of widely-held concern. The Kalman filter is an outstanding method for the SE, and the noise in the system is generally assumed to be Gaussian noise. In the actual power system however, these measurements are usually disturbed by non-Gaussian noises in practice. Furthermore, it is hard to get the statistics of the state noise and measurement noise. As a result, a novel adaptive extended Kalman filter with correntropy loss is proposed and applied for power system SE in this paper. Firstly, correntropy is used to improve the robustness of the EKF algorithm in the presence of non-Gaussian noises and outliers. In addition, an adaptive update mechanism of the covariance matrixes of the measurement and process noises is introduced into the EKF with correntropy loss to enhance the accuracy of the algorithm. Extensive simulations are carried out on IEEE 14-bus and IEEE 30-bus test systems to verify the feasibility and robustness of the proposed algorithm. Full article
(This article belongs to the Special Issue Information Theoretic Learning and Kernel Methods)
Show Figures

Figure 1

Open AccessFeature PaperArticle
Simple Stopping Criteria for Information Theoretic Feature Selection
Entropy 2019, 21(1), 99; https://doi.org/10.3390/e21010099 - 21 Jan 2019
Cited by 2
Abstract
Feature selection aims to select the smallest feature subset that yields the minimum generalization error. In the rich literature in feature selection, information theory-based approaches seek a subset of features such that the mutual information between the selected features and the class labels [...] Read more.
Feature selection aims to select the smallest feature subset that yields the minimum generalization error. In the rich literature in feature selection, information theory-based approaches seek a subset of features such that the mutual information between the selected features and the class labels is maximized. Despite the simplicity of this objective, there still remain several open problems in optimization. These include, for example, the automatic determination of the optimal subset size (i.e., the number of features) or a stopping criterion if the greedy searching strategy is adopted. In this paper, we suggest two stopping criteria by just monitoring the conditional mutual information (CMI) among groups of variables. Using the recently developed multivariate matrix-based Rényi’s α-entropy functional, which can be directly estimated from data samples, we showed that the CMI among groups of variables can be easily computed without any decomposition or approximation, hence making our criteria easy to implement and seamlessly integrated into any existing information theoretic feature selection methods with a greedy search strategy. Full article
(This article belongs to the Special Issue Information Theoretic Learning and Kernel Methods)
Show Figures

Figure 1

Open AccessArticle
Drug-Drug Interaction Extraction via Recurrent Hybrid Convolutional Neural Networks with an Improved Focal Loss
Entropy 2019, 21(1), 37; https://doi.org/10.3390/e21010037 - 08 Jan 2019
Cited by 1
Abstract
Drug-drug interactions (DDIs) may bring huge health risks and dangerous effects to a patient’s body when taking two or more drugs at the same time or within a certain period of time. Therefore, the automatic extraction of unknown DDIs has great potential for [...] Read more.
Drug-drug interactions (DDIs) may bring huge health risks and dangerous effects to a patient’s body when taking two or more drugs at the same time or within a certain period of time. Therefore, the automatic extraction of unknown DDIs has great potential for the development of pharmaceutical agents and the safety of drug use. In this article, we propose a novel recurrent hybrid convolutional neural network (RHCNN) for DDI extraction from biomedical literature. In the embedding layer, the texts mentioning two entities are represented as a sequence of semantic embeddings and position embeddings. In particular, the complete semantic embedding is obtained by the information fusion between a word embedding and its contextual information which is learnt by recurrent structure. After that, the hybrid convolutional neural network is employed to learn the sentence-level features which consist of the local context features from consecutive words and the dependency features between separated words for DDI extraction. Lastly but most significantly, in order to make up for the defects of the traditional cross-entropy loss function when dealing with class imbalanced data, we apply an improved focal loss function to mitigate against this problem when using the DDIExtraction 2013 dataset. In our experiments, we achieve DDI automatic extraction with a micro F-score of 75.48% on the DDIExtraction 2013 dataset, outperforming the state-of-the-art approach by 2.49%. Full article
(This article belongs to the Special Issue Information Theoretic Learning and Kernel Methods)
Show Figures

Figure 1

Open AccessArticle
KStable: A Computational Method for Predicting Protein Thermal Stability Changes by K-Star with Regular-mRMR Feature Selection
Entropy 2018, 20(12), 988; https://doi.org/10.3390/e20120988 - 19 Dec 2018
Abstract
Thermostability is a protein property that impacts many types of studies, including protein activity enhancement, protein structure determination, and drug development. However, most computational tools designed to predict protein thermostability require tertiary structure data as input. The few tools that are dependent only [...] Read more.
Thermostability is a protein property that impacts many types of studies, including protein activity enhancement, protein structure determination, and drug development. However, most computational tools designed to predict protein thermostability require tertiary structure data as input. The few tools that are dependent only on the primary structure of a protein to predict its thermostability have one or more of the following problems: a slow execution speed, an inability to make large-scale mutation predictions, and the absence of temperature and pH as input parameters. Therefore, we developed a computational tool, named KStable, that is sequence-based, computationally rapid, and includes temperature and pH values to predict changes in the thermostability of a protein upon the introduction of a mutation at a single site. KStable was trained using basis features and minimal redundancy–maximal relevance (mRMR) features, and 58 classifiers were subsequently tested. To find the representative features, a regular-mRMR method was developed. When KStable was evaluated with an independent test set, it achieved an accuracy of 0.708. Full article
(This article belongs to the Special Issue Information Theoretic Learning and Kernel Methods)
Show Figures

Figure 1

Open AccessArticle
Recursive Minimum Complex Kernel Risk-Sensitive Loss Algorithm
Entropy 2018, 20(12), 902; https://doi.org/10.3390/e20120902 - 25 Nov 2018
Cited by 1
Abstract
The maximum complex correntropy criterion (MCCC) has been extended to complex domain for dealing with complex-valued data in the presence of impulsive noise. Compared with the correntropy based loss, a kernel risk-sensitive loss (KRSL) defined in kernel space has demonstrated a superior performance [...] Read more.
The maximum complex correntropy criterion (MCCC) has been extended to complex domain for dealing with complex-valued data in the presence of impulsive noise. Compared with the correntropy based loss, a kernel risk-sensitive loss (KRSL) defined in kernel space has demonstrated a superior performance surface in the complex domain. However, there is no report regarding the recursive KRSL algorithm in the complex domain. Therefore, in this paper we propose a recursive complex KRSL algorithm called the recursive minimum complex kernel risk-sensitive loss (RMCKRSL). In addition, we analyze its stability and obtain the theoretical value of the excess mean square error (EMSE), which are both supported by simulations. Simulation results verify that the proposed RMCKRSL out-performs the MCCC, generalized MCCC (GMCCC), and traditional recursive least squares (RLS). Full article
(This article belongs to the Special Issue Information Theoretic Learning and Kernel Methods)
Show Figures

Figure 1

Back to TopTop