entropy-logo

Journal Browser

Journal Browser

Information Theoretic Learning

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (15 June 2016) | Viewed by 76911

Special Issue Editors


E-Mail Website
Guest Editor
Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, China
Interests: information theoretic learning; artificial intelligence; cognitive science; adaptive filtering; brain machine learning; robotics
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Computational NeuroEngineering Lab, University of Florida, Gainesville, FL 32611, USA
Interests: information theoretic learning; kernel methods; adaptive signal processing; brain machine interfaces
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleague,

In the past decades, especially in recent years, entropy and related information theoretic measures (e.g. mutual information) have been successfully applied in machine learning (supervised or unsupervised) and signal processing. Information theoretic quantities can capture higher-order statistics and offer potentially significant performance improvement in machine learning applications. In information theoretic learning (ITL), the measures from information theory (entropy, mutual information, divergences, etc.) are often used as an optimization cost instead of the conventional second-order statistical measures such as variance and covariance. For example, in supervised learning, such as regression, the problem can be formulated as that of minimizing the entropy of the error between model output and desired response. This optimization criterion is called in ITL the minimum error entropy (MEE) criterion. The information theoretic learning also links information theory, nonparametric estimators, and reproducing kernel Hilbert spaces (RKHS) in a simple and unconventional way. In particular, the correntropy as a nonlinear similarity measure in kernel space has its root in Renyi's entropy. Since correntropy (especially with a small kernel bandwidth) is insensitive to outliers, it is naturally a robust cost for machine learning. The correntropy induced metric (CIM) as an approximation of the l0 norm can also be used as a sparsity penalty in sparse learning.

In this Special Issue, we seek contributions that apply information theoretic quantities (entropy, mutual information, divergences, etc.) and related measures, such as correntropy, to deal with machine learning problems. The scope of the contributions will be very broad, including theoretical research and practical applications to regression, classification, clustering, graph and kernel learning, deep learning, and so on.

Prof. Dr. Badong Chen
Prof. Dr. Jose C. Principe
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.


Keywords

  • entropy
  • correntropy
  • information theoretic learning
  • kernel methods
  • machine learning

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

1965 KiB  
Article
Sparse Trajectory Prediction Based on Multiple Entropy Measures
by Lei Zhang, Leijun Liu, Zhanguo Xia, Wen Li and Qingfu Fan
Entropy 2016, 18(9), 327; https://doi.org/10.3390/e18090327 - 14 Sep 2016
Cited by 10 | Viewed by 5099
Abstract
Trajectory prediction is an important problem that has a large number of applications. A common approach to trajectory prediction is based on historical trajectories. However, existing techniques suffer from the “data sparsity problem”. The available historical trajectories are far from enough to cover [...] Read more.
Trajectory prediction is an important problem that has a large number of applications. A common approach to trajectory prediction is based on historical trajectories. However, existing techniques suffer from the “data sparsity problem”. The available historical trajectories are far from enough to cover all possible query trajectories. We propose the sparsity trajectory prediction algorithm based on multiple entropy measures (STP-ME) to address the data sparsity problem. Firstly, the moving region is iteratively divided into a two-dimensional plane grid graph, and each trajectory is represented as a grid sequence with temporal information. Secondly, trajectory entropy is used to evaluate trajectory’s regularity, the L-Z entropy estimator is implemented to calculate trajectory entropy, and a new trajectory space is generated through trajectory synthesis. We define location entropy and time entropy to measure the popularity of locations and timeslots respectively. Finally, a second-order Markov model that contains a temporal dimension is adopted to perform sparse trajectory prediction. The experiments show that when trip completed percentage increases towards 90%, the coverage of the baseline algorithm decreases to almost 25%, while the STP-ME algorithm successfully copes with it as expected with only an unnoticeable drop in coverage, and can constantly answer almost 100% of query trajectories. It is found that the STP-ME algorithm improves the prediction accuracy generally by as much as 8%, 3%, and 4%, compared to the baseline algorithm, the second-order Markov model (2-MM), and sub-trajectory synthesis (SubSyn) algorithm, respectively. At the same time, the prediction time of STP-ME algorithm is negligible (10 μ s ), greatly outperforming the baseline algorithm (100 ms ). Full article
(This article belongs to the Special Issue Information Theoretic Learning)
Show Figures

Figure 1

2996 KiB  
Article
Mechanical Fault Diagnosis of High Voltage Circuit Breakers with Unknown Fault Type Using Hybrid Classifier Based on LMD and Time Segmentation Energy Entropy
by Nantian Huang, Lihua Fang, Guowei Cai, Dianguo Xu, Huaijin Chen and Yonghui Nie
Entropy 2016, 18(9), 322; https://doi.org/10.3390/e18090322 - 3 Sep 2016
Cited by 44 | Viewed by 6597
Abstract
In order to improve the identification accuracy of the high voltage circuit breakers’ (HVCBs) mechanical fault types without training samples, a novel mechanical fault diagnosis method of HVCBs using a hybrid classifier constructed with Support Vector Data Description (SVDD) and fuzzy c-means (FCM) [...] Read more.
In order to improve the identification accuracy of the high voltage circuit breakers’ (HVCBs) mechanical fault types without training samples, a novel mechanical fault diagnosis method of HVCBs using a hybrid classifier constructed with Support Vector Data Description (SVDD) and fuzzy c-means (FCM) clustering method based on Local Mean Decomposition (LMD) and time segmentation energy entropy (TSEE) is proposed. Firstly, LMD is used to decompose nonlinear and non-stationary vibration signals of HVCBs into a series of product functions (PFs). Secondly, TSEE is chosen as feature vectors with the superiority of energy entropy and characteristics of time-delay faults of HVCBs. Then, SVDD trained with normal samples is applied to judge mechanical faults of HVCBs. If the mechanical fault is confirmed, the new fault sample and all known fault samples are clustered by FCM with the cluster number of known fault types. Finally, another SVDD trained by the specific fault samples is used to judge whether the fault sample belongs to an unknown type or not. The results of experiments carried on a real SF6 HVCB validate that the proposed fault-detection method is effective for the known faults with training samples and unknown faults without training samples. Full article
(This article belongs to the Special Issue Information Theoretic Learning)
Show Figures

Figure 1

520 KiB  
Article
Information Theoretical Measures for Achieving Robust Learning Machines
by Pablo Zegers, B. Roy Frieden, Carlos Alarcón and Alexis Fuentes
Entropy 2016, 18(8), 295; https://doi.org/10.3390/e18080295 - 12 Aug 2016
Cited by 1 | Viewed by 4160
Abstract
Information theoretical measures are used to design, from first principles, an objective function that can drive a learning machine process to a solution that is robust to perturbations in parameters. Full analytic derivations are given and tested with computational examples showing that indeed [...] Read more.
Information theoretical measures are used to design, from first principles, an objective function that can drive a learning machine process to a solution that is robust to perturbations in parameters. Full analytic derivations are given and tested with computational examples showing that indeed the procedure is successful. The final solution, implemented by a robust learning machine, expresses a balance between Shannon differential entropy and Fisher information. This is also surprising in being an analytical relation, given the purely numerical operations of the learning machine. Full article
(This article belongs to the Special Issue Information Theoretic Learning)
Show Figures

Graphical abstract

587 KiB  
Article
An Entropy-Based Kernel Learning Scheme toward Efficient Data Prediction in Cloud-Assisted Network Environments
by Xiong Luo, Ji Liu, Dandan Zhang, Weiping Wang and Yueqin Zhu
Entropy 2016, 18(7), 274; https://doi.org/10.3390/e18070274 - 22 Jul 2016
Cited by 4 | Viewed by 4508
Abstract
With the recent emergence of wireless sensor networks (WSNs) in the cloud computing environment, it is now possible to monitor and gather physical information via lots of sensor nodes to meet the requirements of cloud services. Generally, those sensor nodes collect data and [...] Read more.
With the recent emergence of wireless sensor networks (WSNs) in the cloud computing environment, it is now possible to monitor and gather physical information via lots of sensor nodes to meet the requirements of cloud services. Generally, those sensor nodes collect data and send data to sink node where end-users can query all the information and achieve cloud applications. Currently, one of the main disadvantages in the sensor nodes is that they are with limited physical performance relating to less memory for storage and less source of power. Therefore, in order to avoid such limitation, it is necessary to develop an efficient data prediction method in WSN. To serve this purpose, by reducing the redundant data transmission between sensor nodes and sink node while maintaining the required acceptable errors, this article proposes an entropy-based learning scheme for data prediction through the use of kernel least mean square (KLMS) algorithm. The proposed scheme called E-KLMS develops a mechanism to maintain the predicted data synchronous at both sides. Specifically, the kernel-based method is able to adjust the coefficients adaptively in accordance with every input, which will achieve a better performance with smaller prediction errors, while employing information entropy to remove these data which may cause relatively large errors. E-KLMS can effectively solve the tradeoff problem between prediction accuracy and computational efforts while greatly simplifying the training structure compared with some other data prediction approaches. What’s more, the kernel-based method and entropy technique could ensure the prediction effect by both improving the accuracy and reducing errors. Experiments with some real data sets have been carried out to validate the efficiency and effectiveness of E-KLMS learning scheme, and the experiment results show advantages of the our method in prediction accuracy and computational time. Full article
(This article belongs to the Special Issue Information Theoretic Learning)
Show Figures

Graphical abstract

1674 KiB  
Article
Maximum Entropy Learning with Deep Belief Networks
by Payton Lin, Szu-Wei Fu, Syu-Siang Wang, Ying-Hui Lai and Yu Tsao
Entropy 2016, 18(7), 251; https://doi.org/10.3390/e18070251 - 8 Jul 2016
Cited by 11 | Viewed by 14213
Abstract
Conventionally, the maximum likelihood (ML) criterion is applied to train a deep belief network (DBN). We present a maximum entropy (ME) learning algorithm for DBNs, designed specifically to handle limited training data. Maximizing only the entropy of parameters in the DBN allows more [...] Read more.
Conventionally, the maximum likelihood (ML) criterion is applied to train a deep belief network (DBN). We present a maximum entropy (ME) learning algorithm for DBNs, designed specifically to handle limited training data. Maximizing only the entropy of parameters in the DBN allows more effective generalization capability, less bias towards data distributions, and robustness to over-fitting compared to ML learning. Results of text classification and object recognition tasks demonstrate ME-trained DBN outperforms ML-trained DBN when training data is limited. Full article
(This article belongs to the Special Issue Information Theoretic Learning)
Show Figures

Graphical abstract

1305 KiB  
Article
Product Design Time Forecasting by Kernel-Based Regression with Gaussian Distribution Weights
by Zhi-Gen Shang and Hong-Sen Yan
Entropy 2016, 18(6), 231; https://doi.org/10.3390/e18060231 - 21 Jun 2016
Cited by 6 | Viewed by 4370
Abstract
There exist problems of small samples and heteroscedastic noise in design time forecasts. To solve them, a kernel-based regression with Gaussian distribution weights (GDW-KR) is proposed here. GDW-KR maintains a Gaussian distribution over weight vectors for the regression. It is applied to seek [...] Read more.
There exist problems of small samples and heteroscedastic noise in design time forecasts. To solve them, a kernel-based regression with Gaussian distribution weights (GDW-KR) is proposed here. GDW-KR maintains a Gaussian distribution over weight vectors for the regression. It is applied to seek the least informative distribution from those that keep the target value within the confidence interval of the forecast value. GDW-KR inherits the benefits of Gaussian margin machines. By assuming a Gaussian distribution over weight vectors, it could simultaneously offer a point forecast and its confidence interval, thus providing more information about product design time. Our experiments with real examples verify the effectiveness and flexibility of GDW-KR. Full article
(This article belongs to the Special Issue Information Theoretic Learning)
Show Figures

Graphical abstract

897 KiB  
Article
Sparse Estimation Based on a New Random Regularized Matching Pursuit Generalized Approximate Message Passing Algorithm
by Yongjie Luo, Guan Gui, Xunchao Cong and Qun Wan
Entropy 2016, 18(6), 207; https://doi.org/10.3390/e18060207 - 28 May 2016
Cited by 8 | Viewed by 4198
Abstract
Approximate Message Passing (AMP) and Generalized AMP (GAMP) algorithms usually suffer from serious convergence issues when the elements of the sensing matrix do not exactly match the zero-mean Gaussian assumption. To stabilize AMP/GAMP in these contexts, we have proposed a new sparse reconstruction [...] Read more.
Approximate Message Passing (AMP) and Generalized AMP (GAMP) algorithms usually suffer from serious convergence issues when the elements of the sensing matrix do not exactly match the zero-mean Gaussian assumption. To stabilize AMP/GAMP in these contexts, we have proposed a new sparse reconstruction algorithm, termed the Random regularized Matching pursuit GAMP (RrMpGAMP). It utilizes a random splitting support operation and some dropout/replacement support operations to make the matching pursuit steps regularized and uses a new GAMP-like algorithm to estimate the non-zero elements in a sparse vector. Moreover, our proposed algorithm can save much memory, be equipped with a comparable computational complexity as GAMP and support parallel computing in some steps. We have analyzed the convergence of this GAMP-like algorithm by the replica method and provided the convergence conditions of it. The analysis also gives an explanation about the broader variance range of the elements of the sensing matrix for this GAMP-like algorithm. Experiments using simulation data and real-world synthetic aperture radar tomography (TomoSAR) data show that our method provides the expected performance for scenarios where AMP/GAMP diverges. Full article
(This article belongs to the Special Issue Information Theoretic Learning)
Show Figures

Figure 1

1917 KiB  
Article
Entropy-Based Incomplete Cholesky Decomposition for a Scalable Spectral Clustering Algorithm: Computational Studies and Sensitivity Analysis
by Rocco Langone, Marc Van Barel and Johan A. K. Suykens
Entropy 2016, 18(5), 182; https://doi.org/10.3390/e18050182 - 13 May 2016
Cited by 7 | Viewed by 5211
Abstract
Spectral clustering methods allow datasets to be partitioned into clusters by mapping the input datapoints into the space spanned by the eigenvectors of the Laplacian matrix. In this article, we make use of the incomplete Cholesky decomposition (ICD) to construct an approximation of [...] Read more.
Spectral clustering methods allow datasets to be partitioned into clusters by mapping the input datapoints into the space spanned by the eigenvectors of the Laplacian matrix. In this article, we make use of the incomplete Cholesky decomposition (ICD) to construct an approximation of the graph Laplacian and reduce the size of the related eigenvalue problem from N to m, with m N . In particular, we introduce a new stopping criterion based on normalized mutual information between consecutive partitions, which terminates the ICD when the change in the cluster assignments is below a given threshold. Compared with existing ICD-based spectral clustering approaches, the proposed method allows the reduction of the number m of selected pivots (i.e., to obtain a sparser model) and at the same time, to maintain high clustering quality. The method scales linearly with respect to the number of input datapoints N and has low memory requirements, because only matrices of size N × m and m × m are calculated (in contrast to standard spectral clustering, where the construction of the full N × N similarity matrix is needed). Furthermore, we show that the number of clusters can be reliably selected based on the gap heuristics computed using just a small matrix R of size m × m instead of the entire graph Laplacian. The effectiveness of the proposed algorithm is tested on several datasets. Full article
(This article belongs to the Special Issue Information Theoretic Learning)
Show Figures

Graphical abstract

1821 KiB  
Article
A Hybrid EEMD-Based SampEn and SVD for Acoustic Signal Processing and Fault Diagnosis
by Zhi-Xin Yang and Jian-Hua Zhong
Entropy 2016, 18(4), 112; https://doi.org/10.3390/e18040112 - 1 Apr 2016
Cited by 49 | Viewed by 6319
Abstract
Acoustic signals are an ideal source of diagnosis data thanks to their intrinsic non-directional coverage, sensitivity to incipient defects, and insensitivity to structural resonance characteristics. However this makes prevailing signal de-nosing and feature extraction methods suffer from high computational cost, low signal to [...] Read more.
Acoustic signals are an ideal source of diagnosis data thanks to their intrinsic non-directional coverage, sensitivity to incipient defects, and insensitivity to structural resonance characteristics. However this makes prevailing signal de-nosing and feature extraction methods suffer from high computational cost, low signal to noise ratio (S/N), and difficulty to extract the compound acoustic emissions for various failure types. To address these challenges, we propose a hybrid signal processing technique to depict the embedded signal using generally effective features. The ensemble empirical mode decomposition (EEMD) is adopted as the fundamental pre-processor, which is integrated with the sample entropy (SampEn), singular value decomposition (SVD), and statistic feature processing (SFP) methods. The SampEn and SVD are identified as the condition indicators for periodical and irregular signals, respectively. Moreover, such a hybrid module is self-adaptive and robust to different signals, which ensures the generality of its performance. The hybrid signal processor is further integrated with a probabilistic classifier, pairwise-coupled relevance vector machine (PCRVM), to construct a new fault diagnosis system. Experimental verifications for industrial equipment show that the proposed diagnostic system is superior to prior methods in computational efficiency and the capability of simultaneously processing non-stationary and nonlinear condition monitoring signals. Full article
(This article belongs to the Special Issue Information Theoretic Learning)
Show Figures

Figure 1

1541 KiB  
Article
An Optimization Approach of Deriving Bounds between Entropy and Error from Joint Distribution: Case Study for Binary Classifications
by Bao-Gang Hu and Hong-Jie Xing
Entropy 2016, 18(2), 59; https://doi.org/10.3390/e18020059 - 19 Feb 2016
Cited by 3 | Viewed by 5078
Abstract
In this work, we propose a new approach of deriving the bounds between entropy and error from a joint distribution through an optimization means. The specific case study is given on binary classifications. Two basic types of classification errors are investigated, namely, the [...] Read more.
In this work, we propose a new approach of deriving the bounds between entropy and error from a joint distribution through an optimization means. The specific case study is given on binary classifications. Two basic types of classification errors are investigated, namely, the Bayesian and non-Bayesian errors. The consideration of non-Bayesian errors is due to the facts that most classifiers result in non-Bayesian solutions. For both types of errors, we derive the closed-form relations between each bound and error components. When Fano’s lower bound in a diagram of “Error Probability vs. Conditional Entropy” is realized based on the approach, its interpretations are enlarged by including non-Bayesian errors and the two situations along with independent properties of the variables. A new upper bound for the Bayesian error is derived with respect to the minimum prior probability, which is generally tighter than Kovalevskij’s upper bound. Full article
(This article belongs to the Special Issue Information Theoretic Learning)
Show Figures

Figure 1

625 KiB  
Article
Classification Active Learning Based on Mutual Information
by Jamshid Sourati, Murat Akcakaya, Jennifer G. Dy, Todd K. Leen and Deniz Erdogmus
Entropy 2016, 18(2), 51; https://doi.org/10.3390/e18020051 - 5 Feb 2016
Cited by 19 | Viewed by 10209
Abstract
Selecting a subset of samples to label from a large pool of unlabeled data points, such that a sufficiently accurate classifier is obtained using a reasonably small training set is a challenging, yet critical problem. Challenging, since solving this problem includes cumbersome combinatorial [...] Read more.
Selecting a subset of samples to label from a large pool of unlabeled data points, such that a sufficiently accurate classifier is obtained using a reasonably small training set is a challenging, yet critical problem. Challenging, since solving this problem includes cumbersome combinatorial computations, and critical, due to the fact that labeling is an expensive and time-consuming task, hence we always aim to minimize the number of required labels. While information theoretical objectives, such as mutual information (MI) between the labels, have been successfully used in sequential querying, it is not straightforward to generalize these objectives to batch mode. This is because evaluation and optimization of functions which are trivial in individual querying settings become intractable for many objectives when we are to select multiple queries. In this paper, we develop a framework, where we propose efficient ways of evaluating and maximizing the MI between labels as an objective for batch mode active learning. Our proposed framework efficiently reduces the computational complexity from an order proportional to the batch size, when no approximation is applied, to the linear cost. The performance of this framework is evaluated using data sets from several fields showing that the proposed framework leads to efficient active learning for most of the data sets. Full article
(This article belongs to the Special Issue Information Theoretic Learning)
Show Figures

Figure 1

4866 KiB  
Article
A New Process Monitoring Method Based on Waveform Signal by Using Recurrence Plot
by Cheng Zhou and Weidong Zhang
Entropy 2015, 17(9), 6379-6396; https://doi.org/10.3390/e17096379 - 16 Sep 2015
Cited by 4 | Viewed by 6077
Abstract
Process monitoring is an important research problem in numerous areas. This paper proposes a novel process monitoring scheme by integrating the recurrence plot (RP) method and the control chart technique. Recently, the RP method has emerged as an effective tool to analyze waveform [...] Read more.
Process monitoring is an important research problem in numerous areas. This paper proposes a novel process monitoring scheme by integrating the recurrence plot (RP) method and the control chart technique. Recently, the RP method has emerged as an effective tool to analyze waveform signals. However, unlike the existing RP methods that employ recurrence quantification analysis (RQA) to quantify the recurrence plot by a few summary statistics; we propose new concepts of template recurrence plots and continuous-scale recurrence plots to characterize the waveform signals. A new feature extraction method is developed based on continuous-scale recurrence plot. Then, a monitoring statistic based on the top- approach is constructed from the continuous-scale recurrence plot. Finally, a bootstrap control chart is built to detect the signal changes based on the constructed monitoring statistics. The comprehensive simulation studies show that the proposed monitoring scheme outperforms other RQA-based control charts. In addition, a real case study of progressive stamping processes is implemented to further evaluate the performance of the proposed scheme for process monitoring. Full article
(This article belongs to the Special Issue Information Theoretic Learning)
Show Figures

Figure 1

Back to TopTop