E-Mail Alert

Add your e-mail address to receive forthcoming issues of this journal:

Journal Browser

Journal Browser

Special Issue "Information-Theoretical Methods in Data Mining"

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: 30 April 2019

Special Issue Editor

Guest Editor
Prof. Kenji Yamanishi

Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Japan
Website | E-Mail
Interests: information-theoretic learning theory; data mining; knowledge discovery; data science; big data analysis; machine learning; minimum description length principle; model selection; anomaly detection; change detection; health care data analysis; glaucoma progression prediction

Special Issue Information

Dear Colleagues,

Data mining is a rapidly growing field with the aim of analyzing big data in academia and industry. In it information-theoretical methods play a key role in discovering useful knowledge from a large amount of data. For example, probabilistic modeling of data sources based on information-theoretical methods such as maximum entropy, the minimum description length principle, rate-distortion theory, Kolmogorov complexity, etc. have turned to be very effective in machine learning problems in data mining such as model selection, regression, clustering, classification, structural/relational learning, association/causality analysis, transfer learning, change/anomaly detection, stream data mining, sparse modeling, etc.  As real data become complex, further advanced information-theoretical methods are currently emerging to adapt  data in realistic sources such as non-i.i.d. sources, heterogeneous sources, network type data sources, sparse sources, etc. Information-theoretical data mining methods have successfully been applied to a wide range of application areas including finance, education, marketing, intelligent transportation systems, multi-media processing, health care, network science, etc.

This special issue specifically emphasizes research that addresses data mining problems using information-theoretical methods. It includes research on a novel development of information-theoretical methods for specific applications to data mining, and a new data mining problem using information theory.  Submissions at the boundaries of information theory, data mining, and other related areas such as machine learning, network science, etc. are also welcome.

Prof. Kenji Yamanishi
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Data Mining
  • Knowledge Discovery
  • Data Science
  • Machine Learning
  • Big Data
  • Information Theory
  • Minimum Description Length Principle
  • Source Coding
  • Probabilistic Modeling
  • Latent Variable Modeling
  • Network Science

Published Papers (3 papers)

View options order results:
result details:
Displaying articles 1-3
Export citation of selected articles as:

Research

Open AccessArticle An Information Criterion for Auxiliary Variable Selection in Incomplete Data Analysis
Entropy 2019, 21(3), 281; https://doi.org/10.3390/e21030281
Received: 21 February 2019 / Revised: 9 March 2019 / Accepted: 12 March 2019 / Published: 14 March 2019
PDF Full-text (907 KB) | HTML Full-text | XML Full-text
Abstract
Statistical inference is considered for variables of interest, called primary variables, when auxiliary variables are observed along with the primary variables. We consider the setting of incomplete data analysis, where some primary variables are not observed. Utilizing a parametric model of joint distribution [...] Read more.
Statistical inference is considered for variables of interest, called primary variables, when auxiliary variables are observed along with the primary variables. We consider the setting of incomplete data analysis, where some primary variables are not observed. Utilizing a parametric model of joint distribution of primary and auxiliary variables, it is possible to improve the estimation of parametric model for the primary variables when the auxiliary variables are closely related to the primary variables. However, the estimation accuracy reduces when the auxiliary variables are irrelevant to the primary variables. For selecting useful auxiliary variables, we formulate the problem as model selection, and propose an information criterion for predicting primary variables by leveraging auxiliary variables. The proposed information criterion is an asymptotically unbiased estimator of the Kullback–Leibler divergence for complete data of primary variables under some reasonable conditions. We also clarify an asymptotic equivalence between the proposed information criterion and a variant of leave-one-out cross validation. Performance of our method is demonstrated via a simulation study and a real data example. Full article
(This article belongs to the Special Issue Information-Theoretical Methods in Data Mining)
Figures

Figure 1

Open AccessArticle Mixture of Experts with Entropic Regularization for Data Classification
Entropy 2019, 21(2), 190; https://doi.org/10.3390/e21020190
Received: 4 January 2019 / Revised: 4 February 2019 / Accepted: 15 February 2019 / Published: 18 February 2019
PDF Full-text (3604 KB) | HTML Full-text | XML Full-text
Abstract
Today, there is growing interest in the automatic classification of a variety of tasks, such as weather forecasting, product recommendations, intrusion detection, and people recognition. “Mixture-of-experts” is a well-known classification technique; it is a probabilistic model consisting of local expert classifiers weighted by [...] Read more.
Today, there is growing interest in the automatic classification of a variety of tasks, such as weather forecasting, product recommendations, intrusion detection, and people recognition. “Mixture-of-experts” is a well-known classification technique; it is a probabilistic model consisting of local expert classifiers weighted by a gate network that is typically based on softmax functions, combined with learnable complex patterns in data. In this scheme, one data point is influenced by only one expert; as a result, the training process can be misguided in real datasets for which complex data need to be explained by multiple experts. In this work, we propose a variant of the regular mixture-of-experts model. In the proposed model, the cost classification is penalized by the Shannon entropy of the gating network in order to avoid a “winner-takes-all” output for the gating network. Experiments show the advantage of our approach using several real datasets, with improvements in mean accuracy of 3–6% in some datasets. In future work, we plan to embed feature selection into this model. Full article
(This article belongs to the Special Issue Information-Theoretical Methods in Data Mining)
Figures

Figure 1

Open AccessArticle The Optimized Multi-Scale Permutation Entropy and Its Application in Compound Fault Diagnosis of Rotating Machinery
Entropy 2019, 21(2), 170; https://doi.org/10.3390/e21020170
Received: 20 January 2019 / Revised: 2 February 2019 / Accepted: 3 February 2019 / Published: 12 February 2019
Cited by 1 | PDF Full-text (9764 KB) | HTML Full-text | XML Full-text
Abstract
Multi-scale permutation entropy (MPE) is a statistic indicator to detect nonlinear dynamic changes in time series, which has merits of high calculation efficiency, good robust ability, and independence from prior knowledge, etc. However, the performance of MPE is dependent on the parameter selection [...] Read more.
Multi-scale permutation entropy (MPE) is a statistic indicator to detect nonlinear dynamic changes in time series, which has merits of high calculation efficiency, good robust ability, and independence from prior knowledge, etc. However, the performance of MPE is dependent on the parameter selection of embedding dimension and time delay. To complete the automatic parameter selection of MPE, a novel parameter optimization strategy of MPE is proposed, namely optimized multi-scale permutation entropy (OMPE). In the OMPE method, an improved Cao method is proposed to adaptively select the embedding dimension. Meanwhile, the time delay is determined based on mutual information. To verify the effectiveness of OMPE method, a simulated signal and two experimental signals are used for validation. Results demonstrate that the proposed OMPE method has a better feature extraction ability comparing with existing MPE methods. Full article
(This article belongs to the Special Issue Information-Theoretical Methods in Data Mining)
Figures

Figure 1

Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top