Special Issue "Theory and Applications of Information Theoretic Machine Learning"

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (15 December 2020).

Special Issue Editors

Ass. Prof. Dr. Sotiris Kotsiantis
Website
Guest Editor
Department of Mathematics, University of Patras, GR 265-00 Patras, Greece
Interests: machine learning; data mining; knowledge discovery; data science
Special Issues and Collections in MDPI journals
Assoc. Prof. Dimitris Kalles
Website
Guest Editor
School of Science and Technology, Hellenic Open University, Greece
Interests: machine learning; artificial intelligence; educational intelligence; educational technology
Assoc. Prof. Christos Makris
Website
Guest Editor
Department of Computer Engineering and Informatics, University of Patras, Greece
Interests: data structures; information retrieval; data mining; bioinformatics; string algorithms; computational geometry; multimedia data bases; internet technologies
Special Issues and Collections in MDPI journals

Special Issue Information

Dear Colleagues,

At present, the entire world and software industry is looking for ways to apply the principles of data science and data analytics to address various difficult problems. The usage and application of machine learning and data analytics principles, methods, and techniques can contribute to address new problems and discover improved solutions. This Special Issue aims at bringing together applications of machine learning in various interdisciplinary domains and areas of interest, such as data mining, data analytics, and data science to cater to a wide landscape of methods, methodologies, and techniques which can be applied to obtain productive results. The aims of this Special Issue are: (1) to present state-of-the-art research on data mining and machine learning; and (2) to provide a forum for researchers to discuss the latest progress, new research methodologies, and potential research topics. Further, all submissions should explain the role of entropy or information theory applications to this field. Topics of interests include, but are not limited, to classification, regression and prediction, clustering, kernel methods, data mining, web mining, information retrieval, natural language processing, deep learning, probabilistic models and methods, vision and speech perception, bioinformatics, streaming data, industrial, financial, and educational applications. Papers will be evaluated based on their originality, presentation, relevance, and contribution, as well as their suitability and the quality in terms of both technical contribution and writing. The submitted papers must be written in English and describe original research which has not been published nor currently under review by other journals or conferences.

Assist. Prof. Sotiris Kotsiantis
Assoc. Prof. Dimitris Kalles
assoc. Prof. Christos Makris
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • data mining
  • computational intelligence
  • learning analytics
  • artificial intelligence
  • educational intelligence
  • educational technology
  • information retrieval
  • bioinformatics

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

Open AccessArticle
Who Will Score? A Machine Learning Approach to Supporting Football Team Building and Transfers
Entropy 2021, 23(1), 90; https://doi.org/10.3390/e23010090 - 10 Jan 2021
Abstract
Background: the machine learning (ML) techniques have been implemented in numerous applications, including health-care, security, entertainment, and sports. In this article, we present how the ML can be used for building a professional football team and planning player transfers. Methods: in this research, [...] Read more.
Background: the machine learning (ML) techniques have been implemented in numerous applications, including health-care, security, entertainment, and sports. In this article, we present how the ML can be used for building a professional football team and planning player transfers. Methods: in this research, we defined numerous parameters for player assessment, and three definitions of a successful transfer. We used the Random Forest, Naive Bayes, and AdaBoost algorithms in order to predict the player transfer success. We used realistic, publicly available data in order to train and test the classifiers. Results: in the article, we present numerous experiments; they differ in the weights of parameters, the successful transfer definitions, and other factors. We report promising results (accuracy = 0.82, precision = 0.84, recall = 0.82, and F1-score = 0.83). Conclusion: the presented research proves that machine learning can be helpful in professional football team building. The proposed algorithm will be developed in the future and it may be implemented as a professional tool for football talent scouts. Full article
(This article belongs to the Special Issue Theory and Applications of Information Theoretic Machine Learning)
Show Figures

Figure 1

Open AccessArticle
Machine Learning Algorithms for Prediction of the Quality of Transmission in Optical Networks
Entropy 2021, 23(1), 7; https://doi.org/10.3390/e23010007 - 22 Dec 2020
Abstract
Increasing demand in the backbone Dense Wavelength Division (DWDM) Multiplexing network traffic prompts an introduction of new solutions that allow increasing the transmission speed without significant increase of the service cost. In order to achieve this objective simpler and faster, DWDM network reconfiguration [...] Read more.
Increasing demand in the backbone Dense Wavelength Division (DWDM) Multiplexing network traffic prompts an introduction of new solutions that allow increasing the transmission speed without significant increase of the service cost. In order to achieve this objective simpler and faster, DWDM network reconfiguration procedures are needed. A key problem that is intrinsically related to network reconfiguration is that of the quality of transmission assessment. Thus, in this contribution a Machine Learning (ML) based method for an assessment of the quality of transmission is proposed. The proposed ML methods use a database, which was created only on the basis of information that is available to a DWDM network operator via the DWDM network control plane. Several types of ML classifiers are proposed and their performance is tested and compared for two real DWDM network topologies. The results obtained are promising and motivate further research. Full article
(This article belongs to the Special Issue Theory and Applications of Information Theoretic Machine Learning)
Show Figures

Figure 1

Open AccessArticle
Monitoring Volatility Change for Time Series Based on Support Vector Regression
Entropy 2020, 22(11), 1312; https://doi.org/10.3390/e22111312 - 17 Nov 2020
Abstract
This paper considers monitoring an anomaly from sequentially observed time series with heteroscedastic conditional volatilities based on the cumulative sum (CUSUM) method combined with support vector regression (SVR). The proposed online monitoring process is designed to detect a significant change in volatility of [...] Read more.
This paper considers monitoring an anomaly from sequentially observed time series with heteroscedastic conditional volatilities based on the cumulative sum (CUSUM) method combined with support vector regression (SVR). The proposed online monitoring process is designed to detect a significant change in volatility of financial time series. The tuning parameters are optimally chosen using particle swarm optimization (PSO). We conduct Monte Carlo simulation experiments to illustrate the validity of the proposed method. A real data analysis with the S&P 500 index, Korea Composite Stock Price Index (KOSPI), and the stock price of Microsoft Corporation is presented to demonstrate the versatility of our model. Full article
(This article belongs to the Special Issue Theory and Applications of Information Theoretic Machine Learning)
Show Figures

Figure 1

Open AccessArticle
Bayesian3 Active Learning for the Gaussian Process Emulator Using Information Theory
Entropy 2020, 22(8), 890; https://doi.org/10.3390/e22080890 - 13 Aug 2020
Cited by 1
Abstract
Gaussian process emulators (GPE) are a machine learning approach that replicates computational demanding models using training runs of that model. Constructing such a surrogate is very challenging and, in the context of Bayesian inference, the training runs should be well invested. The current [...] Read more.
Gaussian process emulators (GPE) are a machine learning approach that replicates computational demanding models using training runs of that model. Constructing such a surrogate is very challenging and, in the context of Bayesian inference, the training runs should be well invested. The current paper offers a fully Bayesian view on GPEs for Bayesian inference accompanied by Bayesian active learning (BAL). We introduce three BAL strategies that adaptively identify training sets for the GPE using information-theoretic arguments. The first strategy relies on Bayesian model evidence that indicates the GPE’s quality of matching the measurement data, the second strategy is based on relative entropy that indicates the relative information gain for the GPE, and the third is founded on information entropy that indicates the missing information in the GPE. We illustrate the performance of our three strategies using analytical- and carbon-dioxide benchmarks. The paper shows evidence of convergence against a reference solution and demonstrates quantification of post-calibration uncertainty by comparing the introduced three strategies. We conclude that Bayesian model evidence-based and relative entropy-based strategies outperform the entropy-based strategy because the latter can be misleading during the BAL. The relative entropy-based strategy demonstrates superior performance to the Bayesian model evidence-based strategy. Full article
(This article belongs to the Special Issue Theory and Applications of Information Theoretic Machine Learning)
Show Figures

Figure 1

Open AccessArticle
Social Network Analysis and Churn Prediction in Telecommunications Using Graph Theory
Entropy 2020, 22(7), 753; https://doi.org/10.3390/e22070753 - 09 Jul 2020
Cited by 1
Abstract
Due to telecommunications market saturation, it is very important for telco operators to always have fresh insights into their customer’s dynamics. In that regard, social network analytics and its application with graph theory can be very useful. In this paper we analyze a [...] Read more.
Due to telecommunications market saturation, it is very important for telco operators to always have fresh insights into their customer’s dynamics. In that regard, social network analytics and its application with graph theory can be very useful. In this paper we analyze a social network that is represented by a large telco network graph and perform clustering of its nodes by studying a broad set of metrics, e.g., node in/out degree, first and second order influence, eigenvector, authority and hub values. This paper demonstrates that it is possible to identify some important nodes in our social network (graph) that are vital regarding churn prediction. We show that if such a node leaves a monitored telco operator, customers that frequently interact with that specific node will be more prone to leave the monitored telco operator network as well; thus, by analyzing existing churn and previous call patterns, we proactively predict new customers that will probably churn. The churn prediction results are quantified by using top decile lift metrics. The proposed method is general enough to be readily adopted in any field where homophilic or friendship connections can be assumed as a potential churn driver. Full article
(This article belongs to the Special Issue Theory and Applications of Information Theoretic Machine Learning)
Show Figures

Figure 1

Open AccessArticle
A Method Based on GA-CNN-LSTM for Daily Tourist Flow Prediction at Scenic Spots
Entropy 2020, 22(3), 261; https://doi.org/10.3390/e22030261 - 25 Feb 2020
Abstract
Accurate tourist flow prediction is key to ensuring the normal operation of popular scenic spots. However, one single model cannot effectively grasp the characteristics of the data and make accurate predictions because of the strong nonlinear characteristics of daily tourist flow data. Accordingly, [...] Read more.
Accurate tourist flow prediction is key to ensuring the normal operation of popular scenic spots. However, one single model cannot effectively grasp the characteristics of the data and make accurate predictions because of the strong nonlinear characteristics of daily tourist flow data. Accordingly, this study predicts daily tourist flow in Huangshan Scenic Spot in China. A prediction method (GA-CNN-LSTM) which combines convolutional neural network (CNN) and long-short-term memory network (LSTM) and optimized by genetic algorithm (GA) is established. First, network search data, meteorological data, and other data are constructed into continuous feature maps. Then, feature vectors are extracted by convolutional neural network (CNN). Finally, the feature vectors are input into long-short-term memory network (LSTM) in time series for prediction. Moreover, GA is used to scientifically select the number of neurons in the CNN-LSTM model. Data is preprocessed and normalized before prediction. The accuracy of GA-CNN-LSTM is evaluated using mean absolute percentage error (MAPE), mean absolute error (MAE), Pearson correlation coefficient and index of agreement (IA). For a fair comparison, GA-CNN-LSTM model is compared with CNN-LSTM, LSTM, CNN and the back propagation neural network (BP). The experimental results show that GA-CNN-LSTM model is approximately 8.22% higher than CNN-LSTM on the performance of MAPE. Full article
(This article belongs to the Special Issue Theory and Applications of Information Theoretic Machine Learning)
Show Figures

Figure 1

Open AccessArticle
Nonlinear Canonical Correlation Analysis:A Compressed Representation Approach
Entropy 2020, 22(2), 208; https://doi.org/10.3390/e22020208 - 12 Feb 2020
Abstract
Canonical Correlation Analysis (CCA) is a linear representation learning method that seeks maximally correlated variables in multi-view data. Nonlinear CCA extends this notion to a broader family of transformations, which are more powerful in many real-world applications. Given the joint probability, the Alternating [...] Read more.
Canonical Correlation Analysis (CCA) is a linear representation learning method that seeks maximally correlated variables in multi-view data. Nonlinear CCA extends this notion to a broader family of transformations, which are more powerful in many real-world applications. Given the joint probability, the Alternating Conditional Expectation (ACE) algorithm provides an optimal solution to the nonlinear CCA problem. However, it suffers from limited performance and an increasing computational burden when only a finite number of samples is available. In this work, we introduce an information-theoretic compressed representation framework for the nonlinear CCA problem (CRCCA), which extends the classical ACE approach. Our suggested framework seeks compact representations of the data that allow a maximal level of correlation. This way, we control the trade-off between the flexibility and the complexity of the model. CRCCA provides theoretical bounds and optimality conditions, as we establish fundamental connections to rate-distortion theory, the information bottleneck and remote source coding. In addition, it allows a soft dimensionality reduction, as the compression level is determined by the mutual information between the original noisy data and the extracted signals. Finally, we introduce a simple implementation of the CRCCA framework, based on lattice quantization. Full article
(This article belongs to the Special Issue Theory and Applications of Information Theoretic Machine Learning)
Show Figures

Figure 1

Open AccessArticle
Mining Educational Data to Predict Students’ Performance through Procrastination Behavior
Entropy 2020, 22(1), 12; https://doi.org/10.3390/e22010012 - 20 Dec 2019
Cited by 10
Abstract
A significant amount of research has indicated that students’ procrastination tendencies are an important factor influencing the performance of students in online learning. It is, therefore, vital for educators to be aware of the presence of such behavior trends as students with lower [...] Read more.
A significant amount of research has indicated that students’ procrastination tendencies are an important factor influencing the performance of students in online learning. It is, therefore, vital for educators to be aware of the presence of such behavior trends as students with lower procrastination tendencies usually achieve better than those with higher procrastination. In the present study, we propose a novel algorithm—using student’s assignment submission behavior—to predict the performance of students with learning difficulties through procrastination behavior (called PPP). Unlike many existing works, PPP not only considers late or non-submissions, but also investigates students’ behavioral patterns before the due date of assignments. PPP firstly builds feature vectors representing the submission behavior of students for each assignment, then applies a clustering method to the feature vectors for labelling students as a procrastinator, procrastination candidate, or non-procrastinator, and finally employs and compares several classification methods to best classify students. To evaluate the effectiveness of PPP, we use a course including 242 students from the University of Tartu in Estonia. The results reveal that PPP could successfully predict students’ performance through their procrastination behaviors with an accuracy of 96%. Linear support vector machine appears to be the best classifier among others in terms of continuous features, and neural network in categorical features, where categorical features tend to perform slightly better than continuous. Finally, we found that the predictive power of all classification methods is lowered by an increment in class numbers formed by clustering. Full article
(This article belongs to the Special Issue Theory and Applications of Information Theoretic Machine Learning)
Show Figures

Figure 1

Open AccessArticle
Electricity Load and Price Forecasting Using Jaya-Long Short Term Memory (JLSTM) in Smart Grids
Entropy 2020, 22(1), 10; https://doi.org/10.3390/e22010010 - 19 Dec 2019
Cited by 12
Abstract
In the smart grid (SG) environment, consumers are enabled to alter electricity consumption patterns in response to electricity prices and incentives. This results in prices that may differ from the initial price pattern. Electricity price and demand forecasting play a vital role in [...] Read more.
In the smart grid (SG) environment, consumers are enabled to alter electricity consumption patterns in response to electricity prices and incentives. This results in prices that may differ from the initial price pattern. Electricity price and demand forecasting play a vital role in the reliability and sustainability of SG. Forecasting using big data has become a new hot research topic as a massive amount of data is being generated and stored in the SG environment. Electricity users, having advanced knowledge of prices and demand of electricity, can manage their load efficiently. In this paper, a recurrent neural network (RNN), long short term memory (LSTM), is used for electricity price and demand forecasting using big data. Researchers are working actively to propose new models of forecasting. These models contain a single input variable as well as multiple variables. From the literature, we observed that the use of multiple variables enhances the forecasting accuracy. Hence, our proposed model uses multiple variables as input and forecasts the future values of electricity demand and price. The hyperparameters of this algorithm are tuned using the Jaya optimization algorithm to improve the forecasting ability and increase the training mechanism of the model. Parameter tuning is necessary because the performance of a forecasting model depends on the values of these parameters. Selection of inappropriate values can result in inaccurate forecasting. So, integration of an optimization method improves the forecasting accuracy with minimum user efforts. For efficient forecasting, data is preprocessed and cleaned from missing values and outliers, using the z-score method. Furthermore, data is normalized before forecasting. The forecasting accuracy of the proposed model is evaluated using the root mean square error (RMSE) and mean absolute error (MAE). For a fair comparison, the proposed forecasting model is compared with univariate LSTM and support vector machine (SVM). The values of the performance metrics depict that the proposed model has higher accuracy than SVM and univariate LSTM. Full article
(This article belongs to the Special Issue Theory and Applications of Information Theoretic Machine Learning)
Show Figures

Figure 1

Open AccessArticle
Predicting Student Performance and Deficiency in Mastering Knowledge Points in MOOCs Using Multi-Task Learning
Entropy 2019, 21(12), 1216; https://doi.org/10.3390/e21121216 - 12 Dec 2019
Cited by 2
Abstract
Massive open online courses (MOOCs), which have been deemed a revolutionary teaching mode, are increasingly being used in higher education. However, there remain deficiencies in understanding the relationship between online behavior of students and their performance, and in verifying how well a student [...] Read more.
Massive open online courses (MOOCs), which have been deemed a revolutionary teaching mode, are increasingly being used in higher education. However, there remain deficiencies in understanding the relationship between online behavior of students and their performance, and in verifying how well a student comprehends learning material. Therefore, we propose a method for predicting student performance and mastery of knowledge points in MOOCs based on assignment-related online behavior; this allows for those providing academic support to intervene and improve learning outcomes of students facing difficulties. The proposed method was developed while using data from 1528 participants in a C Programming course, from which we extracted assignment-related features. We first applied a multi-task multi-layer long short-term memory-based student performance predicting method with cross-entropy as the loss function to predict students’ overall performance and mastery of each knowledge point. Our method incorporates the attention mechanism, which might better reflect students’ learning behavior and performance. Our method achieves an accuracy of 92.52% for predicting students’ performance and a recall rate of 94.68%. Students’ actions, such as submission times and plagiarism, were related to their performance in the MOOC, and the results demonstrate that our method predicts the overall performance and knowledge points that students cannot master well. Full article
(This article belongs to the Special Issue Theory and Applications of Information Theoretic Machine Learning)
Show Figures

Figure 1

Open AccessArticle
Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme
Entropy 2019, 21(10), 988; https://doi.org/10.3390/e21100988 - 10 Oct 2019
Cited by 4
Abstract
One of the major aspects affecting the performance of the classification algorithms is the amount of labeled data which is available during the training phase. It is widely accepted that the labeling procedure of vast amounts of data is both expensive and time-consuming [...] Read more.
One of the major aspects affecting the performance of the classification algorithms is the amount of labeled data which is available during the training phase. It is widely accepted that the labeling procedure of vast amounts of data is both expensive and time-consuming since it requires the employment of human expertise. For a wide variety of scientific fields, unlabeled examples are easy to collect but hard to handle in a useful manner, thus improving the contained information for a subject dataset. In this context, a variety of learning methods have been studied in the literature aiming to efficiently utilize the vast amounts of unlabeled data during the learning process. The most common approaches tackle problems of this kind by individually applying active learning or semi-supervised learning methods. In this work, a combination of active learning and semi-supervised learning methods is proposed, under a common self-training scheme, in order to efficiently utilize the available unlabeled data. The effective and robust metrics of the entropy and the distribution of probabilities of the unlabeled set, to select the most sufficient unlabeled examples for the augmentation of the initial labeled set, are used. The superiority of the proposed scheme is validated by comparing it against the base approaches of supervised, semi-supervised, and active learning in the wide range of fifty-five benchmark datasets. Full article
(This article belongs to the Special Issue Theory and Applications of Information Theoretic Machine Learning)
Show Figures

Figure 1

Review

Jump to: Research

Open AccessReview
Explainable AI: A Review of Machine Learning Interpretability Methods
Entropy 2021, 23(1), 18; https://doi.org/10.3390/e23010018 - 25 Dec 2020
Abstract
Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption, with machine learning systems demonstrating superhuman performance in a significant number of tasks. However, this surge in performance, has often been achieved through increased model complexity, turning such systems into [...] Read more.
Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption, with machine learning systems demonstrating superhuman performance in a significant number of tasks. However, this surge in performance, has often been achieved through increased model complexity, turning such systems into “black box” approaches and causing uncertainty regarding the way they operate and, ultimately, the way that they come to decisions. This ambiguity has made it problematic for machine learning systems to be adopted in sensitive yet critical domains, where their value could be immense, such as healthcare. As a result, scientific interest in the field of Explainable Artificial Intelligence (XAI), a field that is concerned with the development of new methods that explain and interpret machine learning models, has been tremendously reignited over recent years. This study focuses on machine learning interpretability methods; more specifically, a literature review and taxonomy of these methods are presented, as well as links to their programming implementations, in the hope that this survey would serve as a reference point for both theorists and practitioners. Full article
(This article belongs to the Special Issue Theory and Applications of Information Theoretic Machine Learning)
Show Figures

Figure 1

Back to TopTop