Table of Contents

Mach. Learn. Knowl. Extr., Volume 1, Issue 1 (December 2019)

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Readerexternal link to open them.
View options order results:
result details:
Displaying articles 1-32
Export citation of selected articles as:
Open AccessFeature PaperReview Evaluation of Regression Models: Model Assessment, Model Selection and Generalization Error
Mach. Learn. Knowl. Extr. 2019, 1(1), 521-551; https://doi.org/10.3390/make1010032 (registering DOI)
Received: 9 February 2019 / Revised: 13 March 2019 / Accepted: 18 March 2019 / Published: 22 March 2019
PDF Full-text (3081 KB)
Abstract
When performing a regression or classification analysis, one needs to specify a statistical model. This model should avoid the overfitting and underfitting of data, and achieve a low generalization error that characterizes its prediction performance. In order to identify such a model, one [...] Read more.
When performing a regression or classification analysis, one needs to specify a statistical model. This model should avoid the overfitting and underfitting of data, and achieve a low generalization error that characterizes its prediction performance. In order to identify such a model, one needs to decide which model to select from candidate model families based on performance evaluations. In this paper, we review the theoretical framework of model selection and model assessment, including error-complexity curves, the bias-variance tradeoff, and learning curves for evaluating statistical models. We discuss criterion-based, step-wise selection procedures and resampling methods for model selection, whereas cross-validation provides the most simple and generic means for computationally estimating all required entities. To make the theoretical concepts transparent, we present worked examples for linear regression models. However, our conceptual presentation is extensible to more general models, as well as classification problems. Full article
(This article belongs to the Section Learning)
Open AccessArticle A Near Real-Time Automatic Speaker Recognition Architecture for Voice-Based User Interface
Mach. Learn. Knowl. Extr. 2019, 1(1), 504-520; https://doi.org/10.3390/make1010031
Received: 26 January 2019 / Revised: 13 March 2019 / Accepted: 15 March 2019 / Published: 19 March 2019
Viewed by 202 | PDF Full-text (2200 KB) | HTML Full-text | XML Full-text
Abstract
In this paper, we present a novel pipelined near real-time speaker recognition architecture that enhances the performance of speaker recognition by exploiting the advantages of hybrid feature extraction techniques that contain the features of Gabor Filter (GF), Convolution Neural Networks (CNN), and statistical [...] Read more.
In this paper, we present a novel pipelined near real-time speaker recognition architecture that enhances the performance of speaker recognition by exploiting the advantages of hybrid feature extraction techniques that contain the features of Gabor Filter (GF), Convolution Neural Networks (CNN), and statistical parameters as a single matrix set. This architecture has been developed to enable secure access to a voice-based user interface (UI) by enabling speaker-based authentication and integration with an existing Natural Language Processing (NLP) system. Gaining secure access to existing NLP systems also served as motivation. Initially, we identify challenges related to real-time speaker recognition and highlight the recent research in the field. Further, we analyze the functional requirements of a speaker recognition system and introduce the mechanisms that can address these requirements through our novel architecture. Subsequently, the paper discusses the effect of different techniques such as CNN, GF, and statistical parameters in feature extraction. For the classification, standard classifiers such as Support Vector Machine (SVM), Random Forest (RF) and Deep Neural Network (DNN) are investigated. To verify the validity and effectiveness of the proposed architecture, we compared different parameters including accuracy, sensitivity, and specificity with the standard AlexNet architecture. Full article
Figures

Figure 1

Open AccessArticle Gender Recognition by Voice Using an Improved Self-Labeled Algorithm
Mach. Learn. Knowl. Extr. 2019, 1(1), 492-503; https://doi.org/10.3390/make1010030
Received: 22 January 2019 / Revised: 19 February 2019 / Accepted: 2 March 2019 / Published: 5 March 2019
Viewed by 213 | PDF Full-text (1153 KB) | HTML Full-text | XML Full-text
Abstract
Speech recognition has various applications including human to machine interaction, sorting of telephone calls by gender categorization, video categorization with tagging and so on. Currently, machine learning is a popular trend which has been widely utilized in various fields and applications, exploiting the [...] Read more.
Speech recognition has various applications including human to machine interaction, sorting of telephone calls by gender categorization, video categorization with tagging and so on. Currently, machine learning is a popular trend which has been widely utilized in various fields and applications, exploiting the recent development in digital technologies and the advantage of storage capabilities from electronic media. Recently, research focuses on the combination of ensemble learning techniques with the semi-supervised learning framework aiming to build more accurate classifiers. In this paper, we focus on gender recognition by voice utilizing a new ensemble semi-supervised self-labeled algorithm. Our preliminary numerical experiments demonstrate the classification efficiency of the proposed algorithm in terms of accuracy, leading to the development of stable and robust predictive models. Full article
Open AccessArticle Differentially Private Image Classification Using Support Vector Machine and Differential Privacy
Mach. Learn. Knowl. Extr. 2019, 1(1), 483-491; https://doi.org/10.3390/make1010029
Received: 30 December 2018 / Revised: 13 February 2019 / Accepted: 19 February 2019 / Published: 20 February 2019
Viewed by 287 | PDF Full-text (641 KB)
Abstract
The ubiquity of data, including multi-media data such as images, enables easy mining and analysis of such data. However, such an analysis might involve the use of sensitive data such as medical records (including radiological images) and financial records. Privacy-preserving machine learning is [...] Read more.
The ubiquity of data, including multi-media data such as images, enables easy mining and analysis of such data. However, such an analysis might involve the use of sensitive data such as medical records (including radiological images) and financial records. Privacy-preserving machine learning is an approach that is aimed at the analysis of such data in such a way that privacy is not compromised. There are various privacy-preserving data analysis approaches such as k-anonymity, l-diversity, t-closeness and Differential Privacy (DP). Currently, DP is a golden standard of privacy-preserving data analysis due to its robustness against background knowledge attacks. In this paper, we report a scheme for privacy-preserving image classification using Support Vector Machine (SVM) and DP. SVM is chosen as a classification algorithm because unlike variants of artificial neural networks, it converges to a global optimum. SVM kernels used are linear and Radial Basis Function (RBF), while ϵ -differential privacy was the DP framework used. The proposed scheme achieved an accuracy of up to 98%. The results obtained underline the utility of using SVM and DP for privacy-preserving image classification. Full article
(This article belongs to the Section Privacy)
Open AccessArticle Using Resistin, Glucose, Age and BMI and Pruning Fuzzy Neural Network for the Construction of Expert Systems in the Prediction of Breast Cancer
Mach. Learn. Knowl. Extr. 2019, 1(1), 466-482; https://doi.org/10.3390/make1010028
Received: 23 January 2019 / Revised: 4 February 2019 / Accepted: 12 February 2019 / Published: 14 February 2019
Viewed by 423 | PDF Full-text (1219 KB) | HTML Full-text | XML Full-text
Abstract
Research on predictions of breast cancer grows in the scientific community, providing data on studies in patient surveys. Predictive models link areas of medicine and artificial intelligence to collect data and improve disease assessments that affect a large part of the population, such [...] Read more.
Research on predictions of breast cancer grows in the scientific community, providing data on studies in patient surveys. Predictive models link areas of medicine and artificial intelligence to collect data and improve disease assessments that affect a large part of the population, such as breast cancer. In this work, we used a hybrid artificial intelligence model based on concepts of neural networks and fuzzy systems to assist in the identification of people with breast cancer through fuzzy rules. The hybrid model can manipulate the data collected in medical examinations and identify patterns between healthy people and people with breast cancer with an acceptable level of accuracy. These intelligent techniques allow the creation of expert systems based on logical rules of the IF/THEN type. To demonstrate the feasibility of applying fuzzy neural networks, binary pattern classification tests were performed where the dimensions of the problem are used for a model, and the answers identify whether or not the patient has cancer. In the tests, experiments were replicated with several characteristics collected in the examinations done by medical specialists. The results of the tests, compared to other models commonly used for this purpose in the literature, confirm that the hybrid model has a tremendous predictive capacity in the prediction of people with breast cancer maintaining acceptable levels of accuracy with good ability to act on false positives and false negatives, assisting the scientific milieu with its forecasts with the significant characteristic of interpretability of breast cancer. In addition to coherent predictions, the fuzzy neural network enables the construction of systems in high level programming languages to build support systems for physicians’ actions during the initial stages of treatment of the disease with the fuzzy rules found, allowing the construction of systems that replicate the knowledge of medical specialists, disseminating it to other professionals. Full article
(This article belongs to the Special Issue Machine Learning for Biomedical Data Processing)
Figures

Figure 1

Open AccessArticle Guidelines and Benchmarks for Deployment of Deep Learning Models on Smartphones as Real-Time Apps
Mach. Learn. Knowl. Extr. 2019, 1(1), 450-465; https://doi.org/10.3390/make1010027
Received: 9 January 2019 / Revised: 4 February 2019 / Accepted: 6 February 2019 / Published: 13 February 2019
Viewed by 315 | PDF Full-text (3464 KB) | HTML Full-text | XML Full-text
Abstract
Deep learning solutions are being increasingly used in mobile applications. Although there are many open-source software tools for the development of deep learning solutions, there are no guidelines in one place in a unified manner for using these tools toward real-time deployment of [...] Read more.
Deep learning solutions are being increasingly used in mobile applications. Although there are many open-source software tools for the development of deep learning solutions, there are no guidelines in one place in a unified manner for using these tools toward real-time deployment of these solutions on smartphones. From the variety of available deep learning tools, the most suited ones are used in this paper to enable real-time deployment of deep learning inference networks on smartphones. A uniform flow of implementation is devised for both Android and iOS smartphones. The advantage of using multi-threading to achieve or improve real-time throughputs is also showcased. A benchmarking framework consisting of accuracy, CPU/GPU consumption, and real-time throughput is considered for validation purposes. The developed deployment approach allows deep learning models to be turned into real-time smartphone apps with ease based on publicly available deep learning and smartphone software tools. This approach is applied to six popular or representative convolutional neural network models, and the validation results based on the benchmarking metrics are reported. Full article
(This article belongs to the Section Learning)
Figures

Figure 1

Open AccessArticle Model Selection Criteria on Beta Regression for Machine Learning
Mach. Learn. Knowl. Extr. 2019, 1(1), 427-449; https://doi.org/10.3390/make1010026
Received: 19 January 2019 / Revised: 4 February 2019 / Accepted: 6 February 2019 / Published: 8 February 2019
Viewed by 283 | PDF Full-text (1146 KB) | HTML Full-text | XML Full-text
Abstract
Beta regression models are a class of supervised learning tools for regression problems with univariate and limited response. Current fitting procedures for beta regression require variable selection based on (potentially problematic) information criteria. We propose model selection criteria that take into account the [...] Read more.
Beta regression models are a class of supervised learning tools for regression problems with univariate and limited response. Current fitting procedures for beta regression require variable selection based on (potentially problematic) information criteria. We propose model selection criteria that take into account the leverage, residuals, and influence of the observations, both to systematic linear and nonlinear components. To that end, we propose a Predictive Residual Sum of Squares (PRESS)-like machine learning tool and a prediction coefficient, namely P 2 statistic, as a computational procedure. Monte Carlo simulation results on the finite sample behavior of prediction-based model selection criteria P 2 are provided. We also evaluated two versions of the R 2 criterion. Finally, applications to real data are presented. The new criterion proved to be crucial to choose models taking into account the robustness of the maximum likelihood estimation procedure in the presence of influential cases. Full article
Figures

Figure 1

Open AccessArticle The Number of Topics Optimization: Clustering Approach
Mach. Learn. Knowl. Extr. 2019, 1(1), 416-426; https://doi.org/10.3390/make1010025
Received: 8 January 2019 / Revised: 26 January 2019 / Accepted: 29 January 2019 / Published: 30 January 2019
Viewed by 299 | PDF Full-text (3025 KB) | HTML Full-text | XML Full-text
Abstract
Although topic models have been used to build clusters of documents for more than ten years, there is still a problem of choosing the optimal number of topics. The authors analyzed many fundamental studies undertaken on the subject in recent years. The main [...] Read more.
Although topic models have been used to build clusters of documents for more than ten years, there is still a problem of choosing the optimal number of topics. The authors analyzed many fundamental studies undertaken on the subject in recent years. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of the topic model. The authors analyzed the internal metrics of the topic model: coherence, contrast, and purity to determine the optimal number of topics and concluded that they are not applicable to solve this problem. The authors analyzed the approach to choosing the optimal number of topics based on the quality of the clusters. For this purpose, the authors considered the behavior of the cluster validation metrics: the Davies Bouldin index, the silhouette coefficient, and the Calinski-Harabaz index. A new method for determining the optimal number of topics proposed in this paper is based on the following principles: (1) Setting up a topic model with additive regularization (ARTM) to separate noise topics; (2) Using dense vector representation (GloVe, FastText, Word2Vec); (3) Using a cosine measure for the distance in cluster metric that works better than Euclidean distance on vectors with large dimensions. The methodology developed by the authors for obtaining the optimal number of topics was tested on the collection of scientific articles from the OnePetro library, selected by specific themes. The experiment showed that the method proposed by the authors allows assessing the optimal number of topics for the topic model built on a small collection of English documents. Full article
(This article belongs to the Special Issue Language Processing and Knowledge Extraction)
Figures

Figure 1

Open AccessEditorial Acknowledgement to Reviewers of MAKE in 2018
Mach. Learn. Knowl. Extr. 2019, 1(1), 414-415; https://doi.org/10.3390/make1010024
Published: 19 January 2019
Viewed by 258 | PDF Full-text (208 KB) | HTML Full-text | XML Full-text
Abstract
Rigorous peer-review is the corner-stone of high-quality academic publishing [...] Full article
Open AccessFeature PaperArticle Discovery of Relevant Response in Infected Potato Plants from Time Series of Gene Expression Data
Mach. Learn. Knowl. Extr. 2019, 1(1), 400-413; https://doi.org/10.3390/make1010023
Received: 2 November 2018 / Revised: 12 December 2018 / Accepted: 8 January 2019 / Published: 16 January 2019
Viewed by 254 | PDF Full-text (2763 KB) | HTML Full-text | XML Full-text
Abstract
The paper presents a methodology for analyzing time series of gene expression data collected from the leaves of potato virus Y (PVY) infected and non-infected potato plants, with the aim to identify significant differences between the two sets of potato plants’ characteristic for [...] Read more.
The paper presents a methodology for analyzing time series of gene expression data collected from the leaves of potato virus Y (PVY) infected and non-infected potato plants, with the aim to identify significant differences between the two sets of potato plants’ characteristic for various time points. We aim at identifying differentially-expressed genes whose expression values are statistically significantly different in the set of PVY infected potato plants compared to non-infected plants, and which demonstrate also statistically significant changes of expression values of genes of PVY infected potato plants in time. The novelty of the approach includes stratified data randomization used in estimating the statistical properties of gene expression of the samples in the control set of non-infected potato plants. A novel estimate that computes the relative minimal distance between the samples has been defined that enables reliable identification of the differences between the target and control datasets when these sets are small. The relevance of the outcomes is demonstrated by visualizing the relative minimal distance of gene expression changes in time for three different types of potato leaves for the genes that have been identified as relevant by the proposed methodology. Full article
Figures

Figure 1

Open AccessArticle Encrypted DNP3 Traffic Classification Using Supervised Machine Learning Algorithms
Mach. Learn. Knowl. Extr. 2019, 1(1), 384-399; https://doi.org/10.3390/make1010022
Received: 24 November 2018 / Revised: 18 December 2018 / Accepted: 11 January 2019 / Published: 15 January 2019
Viewed by 334 | PDF Full-text (1341 KB) | HTML Full-text | XML Full-text
Abstract
The Distributed Network Protocol (DNP3) is predominately used by the electric utility industry and, consequently, in smart grids. The Peekaboo attack was created to compromise DNP3 traffic, in which a man-in-the-middle on a communication link can capture and drop selected encrypted DNP3 messages [...] Read more.
The Distributed Network Protocol (DNP3) is predominately used by the electric utility industry and, consequently, in smart grids. The Peekaboo attack was created to compromise DNP3 traffic, in which a man-in-the-middle on a communication link can capture and drop selected encrypted DNP3 messages by using support vector machine learning algorithms. The communication networks of smart grids are a important part of their infrastructure, so it is of critical importance to keep this communication secure and reliable. The main contribution of this paper is to compare the use of machine learning techniques to classify messages of the same protocol exchanged in encrypted tunnels. The study considers four simulated cases of encrypted DNP3 traffic scenarios and four different supervised machine learning algorithms: Decision tree, nearest-neighbor, support vector machine, and naive Bayes. The results obtained show that it is possible to extend a Peekaboo attack over multiple substations, using a decision tree learning algorithm, and to gather significant information from a system that communicates using encrypted DNP3 traffic. Full article
Figures

Figure 1

Open AccessReview High-Dimensional LASSO-Based Computational Regression Models: Regularization, Shrinkage, and Selection
Mach. Learn. Knowl. Extr. 2019, 1(1), 359-383; https://doi.org/10.3390/make1010021
Received: 9 December 2018 / Revised: 31 December 2018 / Accepted: 11 January 2019 / Published: 14 January 2019
Viewed by 288 | PDF Full-text (1072 KB) | HTML Full-text | XML Full-text
Abstract
Regression models are a form of supervised learning methods that are important for machine learning, statistics, and general data science. Despite the fact that classical ordinary least squares (OLS) regression models have been known for a long time, in recent years there are [...] Read more.
Regression models are a form of supervised learning methods that are important for machine learning, statistics, and general data science. Despite the fact that classical ordinary least squares (OLS) regression models have been known for a long time, in recent years there are many new developments that extend this model significantly. Above all, the least absolute shrinkage and selection operator (LASSO) model gained considerable interest. In this paper, we review general regression models with a focus on the LASSO and extensions thereof, including the adaptive LASSO, elastic net, and group LASSO. We discuss the regularization terms responsible for inducing coefficient shrinkage and variable selection leading to improved performance metrics of these regression models. This makes these modern, computational regression models valuable tools for analyzing high-dimensional problems. Full article
(This article belongs to the Section Learning)
Figures

Figure 1

Open AccessArticle Recent Advances in Supervised Dimension Reduction: A Survey
Mach. Learn. Knowl. Extr. 2019, 1(1), 341-358; https://doi.org/10.3390/make1010020
Received: 28 October 2018 / Revised: 28 December 2018 / Accepted: 28 December 2018 / Published: 7 January 2019
Viewed by 406 | PDF Full-text (655 KB) | HTML Full-text | XML Full-text
Abstract
Recently, we have witnessed an explosive growth in both the quantity and dimension of data generated, which aggravates the high dimensionality challenge in tasks such as predictive modeling and decision support. Up to now, a large amount of unsupervised dimension reduction methods have [...] Read more.
Recently, we have witnessed an explosive growth in both the quantity and dimension of data generated, which aggravates the high dimensionality challenge in tasks such as predictive modeling and decision support. Up to now, a large amount of unsupervised dimension reduction methods have been proposed and studied. However, there is no specific review focusing on the supervised dimension reduction problem. Most studies performed classification or regression after unsupervised dimension reduction methods. However, we recognize the following advantages if learning the low-dimensional representation and the classification/regression model simultaneously: high accuracy and effective representation. Considering classification or regression as being the main goal of dimension reduction, the purpose of this paper is to summarize and organize the current developments in the field into three main classes: PCA-based, Non-negative Matrix Factorization (NMF)-based, and manifold-based supervised dimension reduction methods, as well as provide elaborated discussions on their advantages and disadvantages. Moreover, we outline a dozen open problems that can be further explored to advance the development of this topic. Full article
(This article belongs to the Section Learning)
Figures

Figure 1

Open AccessArticle Causal Discovery with Attention-Based Convolutional Neural Networks
Mach. Learn. Knowl. Extr. 2019, 1(1), 312-340; https://doi.org/10.3390/make1010019
Received: 5 November 2018 / Revised: 26 December 2018 / Accepted: 27 December 2018 / Published: 7 January 2019
Viewed by 695 | PDF Full-text (1802 KB) | HTML Full-text | XML Full-text
Abstract
Having insight into the causal associations in a complex system facilitates decision making, e.g., for medical treatments, urban infrastructure improvements or financial investments. The amount of observational data grows, which enables the discovery of causal relationships between variables from observation of their behaviour [...] Read more.
Having insight into the causal associations in a complex system facilitates decision making, e.g., for medical treatments, urban infrastructure improvements or financial investments. The amount of observational data grows, which enables the discovery of causal relationships between variables from observation of their behaviour in time. Existing methods for causal discovery from time series data do not yet exploit the representational power of deep learning. We therefore present the Temporal Causal Discovery Framework (TCDF), a deep learning framework that learns a causal graph structure by discovering causal relationships in observational time series data. TCDF uses attention-based convolutional neural networks combined with a causal validation step. By interpreting the internal parameters of the convolutional networks, TCDF can also discover the time delay between a cause and the occurrence of its effect. Our framework learns temporal causal graphs, which can include confounders and instantaneous effects. Experiments on financial and neuroscientific benchmarks show state-of-the-art performance of TCDF on discovering causal relationships in continuous time series data. Furthermore, we show that TCDF can circumstantially discover the presence of hidden confounders. Our broadly applicable framework can be used to gain novel insights into the causal dependencies in a complex system, which is important for reliable predictions, knowledge discovery and data-driven decision making. Full article
(This article belongs to the Special Issue Women in Machine Learning 2018)
Figures

Figure 1

Open AccessArticle Evaluation of ARIMA Models for Human–Machine Interface State Sequence Prediction
Mach. Learn. Knowl. Extr. 2019, 1(1), 287-311; https://doi.org/10.3390/make1010018
Received: 16 November 2018 / Revised: 22 December 2018 / Accepted: 24 December 2018 / Published: 3 January 2019
Viewed by 384 | PDF Full-text (5592 KB) | HTML Full-text | XML Full-text
Abstract
In this paper, auto-regressive integrated moving average (ARIMA) time-series data forecast models are evaluated to ascertain their feasibility in predicting human–machine interface (HMI) state transitions, which are modeled as multivariate time-series patterns. Human–machine interface states generally include changes in their visually displayed information [...] Read more.
In this paper, auto-regressive integrated moving average (ARIMA) time-series data forecast models are evaluated to ascertain their feasibility in predicting human–machine interface (HMI) state transitions, which are modeled as multivariate time-series patterns. Human–machine interface states generally include changes in their visually displayed information brought about due to both process parameter changes and user actions. This approach has wide applications in industrial controls, such as nuclear power plant control rooms and transportation industry, such as aircraft cockpits, etc., to develop non-intrusive real-time monitoring solutions for human operator situational awareness and potentially predicting human-in-the-loop error trend precursors. Full article
(This article belongs to the Section Learning)
Figures

Figure 1

Open AccessArticle Multi-Layer Hidden Markov Model Based Intrusion Detection System
Mach. Learn. Knowl. Extr. 2019, 1(1), 265-286; https://doi.org/10.3390/make1010017
Received: 14 October 2018 / Revised: 12 December 2018 / Accepted: 12 December 2018 / Published: 25 December 2018
Viewed by 533 | PDF Full-text (3220 KB) | HTML Full-text | XML Full-text
Abstract
The all IP nature of the next generation (5G) networks is going to open a lot of doors for new vulnerabilities which are going to be challenging in preventing the risk associated with them. Majority of these vulnerabilities might be impossible to detect [...] Read more.
The all IP nature of the next generation (5G) networks is going to open a lot of doors for new vulnerabilities which are going to be challenging in preventing the risk associated with them. Majority of these vulnerabilities might be impossible to detect with simple networking traffic monitoring tools. Intrusion Detection Systems (IDS) which rely on machine learning and artificial intelligence can significantly improve network defense against intruders. This technology can be trained to learn and identify uncommon patterns in massive volume of traffic and notify, using such as alert flags, system administrators for additional investigation. This paper proposes an IDS design which makes use of machine learning algorithms such as Hidden Markov Model (HMM) using a multi-layer approach. This approach has been developed and verified to resolve the common flaws in the application of HMM to IDS commonly referred as the curse of dimensionality. It factors a huge problem of immense dimensionality to a discrete set of manageable and reliable elements. The multi-layer approach can be expanded beyond 2 layers to capture multi-phase attacks over longer spans of time. A pyramid of HMMs can resolve disparate digital events and signatures across protocols and platforms to actionable information where lower layers identify discrete events (such as network scan) and higher layers new states which are the result of multi-phase events of the lower layers. The concepts of this novel approach have been developed but the full potential has not been demonstrated. Full article
(This article belongs to the Section Learning)
Figures

Figure 1

Open AccessArticle The Winning Solution to the IEEE CIG 2017 Game Data Mining Competition
Mach. Learn. Knowl. Extr. 2019, 1(1), 252-264; https://doi.org/10.3390/make1010016
Received: 1 November 2018 / Revised: 14 December 2018 / Accepted: 16 December 2018 / Published: 20 December 2018
Viewed by 486 | PDF Full-text (684 KB) | HTML Full-text | XML Full-text
Abstract
Machine learning competitions such as those organized by Kaggle or KDD represent a useful benchmark for data science research. In this work, we present our winning solution to the Game Data Mining competition hosted at the 2017 IEEE Conference on Computational Intelligence and [...] Read more.
Machine learning competitions such as those organized by Kaggle or KDD represent a useful benchmark for data science research. In this work, we present our winning solution to the Game Data Mining competition hosted at the 2017 IEEE Conference on Computational Intelligence and Games (CIG 2017). The contest consisted of two tracks, and participants (more than 250, belonging to both industry and academia) were to predict which players would stop playing the game, as well as their remaining lifetime. The data were provided by a major worldwide video game company, NCSoft, and came from their successful massively multiplayer online game Blade and Soul. Here, we describe the long short-term memory approach and conditional inference survival ensemble model that made us win both tracks of the contest, as well as the validation procedure that we followed in order to prevent overfitting. In particular, choosing a survival method able to deal with censored data was crucial to accurately predict the moment in which each player would leave the game, as censoring is inherent in churn. The selected models proved to be robust against evolving conditions—since there was a change in the business model of the game (from subscription-based to free-to-play) between the two sample datasets provided—and efficient in terms of time cost. Thanks to these features and also to their ability to scale to large datasets, our models could be readily implemented in real business settings. Full article
(This article belongs to the Special Issue Women in Machine Learning 2018)
Figures

Figure 1

Open AccessArticle Defining Data Science by a Data-Driven Quantification of the Community
Mach. Learn. Knowl. Extr. 2019, 1(1), 235-251; https://doi.org/10.3390/make1010015
Received: 4 December 2018 / Revised: 14 December 2018 / Accepted: 17 December 2018 / Published: 19 December 2018
Cited by 2 | Viewed by 389 | PDF Full-text (1745 KB) | HTML Full-text | XML Full-text
Abstract
Data science is a new academic field that has received much attention in recent years. One reason for this is that our increasingly digitalized society generates more and more data in all areas of our lives and science and we are desperately seeking [...] Read more.
Data science is a new academic field that has received much attention in recent years. One reason for this is that our increasingly digitalized society generates more and more data in all areas of our lives and science and we are desperately seeking for solutions to deal with this problem. In this paper, we investigate the academic roots of data science. We are using data of scientists and their citations from Google Scholar, who have an interest in data science, to perform a quantitative analysis of the data science community. Furthermore, for decomposing the data science community into its major defining factors corresponding to the most important research fields, we introduce a statistical regression model that is fully automatic and robust with respect to a subsampling of the data. This statistical model allows us to define the ‘importance’ of a field as its predictive abilities. Overall, our method provides an objective answer to the question ‘What is data science?’. Full article
(This article belongs to the Section Data)
Figures

Figure 1

Open AccessArticle Analysis of Machine Learning Algorithms for Opinion Mining in Different Domains
Mach. Learn. Knowl. Extr. 2019, 1(1), 224-234; https://doi.org/10.3390/make1010014
Received: 1 November 2018 / Revised: 29 November 2018 / Accepted: 6 December 2018 / Published: 8 December 2018
Viewed by 479 | PDF Full-text (2311 KB) | HTML Full-text | XML Full-text
Abstract
Sentiment classification (SC) is a reference to the task of sentiment analysis (SA), which is a subfield of natural language processing (NLP) and is used to decide whether textual content implies a positive or negative review. This research focuses on the various machine [...] Read more.
Sentiment classification (SC) is a reference to the task of sentiment analysis (SA), which is a subfield of natural language processing (NLP) and is used to decide whether textual content implies a positive or negative review. This research focuses on the various machine learning (ML) algorithms which are utilized in the analyzation of sentiments and in the mining of reviews in different datasets. Overall, an SC task consists of two phases. The first phase deals with feature extraction (FE). Three different FE algorithms are applied in this research. The second phase covers the classification of the reviews by using various ML algorithms. These are Naïve Bayes (NB), Stochastic Gradient Descent (SGD), Support Vector Machines (SVM), Passive Aggressive (PA), Maximum Entropy (ME), Adaptive Boosting (AdaBoost), Multinomial NB (MNB), Bernoulli NB (BNB), Ridge Regression (RR) and Logistic Regression (LR). The performance of PA with a unigram is the best among other algorithms for all used datasets (IMDB, Cornell Movies, Amazon and Twitter) and provides values that range from 87% to 99.96% for all evaluation metrics. Full article
(This article belongs to the Section Learning)
Figures

Figure 1

Open AccessArticle Using the Outlier Detection Task to Evaluate Distributional Semantic Models
Mach. Learn. Knowl. Extr. 2019, 1(1), 211-223; https://doi.org/10.3390/make1010013
Received: 2 August 2018 / Revised: 16 November 2018 / Accepted: 19 November 2018 / Published: 22 November 2018
Viewed by 377 | PDF Full-text (648 KB) | HTML Full-text | XML Full-text
Abstract
In this article, we define the outlier detection task and use it to compare neural-based word embeddings with transparent count-based distributional representations. Using the English Wikipedia as a text source to train the models, we observed that embeddings outperform count-based representations when their [...] Read more.
In this article, we define the outlier detection task and use it to compare neural-based word embeddings with transparent count-based distributional representations. Using the English Wikipedia as a text source to train the models, we observed that embeddings outperform count-based representations when their contexts are made up of bag-of-words. However, there are no sharp differences between the two models if the word contexts are defined as syntactic dependencies. In general, syntax-based models tend to perform better than those based on bag-of-words for this specific task. Similar experiments were carried out for Portuguese with similar results. The test datasets we have created for the outlier detection task in English and Portuguese are freely available. Full article
(This article belongs to the Special Issue Language Processing and Knowledge Extraction)
Figures

Figure 1

Open AccessOpinion Exploiting Genomic Relations in Big Data Repositories by Graph-Based Search Methods
Mach. Learn. Knowl. Extr. 2019, 1(1), 205-210; https://doi.org/10.3390/make1010012
Received: 26 September 2018 / Revised: 18 November 2018 / Accepted: 21 November 2018 / Published: 22 November 2018
Viewed by 445 | PDF Full-text (4260 KB) | HTML Full-text | XML Full-text
Abstract
We are living at a time that allows the generation of mass data in almost any field of science. For instance, in pharmacogenomics, there exist a number of big data repositories, e.g., the Library of Integrated Network-based Cellular Signatures (LINCS) that provide millions [...] Read more.
We are living at a time that allows the generation of mass data in almost any field of science. For instance, in pharmacogenomics, there exist a number of big data repositories, e.g., the Library of Integrated Network-based Cellular Signatures (LINCS) that provide millions of measurements on the genomics level. However, to translate these data into meaningful information, the data need to be analyzable. The first step for such an analysis is the deliberate selection of subsets of raw data for studying dedicated research questions. Unfortunately, this is a non-trivial problem when millions of individual data files are available with an intricate connection structure induced by experimental dependencies. In this paper, we argue for the need to introduce such search capabilities for big genomics data repositories with a specific discussion about LINCS. Specifically, we suggest the introduction of smart interfaces allowing the exploitation of the connections among individual raw data files, giving raise to a network structure, by graph-based searches. Full article
(This article belongs to the Section Network)
Figures

Figure 1

Open AccessArticle An Algorithm for Generating Invisible Data Poisoning Using Adversarial Noise That Breaks Image Classification Deep Learning
Mach. Learn. Knowl. Extr. 2019, 1(1), 192-204; https://doi.org/10.3390/make1010011
Received: 4 September 2018 / Revised: 28 October 2018 / Accepted: 7 November 2018 / Published: 9 November 2018
Viewed by 476 | PDF Full-text (2184 KB) | HTML Full-text | XML Full-text
Abstract
Today, the main two security issues for deep learning are data poisoning and adversarial examples. Data poisoning consists of perverting a learning system by manipulating a small subset of the training data, while adversarial examples entail bypassing the system at testing time with [...] Read more.
Today, the main two security issues for deep learning are data poisoning and adversarial examples. Data poisoning consists of perverting a learning system by manipulating a small subset of the training data, while adversarial examples entail bypassing the system at testing time with low-amplitude manipulation of the testing sample. Unfortunately, data poisoning that is invisible to human eyes can be generated by adding adversarial noise to the training data. The main contribution of this paper includes a successful implementation of such invisible data poisoning using image classification datasets for a deep learning pipeline. This implementation leads to significant classification accuracy gaps. Full article
Figures

Figure 1

Open AccessReview Particle Swarm Optimization: A Survey of Historical and Recent Developments with Hybridization Perspectives
Mach. Learn. Knowl. Extr. 2019, 1(1), 157-191; https://doi.org/10.3390/make1010010
Received: 1 September 2018 / Revised: 3 October 2018 / Accepted: 4 October 2018 / Published: 10 October 2018
Cited by 5 | Viewed by 1007 | PDF Full-text (17545 KB) | HTML Full-text | XML Full-text
Abstract
Particle Swarm Optimization (PSO) is a metaheuristic global optimization paradigm that has gained prominence in the last two decades due to its ease of application in unsupervised, complex multidimensional problems that cannot be solved using traditional deterministic algorithms. The canonical particle swarm optimizer [...] Read more.
Particle Swarm Optimization (PSO) is a metaheuristic global optimization paradigm that has gained prominence in the last two decades due to its ease of application in unsupervised, complex multidimensional problems that cannot be solved using traditional deterministic algorithms. The canonical particle swarm optimizer is based on the flocking behavior and social co-operation of birds and fish schools and draws heavily from the evolutionary behavior of these organisms. This paper serves to provide a thorough survey of the PSO algorithm with special emphasis on the development, deployment, and improvements of its most basic as well as some of the very recent state-of-the-art implementations. Concepts and directions on choosing the inertia weight, constriction factor, cognition and social weights and perspectives on convergence, parallelization, elitism, niching and discrete optimization as well as neighborhood topologies are outlined. Hybridization attempts with other evolutionary and swarm paradigms in selected applications are covered and an up-to-date review is put forward for the interested reader. Full article
Open AccessViewpoint A Machine Learning Perspective on Personalized Medicine: An Automized, Comprehensive Knowledge Base with Ontology for Pattern Recognition
Mach. Learn. Knowl. Extr. 2019, 1(1), 149-156; https://doi.org/10.3390/make1010009
Received: 28 July 2018 / Revised: 24 August 2018 / Accepted: 4 September 2018 / Published: 8 September 2018
Cited by 1 | Viewed by 1392 | PDF Full-text (908 KB) | HTML Full-text | XML Full-text
Abstract
Personalized or precision medicine is a new paradigm that holds great promise for individualized patient diagnosis, treatment, and care. However, personalized medicine has only been described on an informal level rather than through rigorous practical guidelines and statistical protocols that would allow its [...] Read more.
Personalized or precision medicine is a new paradigm that holds great promise for individualized patient diagnosis, treatment, and care. However, personalized medicine has only been described on an informal level rather than through rigorous practical guidelines and statistical protocols that would allow its robust practical realization for implementation in day-to-day clinical practice. In this paper, we discuss three key factors, which we consider dimensions that effect the experimental design for personalized medicine: (I) phenotype categories; (II) population size; and (III) statistical analysis. This formalization allows us to define personalized medicine from a machine learning perspective, as an automized, comprehensive knowledge base with an ontology that performs pattern recognition of patient profiles. Full article
Figures

Figure 1

Open AccessPerspective Inference of Genome-Scale Gene Regulatory Networks: Are There Differences in Biological and Clinical Validations?
Mach. Learn. Knowl. Extr. 2019, 1(1), 138-148; https://doi.org/10.3390/make1010008
Received: 27 July 2018 / Revised: 17 August 2018 / Accepted: 20 August 2018 / Published: 22 August 2018
Viewed by 615 | PDF Full-text (659 KB) | HTML Full-text | XML Full-text
Abstract
Causal networks, e.g., gene regulatory networks (GRNs) inferred from gene expression data, contain a wealth of information but are defying simple, straightforward and low-budget experimental validations. In this paper, we elaborate on this problem and discuss distinctions between biological and clinical validations. As [...] Read more.
Causal networks, e.g., gene regulatory networks (GRNs) inferred from gene expression data, contain a wealth of information but are defying simple, straightforward and low-budget experimental validations. In this paper, we elaborate on this problem and discuss distinctions between biological and clinical validations. As a result, validation differences for GRNs reflect known differences between basic biological and clinical research questions making the validations context specific. Hence, the meaning of biologically and clinically meaningful GRNs can be very different. For a concerted approach to a problem of this size, we suggest the establishment of the HUMAN GENE REGULATORY NETWORK PROJECT which provides the information required for biological and clinical validations alike. Full article
(This article belongs to the Section Network)
Figures

Figure 1

Open AccessArticle Phi-Delta-Diagrams: Software Implementation of a Visual Tool for Assessing Classifier and Feature Performance
Mach. Learn. Knowl. Extr. 2019, 1(1), 121-137; https://doi.org/10.3390/make1010007
Received: 15 May 2018 / Revised: 22 June 2018 / Accepted: 27 June 2018 / Published: 28 June 2018
Viewed by 914 | PDF Full-text (909 KB) | HTML Full-text | XML Full-text
Abstract
In this article, a two-tiered 2D tool is described, called φ,δ diagrams, and this tool has been devised to support the assessment of classifiers in terms of accuracy and bias. In their standard versions, these diagrams provide information, as [...] Read more.
In this article, a two-tiered 2D tool is described, called φ,δ diagrams, and this tool has been devised to support the assessment of classifiers in terms of accuracy and bias. In their standard versions, these diagrams provide information, as the underlying data were in fact balanced. Their generalization, i.e., ability to account for the imbalance, will be also briefly described. In either case, the isometrics of accuracy and bias are immediately evident therein, as—according to a specific design choice—they are in fact straight lines parallel to the x-axis and y-axis, respectively. φ,δ diagrams can also be used to assess the importance of features, as highly discriminant ones are immediately evident therein. In this paper, a comprehensive introduction on how to adopt φ,δ diagrams as a standard tool for classifier and feature assessment is given. In particular, with the goal of illustrating all relevant details from a pragmatic perspective, their implementation and usage as Python and R packages will be described. Full article
(This article belongs to the Section Visualization)
Figures

Figure 1

Open AccessReview Why Topology for Machine Learning and Knowledge Extraction?
Mach. Learn. Knowl. Extr. 2019, 1(1), 115-120; https://doi.org/10.3390/make1010006
Received: 10 March 2018 / Revised: 26 April 2018 / Accepted: 30 April 2018 / Published: 2 May 2018
Cited by 1 | Viewed by 1173 | PDF Full-text (582 KB) | HTML Full-text | XML Full-text
Abstract
Data has shape, and shape is the domain of geometry and in particular of its “free” part, called topology. The aim of this paper is twofold. First, it provides a brief overview of applications of topology to machine learning and knowledge extraction, as [...] Read more.
Data has shape, and shape is the domain of geometry and in particular of its “free” part, called topology. The aim of this paper is twofold. First, it provides a brief overview of applications of topology to machine learning and knowledge extraction, as well as the motivations thereof. Furthermore, this paper is aimed at promoting cross-talk between the theoretical and applied domains of topology and machine learning research. Such interactions can be beneficial for both the generation of novel theoretical tools and finding cutting-edge practical applications. Full article
(This article belongs to the Section Topology)
Open AccessArticle A Survey of ReRAM-Based Architectures for Processing-In-Memory and Neural Networks
Mach. Learn. Knowl. Extr. 2019, 1(1), 75-114; https://doi.org/10.3390/make1010005
Received: 15 March 2018 / Revised: 16 April 2018 / Accepted: 26 April 2018 / Published: 30 April 2018
Cited by 2 | Viewed by 1595 | PDF Full-text (2919 KB) | HTML Full-text | XML Full-text
Abstract
As data movement operations and power-budget become key bottlenecks in the design of computing systems, the interest in unconventional approaches such as processing-in-memory (PIM), machine learning (ML), and especially neural network (NN)-based accelerators has grown significantly. Resistive random access memory (ReRAM) is a [...] Read more.
As data movement operations and power-budget become key bottlenecks in the design of computing systems, the interest in unconventional approaches such as processing-in-memory (PIM), machine learning (ML), and especially neural network (NN)-based accelerators has grown significantly. Resistive random access memory (ReRAM) is a promising technology for efficiently architecting PIM- and NN-based accelerators due to its capabilities to work as both: High-density/low-energy storage and in-memory computation/search engine. In this paper, we present a survey of techniques for designing ReRAM-based PIM and NN architectures. By classifying the techniques based on key parameters, we underscore their similarities and differences. This paper will be valuable for computer architects, chip designers and researchers in the area of machine learning. Full article
Figures

Figure 1

Open AccessArticle A Machine Learning Approach to Determine Oyster Vessel Behavior
Mach. Learn. Knowl. Extr. 2019, 1(1), 64-74; https://doi.org/10.3390/make1010004
Received: 14 December 2017 / Revised: 20 March 2018 / Accepted: 29 March 2018 / Published: 31 March 2018
Viewed by 1041 | PDF Full-text (7954 KB) | HTML Full-text | XML Full-text
Abstract
In this work, we address a multi-class classification task of oyster vessel behaviors determination by classifying them into four different classes: fishing, traveling, poling (exploring) and docked (anchored). The main purpose of this work is to automate the oyster vessel behaviors determination task [...] Read more.
In this work, we address a multi-class classification task of oyster vessel behaviors determination by classifying them into four different classes: fishing, traveling, poling (exploring) and docked (anchored). The main purpose of this work is to automate the oyster vessel behaviors determination task using machine learning and to explore different techniques to improve the accuracy of the oyster vessel behavior prediction problem. To employ machine learning technique, two important descriptors: speed and net speed, are calculated from the trajectory data, recorded by a satellite communication system (Vessel Management System, VMS) attached to the vessels fishing on the public oyster grounds of Louisiana. We constructed a support vector machine (SVM) based method which employs Radial Basis Function (RBF) as a kernel to accurately predict the behavior of oyster vessels. Several validation and parameter optimization techniques were used to improve the accuracy of the SVM classifier. A total 93% of the trajectory data from a July 2013 to August 2014 dataset consisting of 612,700 samples for which the ground truth can be obtained using rule-based classifier is used for validation and independent testing of our method. The results show that the proposed SVM based method is able to correctly classify 99.99% of 612,700 samples using the 10-fold cross validation. Furthermore, we achieved a precision of 1.00, recall of 1.00, F1-score of 1.00 and a test accuracy of 99.99%, while performing an independent test using a subset of 93% of the dataset, which consists of 31,418 points. Full article
Figures

Figure 1

Open AccessArticle Category Maps Describe Driving Episodes Recorded with Event Data Recorders
Mach. Learn. Knowl. Extr. 2019, 1(1), 43-63; https://doi.org/10.3390/make1010003
Received: 29 January 2018 / Revised: 7 March 2018 / Accepted: 8 March 2018 / Published: 12 March 2018
Viewed by 949 | PDF Full-text (2930 KB) | HTML Full-text | XML Full-text
Abstract
This study was conducted to create driving episodes using machine-learning-based algorithms that address long-term memory (LTM) and topological mapping. This paper presents a novel episodic memory model for driving safety according to traffic scenes. The model incorporates three important features: adaptive resonance theory [...] Read more.
This study was conducted to create driving episodes using machine-learning-based algorithms that address long-term memory (LTM) and topological mapping. This paper presents a novel episodic memory model for driving safety according to traffic scenes. The model incorporates three important features: adaptive resonance theory (ART), which learns time-series features incrementally while maintaining stability and plasticity; self-organizing maps (SOMs), which represent input data as a map with topological relations using self-mapping characteristics; and counter propagation networks (CPNs), which label category maps using input features and counter signals. Category maps represent driving episode information that includes driving contexts and facial expressions. The bursting states of respective maps produce LTM created on ART as episodic memory. For a preliminary experiment using a driving simulator (DS), we measure gazes and face orientations of drivers as their internal information to create driving episodes. Moreover, we measure cognitive distraction according to effects on facial features shown in reaction to simulated near-misses. Evaluation of the experimentally obtained results show the possibility of using recorded driving episodes with image datasets obtained using an event data recorder (EDR) with two cameras. Using category maps, we visualize driving features according to driving scenes on a public road and an expressway. Full article
(This article belongs to the Section Learning)
Figures

Figure 1

Mach. Learn. Knowl. Extr. EISSN 2504-4990 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top