Special Issue "Advances in Artificial Intelligence: Machine Learning, Data Mining and Data Sciences"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 August 2021.

Special Issue Editors

Prof. Sławomir Nowaczyk
E-Mail Website
Guest Editor
Center for Applied Intelligent Systems Research, Halmstad University, Sweden
Interests: machine learning; autonomous knowledge creation; representation learning; aware intelligent systems; predictive maintenance
Dr. Mohamed-Rafik Bouguelia‬
E-Mail Website
Guest Editor
Center for Applied Intelligent Systems Research, Halmstad University, Sweden
Interests: machine learning; anomaly and novelty detection; interactive learning; data stream mining; big data
Dr. Hadi Fanaee
E-Mail Website
Guest Editor
Center for Applied Intelligent Systems Research, Halmstad University, Sweden
Interests: data mining; machine learning; tensor analysis; anomaly detection; time series analysis; spatiotemporal data mining

Special Issue Information

Dear Colleagues,

Machine learning (ML), data mining (DM), and data sciences in general are among the most exciting and rapidly growing research fields today. In recent years, ML and DM have been successfully used to solve practical problems in various domains, including engineering, healthcare, medicine, manufacturing, transportation, and finance.

In this era of big data, considerable research is being focused on designing efficient ML and DM methods. Nonetheless, practical applications of ML face several challenges, such as dealing with either too small or big data, missing and uncertain data, highly multidimensional data, and the need for interpretable ML models that can provide trustable evidence and explanations of the predictions they make. Moreover, in a time where the complexity of systems is continuously growing, it becomes not always feasible to collect clean and exhaustive datasets and produce high-quality labels. In addition, most systems generate data that are subject to change over time due to external conditions resulting in non-stationary data distributions. Therefore, there is a need to do more “knowledge creation”: to develop ML and DM methods that sift through large amounts of streaming data and extract useful high-level knowledge from there, without human supervision or with very little of it. In addition, learning and obtaining good generalization from fewer training examples, efficient data/knowledge representation schemes, knowledge transfer between tasks and domains, and learning to adapt to varying contexts are also examples of important research problems.

To address such problems, this Special Issue invites researchers to contribute new methods and to demonstrate the applicability of existing methods in various fields.

Topics of interest for this Special Issue include but are not limited to the following:

  • Novel methods and algorithms in machine learning, data mining, data science, including data cleaning, clustering, classification, feature selection and extraction, neural networks and deep learning, representation learning, knowledge discovery, anomaly detection, fault detection, transfer learning, and active learning;
  • Solutions improving the state-of-the-art regarding important challenges such as big data, streaming data, time series, interactive learning, concept drift and nonstationary data, change detection, and dimensionality reduction;
  • Applications in various domains, for example, activity and event recognition, computational biology and bioinformatics, computational social science, game playing, healthcare, information retrieval, natural language processing, predictive maintenance, recommender systems, signal processing, web applications, and internet data;
  • Societal challenges associated with AI, such as fairness, accountability, and transparency or privacy, anonymity, and security.

Prof. Sławomir Nowaczyk
Dr. Mohamed-Rafik Bouguelia‬
Dr. Hadi Fanaee
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Active Learning
  • Anomaly Detection
  • Big Data
  • Classification         
  • Clustering 
  • Causal Inference
  • Concept Drift
  • Data Mining
  • Data Science
  • Deep Learning
  • Fairness, Accountability, and Transparency of AI
  • Knowledge Discovery
  • Machine Learning
  • Medical Decision Support Systems
  • Multitask Learning  
  • Neural Networks
  • Predictive Models      
  • Representation Learning     
  • Semi-Supervised Learning 
  • Supervised Learning
  • Transfer Learning    
  • Unsupervised Learning       
  • Predictive Maintenance
  • Privacy, Anonymity, and Security of AI…

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Article
Efficient High-Dimensional Kernel k-Means++ with Random Projection
Appl. Sci. 2021, 11(15), 6963; https://doi.org/10.3390/app11156963 - 28 Jul 2021
Viewed by 148
Abstract
Using random projection, a method to speed up both kernel k-means and centroid initialization with k-means++ is proposed. We approximate the kernel matrix and distances in a lower-dimensional space Rd before the kernel k-means clustering motivated by upper error bounds. With random projections, previous work on bounds for dot products and an improved bound for kernel methods are considered for kernel k-means. The complexities for both kernel k-means with Lloyd’s algorithm and centroid initialization with k-means++ are known to be O(nkD) and Θ(nkD), respectively, with n being the number of data points, the dimensionality of input feature vectors D and the number of clusters k. The proposed method reduces the computational complexity for the kernel computation of kernel k-means from O(n2D) to O(n2d) and the subsequent computation for k-means with Lloyd’s algorithm and centroid initialization from O(nkD) to O(nkd). Our experiments demonstrate that the speed-up of the clustering method with reduced dimensionality d=200 is 2 to 26 times with very little performance degradation (less than one percent) in general. Full article
Show Figures

Figure 1

Article
Identifying the Author Group of Malwares through Graph Embedding and Human-in-the-Loop Classification
Appl. Sci. 2021, 11(14), 6640; https://doi.org/10.3390/app11146640 - 20 Jul 2021
Viewed by 242
Abstract
Malware are developed for various types of malicious attacks, e.g., to gain access to a user’s private information or control of the computer system. The identification and classification of malware has been extensively studied in academic societies and many companies. Beyond the traditional [...] Read more.
Malware are developed for various types of malicious attacks, e.g., to gain access to a user’s private information or control of the computer system. The identification and classification of malware has been extensively studied in academic societies and many companies. Beyond the traditional research areas in this field, including malware detection, malware propagation analysis, and malware family clustering, this paper focuses on identifying the “author group” of a given malware as a means of effective detection and prevention of further malware threats, along with providing evidence for proper legal action. Our framework consists of a malware-feature bipartite graph construction, malware embedding based on DeepWalk, and classification of the target malware based on the k-nearest neighbors (KNN) classification. However, our KNN classifier often faced ambiguous cases, where it should say “I don’t know” rather than attempting to predict something with a high risk of misclassification. Therefore, our framework allows human experts to intervene in the process of classification for the final decision. We also developed a graphical user interface that provides the points of ambiguity for helping human experts to effectively determine the author group of the target malware. We demonstrated the effectiveness of our human-in-the-loop classification framework via extensive experiments using real-world malware data. Full article
Show Figures

Figure 1

Article
Linked Data Triples Enhance Document Relevance Classification
Appl. Sci. 2021, 11(14), 6636; https://doi.org/10.3390/app11146636 - 20 Jul 2021
Viewed by 230
Abstract
Standardized approaches to relevance classification in information retrieval use generative statistical models to identify the presence or absence of certain topics that might make a document relevant to the searcher. These approaches have been used to better predict relevance on the basis of [...] Read more.
Standardized approaches to relevance classification in information retrieval use generative statistical models to identify the presence or absence of certain topics that might make a document relevant to the searcher. These approaches have been used to better predict relevance on the basis of what the document is “about”, rather than a simple-minded analysis of the bag of words contained within the document. In more recent times, this idea has been extended by using pre-trained deep learning models and text representations, such as GloVe or BERT. These use an external corpus as a knowledge-base that conditions the model to help predict what a document is about. This paper adopts a hybrid approach that leverages the structure of knowledge embedded in a corpus. In particular, the paper reports on experiments where linked data triples (subject-predicate-object), constructed from natural language elements are derived from deep learning. These are evaluated as additional latent semantic features for a relevant document classifier in a customized news-feed website. The research is a synthesis of current thinking in deep learning models in NLP and information retrieval and the predicate structure used in semantic web research. Our experiments indicate that linked data triples increased the F-score of the baseline GloVe representations by 6% and show significant improvement over state-of-the art models, like BERT. The findings are tested and empirically validated on an experimental dataset and on two standardized pre-classified news sources, namely the Reuters and 20 News groups datasets. Full article
Show Figures

Figure 1

Article
BHHO-TVS: A Binary Harris Hawks Optimizer with Time-Varying Scheme for Solving Data Classification Problems
Appl. Sci. 2021, 11(14), 6516; https://doi.org/10.3390/app11146516 - 15 Jul 2021
Viewed by 282
Abstract
Data classification is a challenging problem. Data classification is very sensitive to the noise and high dimensionality of the data. Being able to reduce the model complexity can help to improve the accuracy of the classification model performance. Therefore, in this research, we [...] Read more.
Data classification is a challenging problem. Data classification is very sensitive to the noise and high dimensionality of the data. Being able to reduce the model complexity can help to improve the accuracy of the classification model performance. Therefore, in this research, we propose a novel feature selection technique based on Binary Harris Hawks Optimizer with Time-Varying Scheme (BHHO-TVS). The proposed BHHO-TVS adopts a time-varying transfer function that is applied to leverage the influence of the location vector to balance the exploration and exploitation power of the HHO. Eighteen well-known datasets provided by the UCI repository were utilized to show the significance of the proposed approach. The reported results show that BHHO-TVS outperforms BHHO with traditional binarization schemes as well as other binary feature selection methods such as binary gravitational search algorithm (BGSA), binary particle swarm optimization (BPSO), binary bat algorithm (BBA), binary whale optimization algorithm (BWOA), and binary salp swarm algorithm (BSSA). Compared with other similar feature selection approaches introduced in previous studies, the proposed method achieves the best accuracy rates on 67% of datasets. Full article
Show Figures

Figure 1

Article
Opportunities for Machine Learning in District Heating
Appl. Sci. 2021, 11(13), 6112; https://doi.org/10.3390/app11136112 - 30 Jun 2021
Viewed by 385
Abstract
The district heating (DH) industry is facing an important transformation towards more efficient networks that utilise significantly lower water temperatures to distribute the heat. This change requires taking advantage of new technologies, and Machine Learning (ML) is a popular direction. In the last [...] Read more.
The district heating (DH) industry is facing an important transformation towards more efficient networks that utilise significantly lower water temperatures to distribute the heat. This change requires taking advantage of new technologies, and Machine Learning (ML) is a popular direction. In the last decade, we have witnessed an extreme growth in the number of published research papers that focus on applying ML techniques to the DH domain. However, based on our experience in the field, and an extensive review of the state-of-the-art, we perceive a mismatch between the most popular research directions, such as forecasting, and the challenges faced by the DH industry. In this work, we present our findings, explain and demonstrate the key gaps between the two communities and suggest a road-map ahead towards increasing the impact of ML research in the DH industry. Full article
Show Figures

Figure 1

Article
Seismic Reflection Analysis of AETA Electromagnetic Signals
Appl. Sci. 2021, 11(13), 5869; https://doi.org/10.3390/app11135869 - 24 Jun 2021
Viewed by 239
Abstract
Acoustic and electromagnetics to artificial intelligence (AETA) is a system used to predict seismic events through monitoring of electromagnetic and geoacoustic signals. It is widely deployed in the Sichuan–Yunnan region (22° N–34° N, 98° E–107° E) of China. Generally, the electromagnetic signals of [...] Read more.
Acoustic and electromagnetics to artificial intelligence (AETA) is a system used to predict seismic events through monitoring of electromagnetic and geoacoustic signals. It is widely deployed in the Sichuan–Yunnan region (22° N–34° N, 98° E–107° E) of China. Generally, the electromagnetic signals of AETA stations near the epicenter have abnormal disturbances before an earthquake. When a significant decrease or increase in the signal is observed, it is difficult to quantify this change using only visual observation and confirm that it is related to an upcoming large earthquake. Considering that the AETA data comprise a typical time series, current work has analyzed the anomalism of AETA electromagnetic signals using the long short-term memory (LSTM) autoencoder method to prove that the electromagnetic anomaly of the AETA station can be regarded as an earthquake precursor. The results show that there are 2–4% anomalous points and some outliers exceeding 0.7 (after normalization) in the AETA stations within 200 km of the epicenter of the Jiuzaigou earthquake (M. 7.0) and the Yibin earthquake (M. 6.0) half a month before the earthquakes. Therefore, the AETA electromagnetic disturbance signal can be used as an earthquake precursor and for further earthquake prediction. Full article
Show Figures

Figure 1

Article
NPU RGBD Dataset and a Feature-Enhanced LSTM-DGCN Method for Action Recognition of Basketball Players+
Appl. Sci. 2021, 11(10), 4426; https://doi.org/10.3390/app11104426 - 13 May 2021
Viewed by 357
Abstract
Computer vision-based action recognition of basketball players in basketball training and competition has gradually become a research hotspot. However, owing to the complex technical action, diverse background, and limb occlusion, it remains a challenging task without effective solutions or public dataset benchmarks. In [...] Read more.
Computer vision-based action recognition of basketball players in basketball training and competition has gradually become a research hotspot. However, owing to the complex technical action, diverse background, and limb occlusion, it remains a challenging task without effective solutions or public dataset benchmarks. In this study, we defined 32 kinds of atomic actions covering most of the complex actions for basketball players and built the dataset NPU RGB+D (a large scale dataset of basketball action recognition with RGB image data and Depth data captured in Northwestern Polytechnical University) for 12 kinds of actions of 10 professional basketball players with 2169 RGB+D videos and 75 thousand frames, including RGB frame sequences, depth maps, and skeleton coordinates. Through extracting the spatial features of the distances and angles between the joint points of basketball players, we created a new feature-enhanced skeleton-based method called LSTM-DGCN for basketball player action recognition based on the deep graph convolutional network (DGCN) and long short-term memory (LSTM) methods. Many advanced action recognition methods were evaluated on our dataset and compared with our proposed method. The experimental results show that the NPU RGB+D dataset is very competitive with the current action recognition algorithms and that our LSTM-DGCN outperforms the state-of-the-art action recognition methods in various evaluation criteria on our dataset. Our action classifications and this NPU RGB+D dataset are valuable for basketball player action recognition techniques. The feature-enhanced LSTM-DGCN has a more accurate action recognition effect, which improves the motion expression ability of the skeleton data. Full article
Show Figures

Figure 1

Article
Learning-Based Dissimilarity for Clustering Categorical Data
Appl. Sci. 2021, 11(8), 3509; https://doi.org/10.3390/app11083509 - 14 Apr 2021
Viewed by 374
Abstract
Comparing data objects is at the heart of machine learning. For continuous data, object dissimilarity is usually taken to be object distance; however, for categorical data, there is no universal agreement, for categories can be ordered in several different ways. Most existing category [...] Read more.
Comparing data objects is at the heart of machine learning. For continuous data, object dissimilarity is usually taken to be object distance; however, for categorical data, there is no universal agreement, for categories can be ordered in several different ways. Most existing category dissimilarity measures characterize the distance among the values an attribute may take using precisely the number of different values the attribute takes (the attribute space) and the frequency at which they occur. These kinds of measures overlook attribute interdependence, which may provide valuable information when capturing per-attribute object dissimilarity. In this paper, we introduce a novel object dissimilarity measure that we call Learning-Based Dissimilarity, for comparing categorical data. Our measure characterizes the distance between two categorical values of a given attribute in terms of how likely it is that such values are confused or not when all the dataset objects with the remaining attributes are used to predict them. To that end, we provide an algorithm that, given a target attribute, first learns a classification model in order to compute a confusion matrix for the attribute. Then, our method transforms the confusion matrix into a per-attribute dissimilarity measure. We have successfully tested our measure against 55 datasets gathered from the University of California, Irvine (UCI) Machine Learning Repository. Our results show that it surpasses, in terms of various performance indicators for data clustering, the most prominent distance relations put forward in the literature. Full article
Show Figures

Figure 1

Article
EvoSplit: An Evolutionary Approach to Split a Multi-Label Data Set into Disjoint Subsets
Appl. Sci. 2021, 11(6), 2823; https://doi.org/10.3390/app11062823 - 22 Mar 2021
Viewed by 475
Abstract
This paper presents a new evolutionary approach, EvoSplit, for the distribution of multi-label data sets into disjoint subsets for supervised machine learning. Currently, data set providers either divide a data set randomly or using iterative stratification, a method that aims to maintain the [...] Read more.
This paper presents a new evolutionary approach, EvoSplit, for the distribution of multi-label data sets into disjoint subsets for supervised machine learning. Currently, data set providers either divide a data set randomly or using iterative stratification, a method that aims to maintain the label (or label pair) distribution of the original data set into the different subsets. Following the same aim, this paper first introduces a single-objective evolutionary approach that tries to obtain a split that maximizes the similarity between those distributions independently. Second, a new multi-objective evolutionary algorithm is presented to maximize the similarity considering simultaneously both distributions (labels and label pairs). Both approaches are validated using well-known multi-label data sets as well as large image data sets currently used in computer vision and machine learning applications. EvoSplit improves the splitting of a data set in comparison to the iterative stratification following different measures: Label Distribution, Label Pair Distribution, Examples Distribution, folds and fold-label pairs with zero positive examples. Full article
Show Figures

Figure 1

Article
Recommendation Systems: Algorithms, Challenges, Metrics, and Business Opportunities
Appl. Sci. 2020, 10(21), 7748; https://doi.org/10.3390/app10217748 - 02 Nov 2020
Cited by 10 | Viewed by 1429
Abstract
Recommender systems are widely used to provide users with recommendations based on their preferences. With the ever-growing volume of information online, recommender systems have been a useful tool to overcome information overload. The utilization of recommender systems cannot be overstated, given its potential [...] Read more.
Recommender systems are widely used to provide users with recommendations based on their preferences. With the ever-growing volume of information online, recommender systems have been a useful tool to overcome information overload. The utilization of recommender systems cannot be overstated, given its potential influence to ameliorate many over-choice challenges. There are many types of recommendation systems with different methodologies and concepts. Various applications have adopted recommendation systems, including e-commerce, healthcare, transportation, agriculture, and media. This paper provides the current landscape of recommender systems research and identifies directions in the field in various applications. This article provides an overview of the current state of the art in recommendation systems, their types, challenges, limitations, and business adoptions. To assess the quality of a recommendation system, qualitative evaluation metrics are discussed in the paper. Full article
Show Figures

Figure 1

Article
Transitional SAX Representation for Knowledge Discovery for Time Series
Appl. Sci. 2020, 10(19), 6980; https://doi.org/10.3390/app10196980 - 06 Oct 2020
Viewed by 792
Abstract
Numerous dimensionality-reducing representations of time series have been proposed in data mining and have proved to be useful, especially in handling a high volume of time series data. Among them, widely used symbolic representations such as symbolic aggregate approximation and piecewise aggregate approximation [...] Read more.
Numerous dimensionality-reducing representations of time series have been proposed in data mining and have proved to be useful, especially in handling a high volume of time series data. Among them, widely used symbolic representations such as symbolic aggregate approximation and piecewise aggregate approximation focus on information of local averages of time series. To compensate for such methods, several attempts were made to include trend information. However, the included trend information is quite simple, leading to great information loss. Such information is hardly extendable, so adjusting the level of simplicity to a higher complexity is difficult. In this paper, we propose a new symbolic representation method called transitional symbolic aggregate approximation that incorporates transitional information into symbolic aggregate approximations. We show that the proposed method, satisfying a lower bound of the Euclidean distance, is able to preserve meaningful information, including dynamic trend transitions in segmented time series, while still reducing dimensionality. We also show that this method is advantageous from theoretical aspects of interpretability, and practical and superior in terms of time-series classification tasks when compared with existing symbolic representation methods. Full article
Show Figures

Figure 1

Back to TopTop