Previous Issue
Volume 4, June
 
 

Mach. Learn. Knowl. Extr., Volume 4, Issue 3 (September 2022) – 9 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Readerexternal link to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
Article
Deep Leaning Based Frequency-Aware Single Image Deraining by Extracting Knowledge from Rain and Background
Mach. Learn. Knowl. Extr. 2022, 4(3), 738-752; https://doi.org/10.3390/make4030035 - 16 Aug 2022
Viewed by 141
Abstract
Due to the requirement of video surveillance, machine learning-based single image deraining has become a research hotspot in recent years. In order to efficiently obtain rain removal images that contain more detailed information, this paper proposed a novel frequency-aware single image deraining network [...] Read more.
Due to the requirement of video surveillance, machine learning-based single image deraining has become a research hotspot in recent years. In order to efficiently obtain rain removal images that contain more detailed information, this paper proposed a novel frequency-aware single image deraining network via the separation of rain and background. For the rainy images, most of the background key information belongs to the low-frequency components, while the high-frequency components are mixed by background image details and rain streaks. This paper attempted to decouple background image details from high frequency components under the guidance of the restored low frequency components. Compared with existing approaches, the proposed network has three major contributions. (1) A residual dense network based on Discrete Wavelet Transform (DWT) was proposed to study the rainy image background information. (2) The frequency channel attention module was introduced into the adaptive decoupling of high-frequency image detail signals. (3) A fusion module was introduced that contains the attention mechanism to make full use of the multi receptive fields information using a two-branch structure, using the context information in a large area. The proposed approach was evaluated using several representative datasets. Experimental results shows this proposed approach outperforms other state-of-the-art deraining algorithms. Full article
Show Figures

Figure 1

Article
VLA-SMILES: Variable-Length-Array SMILES Descriptors in Neural Network-Based QSAR Modeling
Mach. Learn. Knowl. Extr. 2022, 4(3), 715-737; https://doi.org/10.3390/make4030034 - 05 Aug 2022
Viewed by 319
Abstract
Machine learning represents a milestone in data-driven research, including material informatics, robotics, and computer-aided drug discovery. With the continuously growing virtual and synthetically available chemical space, efficient and robust quantitative structure–activity relationship (QSAR) methods are required to uncover molecules with desired properties. Herein, [...] Read more.
Machine learning represents a milestone in data-driven research, including material informatics, robotics, and computer-aided drug discovery. With the continuously growing virtual and synthetically available chemical space, efficient and robust quantitative structure–activity relationship (QSAR) methods are required to uncover molecules with desired properties. Herein, we propose variable-length-array SMILES-based (VLA-SMILES) structural descriptors that expand conventional SMILES descriptors widely used in machine learning. This structural representation extends the family of numerically coded SMILES, particularly binary SMILES, to expedite the discovery of new deep learning QSAR models with high predictive ability. VLA-SMILES descriptors were shown to speed up the training of QSAR models based on multilayer perceptron (MLP) with optimized backpropagation (ATransformedBP), resilient propagation (iRPROP), and Adam optimization learning algorithms featuring rational train–test splitting, while improving the predictive ability toward the more compute-intensive binary SMILES representation format. All the tested MLPs under the same length-array-based SMILES descriptors showed similar predictive ability and convergence rate of training in combination with the considered learning procedures. Validation with the Kennard–Stone train–test splitting based on the structural descriptor similarity metrics was found more effective than the partitioning with the ranking by activity based on biological activity values metrics for the entire set of VLA-SMILES featured QSAR. Robustness and the predictive ability of MLP models based on VLA-SMILES were assessed via the method of QSAR parametric model validation. In addition, the method of the statistical H0 hypothesis testing of the linear regression between real and observed activities based on the F2,n−2 -criteria was used for predictability estimation among VLA-SMILES featured QSAR-MLPs (with n being the volume of the testing set). Both approaches of QSAR parametric model validation and statistical hypothesis testing were found to correlate when used for the quantitative evaluation of predictabilities of the designed QSAR models with VLA-SMILES descriptors. Full article
(This article belongs to the Section Learning)
Show Figures

Graphical abstract

Article
Data Mining Algorithms for Operating Pressure Forecasting of Crude Oil Distribution Pipelines to Identify Potential Blockages
Mach. Learn. Knowl. Extr. 2022, 4(3), 700-714; https://doi.org/10.3390/make4030033 - 21 Jul 2022
Viewed by 409
Abstract
The implementation of data mining has become very popular in many fields recently, including in the petroleum industry. It is widely used to help in decision-making processes in order to minimize oil losses during operations. One of the major causes of loss is [...] Read more.
The implementation of data mining has become very popular in many fields recently, including in the petroleum industry. It is widely used to help in decision-making processes in order to minimize oil losses during operations. One of the major causes of loss is oil flow blockages during transport to the gathering facility, known as the congeal phenomenon. To overcome this situation, real-time surveillance is used to monitor the oil flow condition inside pipes. However, this system is not able to forecast the pipeline pressure on the next several days. The objective of this study is to forecast the pressure several days in advance using real-time pressure data, as well as external factor data recorded by nearby weather stations, such as ambient temperature and precipitation. Three machine learning algorithms—multi-layer perceptron (MLP), long short-term memory (LSTM), and nonlinear autoregressive exogenous model (NARX)—are evaluated and compared with each other using standard regression evaluation metrics, including a steady-state model. As a result, with proper hyperparameters, in the proposed method of NARX with MLP as a regressor, the NARX algorithm showed the best performance among the evaluated algorithms, indicated by the highest values of R2 and lowest values of RMSE. This algorithm is capable of forecasting the pressure with high correlation to actual field data. By forecasting the pressure several days ahead, system owners may take pre-emptive actions to prevent congealing. Full article
(This article belongs to the Section Learning)
Show Figures

Figure 1

Article
Input/Output Variables Selection in Data Envelopment Analysis: A Shannon Entropy Approach
Mach. Learn. Knowl. Extr. 2022, 4(3), 688-699; https://doi.org/10.3390/make4030032 - 14 Jul 2022
Viewed by 353
Abstract
The purpose of this study is to provide an efficient method for the selection of input–output indicators in the data envelopment analysis (DEA) approach, in order to improve the discriminatory power of the DEA method in the evaluation process and performance analysis of [...] Read more.
The purpose of this study is to provide an efficient method for the selection of input–output indicators in the data envelopment analysis (DEA) approach, in order to improve the discriminatory power of the DEA method in the evaluation process and performance analysis of homogeneous decision-making units (DMUs) in the presence of negative values and data. For this purpose, the Shannon entropy technique is used as one of the most important methods for determining the weight of indicators. Moreover, due to the presence of negative data in some indicators, the range directional measure (RDM) model is used as the basic model of the research. Finally, to demonstrate the applicability of the proposed approach, the food and beverage industry has been selected from the Tehran stock exchange (TSE) as a case study, and data related to 15 stocks have been extracted from this industry. The numerical and experimental results indicate the efficacy of the hybrid data envelopment analysis–Shannon entropy (DEASE) approach to evaluate stocks under negative data. Furthermore, the discriminatory power of the proposed DEASE approach is greater than that of a classical DEA model. Full article
Show Figures

Figure 1

Article
Improving Deep Learning for Maritime Remote Sensing through Data Augmentation and Latent Space
Mach. Learn. Knowl. Extr. 2022, 4(3), 665-687; https://doi.org/10.3390/make4030031 - 07 Jul 2022
Viewed by 427
Abstract
Training deep learning models requires having the right data for the problem and understanding both your data and the models’ performance on that data. Training deep learning models is difficult when data are limited, so in this paper, we seek to answer the [...] Read more.
Training deep learning models requires having the right data for the problem and understanding both your data and the models’ performance on that data. Training deep learning models is difficult when data are limited, so in this paper, we seek to answer the following question: how can we train a deep learning model to increase its performance on a targeted area with limited data? We do this by applying rotation data augmentations to a simulated synthetic aperture radar (SAR) image dataset. We use the Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction technique to understand the effects of augmentations on the data in latent space. Using this latent space representation, we can understand the data and choose specific training samples aimed at boosting model performance in targeted under-performing regions without the need to increase training set sizes. Results show that using latent space to choose training data significantly improves model performance in some cases; however, there are other cases where no improvements are made. We show that linking patterns in latent space is a possible predictor of model performance, but results require some experimentation and domain knowledge to determine the best options. Full article
Show Figures

Figure 1

Article
Do We Need a Specific Corpus and Multiple High-Performance GPUs for Training the BERT Model? An Experiment on COVID-19 Dataset
Mach. Learn. Knowl. Extr. 2022, 4(3), 641-664; https://doi.org/10.3390/make4030030 - 04 Jul 2022
Viewed by 495
Abstract
The COVID-19 pandemic has impacted daily lives around the globe. Since 2019, the amount of literature focusing on COVID-19 has risen exponentially. However, it is almost impossible for humans to read all of the studies and classify them. This article proposes a method [...] Read more.
The COVID-19 pandemic has impacted daily lives around the globe. Since 2019, the amount of literature focusing on COVID-19 has risen exponentially. However, it is almost impossible for humans to read all of the studies and classify them. This article proposes a method of making an unsupervised model called a zero-shot classification model, based on the pre-trained BERT model. We used the CORD-19 dataset in conjunction with the LitCovid database to construct new vocabulary and prepare the test dataset. For NLI downstream task, we used three corpora: SNLI, MultiNLI, and MedNLI. We significantly reduced the training time by 98.2639% to build a task-specific machine learning model, using only one Nvidia Tesla V100. The final model can run faster and use fewer resources than its comparators. It has an accuracy of 27.84%, which is lower than the best-achieved accuracy by 6.73%, but it is comparable. Finally, we identified that the tokenizer and vocabulary more specific to COVID-19 could not outperform the generalized ones. Additionally, it was found that BART architecture affects the classification results. Full article
(This article belongs to the Section Learning)
Show Figures

Figure 1

Article
Semantic Image Segmentation Using Scant Pixel Annotations
Mach. Learn. Knowl. Extr. 2022, 4(3), 621-640; https://doi.org/10.3390/make4030029 - 01 Jul 2022
Viewed by 424
Abstract
The success of deep networks for the semantic segmentation of images is limited by the availability of annotated training data. The manual annotation of images for segmentation is a tedious and time-consuming task that often requires sophisticated users with significant domain expertise to [...] Read more.
The success of deep networks for the semantic segmentation of images is limited by the availability of annotated training data. The manual annotation of images for segmentation is a tedious and time-consuming task that often requires sophisticated users with significant domain expertise to create high-quality annotations over hundreds of images. In this paper, we propose the segmentation with scant pixel annotations (SSPA) approach to generate high-performing segmentation models using a scant set of expert annotated images. The models are generated by training them on images with automatically generated pseudo-labels along with a scant set of expert annotated images selected using an entropy-based algorithm. For each chosen image, experts are directed to assign labels to a particular group of pixels, while a set of replacement rules that leverage the patterns learned by the model is used to automatically assign labels to the remaining pixels. The SSPA approach integrates active learning and semi-supervised learning with pseudo-labels, where expert annotations are not essential but generated on demand. Extensive experiments on bio-medical and biofilm datasets show that the SSPA approach achieves state-of-the-art performance with less than 5% cumulative annotation of the pixels of the training data by the experts. Full article
Show Figures

Figure 1

Article
Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study
Mach. Learn. Knowl. Extr. 2022, 4(3), 591-620; https://doi.org/10.3390/make4030028 - 22 Jun 2022
Viewed by 583
Abstract
Machine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted. Methods for the task are desired to combine effectiveness and efficiency (i.e., they should effectively “unlearn” deleted data, but [...] Read more.
Machine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted. Methods for the task are desired to combine effectiveness and efficiency (i.e., they should effectively “unlearn” deleted data, but in a way that does not require excessive computational effort (e.g., a full retraining) for a small amount of deletions). Such a combination is typically achieved by tolerating some amount of approximation in the unlearning. In addition, laws and regulations in the spirit of “the right to be forgotten” have given rise to requirements for certifiability (i.e., the ability to demonstrate that the deleted data has indeed been unlearned by the ML model). In this paper, we present an experimental study of the three state-of-the-art approximate unlearning methods for logistic regression and demonstrate the trade-offs between efficiency, effectiveness and certifiability offered by each method. In implementing this study, we extend some of the existing works and describe a common unlearning pipeline to compare and evaluate the unlearning methods on six real-world datasets and a variety of settings. We provide insights into the effect of the quantity and distribution of the deleted data on ML models and the performance of each unlearning method in different settings. We also propose a practical online strategy to determine when the accumulated error from approximate unlearning is large enough to warrant a full retraining of the ML model. Full article
(This article belongs to the Section Learning)
Show Figures

Figure 1

Article
Real Quadratic-Form-Based Graph Pooling for Graph Neural Networks
Mach. Learn. Knowl. Extr. 2022, 4(3), 580-590; https://doi.org/10.3390/make4030027 - 21 Jun 2022
Viewed by 389
Abstract
Graph neural networks (GNNs) have developed rapidly in recent years because they can work over non-Euclidean data and possess promising prediction power in many real-word applications. The graph classification problem is one of the central problems in graph neural networks, and aims to [...] Read more.
Graph neural networks (GNNs) have developed rapidly in recent years because they can work over non-Euclidean data and possess promising prediction power in many real-word applications. The graph classification problem is one of the central problems in graph neural networks, and aims to predict the label of a graph with the help of training graph neural networks over graph-structural datasets. The graph pooling scheme is an important part of graph neural networks for the graph classification objective. Previous works typically focus on using the graph pooling scheme in a linear manner. In this paper, we propose the real quadratic-form-based graph pooling framework for graph neural networks in graph classification. The quadratic form can capture a pairwise relationship, which brings a stronger expressive power than existing linear forms. Experiments on benchmarks verify the effectiveness of the proposed graph pooling scheme based on the quadratic form in graph classification tasks. Full article
Show Figures

Figure 1

Previous Issue
Back to TopTop