Topic Editors

Department of Engineering and Architecture, University of Parma, Parco Area delle Scienze, 181/A, 43124 Parma, Italy
Department of Software Engineering and Artificial Intelligence (DISIA), Faculty of Computer Science and Engineering, Office 431, Universidad Complutense de Madrid (UCM), 28040 Madrid, Spain
Computer Science, Stockton University, Galloway, NJ 08205, USA

Machine and Deep Learning

Abstract submission deadline
closed (31 December 2022)
Manuscript submission deadline
closed (31 March 2023)
Viewed by
779246

Topic Information

Dear Colleagues,

Our society is facing a new era of automation, not only in industry but also in our daily lives. Computers are everywhere, and their employment is no longer relegated to just industry or work but also to entertainment and leisure. Computing and artificial intelligence are not simply scientific lab experiments for publishing papers in major journals and conferences but opportunities to make our lives better.  

Among the different fields of artificial intelligence, machine learning is certainly one of the most studied in recent years. There has been a gigantic shift in the last few decades due to the birth of deep learning, which has opened unprecedented theoretic and application-based opportunities.

In this context, advances in machine and deep learning are discovered on a daily basis, but still much has to be learned. For instance, the functioning of deep learning architectures is still partially obscure and explaining it will foster new applications, algorithms and architectures. While deep learning is considered the hottest topic of artificial intelligence nowadays, still much interest is raised by “traditional” machine learning, especially in (but not limited to) new learning paradigms, extendibility to big/huge data applications, and optimization.

Even more diffused, then, are the (new) applications of machine and deep learning, to finance, healthcare, sustainability, climate science, neuroscience, to name a few. Continuing and improving the research in machine and deep learning will not only be a chance for new surprising discoveries, but also a way to contribute to our wellbeing and economical growth.

Prof. Dr. Andrea Prati
Dr. Luis Javier García Villalba
Prof. Dr. Vincent A. Cicirello
Topic Editors

Keywords

  • machine learning
  • deep learning
  • natural language processing
  • text mining
  • active learning
  • clustering
  • regression
  • data mining
  • web mining
  • online learning
  • ranking in machine learning
  • reinforcement learning
  • transfer learning
  • semi-supervised learning
  • zero- and few-shot learning
  • time series analysis
  • unsupervised learning
  • deep learning architectures
  • generative models
  • deep reinforcement learning
  • learning theory (bandits, game theory, statistical learning theory, etc.)
  • optimization (convex and non-convex optimization, matrix/tensor methods, sparsity, etc.)
  • probabilistic methods (e.g., variational inference, causal inference, Gaussian processes)
  • probabilistic inference (Bayesian methods, graphical models, Monte Carlo methods, etc.)
  • evolution-based methods
  • explanation-based learning
  • multi-agent learning
  • neuroscience and cognitive science (e.g., neural coding, brain–computer interfaces)
  • trustworthy machine learning (accountability, causality, fairness, privacy, robustness, etc.)
  • applications (e.g., speech processing, computational biology, computer vision, NLP)

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.5 5.3 2011 17.8 Days CHF 2400
Big Data and Cognitive Computing
BDCC
3.7 7.1 2017 18 Days CHF 1800
Mathematics
mathematics
2.3 4.0 2013 17.1 Days CHF 2600
Electronics
electronics
2.6 5.3 2012 16.8 Days CHF 2400
Entropy
entropy
2.1 4.9 1999 22.4 Days CHF 2600

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (269 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
21 pages, 773 KiB  
Article
Parkinson’s Disease Detection Using Hybrid LSTM-GRU Deep Learning Model
by Amjad Rehman, Tanzila Saba, Muhammad Mujahid, Faten S. Alamri and Narmine ElHakim
Electronics 2023, 12(13), 2856; https://doi.org/10.3390/electronics12132856 - 28 Jun 2023
Cited by 17 | Viewed by 4319
Abstract
Parkinson’s disease is the second-most common cause of death and disability as well as the most prevalent neurological disorder. In the last 15 years, the number of cases of PD has doubled. The accurate detection of PD in the early stages is one [...] Read more.
Parkinson’s disease is the second-most common cause of death and disability as well as the most prevalent neurological disorder. In the last 15 years, the number of cases of PD has doubled. The accurate detection of PD in the early stages is one of the most challenging tasks to ensure individuals can continue to live with as little interference as possible. Yet there are not enough trained neurologists around the world to detect Parkinson’s disease in its early stages. Machine learning methods based on Artificial intelligence have acquired a lot of popularity over the past few decades in medical disease detection. However, these methods do not provide an accurate and timely diagnosis. The overall detection accuracy of machine learning-related models is inadequate. This study collected data from 31 male and female patients, including 195 voices. Approximately six recordings were created per patient, with the length of each recording extending from 1 to 36 s. These voices were recorded in a soundproof studio using an Industrial Acoustics Company (IAC) AKG-C420 head-mounted microphone. The data set was collected to investigate the diagnostic significance of speech and voice abnormalities caused by Parkinson’s disease. An imbalanced dataset is the main contributor of model overfitting and generalization errors, and hence one class has the majority of samples and the other class has minority samples. This problem is addressed in this study by utilizing the three sampling techniques. After balancing the datasets, each class has the same number of samples, which has proven valuable in improving the model’s performance and reducing the overfitting problem. Four performance metrics such as accuracy, precision, recall and f1 score are used to evaluate the effectiveness of the proposed hybrid model. Experiments demonstrated that the proposed model achieved 100% accuracy, recall and f1 score using the balanced dataset with the random oversampling technique and 100% precision, 97% recall, 99% AUC score and 91% f1 score with the SMOTE technique. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

10 pages, 868 KiB  
Article
Efficient Meta-Learning through Task-Specific Pseudo Labelling
by Sanghyuk Lee, Seunghyun Lee and Byung Cheol Song
Electronics 2023, 12(13), 2757; https://doi.org/10.3390/electronics12132757 - 21 Jun 2023
Cited by 2 | Viewed by 1273
Abstract
Meta-learning is attracting attention as a crucial tool for few-show learning tasks. Meta-learning involves the establishment and acquisition of “meta-knowledge”, enabling the ability to adapt to a novel field using only limited data. Transductive meta-learning has garnered increasing attention as a solution to [...] Read more.
Meta-learning is attracting attention as a crucial tool for few-show learning tasks. Meta-learning involves the establishment and acquisition of “meta-knowledge”, enabling the ability to adapt to a novel field using only limited data. Transductive meta-learning has garnered increasing attention as a solution to the sample bias problem arising from meta-learning’s reliance on a limited support set for adaptation. This approach surpasses the traditional inductive learning perspective, aiming to address this issue effectively. Transductive meta-learning infers the class of each instance in time by considering the relation of instances in the test set. In order to enhance the effectiveness of transductive meta-learning, this paper introduces a novel technique called task-specific pseudo labelling. The main idea is to produce synthetic labels for unannotated query sets by propagating labels from annotated support sets. This approach allows the utilization of the supervised setting as is, while incorporating the unannotated query set into the adjustment procedure. Consequently, our approach enables handling a larger number of examples during adaptation compared to inductive approaches, leading to improved classification performance of the model. Notably, this approach represents the first instance of employing task adaptation within the context of pseudo labelling. Based on the experimental outcomes in the evaluation configurations of few-shot learning, specifically in the 5-way 1-shot setup, the proposed method demonstrates noteworthy enhancements over two existing meta-learning algorithms, with improvements of 6.75% and 5.03%, respectively. Consequently, the proposed method establishes a new state-of-the-art performance in the realm of transductive meta-learning. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

12 pages, 1158 KiB  
Article
Machine Learning and Cochlear Implantation: Predicting the Post-Operative Electrode Impedances
by Yousef A. Alohali, Mahmoud Samir Fayed, Yassin Abdelsamad, Fida Almuhawas, Asma Alahmadi, Tamer Mesallam and Abdulrahman Hagr
Electronics 2023, 12(12), 2720; https://doi.org/10.3390/electronics12122720 - 18 Jun 2023
Cited by 2 | Viewed by 3438
Abstract
Cochlear implantation is the common treatment for severe to profound sensorineural hearing loss if there is no benefit from hearing aids. Measuring the electrode impedance along the electrode array at different time points after surgery is crucial in verifying the electrodes’ status, determining [...] Read more.
Cochlear implantation is the common treatment for severe to profound sensorineural hearing loss if there is no benefit from hearing aids. Measuring the electrode impedance along the electrode array at different time points after surgery is crucial in verifying the electrodes’ status, determining the compliance levels, and helping to identify the electric dynamic range. Increased impedance values without proper reprogramming can affect the patient’s performance. The prediction of acceptable levels of electrode impedance at different time points after the surgery could help clinicians during the fitting sessions through a comparison of the predicted with the measured levels. Accordingly, clinicians can decide if the measured levels are within the predicted normal range or not. In this work, we used a dataset of 80 pediatric patients who had received cochlear implants with the MED-EL FLEX 28 electrode array. We predicted the impedance of the electrode arrays in each channel at different time points: at one month, three months, six months, and one year after the date of surgery. We used different machine learning algorithms such as linear regression, Bayesian linear regression, decision forest regression, boosted decision tree regression, and neural networks. The used features include the patient’s age and the intra-operative electrode impedance at different electrodes. Our results indicated that the best algorithm varies depending on the channel, while the Bayesian linear regression and neural networks provide the best results for 75% of the channels. Furthermore, the accuracy level ranges between 83% and 100% in half of the channels one year after the surgery, when an error range between 0 and 3 KΩ is defined as an acceptable threshold. Moreover, the use of the patient’s age alone can provide the best prediction results for 50% of the channels at six months or one year after surgery. This reflects that the patient’s age could be a predictor of the electrode impedance after the surgery. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

15 pages, 719 KiB  
Article
KNN-Based Machine Learning Classifier Used on Deep Learned Spatial Motion Features for Human Action Recognition
by Kalaivani Paramasivam, Mohamed Mansoor Roomi Sindha and Sathya Bama Balakrishnan
Entropy 2023, 25(6), 844; https://doi.org/10.3390/e25060844 - 25 May 2023
Cited by 7 | Viewed by 2110
Abstract
Human action recognition is an essential process in surveillance video analysis, which is used to understand the behavior of people to ensure safety. Most of the existing methods for HAR use computationally heavy networks such as 3D CNN and two-stream networks. To alleviate [...] Read more.
Human action recognition is an essential process in surveillance video analysis, which is used to understand the behavior of people to ensure safety. Most of the existing methods for HAR use computationally heavy networks such as 3D CNN and two-stream networks. To alleviate the challenges in the implementation and training of 3D deep learning networks, which have more parameters, a customized lightweight directed acyclic graph-based residual 2D CNN with fewer parameters was designed from scratch and named HARNet. A novel pipeline for the construction of spatial motion data from raw video input is presented for the latent representation learning of human actions. The constructed input is fed to the network for simultaneous operation over spatial and motion information in a single stream, and the latent representation learned at the fully connected layer is extracted and fed to the conventional machine learning classifiers for action recognition. The proposed work was empirically verified, and the experimental results were compared with those for existing methods. The results show that the proposed method outperforms state-of-the-art (SOTA) methods with a percentage improvement of 2.75% on UCF101, 10.94% on HMDB51, and 0.18% on the KTH dataset. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

17 pages, 2957 KiB  
Article
Autonomous Driving Decision Control Based on Improved Proximal Policy Optimization Algorithm
by Qingpeng Song, Yuansheng Liu, Ming Lu, Jun Zhang, Han Qi, Ziyu Wang and Zijian Liu
Appl. Sci. 2023, 13(11), 6400; https://doi.org/10.3390/app13116400 - 24 May 2023
Cited by 1 | Viewed by 1626
Abstract
The decision-making control of autonomous driving in complex urban road environments is a difficult problem in the research of autonomous driving. In order to solve the problem of high dimensional state space and sparse reward in autonomous driving decision control in this environment, [...] Read more.
The decision-making control of autonomous driving in complex urban road environments is a difficult problem in the research of autonomous driving. In order to solve the problem of high dimensional state space and sparse reward in autonomous driving decision control in this environment, this paper proposed a Coordinated Convolution Multi-Reward Proximal Policy Optimization (CCMR-PPO). This method reduces the dimension of the bird’s-eye view data through the coordinated convolution network and then fuses the processed data with the vehicle state data as the input of the algorithm to optimize the state space. The control commands acc (acc represents throttle and brake) and steer of the vehicle are used as the output of the algorithm.. Comprehensively considering the lateral error, safety distance, speed, and other factors of the vehicle, a multi-objective reward mechanism was designed to alleviate the sparse reward. Experiments on the CARLA simulation platform show that the proposed method can effectively increase the performance: compared with the PPO algorithm, the line crossed times are reduced by 24 %, and the number of tasks completed is increased by 54 %. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

22 pages, 4078 KiB  
Article
Multi-Class Document Classification Using Lexical Ontology-Based Deep Learning
by Ilkay Yelmen, Ali Gunes and Metin Zontul
Appl. Sci. 2023, 13(10), 6139; https://doi.org/10.3390/app13106139 - 17 May 2023
Cited by 4 | Viewed by 2147
Abstract
With the recent growth of the Internet, the volume of data has also increased. In particular, the increase in the amount of unstructured data makes it difficult to manage data. Classification is also needed in order to be able to use the data [...] Read more.
With the recent growth of the Internet, the volume of data has also increased. In particular, the increase in the amount of unstructured data makes it difficult to manage data. Classification is also needed in order to be able to use the data for various purposes. Since it is difficult to manually classify the ever-increasing volume data for the purpose of various types of analysis and evaluation, automatic classification methods are needed. In addition, the performance of imbalanced and multi-class classification is a challenging task. As the number of classes increases, so does the number of decision boundaries a learning algorithm has to solve. Therefore, in this paper, an improvement model is proposed using WordNet lexical ontology and BERT to perform deeper learning on the features of text, thereby improving the classification effect of the model. It was observed that classification success increased when using WordNet 11 general lexicographer files based on synthesis sets, syntactic categories, and logical groupings. WordNet was used for feature dimension reduction. In experimental studies, word embedding methods were used without dimension reduction. Afterwards, Random Forest (RF), Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) algorithms were employed to perform classification. These studies were then repeated with dimension reduction performed by WordNet. In addition to the machine learning model, experiments were also conducted with the pretrained BERT model with and without WordNet. The experimental results showed that, on an unstructured, seven-class, imbalanced dataset, the highest accuracy value of 93.77% was obtained when using our proposed model. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

19 pages, 19289 KiB  
Article
Improving Graphite Ore Grade Identification with a Novel FRCNN-PGR Method Based on Deep Learning
by Junchen Xiang, Haoyu Shi, Xueyu Huang and Daogui Chen
Appl. Sci. 2023, 13(8), 5179; https://doi.org/10.3390/app13085179 - 21 Apr 2023
Cited by 2 | Viewed by 1819
Abstract
Graphite stone is widely used in various industries, including the refractory, battery making, steel making, expanded graphite, brake pads, casting coatings, and lubricants industries. In the mineral processing industry, an effective and accurate diagnostic method based on FRCNN-PGR is proposed and evaluated, which [...] Read more.
Graphite stone is widely used in various industries, including the refractory, battery making, steel making, expanded graphite, brake pads, casting coatings, and lubricants industries. In the mineral processing industry, an effective and accurate diagnostic method based on FRCNN-PGR is proposed and evaluated, which involves cutting images to expand the dataset, combining them using the faster R-CNN model with high and low feature layers, and adding a global attention mechanism, Relation-Aware Global Attention Network (RGA), to extract features of interest from both the space and channel. The proposed model outperforms the original faster R-CNN model with 80.21% mAP and 87.61% recall on the split graphite mine dataset. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

26 pages, 1249 KiB  
Article
KHGCN: Knowledge-Enhanced Recommendation with Hierarchical Graph Capsule Network
by Fukun Chen, Guisheng Yin, Yuxin Dong, Gesu Li and Weiqi Zhang
Entropy 2023, 25(4), 697; https://doi.org/10.3390/e25040697 - 20 Apr 2023
Cited by 5 | Viewed by 3353
Abstract
Knowledge graphs as external information has become one of the mainstream directions of current recommendation systems. Various knowledge-graph-representation methods have been proposed to promote the development of knowledge graphs in related fields. Knowledge-graph-embedding methods can learn entity information and complex relationships between the [...] Read more.
Knowledge graphs as external information has become one of the mainstream directions of current recommendation systems. Various knowledge-graph-representation methods have been proposed to promote the development of knowledge graphs in related fields. Knowledge-graph-embedding methods can learn entity information and complex relationships between the entities in knowledge graphs. Furthermore, recently proposed graph neural networks can learn higher-order representations of entities and relationships in knowledge graphs. Therefore, the complete presentation in the knowledge graph enriches the item information and alleviates the cold start of the recommendation process and too-sparse data. However, the knowledge graph’s entire entity and relation representation in personalized recommendation tasks will introduce unnecessary noise information for different users. To learn the entity-relationship presentation in the knowledge graph while effectively removing noise information, we innovatively propose a model named knowledgeenhanced hierarchical graph capsule network (KHGCN), which can extract node embeddings in graphs while learning the hierarchical structure of graphs. Our model eliminates noisy entities and relationship representations in the knowledge graph by the entity disentangling for the recommendation and introduces the attentive mechanism to strengthen the knowledge-graph aggregation. Our model learns the presentation of entity relationships by an original graph capsule network. The capsule neural networks represent the structured information between the entities more completely. We validate the proposed model on real-world datasets, and the validation results demonstrate the model’s effectiveness. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

17 pages, 21723 KiB  
Article
Multi-Mode Data Generation and Fault Diagnosis of Bearings Based on STFT-SACGAN
by Hongxing Wang, Hua Zhu and Huafeng Li
Electronics 2023, 12(8), 1910; https://doi.org/10.3390/electronics12081910 - 18 Apr 2023
Cited by 5 | Viewed by 1461
Abstract
To achieve multi-mode fault sample generation and fault diagnosis of bearings in a complex operating environment with scarce labeled data. Combining a semi-supervised generative adversarial network (SGAN) and an auxiliary classifier generative adversarial network (ACGAN), a semi-supervised auxiliary classifier generative adversarial network (SACGAN) [...] Read more.
To achieve multi-mode fault sample generation and fault diagnosis of bearings in a complex operating environment with scarce labeled data. Combining a semi-supervised generative adversarial network (SGAN) and an auxiliary classifier generative adversarial network (ACGAN), a semi-supervised auxiliary classifier generative adversarial network (SACGAN) is constructed in this paper. The network structure and the loss function are improved. A fault diagnosis method based on STFT-SACGAN is also proposed. The method uses a short-time Fourier transform (STFT) to convert one-dimensional time-domain vibration signals of bearings into two-dimensional time-frequency images, which are used as the input of SACGAN. Two multi-mode fault data generation and intelligent diagnosis cases for bearings are studied. The experimental results show that the proposed method generates high-quality multi-mode fault samples with high fault diagnosis accuracy, generalization, and stability. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

35 pages, 2329 KiB  
Review
A Survey on Deep Learning Based Segmentation, Detection and Classification for 3D Point Clouds
by Prasoon Kumar Vinodkumar, Dogus Karabulut, Egils Avots, Cagri Ozcinar and Gholamreza Anbarjafari
Entropy 2023, 25(4), 635; https://doi.org/10.3390/e25040635 - 10 Apr 2023
Cited by 11 | Viewed by 5679
Abstract
The computer vision, graphics, and machine learning research groups have given a significant amount of focus to 3D object recognition (segmentation, detection, and classification). Deep learning approaches have lately emerged as the preferred method for 3D segmentation problems as a result of their [...] Read more.
The computer vision, graphics, and machine learning research groups have given a significant amount of focus to 3D object recognition (segmentation, detection, and classification). Deep learning approaches have lately emerged as the preferred method for 3D segmentation problems as a result of their outstanding performance in 2D computer vision. As a result, many innovative approaches have been proposed and validated on multiple benchmark datasets. This study offers an in-depth assessment of the latest developments in deep learning-based 3D object recognition. We discuss the most well-known 3D object recognition models, along with evaluations of their distinctive qualities. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

14 pages, 3251 KiB  
Article
CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement
by Haozhe Chen and Xiaojuan Zhang
Entropy 2023, 25(4), 628; https://doi.org/10.3390/e25040628 - 6 Apr 2023
Cited by 1 | Viewed by 2044
Abstract
In recent years, neural networks based on attention mechanisms have seen increasingly use in speech recognition, separation, and enhancement, as well as other fields. In particular, the convolution-augmented transformer has performed well, as it can combine the advantages of convolution and self-attention. Recently, [...] Read more.
In recent years, neural networks based on attention mechanisms have seen increasingly use in speech recognition, separation, and enhancement, as well as other fields. In particular, the convolution-augmented transformer has performed well, as it can combine the advantages of convolution and self-attention. Recently, the gated attention unit (GAU) was proposed. Compared with traditional multi-head self-attention, approaches with GAU are effective and computationally efficient. In this CGA-MGAN: MetricGAN based on Convolution-augmented Gated Attention for Speech Enhancement, we propose a network for speech enhancement called CGA-MGAN, a kind of MetricGAN based on convolution-augmented gated attention. CGA-MGAN captures local and global correlations in speech signals at the same time by fusing convolution and gated attention units. Experiments on Voice Bank + DEMAND show that our proposed CGA-MGAN model achieves excellent performance (3.47 PESQ, 0.96 STOI, and 11.09 dB SSNR) with a relatively small model size (1.14 M). Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

15 pages, 503 KiB  
Article
A Unified Approach to Nested and Non-Nested Slots for Spoken Language Understanding
by Xue Wan, Wensheng Zhang, Mengxing Huang, Siling Feng and Yuanyuan Wu
Electronics 2023, 12(7), 1748; https://doi.org/10.3390/electronics12071748 - 6 Apr 2023
Cited by 2 | Viewed by 1408
Abstract
As chatbots become more popular, multi-intent spoken language understanding (SLU) has received unprecedented attention. Multi-intent SLU, which primarily comprises the two subtasks of multiple intent detection (ID) and slot filling (SF), has the potential for widespread implementation. The two primary issues with the [...] Read more.
As chatbots become more popular, multi-intent spoken language understanding (SLU) has received unprecedented attention. Multi-intent SLU, which primarily comprises the two subtasks of multiple intent detection (ID) and slot filling (SF), has the potential for widespread implementation. The two primary issues with the current approaches are as follows: (1) They cannot solve the problem of slot nesting; (2) The performance and inference rate of the model are not high enough. To address these issues, we suggest a multi-intent joint model based on global pointers to handle nested and non-nested slots. Firstly, we constructed a multi-dimensional type-slot label interaction network (MTLN) for subsequent intent decoding to enhance the implicit correlation between intents and slots, which allows for more adequate information about each other. Secondly, the global pointer network (GP) was introduced, which not only deals with nested and non-nested slots and slot incoherence but also has a faster inference rate and better performance than the baseline model. On two multi-intent datasets, the proposed model achieves state-of-the-art results on MixATIS with 1.6% improvement of intent Acc, 0.1% improvement of slot F1 values, 3.1% improvement of sentence Acc values, and 1.2%, 1.1% and 4.5% performance improvements on MixSNIPS, respectively. Meanwhile, the inference rate is also improved. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

14 pages, 2378 KiB  
Article
Automated Segmentation to Make Hidden Trigger Backdoor Attacks Robust against Deep Neural Networks
by Saqib Ali, Sana Ashraf, Muhammad Sohaib Yousaf, Shazia Riaz and Guojun Wang
Appl. Sci. 2023, 13(7), 4599; https://doi.org/10.3390/app13074599 - 5 Apr 2023
Cited by 1 | Viewed by 2033
Abstract
The successful outcomes of deep learning (DL) algorithms in diverse fields have prompted researchers to consider backdoor attacks on DL models to defend them in practical applications. Adversarial examples could deceive a safety-critical system, which could lead to hazardous situations. To cope with [...] Read more.
The successful outcomes of deep learning (DL) algorithms in diverse fields have prompted researchers to consider backdoor attacks on DL models to defend them in practical applications. Adversarial examples could deceive a safety-critical system, which could lead to hazardous situations. To cope with this, we suggested a segmentation technique that makes hidden trigger backdoor attacks more robust. The tiny trigger patterns are conventionally established by a series of parameters encompassing their DNN size, location, color, shape, and other defining attributes. From the original triggers, alternate triggers are generated to control the backdoor patterns by a third party in addition to their original designer, which can produce a higher success rate than the original triggers. However, the significant downside of these approaches is the lack of automation in the scene segmentation phase, which results in the poor optimization of the threat model. We developed a novel technique that automatically generates alternate triggers to increase the effectiveness of triggers. Image denoising is performed for this purpose, followed by scene segmentation techniques to make the poisoned classifier more robust. The experimental results demonstrated that our proposed technique achieved 99% to 100% accuracy and helped reduce the vulnerabilities of DL models by exposing their loopholes. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

17 pages, 4060 KiB  
Article
Multi-Modal Fake News Detection via Bridging the Gap between Modals
by Peng Liu, Wenhua Qian, Dan Xu, Bingling Ren and Jinde Cao
Entropy 2023, 25(4), 614; https://doi.org/10.3390/e25040614 - 4 Apr 2023
Cited by 6 | Viewed by 2468
Abstract
Multi-modal fake news detection aims to identify fake information through text and corresponding images. The current methods purely combine images and text scenarios by a vanilla attention module but there exists a semantic gap between different scenarios. To address this issue, we introduce [...] Read more.
Multi-modal fake news detection aims to identify fake information through text and corresponding images. The current methods purely combine images and text scenarios by a vanilla attention module but there exists a semantic gap between different scenarios. To address this issue, we introduce an image caption-based method to enhance the model’s ability to capture semantic information from images. Formally, we integrate image description information into the text to bridge the semantic gap between text and images. Moreover, to optimize image utilization and enhance the semantic interaction between images and text, we combine global and object features from the images for the final representation. Finally, we leverage a transformer to fuse the above multi-modal content. We carried out extensive experiments on two publicly available datasets, and the results show that our proposed method significantly improves performance compared to other existing methods. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

13 pages, 2796 KiB  
Article
Utility Analysis about Log Data Anomaly Detection Based on Federated Learning
by Tae-Ho Shin and Soo-Hyung Kim
Appl. Sci. 2023, 13(7), 4495; https://doi.org/10.3390/app13074495 - 1 Apr 2023
Cited by 3 | Viewed by 1835
Abstract
Logs that record system information are managed in anomaly detection, and more efficient anomaly detection methods have been proposed due to their increase in complexity and scale. Accordingly, deep learning models that automatically detect system anomalies through log data learning have been proposed. [...] Read more.
Logs that record system information are managed in anomaly detection, and more efficient anomaly detection methods have been proposed due to their increase in complexity and scale. Accordingly, deep learning models that automatically detect system anomalies through log data learning have been proposed. However, in existing log anomaly detection models, user logs are collected from the central server system, exposing the data collection process to the risk of leaking sensitive information. A distributed learning method, federated learning, is a trend proposed for artificial intelligence learning regarding sensitive information because it guarantees the anonymity of the collected user data and collects only weights learned from each local server in the central server. In this paper, we executed an experiment regarding system log anomaly detection using federated learning. The results demonstrate the feasibility of applying federated learning in deep-learning-based system-log anomaly detection compared to the existing centralized learning method. Moreover, we present an efficient deep-learning model based on federated learning for system log anomaly detection. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

14 pages, 3955 KiB  
Article
TS-CGANet: A Two-Stage Complex and Real Dual-Path Sub-Band Fusion Network for Full-Band Speech Enhancement
by Haozhe Chen and Xiaojuan Zhang
Appl. Sci. 2023, 13(7), 4431; https://doi.org/10.3390/app13074431 - 31 Mar 2023
Viewed by 1583
Abstract
Speech enhancement based on deep neural networks faces difficulties, as modeling more frequency bands can lead to a decrease in the resolution of low-frequency bands and increase the computational complexity. Previously, we proposed a convolution-augmented gated attention unit (CGAU), which captured local and [...] Read more.
Speech enhancement based on deep neural networks faces difficulties, as modeling more frequency bands can lead to a decrease in the resolution of low-frequency bands and increase the computational complexity. Previously, we proposed a convolution-augmented gated attention unit (CGAU), which captured local and global correlation in speech signals through the fusion of the convolution and gated attention unit. In this paper, we further improved the CGAU, and proposed a two-stage complex and real dual-path sub-band fusion network for full-band speech enhancement called TS-CGANet. Specifically, we proposed a dual-path CGA network to enhance low-band (0–8 kHz) speech signals. In the medium band (8–16 kHz) and high band (16–24 kHz), noise suppression is only performed in the magnitude domain. The Voice Bank+DEMAND dataset was used to conduct experiments on the proposed TS-CGANet, which consistently outperformed state-of-the-art full-band baselines, as evidenced by the results. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

20 pages, 4334 KiB  
Article
Energy Dispatch for CCHP System in Summer Based on Deep Reinforcement Learning
by Wenzhong Gao and Yifan Lin
Entropy 2023, 25(3), 544; https://doi.org/10.3390/e25030544 - 21 Mar 2023
Cited by 4 | Viewed by 1609
Abstract
Combined cooling, heating, and power (CCHP) system is an effective solution to solve energy and environmental problems. However, due to the demand-side load uncertainty, load-prediction error, environmental change, and demand charge, the energy dispatch optimization of the CCHP system is definitely a tough [...] Read more.
Combined cooling, heating, and power (CCHP) system is an effective solution to solve energy and environmental problems. However, due to the demand-side load uncertainty, load-prediction error, environmental change, and demand charge, the energy dispatch optimization of the CCHP system is definitely a tough challenge. In view of this, this paper proposes a dispatch method based on the deep reinforcement learning (DRL) algorithm, DoubleDQN, to generate an optimal dispatch strategy for the CCHP system in the summer. By integrating DRL, this method does not require any prediction information, and can adapt to the load uncertainty. The simulation result shows that compared with strategies based on benchmark policies and DQN, the proposed dispatch strategy not only well preserves the thermal comfort, but also reduces the total intra-month cost by 0.13~31.32%, of which the demand charge is reduced by 2.19~46.57%. In addition, this method is proven to have the potential to be applied in the real world by testing under extended scenarios. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

18 pages, 6162 KiB  
Article
Extraction of Interconnect Parasitic Capacitance Matrix Based on Deep Neural Network
by Yaoyao Ma, Xiaoyu Xu, Shuai Yan, Yaxing Zhou, Tianyu Zheng, Zhuoxiang Ren and Lan Chen
Electronics 2023, 12(6), 1440; https://doi.org/10.3390/electronics12061440 - 17 Mar 2023
Cited by 4 | Viewed by 2498
Abstract
Interconnect parasitic capacitance extraction is crucial in analyzing VLSI circuits’ delay and crosstalk. This paper uses the deep neural network (DNN) to predict the parasitic capacitance matrix of a two-dimensional pattern. To save the DNN training time, the neural network’s output includes only [...] Read more.
Interconnect parasitic capacitance extraction is crucial in analyzing VLSI circuits’ delay and crosstalk. This paper uses the deep neural network (DNN) to predict the parasitic capacitance matrix of a two-dimensional pattern. To save the DNN training time, the neural network’s output includes only coupling capacitances in the matrix, and total capacitances are obtained by summing corresponding predicted coupling capacitances. In this way, we can obtain coupling and total capacitances simultaneously using a single neural network. Moreover, we introduce a mirror flip method to augment the datasets computed by the finite element method (FEM), which doubles the dataset size and reduces data preparation efforts. Then, we compare the prediction accuracy of DNN with another neural network ResNet. The result shows that DNN performs better in this case. Moreover, to verify our method’s efficiency, the total capacitances calculated from the trained DNN are compared with the network (named DNN-2) that takes the total capacitance as an extra output. The results show that the prediction accuracy of the two methods is very close, indicating that our method is reliable and can save the training workload for the total capacitance. Finally, a solving efficiency comparison shows that the average computation time of the trained DNN for one case is not more than 2% of that of FEM. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

17 pages, 6183 KiB  
Article
Feature Fusion and Metric Learning Network for Zero-Shot Sketch-Based Image Retrieval
by Honggang Zhao, Mingyue Liu and Mingyong Li
Entropy 2023, 25(3), 502; https://doi.org/10.3390/e25030502 - 14 Mar 2023
Cited by 3 | Viewed by 1702
Abstract
Zero-shot sketch-based image retrieval (ZS-SBIR) is an important computer vision problem. The image category in the test phase is a new category that was not visible in the training stage. Because sketches are extremely abstract, the commonly used backbone networks (such as VGG-16 [...] Read more.
Zero-shot sketch-based image retrieval (ZS-SBIR) is an important computer vision problem. The image category in the test phase is a new category that was not visible in the training stage. Because sketches are extremely abstract, the commonly used backbone networks (such as VGG-16 and ResNet-50) cannot handle both sketches and photos. Semantic similarities between the same features in photos and sketches are difficult to reflect in deep models without textual assistance. To solve this problem, we propose a novel and effective feature embedding model called Attention Map Feature Fusion (AMFF). The AMFF model combines the excellent feature extraction capability of the ResNet-50 network with the excellent representation ability of the attention network. By processing the residuals of the ResNet-50 network, the attention map is finally obtained without introducing external semantic knowledge. Most previous approaches treat the ZS-SBIR problem as a classification problem, which ignores the huge domain gap between sketches and photos. This paper proposes an effective method to optimize the entire network, called domain-aware triplets (DAT). Domain feature discrimination and semantic feature embedding can be learned through DAT. In this paper, we also use the classification loss function to stabilize the training process to avoid getting trapped in a local optimum. Compared with the state-of-the-art methods, our method shows a superior performance. For example, on the Tu-berlin dataset, we achieved 61.2 + 1.2% Prec200. On the Sketchy_c100 dataset, we achieved 62.3 + 3.3% mAPall and 75.5 + 1.5% Prec100. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

20 pages, 10865 KiB  
Article
Rock Image Classification Based on EfficientNet and Triplet Attention Mechanism
by Zhihao Huang, Lumei Su, Jiajun Wu and Yuhan Chen
Appl. Sci. 2023, 13(5), 3180; https://doi.org/10.3390/app13053180 - 1 Mar 2023
Cited by 15 | Viewed by 3870
Abstract
Rock image classification is a fundamental and crucial task in the creation of geological surveys. Traditional rock image classification methods mainly rely on manual operation, resulting in high costs and unstable accuracy. While existing methods based on deep learning models have overcome the [...] Read more.
Rock image classification is a fundamental and crucial task in the creation of geological surveys. Traditional rock image classification methods mainly rely on manual operation, resulting in high costs and unstable accuracy. While existing methods based on deep learning models have overcome the limitations of traditional methods and achieved intelligent image classification, they still suffer from low accuracy due to suboptimal network structures. In this study, a rock image classification model based on EfficientNet and a triplet attention mechanism is proposed to achieve accurate end-to-end classification. The model was built on EfficientNet, which boasts an efficient network structure thanks to NAS technology and a compound model scaling method, thus achieving high accuracy for rock image classification. Additionally, the triplet attention mechanism was introduced to address the shortcoming of EfficientNet in feature expression and enable the model to fully capture the channel and spatial attention information of rock images, further improving accuracy. During network training, transfer learning was employed by loading pre-trained model parameters into the classification model, which accelerated convergence and reduced training time. The results show that the classification model with transfer learning achieved 92.6% accuracy in the training set and 93.2% Top-1 accuracy in the test set, outperforming other mainstream models and demonstrating strong robustness and generalization ability. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

9 pages, 543 KiB  
Communication
Detecting Phishing Accounts on Ethereum Based on Transaction Records and EGAT
by Xuanchen Zhou, Wenzhong Yang and Xiaodan Tian
Electronics 2023, 12(4), 993; https://doi.org/10.3390/electronics12040993 - 16 Feb 2023
Cited by 6 | Viewed by 2247
Abstract
In recent years, the losses caused by scams on Ethereum have reached a level that cannot be ignored. As one of the most rampant crimes, phishing scams have caused a huge economic loss to blockchain platforms and users. Under these circumstances, to address [...] Read more.
In recent years, the losses caused by scams on Ethereum have reached a level that cannot be ignored. As one of the most rampant crimes, phishing scams have caused a huge economic loss to blockchain platforms and users. Under these circumstances, to address the threat to the financial security of blockchain, an Edge Aggregated Graph Attention Network (EGAT) based on the static subgraph representation of the transaction network is proposed. This study intends to detect Ethereum phishing accounts through the classification of transaction network subgraphs with the following procedures. Firstly, the accounts are used as nodes and the flow of transaction funds is used as directed edges to construct the transaction network graph. Secondly, the transaction record data of phishing accounts in the publicly available Ethereum are analyzed and statistical features of Value, Gas, and Timestamp values are manually constructed as node and edge features of the graph. Finally, the features are extracted and classified using the EGAT network. According to the experimental results, the Recall of the proposed method from the article is 99.3% on the dataset of phishing accounts. As demonstrated, the EGAT is more efficient and accurate compared with Graph2Vec and DeepWalk, and the graph structure features can express semantics better than manual features and simple transaction networks, which effectively improves the performance of phishing account detection. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

12 pages, 365 KiB  
Article
TKRM: Learning a Transfer Kernel Regression Model for Cross-Database Micro-Expression Recognition
by Zixuan Chen, Cheng Lu, Feng Zhou and Yuan Zong
Mathematics 2023, 11(4), 918; https://doi.org/10.3390/math11040918 - 11 Feb 2023
Viewed by 1288
Abstract
Cross-database micro-expression recognition (MER) is a more challenging task than the conventional one because its labeled training (source) and unlabeled testing (target) micro-expression (ME) samples are from different databases. In this circumstance, a large feature-distribution gap may exist between the source and target [...] Read more.
Cross-database micro-expression recognition (MER) is a more challenging task than the conventional one because its labeled training (source) and unlabeled testing (target) micro-expression (ME) samples are from different databases. In this circumstance, a large feature-distribution gap may exist between the source and target ME samples due to the different sample sources, which decreases the recognition performance of existing MER methods. In this paper, we focus on this challenging task by proposing a simple yet effective method called the transfer kernel regression model (TKRM). The basic idea of TKRM is to find an ME-discriminative, database-invariant and common reproduced kernel Hilbert space (RKHS) to bridge MEs belonging to different databases. For this purpose, TKRM has the ME discriminative ability of learning a kernel mapping operator to generate an RKHS and build the relationship between the kernelized ME features and labels in such RKHS. Meanwhile, an additional novel regularization term called target sample reconstruction (TSR) is also designed to benefit kernel mapping operator learning by improving the database-invariant ability of TKRM while preserving the ME-discriminative one. To evaluate the proposed TKRM method, we carried out extensive cross-database MER experiments on widely used micro-expression databases, including CASME II and SMIC. Experimental results obtained proved that the proposed TKRM method is indeed superior to recent state-of-the-art domain adaptation methods for cross-database MER. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

21 pages, 2300 KiB  
Article
FAD: Fine-Grained Adversarial Detection by Perturbation Intensity Classification
by Jin-Tao Yang, Hao Jiang, Hao Li, Dong-Sheng Ye and Wei Jiang
Entropy 2023, 25(2), 335; https://doi.org/10.3390/e25020335 - 11 Feb 2023
Cited by 1 | Viewed by 1691
Abstract
Adversarial examples present a severe threat to deep neural networks’ application in safetycritical domains such as autonomous driving. Although there are numerous defensive solutions, they all have some flaws, such as the fact that they can only defend against adversarial attacks with a [...] Read more.
Adversarial examples present a severe threat to deep neural networks’ application in safetycritical domains such as autonomous driving. Although there are numerous defensive solutions, they all have some flaws, such as the fact that they can only defend against adversarial attacks with a limited range of adversarial intensities. Therefore, there is a need for a detection method that can distinguish the adversarial intensity in a fine-grained manner so that subsequent tasks can perform different defense processing against perturbations of various intensities. Based on thefact that adversarial attack samples of different intensities are significantly different in the highfrequency region, this paper proposes a method to amplify the high-frequency component of the image and input it into the deep neural network based on the residual block structure. To our best knowledge, the proposed method is the first to classify adversarial intensities at a fine-grained level, thus providing an attack detection component for a general AI firewall. Experimental results show that our proposed method not only has advanced performance in AutoAttack detection by perturbation intensity classification, but also can effectively apply to detect examples of unseen adversarial attack methods. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

26 pages, 9501 KiB  
Article
A Score-Based Approach for Training Schrödinger Bridges for Data Modelling
by Ludwig Winkler, Cesar Ojeda and Manfred Opper
Entropy 2023, 25(2), 316; https://doi.org/10.3390/e25020316 - 8 Feb 2023
Cited by 3 | Viewed by 2849
Abstract
A Schrödinger bridge is a stochastic process connecting two given probability distributions over time. It has been recently applied as an approach for generative data modelling. The computational training of such bridges requires the repeated estimation of the drift function for a time-reversed [...] Read more.
A Schrödinger bridge is a stochastic process connecting two given probability distributions over time. It has been recently applied as an approach for generative data modelling. The computational training of such bridges requires the repeated estimation of the drift function for a time-reversed stochastic process using samples generated by the corresponding forward process. We introduce a modified score- function-based method for computing such reverse drifts, which can be efficiently implemented by a feed-forward neural network. We applied our approach to artificial datasets with increasing complexity. Finally, we evaluated its performance on genetic data, where Schrödinger bridges can be used to model the time evolution of single-cell RNA measurements. Full article
(This article belongs to the Topic Machine and Deep Learning)
(This article belongs to the Section Information Theory, Probability and Statistics)
Show Figures

Figure 1

17 pages, 343 KiB  
Article
A Reasonable Effectiveness of Features in Modeling Visual Perception of User Interfaces
by Maxim Bakaev, Sebastian Heil and Martin Gaedke
Big Data Cogn. Comput. 2023, 7(1), 30; https://doi.org/10.3390/bdcc7010030 - 8 Feb 2023
Viewed by 1774
Abstract
Training data for user behavior models that predict subjective dimensions of visual perception are often too scarce for deep learning methods to be applicable. With the typical datasets in HCI limited to thousands or even hundreds of records, feature-based approaches are still widely [...] Read more.
Training data for user behavior models that predict subjective dimensions of visual perception are often too scarce for deep learning methods to be applicable. With the typical datasets in HCI limited to thousands or even hundreds of records, feature-based approaches are still widely used in visual analysis of graphical user interfaces (UIs). In our paper, we benchmarked the predictive accuracy of the two types of neural network (NN) models, and explored the effects of the number of features, and the dataset volume. To this end, we used two datasets that comprised over 4000 webpage screenshots, assessed by 233 subjects per the subjective dimensions of Complexity, Aesthetics and Orderliness. With the experimental data, we constructed and trained 1908 models. The feature-based NNs demonstrated 16.2%-better mean squared error (MSE) than the convolutional NNs (a modified GoogLeNet architecture); however, the CNNs’ accuracy improved with the larger dataset volume, whereas the ANNs’ did not: therefore, provided that the effect of more data on the models’ error improvement is linear, the CNNs should become superior at dataset sizes over 3000 UIs. Unexpectedly, adding more features to the NN models caused the MSE to somehow increase by 1.23%: although the difference was not significant, this confirmed the importance of careful feature engineering. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

15 pages, 7944 KiB  
Article
Gradient Agreement Hinders the Memorization of Noisy Labels
by Shaotian Yan, Xiang Tian, Rongxin Jiang and Yaowu Chen
Appl. Sci. 2023, 13(3), 1823; https://doi.org/10.3390/app13031823 - 31 Jan 2023
Viewed by 1369
Abstract
The performance of deep neural networks (DNNs) critically relies on high-quality annotations, while training DNNs with noisy labels remains challenging owing to their incredible capacity to memorize the entire training set. In this work, we use two synchronously trained networks to reveal that [...] Read more.
The performance of deep neural networks (DNNs) critically relies on high-quality annotations, while training DNNs with noisy labels remains challenging owing to their incredible capacity to memorize the entire training set. In this work, we use two synchronously trained networks to reveal that noisy labels may result in more divergent gradients when updating the parameters. To overcome this, we propose a novel co-training framework named gradient agreement learning (GAL). By dynamically evaluating the gradient agreement coefficient of every pair of parameters from two identical DNNs to determine whether to update them in the training process. GAL can effectively hinder the memorization of noisy labels. Furthermore, we utilize the pseudo labels produced by the two DNNs as the supervision for the training of another network, thereby gaining further improvement by correcting some noisy labels while overcoming the confirmation bias. Extensive experiments on various benchmark datasets demonstrate the superiority of the proposed GAL. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

27 pages, 2357 KiB  
Article
Technical Study of Deep Learning in Cloud Computing for Accurate Workload Prediction
by Zaakki Ahamed, Maher Khemakhem, Fathy Eassa, Fawaz Alsolami and Abdullah S. Al-Malaise Al-Ghamdi
Electronics 2023, 12(3), 650; https://doi.org/10.3390/electronics12030650 - 28 Jan 2023
Cited by 8 | Viewed by 2820
Abstract
Proactive resource management in Cloud Services not only maximizes cost effectiveness but also enables issues such as Service Level Agreement (SLA) violations and the provisioning of resources to be overcome. Workload prediction using Deep Learning (DL) is a popular method of inferring complicated [...] Read more.
Proactive resource management in Cloud Services not only maximizes cost effectiveness but also enables issues such as Service Level Agreement (SLA) violations and the provisioning of resources to be overcome. Workload prediction using Deep Learning (DL) is a popular method of inferring complicated multidimensional data of cloud environments to meet this requirement. The overall quality of the model depends on the quality of the data as much as the architecture. Therefore, the data sourced to train the model must be of good quality. However, existing works in this domain have either used a singular data source or have not taken into account the importance of uniformity for unbiased and accurate analysis. This results in the efficacy of DL models suffering. In this paper, we provide a technical analysis of using DL models such as Recurrent Neural Networks (RNN), Multilayer Perception (MLP), Long Short-Term Memory (LSTM), and, Convolutional Neural Networks (CNN) to exploit the time series characteristics of real-world workloads from the Parallel Workloads Archive of the Standard Workload Format (SWF) with the aim of conducting an unbiased analysis. The robustness of these models is evaluated using the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) error metrics. The findings of these highlight that the LSTM model exhibits the best performance compared to the other models. Additionally, to the best of our knowledge, insights of DL in workload prediction of cloud computing environments is insufficient in the literature. To address these challenges, we provide a comprehensive background on resource management and load prediction using DL. Then, we break down the models, error metrics, and data sources across different bodies of work. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

13 pages, 867 KiB  
Article
Maximum Entropy Exploration in Contextual Bandits with Neural Networks and Energy Based Models
by Adam Elwood, Marco Leonardi, Ashraf Mohamed and Alessandro Rozza
Entropy 2023, 25(2), 188; https://doi.org/10.3390/e25020188 - 18 Jan 2023
Cited by 1 | Viewed by 1841
Abstract
Contextual bandits can solve a huge range of real-world problems. However, current popular algorithms to solve them either rely on linear models or unreliable uncertainty estimation in non-linear models, which are required to deal with the exploration–exploitation trade-off. Inspired by theories of human [...] Read more.
Contextual bandits can solve a huge range of real-world problems. However, current popular algorithms to solve them either rely on linear models or unreliable uncertainty estimation in non-linear models, which are required to deal with the exploration–exploitation trade-off. Inspired by theories of human cognition, we introduce novel techniques that use maximum entropy exploration, relying on neural networks to find optimal policies in settings with both continuous and discrete action spaces. We present two classes of models, one with neural networks as reward estimators, and the other with energy based models, which model the probability of obtaining an optimal reward given an action. We evaluate the performance of these models in static and dynamic contextual bandit simulation environments. We show that both techniques outperform standard baseline algorithms, such as NN HMC, NN Discrete, Upper Confidence Bound, and Thompson Sampling, where energy based models have the best overall performance. This provides practitioners with new techniques that perform well in static and dynamic settings, and are particularly well suited to non-linear scenarios with continuous action spaces. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

18 pages, 3117 KiB  
Article
Long-Range Dependence Involutional Network for Logo Detection
by Xingzhuo Li, Sujuan Hou, Baisong Zhang, Jing Wang, Weikuan Jia and Yuanjie Zheng
Entropy 2023, 25(1), 174; https://doi.org/10.3390/e25010174 - 15 Jan 2023
Cited by 7 | Viewed by 2504
Abstract
Logo detection is one of the crucial branches in computer vision due to various real-world applications, such as automatic logo detection and recognition, intelligent transportation, and trademark infringement detection. Compared with traditional handcrafted-feature-based methods, deep learning-based convolutional neural networks (CNNs) can learn both [...] Read more.
Logo detection is one of the crucial branches in computer vision due to various real-world applications, such as automatic logo detection and recognition, intelligent transportation, and trademark infringement detection. Compared with traditional handcrafted-feature-based methods, deep learning-based convolutional neural networks (CNNs) can learn both low-level and high-level image features. Recent decades have witnessed the great feature representation capabilities of deep CNNs and their variants, which have been very good at discovering intricate structures in high-dimensional data and are thereby applicable to many domains including logo detection. However, logo detection remains challenging, as existing detection methods cannot solve well the problems of a multiscale and large aspect ratios. In this paper, we tackle these challenges by developing a novel long-range dependence involutional network (LDI-Net). Specifically, we designed a strategy that combines a new operator and a self-attention mechanism via rethinking the intrinsic principle of convolution called long-range dependence involution (LD involution) to alleviate the detection difficulties caused by large aspect ratios. We also introduce a multilevel representation neural architecture search (MRNAS) to detect multiscale logo objects by constructing a novel multipath topology. In addition, we implemented an adaptive RoI pooling module (ARM) to improve detection efficiency by addressing the problem of logo deformation. Comprehensive experiments on four benchmark logo datasets demonstrate the effectiveness and efficiency of the proposed approach. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

24 pages, 1042 KiB  
Article
Constructing Traceability Links between Software Requirements and Source Code Based on Neural Networks
by Peng Dai, Li Yang, Yawen Wang, Dahai Jin and Yunzhan Gong
Mathematics 2023, 11(2), 315; https://doi.org/10.3390/math11020315 - 7 Jan 2023
Cited by 1 | Viewed by 2746
Abstract
Software requirement changes, code changes, software reuse, and testing are important activities in software engineering that involve the traceability links between software requirements and code. Software requirement documents, design documents, code documents, and test case documents are the intermediate products of software development. [...] Read more.
Software requirement changes, code changes, software reuse, and testing are important activities in software engineering that involve the traceability links between software requirements and code. Software requirement documents, design documents, code documents, and test case documents are the intermediate products of software development. The lack of interrelationship between these documents can make it extremely difficult to change and maintain the software. Frequent requirements and code changes are inevitable in software development. Software reuse, change impact analysis, and testing also require the relationship between software requirements and code. Using these traceability links can improve the efficiency and quality of related software activities. Existing methods for constructing these links need to be better automated and accurate. To address these problems, we propose to embed software requirements and source code into feature vectors containing their semantic information based on four neural networks (NBOW, RNN, CNN, and self-attention). Accurate traceability links from requirements to code are established by comparing the similarity between these vectors. We develop a prototype tool RCT based on this method. These four networks’ performances in constructing links are explored on 18 open-source projects. The experimental results show that the self-attention network performs best, with an average Recall@50 value of 0.687 on the 18 projects, which is higher than the other three neural network models and much higher than previous approaches using information retrieval and machine learning. Full article
(This article belongs to the Topic Machine and Deep Learning)
(This article belongs to the Section Network Science)
Show Figures

Figure 1

16 pages, 3697 KiB  
Article
FCKDNet: A Feature Condensation Knowledge Distillation Network for Semantic Segmentation
by Wenhao Yuan, Xiaoyan Lu, Rongfen Zhang and Yuhong Liu
Entropy 2023, 25(1), 125; https://doi.org/10.3390/e25010125 - 7 Jan 2023
Cited by 1 | Viewed by 2353
Abstract
As a popular research subject in the field of computer vision, knowledge distillation (KD) is widely used in semantic segmentation (SS). However, based on the learning paradigm of the teacher–student model, the poor quality of teacher network feature knowledge still hinders the development [...] Read more.
As a popular research subject in the field of computer vision, knowledge distillation (KD) is widely used in semantic segmentation (SS). However, based on the learning paradigm of the teacher–student model, the poor quality of teacher network feature knowledge still hinders the development of KD technology. In this paper, we investigate the output features of the teacher–student network and propose a feature condensation-based KD network (FCKDNet), which reduces pseudo-knowledge transfer in the teacher–student network. First, combined with the pixel information entropy calculation rule, we design a feature condensation method to separate the foreground feature knowledge from the background noise of the teacher network outputs. Then, the obtained feature condensation matrix is applied to the original outputs of the teacher and student networks to improve the feature representation capability. In addition, after performing feature condensation on the teacher network, we propose a soft enhancement method of features based on spatial and channel dimensions to improve the dependency of pixels in the feature maps. Finally, we divide the outputs of the teacher network into spatial condensation features and channel condensation features and perform distillation loss calculation with the student network separately to assist the student network to converge faster. Extensive experiments on the public datasets Pascal VOC and Cityscapes demonstrate that our proposed method improves the baseline by 3.16% and 2.98% in terms of mAcc, and 2.03% and 2.30% in terms of mIoU, respectively, and has better segmentation performance and robustness than the mainstream methods. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

10 pages, 962 KiB  
Article
Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
by Hongliang Fu, Zhihao Zhuang, Yang Wang, Chen Huang and Wenzhuo Duan
Entropy 2023, 25(1), 124; https://doi.org/10.3390/e25010124 - 7 Jan 2023
Cited by 6 | Viewed by 2279
Abstract
To solve the problem of feature distribution discrepancy in cross-corpus speech emotion recognition tasks, this paper proposed an emotion recognition model based on multi-task learning and subdomain adaptation, which alleviates the impact on emotion recognition. Existing methods have shortcomings in speech feature representation [...] Read more.
To solve the problem of feature distribution discrepancy in cross-corpus speech emotion recognition tasks, this paper proposed an emotion recognition model based on multi-task learning and subdomain adaptation, which alleviates the impact on emotion recognition. Existing methods have shortcomings in speech feature representation and cross-corpus feature distribution alignment. The proposed model uses a deep denoising auto-encoder as a shared feature extraction network for multi-task learning, and the fully connected layer and softmax layer are added before each recognition task as task-specific layers. Subsequently, the subdomain adaptation algorithm of emotion and gender features is added to the shared network to obtain the shared emotion features and gender features of the source domain and target domain, respectively. Multi-task learning effectively enhances the representation ability of features, a subdomain adaptive algorithm promotes the migrating ability of features and effectively alleviates the impact of feature distribution differences in emotional features. The average results of six cross-corpus speech emotion recognition experiments show that, compared with other models, the weighted average recall rate is increased by 1.89~10.07%, the experimental results verify the validity of the proposed model. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

19 pages, 597 KiB  
Article
Survey of Reinforcement-Learning-Based MAC Protocols for Wireless Ad Hoc Networks with a MAC Reference Model
by Zhichao Zheng, Shengming Jiang, Ruoyu Feng, Lige Ge and Chongchong Gu
Entropy 2023, 25(1), 101; https://doi.org/10.3390/e25010101 - 3 Jan 2023
Cited by 15 | Viewed by 3737
Abstract
In this paper, we conduct a survey of the literature about reinforcement learning (RL)-based medium access control (MAC) protocols. As the scale of the wireless ad hoc network (WANET) increases, traditional MAC solutions are becoming obsolete. Dynamic topology, resource allocation, interference management, limited [...] Read more.
In this paper, we conduct a survey of the literature about reinforcement learning (RL)-based medium access control (MAC) protocols. As the scale of the wireless ad hoc network (WANET) increases, traditional MAC solutions are becoming obsolete. Dynamic topology, resource allocation, interference management, limited bandwidth and energy constraint are crucial problems needing resolution for designing modern WANET architectures. In order for future MAC protocols to overcome the current limitations in frequently changing WANETs, more intelligence need to be deployed to maintain efficient communications. After introducing some classic RL schemes, we investigate the existing state-of-the-art MAC protocols and related solutions for WANETs according to the MAC reference model and discuss how each proposed protocol works and the challenging issues on the related MAC model components. Finally, this paper discusses future research directions on how RL can be used to enable MAC protocols for high performance. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

13 pages, 1478 KiB  
Article
Deep Interest Network Based on Knowledge Graph Embedding
by Dehai Zhang, Haoxing Wang, Xiaobo Yang, Yu Ma, Jiashu Liang and Anquan Ren
Appl. Sci. 2023, 13(1), 357; https://doi.org/10.3390/app13010357 - 27 Dec 2022
Cited by 2 | Viewed by 1725
Abstract
Recommendation systems based on knowledge graphs often obtain user preferences through the user’s click matrix. However, the click matrix represents static data and cannot represent the dynamic preferences of users over time. Therefore, we propose DINK, a knowledge graph-based deep interest exploration network, [...] Read more.
Recommendation systems based on knowledge graphs often obtain user preferences through the user’s click matrix. However, the click matrix represents static data and cannot represent the dynamic preferences of users over time. Therefore, we propose DINK, a knowledge graph-based deep interest exploration network, to extract users’ dynamic interests. DINK can be divided into a knowledge graph embedding layer, an interest exploration layer, and a recommendation layer. The embedding layer expands the receptive field of the user’s click sequence through the knowledge graph, the interest exploration layer combines the GRU and the attention mechanism to explore the user’s dynamic interest, and the recommendation layer completes the prediction task. We demonstrate the effectiveness of DINK by conducting extensive experiments on three public datasets. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

11 pages, 1977 KiB  
Article
A Study on the Prediction of Electrical Energy in Food Storage Using Machine Learning
by Sangoh Kim
Appl. Sci. 2023, 13(1), 346; https://doi.org/10.3390/app13010346 - 27 Dec 2022
Cited by 5 | Viewed by 2568
Abstract
This study discusses methods for the sustainability of freezers used in frozen storage methods known as long-term food storage methods. Freezing preserves the quality of food for a long time. However, it is inevitable to use a freezer that uses a large amount [...] Read more.
This study discusses methods for the sustainability of freezers used in frozen storage methods known as long-term food storage methods. Freezing preserves the quality of food for a long time. However, it is inevitable to use a freezer that uses a large amount of electricity to store food with this method. To maintain the quality of food, lower temperatures are required, and therefore more electrical energy must be used. In this study, machine learning was performed using data obtained through a freezer test, and an optimal inference model was obtained with this data. If the inference model is applied to the selection of freezer control parameters, it turns out that optimal food storage is possible using less electrical energy. In this paper, a method for obtaining a dataset for machine learning in a deep freezer and the process of performing SLP and MLP machine learning through the obtained dataset are described. In addition, a method for finding the optimal efficiency is presented by comparing the performances of the inference models obtained in each method. The application of such a development method can reduce electrical energy in the food manufacturing equipment related industry, and accordingly it will be possible to achieve carbon emission reductions. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

12 pages, 2635 KiB  
Brief Report
A Machine Learning Approach for the Forecasting of Computing Resource Requirements in Integrated Circuit Simulation
by Yue Wu, Hua Chen, Min Zhou and Faxin Yu
Electronics 2023, 12(1), 95; https://doi.org/10.3390/electronics12010095 - 26 Dec 2022
Cited by 1 | Viewed by 1625
Abstract
For the iterative development of the chip, ensuring that the simulation is completed in the shortest time is critical. To meet this demand, the common practice is to reduce simulation time by providing more computing resources. However, this acceleration method has an upper [...] Read more.
For the iterative development of the chip, ensuring that the simulation is completed in the shortest time is critical. To meet this demand, the common practice is to reduce simulation time by providing more computing resources. However, this acceleration method has an upper limit. After reaching the upper limit, providing more CPUs can no longer shorten the simulation time, but will instead waste a lot of computing resources. Unfortunately, the recommended values of the existing commercial tools are often higher than this upper limit. To better match this limit, a machine learning optimization algorithm trained with a custom loss function is proposed. Experimental results demonstrate that the proposed algorithm is superior to commercial tools in terms of both accuracy and stability. In addition, the simulations using the resources predicted by the proposed model maintain the same simulation completion time while reducing core hour consumption by approximately 30%. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

20 pages, 13022 KiB  
Article
Citrus Tree Crown Segmentation of Orchard Spraying Robot Based on RGB-D Image and Improved Mask R-CNN
by Peichao Cong, Jiachao Zhou, Shanda Li, Kunfeng Lv and Hao Feng
Appl. Sci. 2023, 13(1), 164; https://doi.org/10.3390/app13010164 - 23 Dec 2022
Cited by 9 | Viewed by 2395
Abstract
Orchard spraying robots must visually obtain citrus tree crown growth information to meet the variable growth-stage-based spraying requirements. However, the complex environments and growth characteristics of fruit trees affect the accuracy of crown segmentation. Therefore, we propose a feature-map-based squeeze-and-excitation UNet++ (MSEU) region-based [...] Read more.
Orchard spraying robots must visually obtain citrus tree crown growth information to meet the variable growth-stage-based spraying requirements. However, the complex environments and growth characteristics of fruit trees affect the accuracy of crown segmentation. Therefore, we propose a feature-map-based squeeze-and-excitation UNet++ (MSEU) region-based convolutional neural network (R-CNN) citrus tree crown segmentation method that intakes red–green–blue-depth (RGB-D) images that are pixel aligned and visual distance-adjusted to eliminate noise. Our MSEU R-CNN achieves accurate crown segmentation using squeeze-and-excitation (SE) and UNet++. To fully fuse the feature map information, the SE block correlates image features and recalibrates their channel weights, and the UNet++ semantic segmentation branch replaces the original mask structure to maximize the interconnectivity between feature layers, achieving a near-real time detection speed of 5 fps. Its bounding box (bbox) and segmentation (seg) AP50 scores are 96.6 and 96.2%, respectively, and the bbox average recall and F1-score are 73.0 and 69.4%, which are 3.4, 2.4, 4.9, and 3.5% higher than the original model, respectively. Compared with bbox instant segmentation (BoxInst) and conditional convolutional frameworks (CondInst), the MSEU R-CNN provides better seg accuracy and speed than the previous-best Mask R-CNN. These results provide the means to accurately employ autonomous spraying robots. Full article
(This article belongs to the Topic Machine and Deep Learning)
(This article belongs to the Section Agricultural Science and Technology)
Show Figures

Figure 1

14 pages, 483 KiB  
Article
An Efficient Hidden Markov Model with Periodic Recurrent Neural Network Observer for Music Beat Tracking
by Guangxiao Song and Zhijie Wang
Electronics 2022, 11(24), 4186; https://doi.org/10.3390/electronics11244186 - 14 Dec 2022
Cited by 7 | Viewed by 1870
Abstract
In music information retrieval (MIR), beat tracking is one of the most fundamental tasks. To obtain this critical component from rhythmic music signals, a previous beat tracking system of hidden Markov model (HMM) with a recurrent neural network (RNN) observer was developed. Although [...] Read more.
In music information retrieval (MIR), beat tracking is one of the most fundamental tasks. To obtain this critical component from rhythmic music signals, a previous beat tracking system of hidden Markov model (HMM) with a recurrent neural network (RNN) observer was developed. Although the frequency of music beat is quite stable, existing HMM based methods do not take this feature into account. Accordingly, most of hidden states in these HMM-based methods are redundant, which is a disadvantage for time efficiency. In this paper, we proposed an efficient HMM using hidden states by exploiting the frequency contents of the neural network’s observation with Fourier transform, which extremely reduces the computational complexity. Observers that previous works used, such as bi-directional recurrent neural network (Bi-RNN) and temporal convolutional network (TCN), cannot perceive the frequency of music beat. To obtain more reliable frequencies from music, a periodic recurrent neural network (PRNN) based on attention mechanism is proposed as well, which is used as the observer in HMM. Experimental results on open source music datasets, such as GTZAN, Hainsworth, SMC, and Ballroom, show that our efficient HMM with PRNN is competitive to the state-of-the-art methods and has lower computational cost. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

21 pages, 5165 KiB  
Article
Remaining Useful Life Prediction Using Dual-Channel LSTM with Time Feature and Its Difference
by Cheng Peng, Jiaqi Wu, Qilong Wang, Weihua Gui and Zhaohui Tang
Entropy 2022, 24(12), 1818; https://doi.org/10.3390/e24121818 - 13 Dec 2022
Cited by 10 | Viewed by 2445
Abstract
At present, the research on the prediction of the remaining useful life (RUL) of machinery mainly focuses on multi-sensor feature extraction and then uses the features to predict RUL. In complex operations and multiple abnormal environments, the impact of noise may result in [...] Read more.
At present, the research on the prediction of the remaining useful life (RUL) of machinery mainly focuses on multi-sensor feature extraction and then uses the features to predict RUL. In complex operations and multiple abnormal environments, the impact of noise may result in increased model complexity and decreased accuracy of RUL predictions. At the same time, how to use the sensor characteristics of time is also a problem. To overcome these issues, this paper proposes a dual-channel long short-term memory (LSTM) neural network model. Compared with the existing methods, the advantage of this method is to adaptively select the time feature and then perform first-order processing on the time feature value and use LSTM to extract the time feature and first-order time feature information. As the RUL curve predicted by the neural network is zigzag, we creatively designed a momentum-smoothing module to smooth the predicted RUL curve and improve the prediction accuracy. Experimental verification on the commercial modular aerospace propulsion system simulation (C-MAPSS) dataset proves the effectiveness and stability of the proposed method. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

13 pages, 543 KiB  
Article
Convolution Based Graph Representation Learning from the Perspective of High Order Node Similarities
by Xing Li, Qingsong Li, Wei Wei and Zhiming Zheng
Mathematics 2022, 10(23), 4586; https://doi.org/10.3390/math10234586 - 3 Dec 2022
Viewed by 1353
Abstract
Nowadays, graph representation learning methods, in particular graph neural network methods, have attracted great attention and performed well in many downstream tasks. However, most graph neural network methods have a single perspective since they start from the edges (or adjacency matrix) of graphs, [...] Read more.
Nowadays, graph representation learning methods, in particular graph neural network methods, have attracted great attention and performed well in many downstream tasks. However, most graph neural network methods have a single perspective since they start from the edges (or adjacency matrix) of graphs, ignoring the mesoscopic structure (high-order local structure). In this paper, we introduce HS-GCN (High-order Node Similarity Graph Convolutional Network), which can mine the potential structural features of graphs from different perspectives by combining multiple high-order node similarity methods. We analyze HS-GCN theoretically and show that it is a generalization of the convolution-based graph neural network methods from different normalization perspectives. A series of experiments have shown that by combining high-order node similarities, our method can capture and utilize the high-order structural information of the graph more effectively, resulting in better results. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

19 pages, 6928 KiB  
Article
Image Fundus Classification System for Diabetic Retinopathy Stage Detection Using Hybrid CNN-DELM
by Dian Candra Rini Novitasari, Fatmawati Fatmawati, Rimuljo Hendradi, Hetty Rohayani, Rinda Nariswari, Arnita Arnita, Moch Irfan Hadi, Rizal Amegia Saputra and Ardhin Primadewi
Big Data Cogn. Comput. 2022, 6(4), 146; https://doi.org/10.3390/bdcc6040146 - 1 Dec 2022
Cited by 7 | Viewed by 2437
Abstract
Diabetic retinopathy is the leading cause of blindness suffered by working-age adults. The increase in the population diagnosed with DR can be prevented by screening and early treatment of eye damage. This screening process can be conducted by utilizing deep learning techniques. In [...] Read more.
Diabetic retinopathy is the leading cause of blindness suffered by working-age adults. The increase in the population diagnosed with DR can be prevented by screening and early treatment of eye damage. This screening process can be conducted by utilizing deep learning techniques. In this study, the detection of DR severity was carried out using the hybrid CNN-DELM method (CDELM). The CNN architectures used were ResNet-18, ResNet-50, ResNet-101, GoogleNet, and DenseNet. The learning outcome features were further classified using the DELM algorithm. The comparison of CNN architecture aimed to find the best CNN architecture for fundus image features extraction. This research also compared the effect of using the kernel function on the performance of DELM in fundus image classification. All experiments using CDELM showed maximum results, with an accuracy of 100% in the DRIVE data and the two-class MESSIDOR data. Meanwhile, the best results obtained in the MESSIDOR 4 class data reached 98.20%. The advantage of the DELM method compared to the conventional CNN method is that the training time duration is much shorter. CNN takes an average of 30 min for training, while the CDELM method takes only an average of 2.5 min. Based on the value of accuracy and duration of training time, the CDELM method had better performance than the conventional CNN method. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

23 pages, 513 KiB  
Article
A Double Penalty Model for Ensemble Learning
by Wenjia Wang and Yi-Hui Zhou
Mathematics 2022, 10(23), 4532; https://doi.org/10.3390/math10234532 - 30 Nov 2022
Viewed by 1409
Abstract
Modern statistical learning techniques often include learning ensembles, for which the combination of multiple separate prediction procedures (ensemble components) can improve prediction accuracy. Although ensemble approaches are widely used, work remains to improve our understanding of the theoretical underpinnings of aspects such as [...] Read more.
Modern statistical learning techniques often include learning ensembles, for which the combination of multiple separate prediction procedures (ensemble components) can improve prediction accuracy. Although ensemble approaches are widely used, work remains to improve our understanding of the theoretical underpinnings of aspects such as identifiability and relative convergence rates of the ensemble components. By considering ensemble learning for two learning ensemble components as a double penalty model, we provide a framework to better understand the relative convergence and identifiability of the two components. In addition, with appropriate conditions the framework provides convergence guarantees for a form of residual stacking when iterating between the two components as a cyclic coordinate ascent procedure. We conduct numerical experiments on three synthetic simulations and two real world datasets to illustrate the performance of our approach, and justify our theory. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

26 pages, 10480 KiB  
Article
Joint Deep Reinforcement Learning and Unsupervised Learning for Channel Selection and Power Control in D2D Networks
by Ming Sun, Yanhui Jin, Shumei Wang and Erzhuang Mei
Entropy 2022, 24(12), 1722; https://doi.org/10.3390/e24121722 - 24 Nov 2022
Cited by 4 | Viewed by 1938
Abstract
Device-to-device (D2D) technology enables direct communication between devices, which can effectively solve the problem of insufficient spectrum resources in 5G communication technology. Since the channels are shared among multiple D2D user pairs, it may lead to serious interference between D2D user pairs. In [...] Read more.
Device-to-device (D2D) technology enables direct communication between devices, which can effectively solve the problem of insufficient spectrum resources in 5G communication technology. Since the channels are shared among multiple D2D user pairs, it may lead to serious interference between D2D user pairs. In order to reduce interference, effectively increase network capacity, and improve wireless spectrum utilization, this paper proposed a distributed resource allocation algorithm with the joint of a deep Q network (DQN) and an unsupervised learning network. Firstly, a DQN algorithm was constructed to solve the channel allocation in the dynamic and unknown environment in a distributed manner. Then, a deep power control neural network with the unsupervised learning strategy was constructed to output an optimized channel power control scheme to maximize the spectrum transmit sum-rate through the corresponding constraint processing. As opposed to traditional centralized approaches that require the collection of instantaneous global network information, the algorithm proposed in this paper used each transmitter as a learning agent to make channel selection and power control through a small amount of state information collected locally. The simulation results showed that the proposed algorithm was more effective in increasing the convergence speed and maximizing the transmit sum-rate than other traditional centralized and distributed algorithms. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

16 pages, 622 KiB  
Article
Taxonomy-Aware Prototypical Network for Few-Shot Relation Extraction
by Mengru Wang, Jianming Zheng and Honghui Chen
Mathematics 2022, 10(22), 4378; https://doi.org/10.3390/math10224378 - 21 Nov 2022
Cited by 1 | Viewed by 1526
Abstract
Relation extraction aims to predict the relation triple between the tail entity and head entity in a given text. A large body of works adopt meta-learning to address the few-shot issue faced by relation extraction, where each relation category only contains few labeled [...] Read more.
Relation extraction aims to predict the relation triple between the tail entity and head entity in a given text. A large body of works adopt meta-learning to address the few-shot issue faced by relation extraction, where each relation category only contains few labeled data for demonstration. Despite promising results achieved by existing meta-learning methods, these methods still struggle to distinguish the subtle differences between different relations with similar expressions. We argue this is largely owing to that these methods cannot capture unbiased and discriminative features in the very few-shot scenario. For alleviating the above problems, we propose a taxonomy-aware prototype network, which consists of a category-aware calibration module and a task-aware training strategy module. The former implicitly and explicitly calibrates the representation of prototype to become sufficiently unbiased and discriminative. The latter balances the weight between easy and hard instances, which enables our proposal to focus on data with more information during the training stage. Finally, comprehensive experiments are conducted on four typical meta tasks. Furthermore, our proposal presents superiority over the competitive baselines with an improvement of 3.30% in terms of average accuracy. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

20 pages, 2833 KiB  
Article
A Novel Drinking Category Detection Method Based on Wireless Signals and Artificial Neural Network
by Jie Zhang, Zhongmin Wang, Kexin Zhou and Ruohan Bai
Entropy 2022, 24(11), 1700; https://doi.org/10.3390/e24111700 - 21 Nov 2022
Viewed by 1842
Abstract
With the continuous improvement of people’s health awareness and the continuous progress of scientific research, consumers have higher requirements for the quality of drinking. Compared with high-sugar-concentrated juice, consumers are more willing to accept healthy and original Not From Concentrated (NFC) juice and [...] Read more.
With the continuous improvement of people’s health awareness and the continuous progress of scientific research, consumers have higher requirements for the quality of drinking. Compared with high-sugar-concentrated juice, consumers are more willing to accept healthy and original Not From Concentrated (NFC) juice and packaged drinking water. At the same time, drinking category detection can be used for vending machine self-checkout. However, the current drinking category systems rely on special equipment, which require professional operation, and also rely on signals that are not widely used, such as radar. This paper introduces a novel drinking category detection method based on wireless signals and artificial neural network (ANN). Unlike past work, our design relies on WiFi signals that are widely used in life. The intuition is that when the wireless signals propagate through the detected target, the signals arrive at the receiver through multiple paths and different drinking categories will result in distinct multipath propagation, which can be leveraged to detect the drinking category. We capture the WiFi signals of detected drinking using wireless devices; then, we calculate channel state information (CSI), perform noise removal and feature extraction, and apply ANN for drinking category detection. Results demonstrate that our design has high accuracy in detecting drinking category. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

16 pages, 1073 KiB  
Article
Adaptive Dynamic Search for Multi-Task Learning
by Eunwoo Kim
Appl. Sci. 2022, 12(22), 11836; https://doi.org/10.3390/app122211836 - 21 Nov 2022
Viewed by 1557
Abstract
Multi-task learning (MTL) is a learning strategy for solving multiple tasks simultaneously while exploiting commonalities and differences between tasks for improved learning efficiency and prediction performance. Despite its potential, there remain several major challenges to be addressed. First of all, the task performance [...] Read more.
Multi-task learning (MTL) is a learning strategy for solving multiple tasks simultaneously while exploiting commonalities and differences between tasks for improved learning efficiency and prediction performance. Despite its potential, there remain several major challenges to be addressed. First of all, the task performance degrades when the number of tasks to solve increases or the tasks are less related. In addition, finding the prediction model for each task is typically laborious and can be suboptimal. This nature of manually designing the architecture further aggravates the problem when it comes to solving multiple tasks under different computational budgets. In this work, we propose a novel MTL approach to address these issues. The proposed method learns to search in a finely modularized base network dynamically and to discover an optimal prediction model for each instance of a task on the fly while taking the computational costs of the discovered models into account. We evaluate our learning framework on a diverse set of MTL scenarios comprising standard benchmark datasets. We achieve significant improvements in performance for all tested cases compared with existing MTL alternatives. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

18 pages, 2567 KiB  
Article
Video Action Recognition Using Motion and Multi-View Excitation with Temporal Aggregation
by Yuri Yudhaswana Joefrie and Masaki Aono
Entropy 2022, 24(11), 1663; https://doi.org/10.3390/e24111663 - 15 Nov 2022
Cited by 1 | Viewed by 1906
Abstract
Spatiotemporal and motion feature representations are the key to video action recognition. Typical previous approaches are to utilize 3D CNNs to cope with both spatial and temporal features, but they suffer from huge computations. Other approaches are to utilize (1+2)D CNNs to learn [...] Read more.
Spatiotemporal and motion feature representations are the key to video action recognition. Typical previous approaches are to utilize 3D CNNs to cope with both spatial and temporal features, but they suffer from huge computations. Other approaches are to utilize (1+2)D CNNs to learn spatial and temporal features in an efficient way, but they neglect the importance of motion representations. To overcome problems with previous approaches, we propose a novel block which makes it possible to alleviate the aforementioned problems, since our block can capture spatial and temporal features more faithfully and efficiently learn motion features. This proposed block includes Motion Excitation (ME), Multi-view Excitation (MvE), and Densely Connected Temporal Aggregation (DCTA). The purpose of ME is to encode feature-level frame differences; MvE is designed to enrich spatiotemporal features with multiple view representations adaptively; and DCTA is to model long-range temporal dependencies. We inject the proposed building block, which we refer to as the META block (or simply “META”), into 2D ResNet-50. Through extensive experiments, we demonstrate that our proposed method architecture outperforms previous CNN-based methods in terms of “Val Top-1 %” measure with Something-Something v1 and Jester datasets, while the META yielded competitive results with the Moment-in-Time Mini dataset. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

13 pages, 2145 KiB  
Article
Feature-Enhanced Document-Level Relation Extraction in Threat Intelligence with Knowledge Distillation
by Yongfei Li, Yuanbo Guo, Chen Fang, Yongjin Hu, Yingze Liu and Qingli Chen
Electronics 2022, 11(22), 3715; https://doi.org/10.3390/electronics11223715 - 13 Nov 2022
Cited by 1 | Viewed by 1461
Abstract
Relation extraction in the threat intelligence domain plays an important role in mining the internal association between crucial threat elements and constructing a knowledge graph (KG). This study designed a novel document-level relation extraction model, FEDRE-KD, integrating additional features to take full advantage [...] Read more.
Relation extraction in the threat intelligence domain plays an important role in mining the internal association between crucial threat elements and constructing a knowledge graph (KG). This study designed a novel document-level relation extraction model, FEDRE-KD, integrating additional features to take full advantage of the information in documents. The study also introduced a teacher–student model, realizing knowledge distillation, to further improve performance. Additionally, a threat intelligence ontology was constructed to standardize the entities and their relationships. To solve the problem of lack of publicly available datasets for threat intelligence, manual annotation was carried out on the documents collected from social blogs, vendor bulletins, and hacking forums. After training the model, we constructed a threat intelligence knowledge graph in Neo4j. Experimental results indicate the effectiveness of additional features and knowledge distillation. Compared to mainstream models SSAN, GAIN, and ATLOP, FEDRE-KD improved the F1score by 22.07, 20.06, and 22.38, respectively. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

14 pages, 3710 KiB  
Article
Prediction of Prospecting Target Based on ResNet Convolutional Neural Network
by Le Gao, Yongjie Huang, Xin Zhang, Qiyuan Liu and Zequn Chen
Appl. Sci. 2022, 12(22), 11433; https://doi.org/10.3390/app122211433 - 11 Nov 2022
Cited by 8 | Viewed by 2018
Abstract
In recent years, with the development of geological prospecting from shallow ore to deep and hidden ore, the difficulty of prospecting is increasing day by day, so the application of computer technology and new methods of geological and mineral exploration is paid more [...] Read more.
In recent years, with the development of geological prospecting from shallow ore to deep and hidden ore, the difficulty of prospecting is increasing day by day, so the application of computer technology and new methods of geological and mineral exploration is paid more and more attention. The mining and prediction of geological prospecting information based on deep learning have become the frontier field of earth science. However, as a deep artificial intelligence algorithm, deep learning still has many problems to be solved in the big data mining and prediction of geological prospecting, such as the small number of training samples of geological and mineral images, the difficulty of building deep learning network models, and the universal applicability of deep learning models. In this paper, the training samples and convolutional neural network models suitable for geochemical element data mining are constructed to solve the above problems, and the model is successfully applied to the prediction research of gold, silver, lead and zinc polymetallic metallogenic areas in South China. Taking the Pangxidong research area in the west of Guangdong Province as an example, this paper carries out prospecting target prediction research based on a 1:50000 stream sediment survey original data. Firstly, the support vector machine (SVM) model and statistical method were used to determine the ore-related geochemical element assemblage. Secondly, the experimental data of geochemical elements were augmented and a dataset was established. Finally, ResNet-50 neural network model is used for data training and prediction research. The experimental results show that the areas numbered 9, 29, 38, 40, 95, 111, 114, 124, 144 have great metallogenic potential, and this method would be a promising tool for metallogenic prediction. By applying the ResNet-50 neural network in metallogenic prediction, it can provide a new idea for the future exploration of mineral resources. In order to verify the generality of the research method in this paper, we conducted experimental tests on the geochemical dataset of B area, another deposit research area in South China. The results show that 100% of the prediction area obtained by using the proposed method covers the known ore deposit area. This model also provides method support for further delineating the prospecting target area in study area B. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

18 pages, 6014 KiB  
Case Report
Comparative Study of Mortality Rate Prediction Using Data-Driven Recurrent Neural Networks and the Lee–Carter Model
by Yuan Chen and Abdul Q. M. Khaliq
Big Data Cogn. Comput. 2022, 6(4), 134; https://doi.org/10.3390/bdcc6040134 - 10 Nov 2022
Cited by 5 | Viewed by 2796
Abstract
The Lee–Carter model could be considered as one of the most important mortality prediction models among stochastic models in the field of mortality. With the recent developments of machine learning and deep learning, many studies have applied deep learning approaches to time series [...] Read more.
The Lee–Carter model could be considered as one of the most important mortality prediction models among stochastic models in the field of mortality. With the recent developments of machine learning and deep learning, many studies have applied deep learning approaches to time series mortality rate predictions, but most of them only focus on a comparison between the Long Short-Term Memory and the traditional models. In this study, three different recurrent neural networks, Long Short-Term Memory, Bidirectional Long Short-Term Memory, and Gated Recurrent Unit, are proposed for the task of mortality rate prediction. Different from the standard country level mortality rate comparison, this study compares the three deep learning models and the classic Lee–Carter model on nine divisions’ yearly mortality data by gender from 1966 to 2015 in the United States. With the out-of-sample testing, we found that the Gated Recurrent Unit model showed better average MAE and RMSE values than the Lee–Carter model on 72.2% (13/18) and 67.7% (12/18) of the database, respectively, while the same measure for the Long Short-Term Memory model and Bidirectional Long Short-Term Memory model are 50%/38.9% (MAE/RMSE) and 61.1%/61.1% (MAE/RMSE), respectively. If we consider forecasting accuracy, computing expense, and interpretability, the Lee–Carter model with ARIMA exhibits the best overall performance, but the recurrent neural networks could also be good candidates for mortality forecasting for divisions in the United States. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

14 pages, 2137 KiB  
Article
A Transfer Learning for Line-Based Portrait Sketch
by Hyungbum Kim, Junyoung Oh and Heekyung Yang
Mathematics 2022, 10(20), 3869; https://doi.org/10.3390/math10203869 - 18 Oct 2022
Cited by 2 | Viewed by 3178
Abstract
This paper presents a transfer learning-based framework that produces line-based portrait sketch images from portraits. The proposed framework produces sketch images using a GAN architecture, which is trained through a pseudo-sketch image dataset. The pseudo-sketch image dataset is constructed from a single artist-created [...] Read more.
This paper presents a transfer learning-based framework that produces line-based portrait sketch images from portraits. The proposed framework produces sketch images using a GAN architecture, which is trained through a pseudo-sketch image dataset. The pseudo-sketch image dataset is constructed from a single artist-created portrait sketch using a style transfer model with a series of postprocessing schemes. The proposed framework successfully produces portrait sketch images for portraits of various poses, expressions and illuminations. The excellence of the proposed model is proved by comparing the produced results with those from the existing works. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

20 pages, 8722 KiB  
Article
A GAN-Based Face Rotation for Artistic Portraits
by Handong Kim, Junho Kim and Heekyung Yang
Mathematics 2022, 10(20), 3860; https://doi.org/10.3390/math10203860 - 18 Oct 2022
Cited by 2 | Viewed by 5418
Abstract
We present a GAN-based model that rotates the faces in artistic portraits to various angles. We build a dataset of artistic portraits for training our GAN-based model by applying a 3D face model to the artistic portraits. We also devise proper loss functions [...] Read more.
We present a GAN-based model that rotates the faces in artistic portraits to various angles. We build a dataset of artistic portraits for training our GAN-based model by applying a 3D face model to the artistic portraits. We also devise proper loss functions to preserve the styles in the artistic portraits as well as to rotate the faces in the portraits to proper angles. These approaches enable us to construct a GAN-based face rotation model. We apply this model to various artistic portraits, including photorealistic oil paint portraits, watercolor portraits, well-known portrait artworks and banknote portraits, and produce convincing rotated faces in the artistic portraits. Finally, we prove that our model can produce improved results compared with the existing models by evaluating the similarity and the angles of the rotated faces through evaluation schemes including FID estimation, recognition ratio estimation, pose estimation and user study. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Graphical abstract

14 pages, 3464 KiB  
Article
Hydrogen Storage Prediction in Dibenzyltoluene as Liquid Organic Hydrogen Carrier Empowered with Weighted Federated Machine Learning
by Ahsan Ali, Muhammad Adnan Khan and Hoimyung Choi
Mathematics 2022, 10(20), 3846; https://doi.org/10.3390/math10203846 - 17 Oct 2022
Cited by 7 | Viewed by 2217
Abstract
The hydrogen stored in liquid organic hydrogen carriers (LOHCs) has an advantage of safe and convenient hydrogen storage system. Dibenzyltoluene (DBT), due to its low flammability, liquid nature and high hydrogen storage capacity, is an efficient LOHC system. It is imperative to indicate [...] Read more.
The hydrogen stored in liquid organic hydrogen carriers (LOHCs) has an advantage of safe and convenient hydrogen storage system. Dibenzyltoluene (DBT), due to its low flammability, liquid nature and high hydrogen storage capacity, is an efficient LOHC system. It is imperative to indicate the optimal reaction conditions to achieve the theoretical hydrogen storage density. Hence, a Hydrogen Storage Prediction System empowered with Weighted Federated Machine Learning (HSPS-WFML) is proposed in this study. The dataset were divided into three classes, i.e., low, medium and high, and the performance of the proposed HSPS-WFML was investigated. The accuracy of the medium class is higher (99.90%) than other classes. The accuracy of the low and high class is 96.50% and 96.40%, respectively. Moreover, the overall accuracy and miss rate of the proposed HSPS-WFML are 96.40% and 3.60%, respectively. Our proposed model is compared with existing studies related to hydrogen storage prediction, and its accuracy is found in agreement with these studies. Therefore, the proposed HSPS-WFML is an efficient model for hydrogen storage prediction. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

29 pages, 3781 KiB  
Article
Reservoir Prediction Model via the Fusion of Optimized Long Short-Term Memory Network (LSTM) and Bidirectional Random Vector Functional Link (RVFL)
by Guodong Li, Yongke Pan and Pu Lan
Electronics 2022, 11(20), 3343; https://doi.org/10.3390/electronics11203343 - 17 Oct 2022
Viewed by 1337
Abstract
An accurate and stable reservoir prediction model is essential for oil location and production. We propose an predictive hybrid model ILSTM-BRVFL based on an improved long short-term memory network (IAOS-LSTM) and a bidirectional random vector functional link (Bidirectional-RVFL) for this problem. Firstly, the [...] Read more.
An accurate and stable reservoir prediction model is essential for oil location and production. We propose an predictive hybrid model ILSTM-BRVFL based on an improved long short-term memory network (IAOS-LSTM) and a bidirectional random vector functional link (Bidirectional-RVFL) for this problem. Firstly, the Atomic Orbit Search algorithm (AOS) is used to perform collective optimization of the parameters to improve the stability and accuracy of the LSTM model for high-dimensional feature extraction. At the same time, there is still room to improve the optimization capability of the AOS. Therefore, an improvement scheme to further enhance the optimization capability is proposed. Then, the LSTM-extracted high-dimensional features are fed into the random vector functional link (RVFL) to improve the prediction of high-dimensional features by the RVFL, which is modified as the bidirectional RVFL. The proposed ILSTM-BRVFL (IAOS) model achieves an average prediction accuracy of 95.28%, compared to the experimental results. The model’s accuracy, recall values, and F1 values also showed good performance, and the prediction ability achieved the expected results. The comparative analysis and the degree of improvement in the model results show that the high-dimensional extraction of the input data by LSTM is the most significant improvement in prediction accuracy. Secondly, it introduces a double-ended mechanism for IAOS to LSTM and RVFL for parameter search. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

16 pages, 1297 KiB  
Article
Enhanced Sample Self-Revised Network for Cross-Dataset Facial Expression Recognition
by Xiaolin Xu, Yuan Zong, Cheng Lu and Xingxun Jiang
Entropy 2022, 24(10), 1475; https://doi.org/10.3390/e24101475 - 17 Oct 2022
Cited by 1 | Viewed by 1743
Abstract
Recently, cross-dataset facial expression recognition (FER) has obtained wide attention from researchers. Thanks to the emergence of large-scale facial expression datasets, cross-dataset FER has made great progress. Nevertheless, facial images in large-scale datasets with low quality, subjective annotation, severe occlusion, and rare subject [...] Read more.
Recently, cross-dataset facial expression recognition (FER) has obtained wide attention from researchers. Thanks to the emergence of large-scale facial expression datasets, cross-dataset FER has made great progress. Nevertheless, facial images in large-scale datasets with low quality, subjective annotation, severe occlusion, and rare subject identity can lead to the existence of outlier samples in facial expression datasets. These outlier samples are usually far from the clustering center of the dataset in the feature space, thus resulting in considerable differences in feature distribution, which severely restricts the performance of most cross-dataset facial expression recognition methods. To eliminate the influence of outlier samples on cross-dataset FER, we propose the enhanced sample self-revised network (ESSRN) with a novel outlier-handling mechanism, whose aim is first to seek these outlier samples and then suppress them in dealing with cross-dataset FER. To evaluate the proposed ESSRN, we conduct extensive cross-dataset experiments across RAF-DB, JAFFE, CK+, and FER2013 datasets. Experimental results demonstrate that the proposed outlier-handling mechanism can reduce the negative impact of outlier samples on cross-dataset FER effectively and our ESSRN outperforms classic deep unsupervised domain adaptation (UDA) methods and the recent state-of-the-art cross-dataset FER results. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

12 pages, 915 KiB  
Article
Lipreading Using Liquid State Machine with STDP-Tuning
by Xuhu Yu, Zhong Wan, Zehao Shi and Lei Wang
Appl. Sci. 2022, 12(20), 10484; https://doi.org/10.3390/app122010484 - 17 Oct 2022
Cited by 2 | Viewed by 2132
Abstract
Lipreading refers to the task of decoding the text content of a speaker based on visual information about the movement of the speaker’s lips. With the development of deep learning in recent years, lipreading has attracted extensive research. However, the deep learning method [...] Read more.
Lipreading refers to the task of decoding the text content of a speaker based on visual information about the movement of the speaker’s lips. With the development of deep learning in recent years, lipreading has attracted extensive research. However, the deep learning method requires a lot of computing resources, which is not conducive to the migration of the system to edge devices. Inspired by the work of Spiking Neural Networks (SNNs) in recognizing human actions and gestures, we propose a lipreading system based on SNNs. Specifically, we construct the front-end feature extractor of the system using Liquid State Machine (LSM). On the other hand, a heuristic algorithm is used to select appropriate parameters for the classifier in the backend. On small-scale lipreading datasets, our recognition accuracy achieves good results. We claim that our network performs better in terms of accuracy and ratio of learned parameters compared to other networks, and has superior advantages in terms of network complexity and training cost. On the AVLetters dataset, our model achieves a 5% improvement in accuracy over traditional methods and a 90% reduction in parameters over the state-of-the-art. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

24 pages, 5704 KiB  
Article
PN-BBN: A Petri Net-Based Bayesian Network for Anomalous Behavior Detection
by Ke Lu, Xianwen Fang and Na Fang
Mathematics 2022, 10(20), 3790; https://doi.org/10.3390/math10203790 - 14 Oct 2022
Cited by 3 | Viewed by 1717
Abstract
Business process anomalous behavior detection reveals unexpected cases from event logs to ensure the trusted operation of information systems. Anomaly behavior is mainly identified through a log-to-model alignment analysis or numerical outlier detection. However, both approaches ignore the influence of probability distributions or [...] Read more.
Business process anomalous behavior detection reveals unexpected cases from event logs to ensure the trusted operation of information systems. Anomaly behavior is mainly identified through a log-to-model alignment analysis or numerical outlier detection. However, both approaches ignore the influence of probability distributions or activity relationships in process activities. Based on this concern, this paper incorporates the behavioral relationships characterized by the process model and the joint probability distribution of nodes related to suspected anomalous behaviors. Moreover, a Petri Net-Based Bayesian Network (PN-BBN) is proposed to detect anomalous behaviors based on the probabilistic inference of behavioral contexts. First, the process model is filtered based on the process structure of the process activities to identify the key regions where the suspected anomalous behaviors are located. Then, the behavioral profile of the activity is used to prune it to position the ineluctable paths that trigger these activities. Further, the model is used as the architecture for parameter learning to construct the PN-BBN. Based on this, anomaly scores are inferred based on the joint probabilities of activities related to suspected anomalous behaviors for anomaly detection under the constraints of control flow and probability distributions. Finally, PN-BBN is implemented based on the open-source frameworks PM4PY and PMGPY and evaluated from multiple metrics with synthetic and real process data. The experimental results demonstrate that PN-BBN effectively identifies anomalous process behaviors and improves the reliability of information systems. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

17 pages, 4844 KiB  
Article
MobileNetV2 Combined with Fast Spectral Kurtosis Analysis for Bearing Fault Diagnosis
by Tian Xue, Huaiguang Wang and Dinghai Wu
Electronics 2022, 11(19), 3176; https://doi.org/10.3390/electronics11193176 - 3 Oct 2022
Cited by 4 | Viewed by 1880
Abstract
Bearings are an important component in mechanical equipment, and their health detection and fault diagnosis are of great significance. In order to meet the speed and recognition accuracy requirements of bearing fault diagnosis, this paper uses the lightweight MobileNetV2 network combined with fast [...] Read more.
Bearings are an important component in mechanical equipment, and their health detection and fault diagnosis are of great significance. In order to meet the speed and recognition accuracy requirements of bearing fault diagnosis, this paper uses the lightweight MobileNetV2 network combined with fast spectral kurtosis to diagnose bearing faults. On the basis of the original MobileNetV2 network, a progressive classifier is used to compress the feature information layer by layer with the network structure to achieve high-precision and rapid identification and classification. A cross-local connection structure is added to the network to increase the extracted feature information to improve accuracy. At the same time, the original fault signal of the bearing is a one-dimensional vibration signal, and the signal contains a large number of non-Gaussian noise and accidental shock defects. In order to extract fault features more efficiently, this paper uses the fast spectral kurtosis algorithm to process the signal, extract the center frequency of the original signal, and calculate the spectral kurtosis value. The kurtosis map generated by signal preprocessing is used as the input of the MobileNetV2 network for fault classification. In order to verify the effectiveness and generality of the proposed method, this paper uses the XJTU-SY bearing fault dataset and the CWRU bearing dataset to conduct experiments. Through data preprocessing methods, such as data expansion for different fault types in the original dataset, input data that meet the experimental requirements are generated and fault diagnosis experiments are carried out. At the same time, through the comparison with other typical classification networks, the paper proves that the proposed method has significant advantages in terms of accuracy, model size, training speed, etc., and, finally, proves the effectiveness and generality of the proposed network model in the field of fault diagnosis. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

19 pages, 5104 KiB  
Article
Automatic Medical Face Mask Detection Based on Cross-Stage Partial Network to Combat COVID-19
by Christine Dewi and Rung-Ching Chen
Big Data Cogn. Comput. 2022, 6(4), 106; https://doi.org/10.3390/bdcc6040106 - 30 Sep 2022
Cited by 11 | Viewed by 2333
Abstract
According to the World Health Organization (WHO), the COVID-19 coronavirus pandemic has resulted in a worldwide public health crisis. One effective method of protection is to use a mask in public places. Recent advances in object detection, which are based on deep learning [...] Read more.
According to the World Health Organization (WHO), the COVID-19 coronavirus pandemic has resulted in a worldwide public health crisis. One effective method of protection is to use a mask in public places. Recent advances in object detection, which are based on deep learning models, have yielded promising results in terms of finding objects in images. Annotating and finding medical face mask objects in real-life images is the aim of this paper. While in public places, people can be protected from the transmission of COVID-19 between themselves by wearing medical masks made of medical materials. Our works employ Yolo V4 CSP SPP to identify the medical mask. Our experiment combined the Face Mask Dataset (FMD) and Medical Mask Dataset (MMD) into one dataset to investigate through this study. The proposed model improves the detection performance of the previous research study with FMD and MMD datasets from 81% to 99.26%. We have shown that our proposed Yolo V4 CSP SPP model scheme is an accurate mechanism for identifying medically masked faces. Each algorithm conducts a comprehensive analysis of, and provides a detailed description of, the benefits that come with using Cross Stage Partial (CSP) and Spatial Pyramid Pooling (SPP). Furthermore, after the study, a comparison between the findings and those of similar works has been provided. In terms of accuracy and precision, the suggested detector surpassed earlier works. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

14 pages, 1262 KiB  
Article
An Asymmetric Contrastive Loss for Handling Imbalanced Datasets
by Valentino Vito and Lim Yohanes Stefanus
Entropy 2022, 24(9), 1303; https://doi.org/10.3390/e24091303 - 15 Sep 2022
Cited by 4 | Viewed by 2497
Abstract
Contrastive learning is a representation learning method performed by contrasting a sample to other similar samples so that they are brought closely together, forming clusters in the feature space. The learning process is typically conducted using a two-stage training architecture, and it utilizes [...] Read more.
Contrastive learning is a representation learning method performed by contrasting a sample to other similar samples so that they are brought closely together, forming clusters in the feature space. The learning process is typically conducted using a two-stage training architecture, and it utilizes the contrastive loss (CL) for its feature learning. Contrastive learning has been shown to be quite successful in handling imbalanced datasets, in which some classes are overrepresented while some others are underrepresented. However, previous studies have not specifically modified CL for imbalanced datasets. In this work, we introduce an asymmetric version of CL, referred to as ACL, in order to directly address the problem of class imbalance. In addition, we propose the asymmetric focal contrastive loss (AFCL) as a further generalization of both ACL and focal contrastive loss (FCL). The results on the imbalanced FMNIST and ISIC 2018 datasets show that the AFCL is capable of outperforming the CL and FCL in terms of both weighted and unweighted classification accuracies. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

17 pages, 4251 KiB  
Article
STSM: Spatio-Temporal Shift Module for Efficient Action Recognition
by Zhaoqilin Yang, Gaoyun An and Ruichen Zhang
Mathematics 2022, 10(18), 3290; https://doi.org/10.3390/math10183290 - 10 Sep 2022
Cited by 5 | Viewed by 1902
Abstract
The modeling, computational complexity, and accuracy of spatio-temporal models are the three major foci in the field of video action recognition. The traditional 2D convolution has low computational complexity, but it cannot capture the temporal relationships. Although the 3D convolution can obtain good [...] Read more.
The modeling, computational complexity, and accuracy of spatio-temporal models are the three major foci in the field of video action recognition. The traditional 2D convolution has low computational complexity, but it cannot capture the temporal relationships. Although the 3D convolution can obtain good performance, it is with both high computational complexity and a large number of parameters. In this paper, we propose a plug-and-play Spatio-Temporal Shift Module (STSM), which is a both effective and high-performance module. STSM can be easily inserted into other networks to increase or enhance the ability of the network to learn spatio-temporal features, effectively improving performance without increasing the number of parameters and computational complexity. In particular, when 2D CNNs and STSM are integrated, the new network may learn spatio-temporal features and outperform networks based on 3D convolutions. We revisit the shift operation from the perspective of matrix algebra, i.e., the spatio-temporal shift operation is a convolution operation with a sparse convolution kernel. Furthermore, we extensively evaluate the proposed module on Kinetics-400 and Something-Something V2 datasets. The experimental results show the effectiveness of the proposed STSM, and the proposed action recognition networks may also achieve state-of-the-art results on the two action recognition benchmarks. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

14 pages, 3010 KiB  
Article
Micro-Expression Recognition Using Uncertainty-Aware Magnification-Robust Networks
by Mengting Wei, Yuan Zong, Xingxun Jiang, Cheng Lu and Jiateng Liu
Entropy 2022, 24(9), 1271; https://doi.org/10.3390/e24091271 - 9 Sep 2022
Cited by 2 | Viewed by 1840
Abstract
A micro-expression (ME) is a kind of involuntary facial expressions, which commonly occurs with subtle intensity. The accurately recognition ME, a. k. a. micro-expression recognition (MER), has a number of potential applications, e.g., interrogation and clinical diagnosis. Therefore, the subject has received a [...] Read more.
A micro-expression (ME) is a kind of involuntary facial expressions, which commonly occurs with subtle intensity. The accurately recognition ME, a. k. a. micro-expression recognition (MER), has a number of potential applications, e.g., interrogation and clinical diagnosis. Therefore, the subject has received a high level of attention among researchers in affective computing and pattern recognition communities. In this paper, we proposed a straightforward and effective deep learning method called uncertainty-aware magnification-robust networks (UAMRN) for MER, which attempts to address two key issues in MER including the low intensity of ME and imbalance of ME samples. Specifically, to better distinguish subtle ME movements, we reconstructed a new sequence by magnifying the ME intensity. Furthermore, a sparse self-attention (SSA) block was implemented which rectifies the standard self-attention with locality sensitive hashing (LSH), resulting in the suppression of artefacts generated during magnification. On the other hand, for the class imbalance problem, we guided the network optimization based on the confidence about the estimation, through which the samples from rare classes were allotted greater uncertainty and thus trained more carefully. We conducted the experiments on three public ME databases, i.e., CASME II, SAMM and SMIC-HS, the results of which demonstrate improvement compared to recent state-of-the-a