MDPI - Publisher of Open Access Journals

28 pages, 7500 KiB

Open AccessArticle

Lightweight Multi-Head MambaOut with CosTaylorFormer for Hyperspectral Image Classification

by Yi Liu, Yanjun Zhang and Jianhong Zhang

Remote Sens. 2025, 17(11), 1864; https://doi.org/10.3390/rs17111864 - 27 May 2025

Viewed by 372

Unmanned aerial vehicles (UAVs) equipped with hyperspectral hardware systems are widely used in urban planning and land classification. However, hyperspectral sensors generate large volumes of data that are rich in both spatial and spectral information, making its efficient processing in resource-constrained devices challenging. [...] Read more.

Unmanned aerial vehicles (UAVs) equipped with hyperspectral hardware systems are widely used in urban planning and land classification. However, hyperspectral sensors generate large volumes of data that are rich in both spatial and spectral information, making its efficient processing in resource-constrained devices challenging. While transformers have been widely adopted for hyperspectral image classification due to their global feature extraction capabilities, their quadratic computational complexity limits their applicability for resource-constrained devices. To address this limitation and enable the real-time processing of hyperspectral data on UAVs, we propose a lightweight multi-head MambaOut with a CosTaylorFormer (LMHMambaOut-CosTaylorFormer). First, 3D-2D CNN is used to extract both spatial and spectral shallow features from hyperspectral images. Following this, one branch employs a linear transformer, CosTaylorFormer, to extract global spectral information. More specifically, we propose CosTaylorFormer with a cosine function, adjusting the weights based on the spectral curve distribution, which is more conducive to establishing long-distance spectral dependencies. Meanwhile, compared with other linearized transformers, the CosTaylorFormer we propose better improves model performance. For the other branch, we propose multi-head MambaOut to extract global spatial features and enhance the network classification effect. Moreover, a dynamic information fusion strategy is proposed to adaptively fuse spatial and spectral information. The proposed network is validated on four datasets (IP, WHU-Longkou, SA, and PU) and compared with several models, demonstrating its superior classification accuracy; however, the number of model parameters is only 0.22 M, thus achieving better balance between model complexity and accuracy. Full article

(This article belongs to the Special Issue 3D Information Recovery and 2D Image Processing for Remotely Sensed Optical Images (Third Edition))

► Show Figures

Figure 1

20 pages, 2611 KiB

Open AccessArticle

Focal Cosine-Enhanced EfficientNetB0: A Novel Approach to Classifying Breast Histopathological Images

by Min Liu, Yuzhen Pei, Minghu Wu and Juan Wang

Information 2025, 16(6), 444; https://doi.org/10.3390/info16060444 - 27 May 2025

Cited by 1 | Viewed by 520

Abstract

Early and accurate breast cancer diagnosis is critical in enhancing patient survival rates, with histopathological image analysis serving as a key diagnostic tool. To address challenges in breast histopathology image analysis, including multi-magnification characteristics, insufficient feature extraction in traditional CNNs, and high inter-class [...] Read more.

Early and accurate breast cancer diagnosis is critical in enhancing patient survival rates, with histopathological image analysis serving as a key diagnostic tool. To address challenges in breast histopathology image analysis, including multi-magnification characteristics, insufficient feature extraction in traditional CNNs, and high inter-class similarity coupled with significant intra-class variation among tumor subtypes, this work proposes a focal cosine-enhanced EfficientNetB0 (FCE-EfficientNetB0) classification model. The framework incorporates a multiscale efficient attention mechanism into a multiscale efficient mobile inverted bottleneck conv, where parallel 1D convolutional branches extract features across magnification levels, while the attention mechanism prioritizes clinically relevant patterns. A focal cosine hybrid loss function further optimizes classification by enlarging interclass distances and reducing intraclass variations in the feature space. Experimental results demonstrate state-of-the-art performance, with the model achieving 99.34% accuracy for benign/malignant classification and 95.97% accuracy for eight-subtype classification on the BreakHis dataset, confirming its effectiveness in breast cancer histopathology analysis. Full article

► Show Figures

Figure 1

14 pages, 1816 KiB

Open AccessArticle

Cosine Distance Loss for Open-Set Image Recognition

by Xiaolin Li, Binbin Chen, Jianxiang Li, Shuwu Chen and Shiguo Huang

Electronics 2025, 14(1), 180; https://doi.org/10.3390/electronics14010180 - 4 Jan 2025

Viewed by 966

Abstract

Traditional image classification often misclassifies unknown samples as known classes during testing, degrading recognition accuracy. Open-set image recognition can simultaneously detect known classes (KCs) and unknown classes (UCs) but still struggles to improve recognition performance caused by open space risk. Therefore, we introduce [...] Read more.

Traditional image classification often misclassifies unknown samples as known classes during testing, degrading recognition accuracy. Open-set image recognition can simultaneously detect known classes (KCs) and unknown classes (UCs) but still struggles to improve recognition performance caused by open space risk. Therefore, we introduce a cosine distance loss function (CDLoss), which exploits the orthogonality of one-hot encoding vectors to align known samples with their corresponding one-hot encoder directions. This reduces the overlap between the feature spaces of KCs and UCs, mitigating open space risk. CDLoss was incorporated into both Softmax-based and prototype-learning-based frameworks to evaluate its effectiveness. Experimental results show that CDLoss improves AUROC, OSCR, and accuracy across both frameworks and different datasets. Furthermore, various weight combinations of the ARPL and CDLoss were explored, revealing optimal performance with a 1:2 ratio. T-SNE analysis confirms that CDLoss reduces the overlap between the feature spaces of KCs and UCs. These results demonstrate that CDLoss helps mitigate open space risk, enhancing recognition performance in open-set image classification tasks. Full article

► Show Figures

Figure 1

19 pages, 2634 KiB

Open AccessArticle

GDSMOTE: A Novel Synthetic Oversampling Method for High-Dimensional Imbalanced Financial Data

by Libin Hu and Yunfeng Zhang

Mathematics 2024, 12(24), 4036; https://doi.org/10.3390/math12244036 - 23 Dec 2024

Cited by 1 | Viewed by 778

Abstract

Synthetic oversampling methods for dealing with imbalanced classification problems have been widely studied. However, the current synthetic oversampling methods still cannot perform well when facing high-dimensional imbalanced financial data. The failure of distance measurement in high-dimensional space, error accumulation caused by noise samples, [...] Read more.

Synthetic oversampling methods for dealing with imbalanced classification problems have been widely studied. However, the current synthetic oversampling methods still cannot perform well when facing high-dimensional imbalanced financial data. The failure of distance measurement in high-dimensional space, error accumulation caused by noise samples, and the reduction of recognition accuracy of majority samples caused by the distribution of synthetic samples are the main reasons that limit the performance of current methods. Taking these factors into consideration, a novel synthetic oversampling method is proposed, namely the gradient distribution-based synthetic minority oversampling technique (GDSMOTE). Firstly, the concept of gradient contribution was used to assign the minority-class samples to different gradient intervals instead of relying on the spatial distance. Secondly, the root sample selection strategy of GDSMOTE avoids the error accumulation caused by noise samples and a new concept of nearest neighbor was proposed to determine the auxiliary samples. Finally, a safety gradient distribution approximation strategy based on cosine similarity was designed to determine the number of samples to be synthesized in each safety gradient interval. Experiments on high-dimensional imbalanced financial datasets show that GDSMOTE can achieve a higher F1-Score and MCC metrics than baseline methods while achieving a higher recall score. This means that our method has the characteristics of improving the recognition accuracy of minority-class samples without sacrificing the recognition accuracy of majority-class samples and has good adaptability to data decision-making tasks in the financial field. Full article

(This article belongs to the Special Issue Advancement of Mathematical Methods in Feature Representation Learning for Artificial Intelligence, Data Mining and Robotics, 2nd Edition)

► Show Figures

Figure 1

13 pages, 7103 KiB

Open AccessArticle

An Action Evaluation Method for Virtual Reality Simulation Power Training Based on an Improved Dynamic Time Warping Algorithm

by Qingjie Xu, Yong Liu and Shuo Li

Energies 2024, 17(24), 6242; https://doi.org/10.3390/en17246242 - 11 Dec 2024

Cited by 1 | Viewed by 784

Abstract

To address the shortcomings in action evaluation within VR simulation power training, this paper introduces a novel action recognition and evaluation method based on dynamic recognition of finger keypoints combined with an improved Dynamic Time Warping (DTW) algorithm. By constructing an action recognition [...] Read more.

To address the shortcomings in action evaluation within VR simulation power training, this paper introduces a novel action recognition and evaluation method based on dynamic recognition of finger keypoints combined with an improved Dynamic Time Warping (DTW) algorithm. By constructing an action recognition model centered on hand keypoints, the proposed method integrates distance similarity and cosine similarity to account comprehensively for both numerical differences and directional consistency of action features. This approach effectively tackles the challenges of feature extraction and recognition for complex actions in VR power training. Furthermore, a scoring mechanism based on the improved DTW algorithm is proposed, which employs Gaussian-weighted feature-derivative Euclidean distance combined with cosine similarity. This method significantly reduces computational complexity while improving scoring accuracy and consistency. Experimental results indicated that the improved DTW algorithm outperformed traditional methods in terms of classification accuracy and robustness. In particular, cosine similarity demonstrated superior performance in capturing dynamic variations and assessing the consistency of fine hand movements. This study provides an essential technical reference for action evaluation in VR simulation power training and offers a scientific basis for advancing the intelligence and digitalization of power VR training environments. Full article

(This article belongs to the Section F: Electrical Engineering)

► Show Figures

Figure 1

13 pages, 1061 KiB

Open AccessArticle

Swin-Fake: A Consistency Learning Transformer-Based Deepfake Video Detector

by Liang Yu Gong, Xue Jun Li and Peter Han Joo Chong

Electronics 2024, 13(15), 3045; https://doi.org/10.3390/electronics13153045 - 1 Aug 2024

Cited by 7 | Viewed by 3082

Abstract

Deepfake has become an emerging technology affecting cyber-security with its illegal applications in recent years. Most deepfake detectors utilize CNN-based models such as the Xception Network to distinguish real or fake media; however, their performance on cross-datasets is not ideal because they suffer [...] Read more.

Deepfake has become an emerging technology affecting cyber-security with its illegal applications in recent years. Most deepfake detectors utilize CNN-based models such as the Xception Network to distinguish real or fake media; however, their performance on cross-datasets is not ideal because they suffer from over-fitting in the current stage. Therefore, this paper proposed a spatial consistency learning method to relieve this issue in three aspects. Firstly, we increased the selections of data augmentation methods to 5, which is more than our previous study’s data augmentation methods. Specifically, we captured several equal video frames of one video and randomly selected five different data augmentations to obtain different data views to enrich the input variety. Secondly, we chose Swin Transformer as the feature extractor instead of a CNN-based backbone, which means that our approach did not utilize it for downstream tasks, and could encode these data using an end-to-end Swin Transformer, aiming to learn the correlation between different image patches. Finally, this was combined with consistency learning in our study, and consistency learning was able to determine more data relationships than supervised classification. We explored the consistency of video frames’ features by calculating their cosine distance and applied traditional cross-entropy loss to regulate this classification loss. Extensive in-dataset and cross-dataset experiments demonstrated that Swin-Fake could produce relatively good results on some open-source deepfake datasets, including FaceForensics++, DFDC, Celeb-DF and FaceShifter. By comparing our model with several benchmark models, our approach shows relatively strong robustness in detecting deepfake media. Full article

(This article belongs to the Special Issue Neural Networks and Deep Learning in Computer Vision)

► Show Figures

Figure 1

25 pages, 2248 KiB

Open AccessArticle

SCMs: Systematic Conglomerated Models for Audio Cough Signal Classification

by Sunil Kumar Prabhakar and Dong-Ok Won

Algorithms 2024, 17(7), 302; https://doi.org/10.3390/a17070302 - 8 Jul 2024

Cited by 1 | Viewed by 1427

Abstract

A common and natural physiological response of the human body is cough, which tries to push air and other wastage thoroughly from the airways. Due to environmental factors, allergic responses, pollution or some diseases, cough occurs. A cough can be either dry or [...] Read more.

A common and natural physiological response of the human body is cough, which tries to push air and other wastage thoroughly from the airways. Due to environmental factors, allergic responses, pollution or some diseases, cough occurs. A cough can be either dry or wet depending on the amount of mucus produced. A characteristic feature of the cough is the sound, which is a quacking sound mostly. Human cough sounds can be monitored continuously, and so, cough sound classification has attracted a lot of interest in the research community in the last decade. In this research, three systematic conglomerated models (SCMs) are proposed for audio cough signal classification. The first conglomerated technique utilizes the concept of robust models like the Cross-Correlation Function (CCF) and Partial Cross-Correlation Function (PCCF) model, Least Absolute Shrinkage and Selection Operator (LASSO) model, elastic net regularization model with Gabor dictionary analysis and efficient ensemble machine learning techniques, the second technique utilizes the concept of stacked conditional autoencoders (SAEs) and the third technique utilizes the concept of using some efficient feature extraction schemes like Tunable Q Wavelet Transform (TQWT), sparse TQWT, Maximal Information Coefficient (MIC), Distance Correlation Coefficient (DCC) and some feature selection techniques like the Binary Tunicate Swarm Algorithm (BTSA), aggregation functions (AFs), factor analysis (FA), explanatory factor analysis (EFA) classified with machine learning classifiers, kernel extreme learning machine (KELM), arc-cosine ELM, Rat Swarm Optimization (RSO)-based KELM, etc. The techniques are utilized on publicly available datasets, and the results show that the highest classification accuracy of 98.99% was obtained when sparse TQWT with AF was implemented with an arc-cosine ELM classifier. Full article

(This article belongs to the Special Issue Quantum and Classical Artificial Intelligence)

► Show Figures

Figure 1

23 pages, 2938 KiB

Open AccessArticle

An Improved Expeditious Meta-Heuristic Clustering Method for Classifying Student Psychological Issues with Homogeneous Characteristics

by Muhammad Suhail Shaikh, Xiaoqing Dong, Gengzhong Zheng, Chang Wang and Yifan Lin

Mathematics 2024, 12(11), 1620; https://doi.org/10.3390/math12111620 - 22 May 2024

Cited by 6 | Viewed by 1509

Abstract

Nowadays, cluster analyses are widely used in mental health research to categorize student stress levels. However, conventional clustering methods experience challenges with large datasets and complex issues, such as converging to local optima and sensitivity to initial random states. To address these limitations, [...] Read more.

Nowadays, cluster analyses are widely used in mental health research to categorize student stress levels. However, conventional clustering methods experience challenges with large datasets and complex issues, such as converging to local optima and sensitivity to initial random states. To address these limitations, this research work introduces an Improved Grey Wolf Clustering Algorithm (iGWCA). This improved approach aims to adjust the convergence rate and mitigate the risk of being trapped in local optima. The iGWCA algorithm provides a balanced technique for exploration and exploitation phases, alongside a local search mechanism around the optimal solution. To assess its efficiency, the proposed algorithm is verified on two different datasets. The dataset-I comprises 1100 individuals obtained from the Kaggle database, while dataset-II is based on 824 individuals obtained from the Mendeley database. The results demonstrate the competence of iGWCA in classifying student stress levels. The algorithm outperforms other methods in terms of lower intra-cluster distances, obtaining a reduction rate of 1.48% compared to Grey Wolf Optimization (GWO), 8.69% compared to Mayfly Optimization (MOA), 8.45% compared to the Firefly Algorithm (FFO), 2.45% Particle Swarm Optimization (PSO), 3.65%, Hybrid Sine Cosine with Cuckoo search (HSCCS), 8.20%, Hybrid Firefly and Genetic Algorithm (FAGA) and 8.68% Gravitational Search Algorithm (GSA). This demonstrates the effectiveness of the proposed algorithm in minimizing intra-cluster distances, making it a better choice for student stress classification. This research contributes to the advancement of understanding and managing student well-being within academic communities by providing a robust tool for stress level classification. Full article

(This article belongs to the Special Issue Deep Learning and Adaptive Control, 3rd Edition)

► Show Figures

Figure 1

17 pages, 4881 KiB

Open AccessArticle

Intelligent Packet Priority Module for a Network of Unmanned Aerial Vehicles Using Manhattan Long Short-Term Memory

by Dino Budi Prakoso, Jauzak Hussaini Windiatmaja, Agus Mulyanto, Riri Fitri Sari and Rosdiadee Nordin

Drones 2024, 8(5), 183; https://doi.org/10.3390/drones8050183 - 7 May 2024

Cited by 1 | Viewed by 1828

Abstract

Unmanned aerial vehicles (UAVs) are becoming more common in wireless communication networks. Using UAVs can lead to network problems. An issue arises when the UAVs function in a network-access-limited environment with nodes causing interference. This issue could potentially hinder UAV network connectivity. This [...] Read more.

Unmanned aerial vehicles (UAVs) are becoming more common in wireless communication networks. Using UAVs can lead to network problems. An issue arises when the UAVs function in a network-access-limited environment with nodes causing interference. This issue could potentially hinder UAV network connectivity. This paper introduces an intelligent packet priority module (IPPM) to minimize network latency. This study analyzed Network Simulator–3 (NS-3) network modules utilizing Manhattan long short-term memory (MaLSTM) for packet classification of critical UAV, ground control station (GCS), or interfering nodes. To minimize network latency and packet delivery ratio (PDR) issues caused by interfering nodes, packets from prioritized nodes are transmitted first. Simulation results and evaluation show that our proposed intelligent packet priority module (IPPM) method outperformed previous approaches. The proposed IPPM based on MaLSTM implementation for the priority packet module led to a lower network delay and a higher packet delivery ratio. The performance of the IPPM averaged 62.2 ms network delay and 0.97 packet delivery ratio (PDR). The MaLSTM peaked at 97.5% accuracy. Upon further evaluation, the stability of LSTM Siamese models was observed to be consistent across diverse similarity functions, including cosine and Euclidean distances. Full article

(This article belongs to the Special Issue UAV-Assisted Mobile Wireless Networks and Applications)

► Show Figures

Figure 1

21 pages, 12467 KiB

Open AccessArticle

AL-MRIS: An Active Learning-Based Multipath Residual Involution Siamese Network for Few-Shot Hyperspectral Image Classification

by Jinghui Yang, Jia Qin, Jinxi Qian, Anqi Li and Liguo Wang

Remote Sens. 2024, 16(6), 990; https://doi.org/10.3390/rs16060990 - 12 Mar 2024

Cited by 6 | Viewed by 1692

Abstract

In hyperspectral image (HSI) classification scenarios, deep learning-based methods have achieved excellent classification performance, but often rely on large-scale training datasets to ensure accuracy. However, in practical applications, the acquisition of hyperspectral labeled samples is time consuming, labor intensive and costly, which leads [...] Read more.

In hyperspectral image (HSI) classification scenarios, deep learning-based methods have achieved excellent classification performance, but often rely on large-scale training datasets to ensure accuracy. However, in practical applications, the acquisition of hyperspectral labeled samples is time consuming, labor intensive and costly, which leads to a scarcity of obtained labeled samples. Suffering from insufficient training samples, few-shot sample conditions limit model training and ultimately affect HSI classification performance. To solve the above issues, an active learning (AL)-based multipath residual involution Siamese network for few-shot HSI classification (AL-MRIS) is proposed. First, an AL-based Siamese network framework is constructed. The Siamese network, which has relatively low demand for sample data, is adopted for classification, and the AL strategy is integrated to select more representative samples to improve the model’s discriminative ability and reduce the costs of labeling samples in practice. Then, the multipath residual involution (MRIN) module is designed for the Siamese subnetwork to obtain the comprehensive features of the HSI. The involution operation was used to capture the fine-grained features and effectively aggregate the contextual semantic information of the HSI through dynamic weights. The MRIN module comprehensively considers the local features, dynamic features and global features through multipath residual connections, which improves the representation ability of HSIs. Moreover, a cosine distance-based contrastive loss is proposed for the Siamese network. By utilizing the directional similarity of high-dimensional HSI data, the discriminability of the Siamese classification network is improved. A large number of experimental results show that the proposed AL-MRIS method can achieve excellent classification performance with few-shot training samples, and compared with several state-of-the-art classification methods, the AL-MRIS method obtains the highest classification accuracy. Full article

(This article belongs to the Special Issue New Advances in Hyperspectral–Multispectral Image Classification and Fusion Applications)

► Show Figures

Graphical abstract

24 pages, 3271 KiB

Open AccessArticle

Ontology-Driven Semantic Analysis of Tabular Data: An Iterative Approach with Advanced Entity Recognition

by Madina Mansurova, Vladimir Barakhnin, Assel Ospan and Roman Titkov

Appl. Sci. 2023, 13(19), 10918; https://doi.org/10.3390/app131910918 - 2 Oct 2023

Cited by 1 | Viewed by 2133

Abstract

This study focuses on the extraction and semantic analysis of data from tables, emphasizing the importance of understanding the semantics of tables to obtain useful information. The main goal was to develop a technology using the ontology for the semantic analysis of tables. [...] Read more.

This study focuses on the extraction and semantic analysis of data from tables, emphasizing the importance of understanding the semantics of tables to obtain useful information. The main goal was to develop a technology using the ontology for the semantic analysis of tables. An iterative algorithm has been proposed that can parse the contents of a table and determine cell types based on the ontology. The study presents an automated method for extracting data in various languages in various fields, subject to the availability of an appropriate ontology. Advanced techniques such as cosine distance search and table subject classification based on a neural network have been integrated to increase efficiency. The result is a software application capable of semantically classifying tabular data, facilitating the rapid transition of information from tables to ontologies. Rigorous testing, including 30 tables in the field of water resources and socio-economic indicators of Kazakhstan, confirmed the reliability of the algorithm. The results demonstrate high accuracy with a notable triple extraction recall of 99.4%. The use of Levenshtein distance for matching entities and ontology as a source of information was key to achieving these metrics. The study offers a promising tool for efficiently extracting data from tables. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

14 pages, 3744 KiB

Open AccessArticle

Evaluation Method and Application of Cold Rolled Strip Flatness Quality Based on Multi-Objective Decision-Making

by Qiuna Wang, Jingdong Li, Xiaochen Wang, Quan Yang and Zedong Wu

Metals 2022, 12(11), 1977; https://doi.org/10.3390/met12111977 - 19 Nov 2022

Cited by 7 | Viewed by 2347

Abstract

Flatness is a vital quality index that determines the dimensional accuracy of the cold-rolled strip. This paper designs a local shape wave extraction algorithm and a fuzzy classification algorithm for overall flatness defect classification based on cosine distance. By introducing the small displacement [...] Read more.

Flatness is a vital quality index that determines the dimensional accuracy of the cold-rolled strip. This paper designs a local shape wave extraction algorithm and a fuzzy classification algorithm for overall flatness defect classification based on cosine distance. By introducing the small displacement buckling theory of thin plates, the plate stress buckling model of overall and local shape waves is studied, and the critical buckling elongation difference of the overall shape and the local shape under the given conditions are obtained. Finally, using the multi-objective decision-making evaluation method, a comprehensive evaluation model of the flatness quality is established. The model is applied to the actual cold rolling production. The on-site flatness data are used to verify the flatness quality determination model both locally and overall. The results show that the model can accurately identify the local and overall flatness defects of cold-rolled strips, realizes the accurate identification and evaluation of the cold-rolled flatness quality, and provides strong support for the optimization of rolling process parameters and the improvement of the quality of thin strip products. Full article

(This article belongs to the Section Computation and Simulation on Metals)

► Show Figures

Figure 1

23 pages, 5081 KiB

Open AccessArticle

Machine Vision-Based Human Action Recognition Using Spatio-Temporal Motion Features (STMF) with Difference Intensity Distance Group Pattern (DIDGP)

by Jawaharlalnehru Arunnehru, Sambandham Thalapathiraj, Ravikumar Dhanasekar, Loganathan Vijayaraja, Raju Kannadasan, Arfat Ahmad Khan, Mohd Anul Haq, Mohammed Alshehri, Mohamed Ibrahim Alwanain and Ismail Keshta

Electronics 2022, 11(15), 2363; https://doi.org/10.3390/electronics11152363 - 28 Jul 2022

Cited by 28 | Viewed by 2877

Abstract

In recent years, human action recognition is modeled as a spatial-temporal video volume. Such aspects have recently expanded greatly due to their explosively evolving real-world uses, such as visual surveillance, autonomous driving, and entertainment. Specifically, the spatio-temporal interest points (STIPs) approach has been [...] Read more.

In recent years, human action recognition is modeled as a spatial-temporal video volume. Such aspects have recently expanded greatly due to their explosively evolving real-world uses, such as visual surveillance, autonomous driving, and entertainment. Specifically, the spatio-temporal interest points (STIPs) approach has been widely and efficiently used in action representation for recognition. In this work, a novel approach based on the STIPs is proposed for action descriptors i.e., Two Dimensional-Difference Intensity Distance Group Pattern (2D-DIDGP) and Three Dimensional-Difference Intensity Distance Group Pattern (3D-DIDGP) for representing and recognizing the human actions in video sequences. Initially, this approach captures the local motion in a video that is invariant to size and shape changes. This approach extends further to build unique and discriminative feature description methods to enhance the action recognition rate. The transformation methods, such as DCT (Discrete cosine transform), DWT (Discrete wavelet transforms), and hybrid DWT+DCT, are utilized. The proposed approach is validated on the UT-Interaction dataset that has been extensively studied by past researchers. Then, the classification methods, such as Support Vector Machines (SVM) and Random Forest (RF) classifiers, are exploited. From the observed results, it is perceived that the proposed descriptors especially the DIDGP based descriptor yield promising results on action recognition. Notably, the 3D-DIDGP outperforms the state-of-the-art algorithm predominantly. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

16 pages, 1792 KiB

Open AccessArticle

DMnet: A New Few-Shot Framework for Wind Turbine Surface Defect Detection

by Jinyun Yu, Kaipei Liu, Liang Qin, Qiang Li, Feng Zhao, Qiulin Wang, Haofeng Liu, Boqiang Li, Jing Wang and Kexin Li

Machines 2022, 10(6), 487; https://doi.org/10.3390/machines10060487 - 16 Jun 2022

Cited by 7 | Viewed by 2621

Abstract

In the field of wind turbine surface defect detection, most existing defect detection algorithms have a single solution with poor generalization to the dilemma of insufficient defect samples and have unsatisfactory precision for small and concealed defects. Inspired by meta-learning ideology, we devised [...] Read more.

In the field of wind turbine surface defect detection, most existing defect detection algorithms have a single solution with poor generalization to the dilemma of insufficient defect samples and have unsatisfactory precision for small and concealed defects. Inspired by meta-learning ideology, we devised a cross-task training strategy. By exploring the common properties between tasks, the hypothesis space shrinks so that the needed sample size that satisfies a reliable empirical risk minimizer is reduced. To improve the training efficiency, a depth metric-based classification method is specially designed to find a sample-matching feature space with a good similarity measure by cosine distance. Additionally, a real-time feedback session is innovatively added to the model training loop, which performs information enhancement and filtering according to the task relevance. With dynamic activation mapping, it alleviates the information loss during traditional pooling operations, thus helping to avoid the missed detection of small-scale targets. Experimental results show that the proposed method has significantly improved the defect recognition ability under few-shot training conditions. Full article

(This article belongs to the Section Machines Testing and Maintenance)

► Show Figures

Figure 1

17 pages, 1864 KiB

Open AccessArticle

Deep Learning-Based Total Kidney Volume Segmentation in Autosomal Dominant Polycystic Kidney Disease Using Attention, Cosine Loss, and Sharpness Aware Minimization

by Anish Raj, Fabian Tollens, Laura Hansen, Alena-Kathrin Golla, Lothar R. Schad, Dominik Nörenberg and Frank G. Zöllner

Diagnostics 2022, 12(5), 1159; https://doi.org/10.3390/diagnostics12051159 - 7 May 2022

Cited by 29 | Viewed by 3706

Abstract

Early detection of the autosomal dominant polycystic kidney disease (ADPKD) is crucial as it is one of the most common causes of end-stage renal disease (ESRD) and kidney failure. The total kidney volume (TKV) can be used as a biomarker to quantify disease [...] Read more.

Early detection of the autosomal dominant polycystic kidney disease (ADPKD) is crucial as it is one of the most common causes of end-stage renal disease (ESRD) and kidney failure. The total kidney volume (TKV) can be used as a biomarker to quantify disease progression. The TKV calculation requires accurate delineation of kidney volumes, which is usually performed manually by an expert physician. However, this is time-consuming and automated segmentation is warranted. Furthermore, the scarcity of large annotated datasets hinders the development of deep learning solutions. In this work, we address this problem by implementing three attention mechanisms into the U-Net to improve TKV estimation. Additionally, we implement a cosine loss function that works well on image classification tasks with small datasets. Lastly, we apply a technique called sharpness aware minimization (SAM) that helps improve the generalizability of networks. Our results show significant improvements (p-value < 0.05) over the reference kidney segmentation U-Net. We show that the attention mechanisms and/or the cosine loss with SAM can achieve a dice score (DSC) of 0.918, a mean symmetric surface distance (MSSD) of 1.20 mm with the mean TKV difference of −1.72%, and R

^{2}

of 0.96 while using only 100 MRI datasets for training and testing. Furthermore, we tested four ensembles and obtained improvements over the best individual network, achieving a DSC and MSSD of 0.922 and 1.09 mm, respectively. Full article

(This article belongs to the Special Issue Machine Learning for Computer-Aided Diagnosis in Biomedical Imaging)

► Show Figures

Figure 1

Search Results (28)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (28)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI