Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (3,198)

Search Parameters:
Keywords = D-CNNs

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 3640 KB  
Article
A 3D Global-Patch Transformer for Brain Age Prediction Using T1-Weighted MRI with Gray and White Matter Maps
by Seung-Jun Lee, Myungeun Lee, Yoo Ri Kim and Hyung-Jeong Yang
Appl. Sci. 2026, 16(6), 3004; https://doi.org/10.3390/app16063004 - 20 Mar 2026
Abstract
With the increasing prevalence of neurodegenerative diseases driven by population aging, imaging-based biomarkers are needed to quantify brain aging at an early stage. Brain age, which estimates structural brain aging relative to chronological age, has emerged as a useful indicator. Prior work has [...] Read more.
With the increasing prevalence of neurodegenerative diseases driven by population aging, imaging-based biomarkers are needed to quantify brain aging at an early stage. Brain age, which estimates structural brain aging relative to chronological age, has emerged as a useful indicator. Prior work has mainly used T1-weighted MRI with deep learning models such as convolutional neural networks (CNNs) or transformers; however, many approaches insufficiently capture three-dimensional structural continuity and localized anatomical patterns, and tissue-specific aging in gray matter (GM) and white matter (WM) is often treated as auxiliary. To address these limitations, we propose a 3D Global–Patch Transformer framework for brain age prediction that directly processes volumetric data while jointly learning global brain structure and local anatomical features. Our model runs global and patch pathways in parallel and explicitly incorporates GM and WM structural maps alongside T1-weighted MRI to encode tissue-specific aging signals. Experiments on multiple public datasets, including IXI and OASIS, show that the proposed method reduces mean absolute error (MAE) by approximately 10–15% compared with CNN-based and single-input transformer baselines, with notably improved performance in older populations, highlighting the value of tissue-level structural information for brain age estimation. Full article
(This article belongs to the Special Issue MR-Based Neuroimaging, 2nd Edition)
Show Figures

Figure 1

18 pages, 3377 KB  
Article
Can 3D T1 Post-Contrast T1 MRI Radiomics-Machine Learning Model to Distinguish Infective from Neoplastic Ring-Enhancing Brain Lesions: An Exploratory Study
by Edwin Chong Yu Sng, Minh Bao Kha, Min Jia Wong, Nicholas Kuan Hsien Lee, Jonathan Cheng Yao Goh, So Jeong Park, Darren Cheng Han Teo, Wei Ming Chua, May Yi Shan Lim, Septian Hartono, Lester Chee Hoe Lee, Candice Yuen Yue Chan, Hwee Kuan Lee and Ling Ling Chan
Diagnostics 2026, 16(6), 926; https://doi.org/10.3390/diagnostics16060926 - 20 Mar 2026
Abstract
Background/Objectives: Rapid and accurate classification of ring-enhancing brain lesions (REBLs) into infection or neoplasm is key to clinical triaging for expedited diagnostics in the former to enhance treatment outcomes, especially in the immunocompromised patients. High-resolution three-dimensional (3D) T1 post-contrast (T1+C) MRI provides [...] Read more.
Background/Objectives: Rapid and accurate classification of ring-enhancing brain lesions (REBLs) into infection or neoplasm is key to clinical triaging for expedited diagnostics in the former to enhance treatment outcomes, especially in the immunocompromised patients. High-resolution three-dimensional (3D) T1 post-contrast (T1+C) MRI provides high-dimensional volumetric data for radiomics analysis. While radiomics is useful in brain neoplasm characterization, its utility in central nervous system infection remains under-explored. In this exploratory study, we aim to determine if a radiomics-machine learning model, based solely on a 3D T1+C MRI dataset, can distinguish infective from neoplastic REBLs. Methods: 92 patients (infection, n = 26; neoplasm, n = 66) with 402 REBLs, who fulfilled criteria for “definite” or “probable” infective or neoplastic REBLs, were identified from scans performed at our hospital over four years and formed the training/validation dataset. All REBLs were manually annotated on T1+C MRI images under radiological supervision. In total, 1197 radiomics features were extracted, feature selection performed using mutual information, and nine machine learning classifiers applied to assess patient-level infection vs. neoplasm classification performance. End-to-end 2D CNN baselines and hybrid radiomics–CNN configurations were additionally evaluated under the same protocol for comparative benchmarking. Model performance was tested on an external holdout dataset of 57 patients (infection, n = 25; neoplasm, n = 32) with 454 REBLs from another hospital. Results: The Multi-layer Perceptron (MLP) model using the Original + LoG + Wavelet feature group demonstrated superior performance. In the cross-validation cohort, it achieved a mean AUC of 0.80 ± 0.02, sensitivity of 0.83 ± 0.09, specificity of 0.77 ± 0.08, and balanced accuracy of 0.80 ± 0.02. On external holdout data, the same configuration showed stable and sustainable performance with an AUC of 0.84, sensitivity of 0.84, specificity of 0.75, and balanced accuracy of 0.80. Conclusions: Our radiomics-machine learning model, based solely on a high-resolution 3D T1+C dataset, shows potential for distinguishing infective REBLs from neoplastic REBLs. Further study, with additional MR sequences and clinical data in a multimodal MRI radiomics-machine learning model, is warranted. Full article
(This article belongs to the Special Issue Neurological Diseases: Biomarkers, Diagnosis and Prognosis)
Show Figures

Figure 1

30 pages, 9811 KB  
Article
Audio-Based Screening of Respiratory Diseases Using Machine Learning: A Methodological Framework Evaluated on a Clinically Validated COVID-19 Cough Dataset
by Arley Magnolia Aquino-García, Humberto Pérez-Espinosa, Javier Andreu-Perez and Ansel Y. Rodríguez González
Mach. Learn. Knowl. Extr. 2026, 8(3), 80; https://doi.org/10.3390/make8030080 - 20 Mar 2026
Abstract
The development of AI-driven computational methods has enabled rapid and non-invasive analysis of respiratory sounds using acoustic data, particularly cough recordings. Although the COVID-19 pandemic accelerated research on cough-based acoustic analysis, many early studies were limited by insufficient data quality, lack of standardized [...] Read more.
The development of AI-driven computational methods has enabled rapid and non-invasive analysis of respiratory sounds using acoustic data, particularly cough recordings. Although the COVID-19 pandemic accelerated research on cough-based acoustic analysis, many early studies were limited by insufficient data quality, lack of standardized protocols, and limited reproducibility due to data scarcity. In this study, we propose an audio analysis framework for cough-based respiratory disease screening research using COVID-19 as a clinically validated case dataset. All analyses were conducted on a single clinically acquired multicentric dataset collected under standardized conditions in certified laboratories in Mexico and Spain, comprising cough recordings from 1105 individuals. Model training and testing were performed exclusively within this dataset. The framework incorporates signal preprocessing and a comparative evaluation of segmentation strategies, showing that segmented cough analysis significantly outperforms full-signal analysis. Class imbalance was addressed using the Synthetic Minority Over-sampling Technique (SMOTE) for CNN2D models and the supervised Resample filter implemented in WEKA for classical machine learning models, both applied exclusively to the training subset to generate balanced training sets and prevent data leakage. Feature extraction and classification were carried out using Random Forest, Support Vector Machine (SVM), XGBoost, and a 2D Convolutional Neural Network (CNN2D), with hyperparameter optimization via AutoML. The proposed framework achieved a best balanced screening performance of 85.58% sensitivity and 86.65% specificity (Random Forest with GeMAPSvB01), while the highest-specificity configuration reached 93.90% specificity with 18.14% sensitivity (CNN2D with SMOTE and AutoML). These results demonstrate the methodological feasibility of the proposed framework under the evaluated conditions. Full article
Show Figures

Figure 1

27 pages, 28242 KB  
Article
Physics-Informed Side-Scan Sonar Perception: Tackling Weak Targets and Sparse Debris via Geometric and Frequency Decoupling
by Bojian Yu, Rongsheng Lin, Hanxiang Zhou, Jianxiong Zhang and Xinwei Zhang
Sensors 2026, 26(6), 1938; https://doi.org/10.3390/s26061938 - 19 Mar 2026
Abstract
Side-scan sonar (SSS) serves as the primary perceptual instrument for Autonomous Underwater Vehicles (AUVs) in large-scale marine search and rescue (SAR) operations. However, the detection of critical targets is frequently hindered by severe hydro-acoustic noise, the spatial discontinuity of wreckage, and the weak [...] Read more.
Side-scan sonar (SSS) serves as the primary perceptual instrument for Autonomous Underwater Vehicles (AUVs) in large-scale marine search and rescue (SAR) operations. However, the detection of critical targets is frequently hindered by severe hydro-acoustic noise, the spatial discontinuity of wreckage, and the weak visual signatures of small targets. To surmount these challenges, this paper presents WPG-DetNet. First, we introduce a Wavelet-Embedded Residual Backbone (WERB) to reconstruct the conventional downsampling paradigm. By substituting standard pooling with the Discrete Wavelet Transform (DWT), this architecture explicitly disentangles high-frequency noise from structural information in the frequency domain, thereby achieving the adaptive preservation of edge fidelity for large human-made targets while filtering out speckle interference. Then, addressing the distinct challenge of discontinuous aircraft wreckage, the framework further incorporates a Debris Graph Reasoning Module (D-GRM). This module models scattered fragments as nodes in a topological graph to capture long-range semantic dependencies, transforming isolated instance recognition into context-aware scene understanding. Finally, to bridge the gap between AI and underwater physics, we design a Shadow-Aided Decoupling Head (SADH) equipped with a physics-informed geometric loss. By enforcing mathematical consistency between target height and acoustic shadow length, this mechanism establishes a rigorous discriminative criterion capable of distinguishing weak-echo human bodies from seabed rocks based on shadow geometry. Experiments on the SCTD dataset demonstrate that WPG-DetNet achieves a mean Average Precision (mAP50) of 97.5% and a Recall of 96.9%. Quantitative analysis reveals that our framework outperforms the classic Faster R-CNN by a margin of 12.8% in mAP50 and surpasses the Transformer-based RT-DETR-R18 by 5.6% in high-precision localization metrics (mAP50:95). Simultaneously, WPG-DetNet maintains superior efficiency with an inference speed of 62.5 FPS and a lightweight parameter count of 16.8 M, striking an optimal balance between robust perception and the real-time constraints of AUV operations. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

22 pages, 1509 KB  
Article
ICTD: Combination of Improved CNN–Transformer and Enhanced Deep Canonical Correlation Analysis for Eye-Movement Emotion Classification
by Cong Zhang, Xisheng Li, Jiannan Chi, Ming Cao, Qingfeng Gu and Jiahui Liu
Brain Sci. 2026, 16(3), 330; https://doi.org/10.3390/brainsci16030330 - 19 Mar 2026
Abstract
Background/Objectives: Emotion classification based on eye-movement features has become a widely adopted approach due to the simplicity of data acquisition and the strong association between ocular responses and emotional states. However, several challenges remain with regard to existing emotion recognition methods, including [...] Read more.
Background/Objectives: Emotion classification based on eye-movement features has become a widely adopted approach due to the simplicity of data acquisition and the strong association between ocular responses and emotional states. However, several challenges remain with regard to existing emotion recognition methods, including the relatively weak correlation between eye-movement features and emotional labels and the fact that the key features are not prominently presented. Methods: To address abovelimitations, this study proposes an improved CNN-transformer combined with enhanced deep canonical correlation analysis network (ICTD). The proposed method first performs preprocessing and reconstruction of raw eye-movement signals to extract informative features. Subsequently, convolutional neural networks (CNNs) and transformer architectures are employed to capture local and global feature, respectively. In addition, an incremental feature feedforward network is incorporated to enhance the transformer, enabling the model to assign higher importance to salient feature information. Finally, the extracted representations are processed through deep canonical correlation analysis based on cosine similarity in order to generate classification outcomes. Results: Experiments conducted on the SEED-IV, SEED-V, and eSEE-d datasets demonstrate that the proposed ICTD framework consistently outperforms baseline approaches and attains optimal classification results. (1) On the eSEE-d dataset, the results of three-category arousal and valence classification reach 81.8% and 85.2%, respectively; (2) on the SEED-IV dataset, the emotion four-category classification result reaches 91.2%; (3) finally, on the SEED-V dataset, the emotion five-category classification result reaches 85.1%. Conclusions: The proposed ICTD framework effectively improves feature representation and classification performance, showing strong potential for practical emotion recognition and physiological signal analysis. Full article
(This article belongs to the Section Cognitive, Social and Affective Neuroscience)
Show Figures

Figure 1

30 pages, 1965 KB  
Article
Joint Denoising and Motion-Correction for Low-Dose CT Myocardial Perfusion Imaging Using Deep Learning
by Mahmud Hasan, Aaron So and Mahmoud R. El-Sakka
Electronics 2026, 15(6), 1286; https://doi.org/10.3390/electronics15061286 - 19 Mar 2026
Abstract
Computed Tomography (CT) is a widely used imaging modality that employs X-rays and computational reconstruction to visualize internal anatomy. Although higher radiation doses produce higher-quality images, they also increase long-term cancer risk, motivating the use of low-dose protocols. However, low-dose CT data inherently [...] Read more.
Computed Tomography (CT) is a widely used imaging modality that employs X-rays and computational reconstruction to visualize internal anatomy. Although higher radiation doses produce higher-quality images, they also increase long-term cancer risk, motivating the use of low-dose protocols. However, low-dose CT data inherently suffer from elevated Poisson–Gaussian noise, necessitating effective denoising strategies. In myocardial CT perfusion (CTP) imaging, this challenge is compounded by residual cardiac motion, which misaligns consecutive time points and impairs accurate estimation of perfusion maps for diagnosing coronary artery disease. Traditional approaches typically treat these two problems, noise and motion, separately, denoising the reconstructed images first or applying the registration first. Such serial pipelines often degrade clinically significant features; e.g., denoising may destroy structural details essential for registration, while motion correction can distort subtle intensity cues needed for noise modelling. To overcome these limitations, we propose a unified deep learning framework that performs noise suppression and motion correction jointly for low-dose myocardial CTP. The method integrates two complementary components through a parallel ensemble strategy: (i) a modified Fast and Flexible Denoising Network (FFDNet) that incorporates noise-level maps to mitigate blended noise effectively, and (ii) a CNN-based registration model, extended with Time Enhancement Curve (TEC) correction and 4D physiological consistency constraints to estimate temporally coherent and anatomically plausible motion fields. By combining their outputs without iterative dependencies, the proposed framework produces motion-corrected and denoised CTP sequences in a single unified processing step, thereby better preserving myocardial structure and perfusion dynamics than conventional serial pipelines. The model has been evaluated using both reference-based (MSE, PSNR, SSIM, PCC, Noise Variance, TRE) and no-reference (NIQE, FID, KID, AUC) image quality metrics, supplemented by expert human assessment. Results demonstrate that jointly learning noise characteristics and motion patterns enables restoration of low-dose CTP images while minimizing feature corruption, thereby advancing the clinical utility of low-dose myocardial CTP imaging. Full article
Show Figures

Figure 1

17 pages, 4972 KB  
Article
Seismic Attribute Fusion and Reservoir Prediction Using Multiscale Convolutional Neural Networks and Self-Attention: A Case Study of the B Gas Field, South Sumatra Basin
by Ziyun Cheng, Wensong Huang, Xiaoling Zhang, Zhanxiang Lei, Guoliang Hong, Wenwen Wang, Mengyang Zhang, Linze Li and Jian Li
Processes 2026, 14(6), 981; https://doi.org/10.3390/pr14060981 - 19 Mar 2026
Abstract
Strong heterogeneity and ambiguous seismic responses hinder reliable sandstone thickness prediction when using a single seismic attribute in the lower sandstone interval of the Talang Akar Formation (hereafter abbreviated as the LTAF interval) in the B gas field, South Sumatra Basin. To address [...] Read more.
Strong heterogeneity and ambiguous seismic responses hinder reliable sandstone thickness prediction when using a single seismic attribute in the lower sandstone interval of the Talang Akar Formation (hereafter abbreviated as the LTAF interval) in the B gas field, South Sumatra Basin. To address this challenge, we propose a seismic attribute fusion and reservoir sweet-spot prediction framework based on a multiscale convolutional neural network (CNN) integrated with a self-attention module. Multiple seismic attribute volumes are organized as multi-channel 2D attribute slices, and parallel convolutions with kernel sizes of 3 × 3, 5 × 5, and 7 × 7 are employed to capture spatial features ranging from thin-bed boundaries and channel morphology to sand-body assemblage distribution. The self-attention module explicitly models inter-attribute dependencies and performs adaptive weighted fusion to suppress noise and emphasize informative attributes. The network adopts a dual-output design, producing (i) a sandstone thickness prediction map at the same spatial resolution as the input and (ii) attribute importance scores for quantitative attribute selection and geological interpretation. Using 3D seismic data and well-constrained thickness labels, the proposed model achieves an R2 of 0.8954, outperforming linear regression (R2 = 0.8281) and random forest regression (R2 ≈ 0.8453). The learned importance scores indicate that amplitude-related attributes (e.g., RMS amplitude and maximum amplitude) contribute most to thickness prediction, whereas frequency- and energy-related attributes show relatively lower contributions, which is consistent with bandwidth-limited resolution effects. Overall, the proposed framework unifies attribute fusion, thickness prediction, and interpretability within a single model, providing practical support for fine reservoir characterization and development optimization in heterogeneous sandstone reservoirs. Full article
(This article belongs to the Special Issue Applications of Intelligent Models in the Petroleum Industry)
Show Figures

Figure 1

19 pages, 7323 KB  
Article
Mathematical Benchmarking of Convolutional Neural Networks for Thai Dialect Recognition: A Spectrogram Texture Classification Approach
by Porawat Visutsak, Duongduen Ongrungruaeng, Surapong Wiriya and Keun Ho Ryu
Electronics 2026, 15(6), 1271; https://doi.org/10.3390/electronics15061271 - 18 Mar 2026
Viewed by 99
Abstract
This study rigorously evaluates 13 Convolutional Neural Network (CNN) architectures for Thai dialect recognition. By treating Automatic Speech Recognition (ASR) as a computer vision texture classification task, we processed an extensive 840-h dataset from the Spoken Language Systems, Chulalongkorn University (SLSCU) corpus. Raw [...] Read more.
This study rigorously evaluates 13 Convolutional Neural Network (CNN) architectures for Thai dialect recognition. By treating Automatic Speech Recognition (ASR) as a computer vision texture classification task, we processed an extensive 840-h dataset from the Spoken Language Systems, Chulalongkorn University (SLSCU) corpus. Raw audio from four major dialects—Central, Northern (Khummuang), Northeastern (Korat), and Southern (Pat-tani)—was transformed into 2D Mel-spectrograms using the Short-Time Fourier Transform (STFT). We analyzed a diverse range of architectures, including the VGG, Inception, ResNet, DenseNet, and MobileNet families, to establish the optimal trade-off between mathematical complexity and spectral feature extraction. Our experimental results identify NASNet-Mobile as the most effective model, achieving a macro-average F1-score of 0.9425. The analysis suggests that NASNet’s search-optimized cell structure is uniquely capable of capturing the multiscale texture of phonetic formants. In contrast, we observed a catastrophic mode collapse in VGG16 (32.97% accuracy), likely due to excessive parameter bloat, while Xception and MobileNetV2 maintained robust generalization. Confusion matrix analysis reveals high acoustic distinctiveness for Southern Thai (96.7% recall), whereas Northern Thai exhibits significant spectral overlap with Central Thai. These results support the hypothesis that CNNs interpret spectrograms as textures rather than discrete objects, positioning NASNet-Mobile as a high-performance, low-latency baseline for edge-device deployment in resource-constrained environments. Full article
(This article belongs to the Special Issue Advances in Machine Learning for Image Classification)
Show Figures

Figure 1

32 pages, 5375 KB  
Article
Deep Learning-Enabled Nondestructive Prediction of Moisture Content in Post-Heading Paddy Rice (Oryza sativa L.) Using Near-Infrared Spectroscopy
by Ha-Eun Yang, Hong-Gu Lee, Jeong-Eun Lee, Jeong-Yong Shin, Wan-Gyu Sang, Byoung-Kwan Cho and Changyeun Mo
Agriculture 2026, 16(6), 679; https://doi.org/10.3390/agriculture16060679 - 17 Mar 2026
Viewed by 143
Abstract
Rapid non-destructive evaluation of the moisture content of freshly harvested paddy rice in the field is essential for determining the optimal harvest timing, ensuring high-quality rice production and energy savings. This study developed a non-destructive prediction model for the moisture content of paddy [...] Read more.
Rapid non-destructive evaluation of the moisture content of freshly harvested paddy rice in the field is essential for determining the optimal harvest timing, ensuring high-quality rice production and energy savings. This study developed a non-destructive prediction model for the moisture content of paddy rice using near-infrared (NIR) spectroscopy combined with machine learning and deep learning techniques. Rice samples were collected weekly during the ripening period after heading, and NIR reflectance spectra were acquired in the range of 950–2200 nm. Seven spectral preprocessing techniques were applied; and the prediction models developed, using partial least squares regression, support vector regression, deep neural network, and one-dimensional convolutional neural networks (1D-CNNs) based on VGGNet and EfficientNet architectures. Among these, the EfficientNet-based 1D-CNN combined with Savitzky–Golay 1st order derivative preprocessing showed the highest performance, achieving an Rp2 of 0.999 and an RMSEP of 0.001 (Friedman test, p < 0.001; Kendall’s W = 0.97), significantly outperforming previous traditional machine learning models. The results demonstrate that the proposed prediction model enables highly accurate estimation of moisture content in freshly harvested paddy rice without requiring drying or milling. The proposed approach can be implemented across various agricultural operations, enabling optimal harvest timing, quality control during storage, energy efficient drying, and real-time monitoring via on-combine sensor systems. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

15 pages, 896 KB  
Article
Enhancing Network Intrusion Detection Under Class Imbalance Using a Three-Discriminator Generative Adversarial Network
by Taesu Kim, Hyoseong Park, Dongil Shin and Dongkyoo Shin
Electronics 2026, 15(6), 1253; https://doi.org/10.3390/electronics15061253 - 17 Mar 2026
Viewed by 102
Abstract
Network Intrusion Detection Systems (NIDS) play a crucial role in protecting network environments against cyberattacks. However, traditional NIDS rely heavily on predefined attack signatures, which limits their ability to detect zero-day attacks. Although machine learning-based intrusion detection techniques have been widely adopted in [...] Read more.
Network Intrusion Detection Systems (NIDS) play a crucial role in protecting network environments against cyberattacks. However, traditional NIDS rely heavily on predefined attack signatures, which limits their ability to detect zero-day attacks. Although machine learning-based intrusion detection techniques have been widely adopted in Network Intrusion Prevention Systems (NIPS), publicly available network traffic datasets often suffer from severe class imbalance, leading to biased learning and degraded detection performance. To address this issue, this study proposes data augmentation framework based on a 3D-GAN (Three-Discriminator Generative Adversarial Network). The proposed architecture integrates an autoencoder, a CNN (Convolutional Neural Network), and an LSTM (Long Short-Term Memory) network as parallel discriminators to capture the statistical, spatial, and temporal characteristics of network traffic. By jointly optimizing multiple discriminator losses, the framework enhances training stability and generates high-quality synthetic samples. Experiments were conducted on the CIC-UNSW-NB15 dataset using Random Forest-, XGBoost (eXtreme Gradient Boosting)-, and BiGRU (Bidirectional Gated Recurrent Unit)-based classifiers. Two augmented datasets were constructed to address class imbalance, containing approximately 100,000 and 350,000 samples, respectively. Among them, Dataset 2, augmented using the proposed 3D-GAN, demonstrated the most significant performance improvement. Compared to the original imbalanced dataset, the XGBoost classifier trained on Dataset 2 achieved approximately a 4% increase in both accuracy and F1-score, while reducing the false positive rate and false negative rate by approximately 3.5%. Furthermore, the optimal configuration attained an F1-score of 0.9816, indicating superior capability in modeling complex network traffic patterns. Overall, this study highlights the potential of GAN-based data augmentation for alleviating class imbalance and improving the robustness and generalization of intrusion detection systems. Full article
Show Figures

Figure 1

25 pages, 3328 KB  
Article
End-to-End Acoustic Classification of Respiratory Sounds Using Multi-Architecture Deep Neural Networks
by Btissam Bouzammour, Ghita Zaz, Malika Alami Marktani, Abdellah Touhafi, Anas El Ouali and Mohammed Jorio
Technologies 2026, 14(3), 178; https://doi.org/10.3390/technologies14030178 - 16 Mar 2026
Viewed by 175
Abstract
Respiratory diseases constitute a major global health burden, necessitating accurate and reliable diagnostic support tools. Conventional auscultation, despite its widespread clinical use, remains inherently subjective and susceptible to inter-observer variability. In this study, we propose a unified deep learning framework for the automated [...] Read more.
Respiratory diseases constitute a major global health burden, necessitating accurate and reliable diagnostic support tools. Conventional auscultation, despite its widespread clinical use, remains inherently subjective and susceptible to inter-observer variability. In this study, we propose a unified deep learning framework for the automated classification of respiratory sound recordings into four clinically relevant categories: Normal, Crackles, Wheezes, and Crackles + Wheezes. The experimental evaluation was conducted on a publicly available dataset comprising heterogeneous respiratory recordings collected from both patients with pulmonary pathologies and healthy individuals. All audio signals were subjected to standardized preprocessing procedures to enhance signal consistency and ensure reliable feature extraction across acquisition conditions. To ensure methodological rigor and prevent optimistic bias, a strict subject-independent validation strategy was adopted using 5-fold GroupKFold cross-validation based on patient identifiers. Six deep learning architectures were systematically implemented and comparatively evaluated under a controlled and reproducible training protocol, including convolutional (1D-CNN, Deep-CNN), recurrent hybrid (CNN–LSTM, CNN–BiLSTM), and attention-based (CNN–Attention, CNN–Transformer) models. Performance metrics were reported as mean ± standard deviation across folds. The CNN–Attention architecture achieved the best overall performance, yielding a Balanced Accuracy of 90.1% ± 1.8% and a macro F1-score of 89.7% ± 2.1%, demonstrating stable inter-patient generalization. These findings indicate that attention-enhanced hybrid architectures effectively capture both local spectral structures and long-range temporal dependencies inherent in respiratory signals. The proposed framework provides a robust foundation for subject-independent automated lung sound classification and contributes to the development of clinically reliable decision-support systems. Full article
(This article belongs to the Section Assistive Technologies)
Show Figures

Figure 1

23 pages, 2965 KB  
Article
Hybrid Supervised Classification and Deep Embedding–Based Profiling Framework for Electricity Consumption Analysis
by Mihriban Gunay, Ozal Yildirim, Yakup Demir, Marin Zhilevski, Mikho Mikhov and Nikolay Yordanov
Appl. Sci. 2026, 16(6), 2827; https://doi.org/10.3390/app16062827 - 16 Mar 2026
Viewed by 133
Abstract
This study proposes a hybrid deep learning framework that integrates supervised classification and unsupervised profiling for electricity consumption analysis. In the supervised phase, a one-dimensional Convolutional Neural Network combined with Long Short-Term Memory (1D CNN–LSTM) architecture is developed to classify daily load patterns. [...] Read more.
This study proposes a hybrid deep learning framework that integrates supervised classification and unsupervised profiling for electricity consumption analysis. In the supervised phase, a one-dimensional Convolutional Neural Network combined with Long Short-Term Memory (1D CNN–LSTM) architecture is developed to classify daily load patterns. The performance of the proposed model is compared with traditional machine learning and deep learning approaches, including Support Vector Machine (SVM), k-Nearest Neighbors (KNN), a standalone Long Short-Term Memory (LSTM) model, a Transformer-based model, and a standalone 1D CNN model. Experimental results on the Precon house dataset and the CU-BEMS dataset demonstrate that the proposed hybrid architecture outperforms the benchmark models, achieving classification accuracies of 87.59% and 86.40%, respectively. In the unsupervised phase, the trained CNN–LSTM encoder is utilized as a deep feature extractor. The resulting 32-dimensional latent embeddings are clustered using K-Means, Gaussian Mixture Model (GMM), Agglomerative, Spectral, and Ensemble methods. Clustering robustness is evaluated through bootstrap-based stability analysis using the Adjusted Rand Index (ARI) and the Normalized Mutual Information (NMI). The results demonstrate stable and interpretable electricity consumption profiles, particularly in the residential dataset, where near-perfect clustering stability is observed for K-Means. The proposed framework provides both improved classification performance and robust consumption profiling based on deep embedding, offering a practical tool for energy management. Full article
Show Figures

Figure 1

28 pages, 12746 KB  
Article
PSTNet: A Hyperspectral Image Classification Method Based on Adaptive Spectral–Spatial Tokens and Parallel Attention
by Shaokang Yu, Yong Mei, Xiangsuo Fan, Song Guo, Wujun Xu and Jinlong Fan
Remote Sens. 2026, 18(6), 901; https://doi.org/10.3390/rs18060901 - 15 Mar 2026
Viewed by 238
Abstract
Hyperspectral image classification holds significant applications across multiple domains due to its rich spectral and spatial information. However, it faces challenges such as spectral variation within the same object, spectral variation across different objects, and noise interference. Existing methods like convolutional neural networks [...] Read more.
Hyperspectral image classification holds significant applications across multiple domains due to its rich spectral and spatial information. However, it faces challenges such as spectral variation within the same object, spectral variation across different objects, and noise interference. Existing methods like convolutional neural networks perform well in local feature extraction but inadequately model long-range dependencies. While Transformers can capture global relationships, they struggle to effectively coordinate spectral and spatial information modeling. To address these limitations, this paper proposes a dual-branch collaborative Transformer network (PST-Net). This architecture integrates an adaptive spectral–spatial token (ASST) module, a Parallel Attention-Augmented lightweight CNN branch (PA-SSCNN), and a collaborative fusion layer. The ASST constructs joint representation tokens through local spectral smoothing and learnable spatial embedding. PA-SSCNN employs 3D-2D cascaded convolutions and channel–spatial attention mechanisms to enhance local texture and spatial feature extraction; CHIB enables deep interaction and synergistic fusion of dual-branch features across different levels and scales. Experimental results demonstrate that with only 2% labeled samples, PST-Net achieves overall classification accuracies of 96.31%, 96.59%, 95.27%, and 89.06% on the Salinas and Whuhh, and the two complex urban scene datasets Qingyun and Houston. Especially in fine-grained categories and complex scenes, it exhibits strong robustness. The ablation experiment further validated the effectiveness and complementarity of each module. This study provides an efficient collaborative modeling framework for hyperspectral image classification that balances global dependencies and local details. Full article
Show Figures

Figure 1

32 pages, 1763 KB  
Article
Deep Learning-Based Visual Analytics for Efficiency and Safety Optimization in Power Infrastructure
by Olga Vladimirovna Afanaseva, Timur Faritovich Tulyakov and Artur Airatovich Shaimardanov
Eng 2026, 7(3), 135; https://doi.org/10.3390/eng7030135 - 15 Mar 2026
Viewed by 281
Abstract
The paper presents a comprehensive deep learning-based framework for automated visual inspection of overhead power line infrastructure using unmanned aerial vehicles. Traditional manual and helicopter inspections are costly, time-consuming, and hazardous for maintenance personnel. The proposed approach integrates UAV imaging with advanced computer [...] Read more.
The paper presents a comprehensive deep learning-based framework for automated visual inspection of overhead power line infrastructure using unmanned aerial vehicles. Traditional manual and helicopter inspections are costly, time-consuming, and hazardous for maintenance personnel. The proposed approach integrates UAV imaging with advanced computer vision models such as YOLOv8, EfficientDet-D2, and Faster R-CNN to automatically detect defects in critical components, including insulators, conductors, and transmission towers. Several open datasets (InsPLAD, TTPLA, MPID) were used for training and validation, ensuring robustness under diverse lighting and environmental conditions. Experimental results demonstrate that YOLOv8 achieved the best performance, reaching 88.5% mAP@0.5 with real-time inference capabilities (over 50 FPS on GPU). The system significantly enhances inspection efficiency, allowing for a threefold increase in coverage capacity and an up to 70% reduction in defect remediation time. The integration of AI-powered visual analytics with maintenance and SCADA systems enables a shift from reactive to predictive maintenance, improving the safety, reliability, and resilience of power transmission infrastructure. Full article
(This article belongs to the Section Electrical and Electronic Engineering)
Show Figures

Figure 1

27 pages, 5361 KB  
Article
Dual-Stream 2D and 3D-SE-ResNet Architectures for Crop Mapping Using EnMAP Hyperspectral Time-Series
by László Mucsi, Márkó Sóti, Dorottya Litkey-Kovács, János Mészáros, Dóra Vigh-Szabó, Elemér Szalma, Zalán Tobak and József Szatmári
Remote Sens. 2026, 18(6), 884; https://doi.org/10.3390/rs18060884 - 13 Mar 2026
Viewed by 205
Abstract
Deep learning-based crop mapping from hyperspectral satellite data offers immense potential for capturing subtle phenological differences, yet leveraging sparse time series remains a major methodological challenge. This study evaluates the ability of the EnMAP sensor to identify nine major crop types in the [...] Read more.
Deep learning-based crop mapping from hyperspectral satellite data offers immense potential for capturing subtle phenological differences, yet leveraging sparse time series remains a major methodological challenge. This study evaluates the ability of the EnMAP sensor to identify nine major crop types in the intensive agricultural landscape of Southeastern Hungary. We utilized a limited time series (November, March, August) to benchmark two modeling strategies: a single-date dual-stream spatial–spectral 2D-CNN (DSS-2D) and a multi-temporal 3D-SE-ResNet. Model performance was assessed using parcel-level spatial cross-validation to ensure realistic accuracy estimates and reduce spatial autocorrelation bias. The results demonstrate that the DSS-2D model achieved superior single-date accuracy (OA > 97%), significantly outperforming pixel-based baselines. Furthermore, the multi-temporal 3D-SE-ResNet achieved a robust seasonal accuracy of 92.9%, effectively compensating for temporal sparsity by exploiting the deep spectral information of the SWIR domain. This study confirms that treating hyperspectral data as a 3D volume enables the extraction of phenological traits even from limited observations. These findings provide a strong proof-of-concept for the operational feasibility of future missions such as Copernicus CHIME for continental-scale food security monitoring. Full article
Show Figures

Figure 1

Back to TopTop