Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (77)

Search Parameters:
Keywords = joint fusion convolutional neural network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 10745 KB  
Article
CNN-GCN Coordinated Multimodal Frequency Network for Hyperspectral Image and LiDAR Classification
by Haibin Wu, Haoran Lv, Aili Wang, Siqi Yan, Gabor Molnar, Liang Yu and Minhui Wang
Remote Sens. 2026, 18(2), 216; https://doi.org/10.3390/rs18020216 - 9 Jan 2026
Viewed by 167
Abstract
The existing multimodal image classification methods often suffer from several key limitations: difficulty in effectively balancing local detail and global topological relationships in hyperspectral image (HSI) feature extraction; insufficient multi-scale characterization of terrain features from light detection and ranging (LiDAR) elevation data; and [...] Read more.
The existing multimodal image classification methods often suffer from several key limitations: difficulty in effectively balancing local detail and global topological relationships in hyperspectral image (HSI) feature extraction; insufficient multi-scale characterization of terrain features from light detection and ranging (LiDAR) elevation data; and neglect of deep inter-modal interactions in traditional fusion methods, often accompanied by high computational complexity. To address these issues, this paper proposes a comprehensive deep learning framework combining convolutional neural network (CNN), a graph convolutional network (GCN), and wavelet transform for the joint classification of HSI and LiDAR data, including several novel components: a Spectral Graph Mixer Block (SGMB), where a CNN branch captures fine-grained spectral–spatial features by multi-scale convolutions, while a parallel GCN branch models long-range contextual features through an enhanced gated graph network. This dual-path design enables simultaneous extraction of local detail and global topological features from HSI data; a Spatial Coordinate Block (SCB) to enhance spatial awareness and improve the perception of object contours and distribution patterns; a Multi-Scale Elevation Feature Extraction Block (MSFE) for capturing terrain representations across varying scales; and a Bidirectional Frequency Attention Encoder (BiFAE) to enable efficient and deep interaction between multimodal features. These modules are intricately designed to work in concert, forming a cohesive end-to-end framework, which not only achieves a more effective balance between local details and global contexts but also enables deep yet computationally efficient interaction across features, significantly strengthening the discriminability and robustness of the learned representation. To evaluate the proposed method, we conducted experiments on three multimodal remote sensing datasets: Houston2013, Augsburg, and Trento. Quantitative results demonstrate that our framework outperforms state-of-the-art methods, achieving OA values of 98.93%, 88.05%, and 99.59% on the respective datasets. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Graphical abstract

28 pages, 13623 KB  
Article
PAFNet: A Parallel Attention Fusion Network for Water Body Extraction of Remote Sensing Images
by Shaochuan Chen, Chenlong Ding, Mutian Li, Xin Lyu, Xin Li, Zhennan Xu, Yiwei Fang and Heng Li
Remote Sens. 2026, 18(1), 153; https://doi.org/10.3390/rs18010153 - 3 Jan 2026
Viewed by 170
Abstract
Water body extraction plays a crucial role in remote sensing, supporting applications such as environmental monitoring and disaster prevention. Although Deep Convolutional Neural Networks (DCNNs) have achieved remarkable progress, their hierarchical architectures often introduce channel redundancy and hinder the joint representation of fine [...] Read more.
Water body extraction plays a crucial role in remote sensing, supporting applications such as environmental monitoring and disaster prevention. Although Deep Convolutional Neural Networks (DCNNs) have achieved remarkable progress, their hierarchical architectures often introduce channel redundancy and hinder the joint representation of fine spatial structures and high-level semantics, leading to ineffective feature fusion and poor discrimination of water features. To address these limitations, a Parallel Attention Fusion Network (PAFNet) is proposed to achieve more effective multi-scale feature aggregation through parallel attention and adaptive fusion. First, the Feature Refinement Module (FRM) employs multi-branch asymmetric convolutions to extract multi-scale features, which are subsequently fused to suppress channel redundancy and preserve fine spatial details. Then, the Parallel Attention Module (PAM) applies spatial and channel attention in parallel, improving the discriminative representation of water features while mitigating interference from spectrally similar land covers. Finally, a Semantic Feature Fusion Module (SFM) integrates adjacent multi-level features through adaptive channel weighting, thereby achieving precise boundary recovery and robust noise suppression. Extensive experiments conducted on four representative datasets (GID, LandCover.ai, QTPL, and LoveDA) demonstrate the superiority of PAFNet over existing state-of-the-art methods. Specifically, the proposed model achieves 94.29% OA and 95.95% F1-Score on GID, 86.17% OA and 88.70% F1-Score on LandCover.ai, 98.99% OA and 98.96% F1-Score on QTPL, and 89.01% OA and 85.59% F1-Score on LoveDA. Full article
Show Figures

Figure 1

25 pages, 2546 KB  
Article
From Joint Distribution Alignment to Spatial Configuration Learning: A Multimodal Financial Governance Diagnostic Framework to Enhance Capital Market Sustainability
by Wenjuan Li, Xinghua Liu, Ziyi Li, Zulei Qin, Jinxian Dong and Shugang Li
Sustainability 2025, 17(24), 11236; https://doi.org/10.3390/su172411236 - 15 Dec 2025
Viewed by 267
Abstract
Financial fraud, as a salient manifestation of corporate governance failure, erodes investor confidence and threatens the long-term sustainability of capital markets. This study aims to develop and validate SFG-2DCNN, a multimodal deep learning framework that adopts a configurational perspective to diagnose financial fraud [...] Read more.
Financial fraud, as a salient manifestation of corporate governance failure, erodes investor confidence and threatens the long-term sustainability of capital markets. This study aims to develop and validate SFG-2DCNN, a multimodal deep learning framework that adopts a configurational perspective to diagnose financial fraud under class-imbalanced conditions and support sustainable corporate governance. Conventional diagnostic approaches struggle to capture the higher-order interactions within covert fraud patterns due to scarce fraud samples and complex multimodal signals. To overcome these limitations, SFG-2DCNN adopts a systematic two-stage mechanism. First, to ensure a logically consistent data foundation, the framework builds a domain-adaptive generative model (SMOTE-FraudGAN) that enforces joint distribution alignment to fundamentally resolve the issue of economic logic coherence in synthetic samples. Subsequently, the framework pioneers a feature topology mapping strategy that spatializes extracted multimodal covert signals, including non-traditional indicators (e.g., Total Liabilities/Operating Costs) and affective dissonance in managerial narratives, into an ordered two-dimensional matrix, enabling a two-dimensional Convolutional Neural Network (2D-CNN) to efficiently identify potential governance failure patterns through deep spatial fusion. Experiments on Chinese A-share listed firms demonstrate that SFG-2DCNN achieves an F1-score of 0.917 and an AUC of 0.942, significantly outperforming baseline models. By advancing the analytical paradigm from isolated variable assessment to holistic multimodal configurational analysis, this research provides a high-fidelity tool for strengthening sustainable corporate governance and market transparency. Full article
(This article belongs to the Section Economic and Business Aspects of Sustainability)
Show Figures

Figure 1

27 pages, 5068 KB  
Article
Grape Leaf Cultivar Identification in Complex Backgrounds with an Improved MobileNetV3-Small Model
by Liuyun Deng, Zhiguo Du, Xiaoyong Liu, Zhihui Wu, Xudong Lin and Bin Wen
Plants 2025, 14(23), 3581; https://doi.org/10.3390/plants14233581 - 24 Nov 2025
Viewed by 562
Abstract
Accurate identification of grape leaf varieties is an important prerequisite for effective viticulture management, contributing to breeding programs, cultivation strategies, and precision field operations. However, reliable recognition in complex field environments remains challenging. Subtle interclass morphological variations among leaves, background interference under natural [...] Read more.
Accurate identification of grape leaf varieties is an important prerequisite for effective viticulture management, contributing to breeding programs, cultivation strategies, and precision field operations. However, reliable recognition in complex field environments remains challenging. Subtle interclass morphological variations among leaves, background interference under natural conditions, and the need to balance recognition accuracy with computational efficiency for mobile applications represent key obstacles that limit practical deployment. This study proposes an improved lightweight convolutional neural network, termed ICS-MobileNetV3-Small (ICS-MS), specifically designed for grape leaf variety recognition. The model’s core innovations, detailed in Key Innovations of the Proposed ICS-MS Model section, include three key components: First, a coordinate attention mechanism is embedded to enhance the network’s ability to capture spatially distributed features while suppressing irrelevant background noise. Second, a multi-branch ICS-Inception structure is integrated to accomplish excellent multi-scale feature fusion, allowing the model to discern minute textural variations among types. Moreover, the feature representation is further optimized by adopting a joint loss function, which improves feature space distribution and enhances classification robustness. Experimental evaluations were conducted on a dataset comprising eleven grape leaf varieties. The proposed ICS-MS model achieves a recognition accuracy of 96.53% with only 1.17 M parameters. Experimental results demonstrate that, compared with the baseline MobileNetV3-Small model, the standalone integration of the Coordinate Attention (CA) mechanism improves accuracy by 0.17% while reducing the number of parameters by 10.4%. Furthermore, incorporating the ICS-Inception structure leads to an additional 4.78% accuracy improvement with only a marginal increase in parameter count. Finally, the introduction of a joint loss function provides an extra 0.23% gain in accuracy, resulting in an overall parameter reduction of approximately 23.5% compared with the baseline model. Three core contributions are highlighted as follows: (1) the construction of an integrated technical framework of “spatial feature enhancement—multi-scale fusion—feature distribution optimization” to systematically address the key issues of insufficient fine-grained feature extraction and the balance between lightweight design and accuracy; (2) the design of a lightweight CA-Block module that reduces parameters by 18.7% while enhancing spatial feature discrimination; (3) the achievement of superior performance with fewer parameters, providing a practical solution for mobile deployment in precision viticulture. Values for precision, recall, and F1-score were continuously near 96%, suggesting a good trade-off between efficiency and accuracy. These findings suggest that ICS-MS provides a practical and reliable approach for grape leaf identification and may serve as a useful tool to support intelligent management in precision viticulture. Full article
(This article belongs to the Section Plant Modeling)
Show Figures

Figure 1

24 pages, 2221 KB  
Article
Multi-Scale Frequency-Aware Transformer for Pipeline Leak Detection Using Acoustic Signals
by Menghan Chen, Yuchen Lu, Wangyu Wu, Yanchen Ye, Bingcai Wei and Yao Ni
Sensors 2025, 25(20), 6390; https://doi.org/10.3390/s25206390 - 16 Oct 2025
Cited by 3 | Viewed by 1020
Abstract
Pipeline leak detection through acoustic signal measurement faces critical challenges, including insufficient utilization of time-frequency domain features, poor adaptability to noisy environments, and inadequate exploitation of frequency-domain prior knowledge in existing deep learning approaches. This paper proposes a Multi-Scale Frequency-Aware Transformer (MSFAT) architecture [...] Read more.
Pipeline leak detection through acoustic signal measurement faces critical challenges, including insufficient utilization of time-frequency domain features, poor adaptability to noisy environments, and inadequate exploitation of frequency-domain prior knowledge in existing deep learning approaches. This paper proposes a Multi-Scale Frequency-Aware Transformer (MSFAT) architecture that integrates measurement-based acoustic signal analysis with artificial intelligence techniques. The MSFAT framework consists of four core components: a frequency-aware embedding layer that achieves joint representation learning of time-frequency dual-domain features through parallel temporal convolution and frequency transformation, a multi-head frequency attention mechanism that dynamically adjusts attention weights based on spectral distribution using frequency features as modulation signals, an adaptive noise filtering module that integrates noise detection, signal enhancement, and adaptive fusion functions through end-to-end joint optimization, and a multi-scale feature aggregation mechanism that extracts discriminative global representations through complementary pooling strategies. The proposed method addresses the fundamental limitations of traditional measurement-based detection systems by incorporating domain-specific prior knowledge into neural network architecture design. Experimental validation demonstrates that MSFAT achieves 97.2% accuracy and an F1-score, representing improvements of 10.5% and 10.9%, respectively, compared to standard Transformer approaches. The model maintains robust detection performance across signal-to-noise ratio conditions ranging from 5 to 30 dB, demonstrating superior adaptability to complex industrial measurement environments. Ablation studies confirm the effectiveness of each innovative module, with frequency-aware mechanisms contributing most significantly to the enhanced measurement precision and reliability in pipeline leak detection applications. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

20 pages, 3591 KB  
Article
Abnormal Gait Phase Recognition and Limb Angle Prediction in Lower-Limb Exoskeletons
by Sheng Wang, Chunjie Chen and Xiaojun Wu
Biomimetics 2025, 10(9), 623; https://doi.org/10.3390/biomimetics10090623 - 16 Sep 2025
Viewed by 1097
Abstract
The phase detection of abnormal gait and the prediction of lower-limb angles are key challenges in controlling lower-limb exoskeletons. This study simulated three types of abnormal gaits: scissor gait, foot-drop gait, and staggering gait. To enhance the recognition capability for abnormal gait phases, [...] Read more.
The phase detection of abnormal gait and the prediction of lower-limb angles are key challenges in controlling lower-limb exoskeletons. This study simulated three types of abnormal gaits: scissor gait, foot-drop gait, and staggering gait. To enhance the recognition capability for abnormal gait phases, a four-discrete-phase division for a single leg is proposed: pre-swing, swing, swing termination, and stance phases. The four phases of both legs further constitute four stages of walking. Using the Euler angles of the ankle joints as inputs, the capabilities of a Convolutional Neural Network and a Support Vector Machine in recognizing discrete gait phases are verified. Based on these discrete gait phases, a continuous phase estimation is further performed using an adaptive frequency oscillator. For predicting the lower-limb motion angle, this study innovatively proposes an input scheme that integrates three-axis ankle joint angles and continuous gait phases. Comparative experiments confirmed that this information fusion scheme improved the limb angle prediction accuracy, with the Convolutional Neural Network–Long Short-Term Memory network yielding the best prediction results. Full article
(This article belongs to the Section Locomotion and Bioinspired Robotics)
Show Figures

Graphical abstract

25 pages, 21209 KB  
Article
Hyperspectral Image Classification Using a Spectral-Cube Gated Harmony Network
by Nana Li, Wentao Shen and Qiuwen Zhang
Electronics 2025, 14(17), 3553; https://doi.org/10.3390/electronics14173553 - 6 Sep 2025
Cited by 1 | Viewed by 855
Abstract
In recent years, hybrid models that integrate Convolutional Neural Networks (CNNs) with Vision Transformers (ViTs) have achieved significant improvements in hyperspectral image classification (HSIC). Nevertheless, their complex architectures often lead to computational redundancy and inefficient feature fusion, particularly struggling to balance global modeling [...] Read more.
In recent years, hybrid models that integrate Convolutional Neural Networks (CNNs) with Vision Transformers (ViTs) have achieved significant improvements in hyperspectral image classification (HSIC). Nevertheless, their complex architectures often lead to computational redundancy and inefficient feature fusion, particularly struggling to balance global modeling and local detail extraction in high-dimensional spectral data. To solve these issues, this paper proposes a Spectral-Cube Gated Harmony Network (SCGHN) that achieves efficient spectral–spatial joint feature modeling through a dynamic gating mechanism and hierarchical feature decoupling strategy. There are three primary innovative contributions of this paper as follows: Firstly, we design a Spectral Cooperative Parallel Convolution (SCPC) module that combines dynamic gating in the spectral dimension and spatial deformable convolution. This module adopts a dual-path parallel architecture that adaptively enhances key bands and captures local textures, thereby significantly improving feature discriminability at mixed ground object boundaries. Secondly, we propose a Dual-Gated Fusion (DGF) module that achieves cross-scale contextual complementarity through group convolution and lightweight attention, thereby enhancing hierarchical semantic representations with significantly lower computational complexity. Finally, by means of the coordinated design of 3D convolution and lightweight classification decision blocks, we construct an end-to-end lightweight framework that effectively alleviates the structural redundancy issues of traditional hybrid models. Extensive experiments on three standard hyperspectral datasets reveal that our SCGHN requires fewer parameters and exhibits lower computational complexity as compared with some existing HSIC methods. Full article
Show Figures

Figure 1

17 pages, 1294 KB  
Article
SPARSE-OTFS-Net: A Sparse Robust OTFS Signal Detection Algorithm for 6G Ubiquitous Coverage
by Yunzhi Ling and Jun Xu
Electronics 2025, 14(17), 3532; https://doi.org/10.3390/electronics14173532 - 4 Sep 2025
Viewed by 929
Abstract
With the evolution of 6G technology toward global coverage and multidimensional integration, OTFS modulation has become a research focus due to its advantages in high-mobility scenarios. However, existing OTFS signal detection algorithms face challenges such as pilot contamination, Doppler spread degradation, and diverse [...] Read more.
With the evolution of 6G technology toward global coverage and multidimensional integration, OTFS modulation has become a research focus due to its advantages in high-mobility scenarios. However, existing OTFS signal detection algorithms face challenges such as pilot contamination, Doppler spread degradation, and diverse interference in complex environments. This paper proposes the SPARSE-OTFS-Net algorithm, which establishes a comprehensive signal detection solution by innovatively integrating sparse random pilot design, compressive sensing-based frequency offset estimation with closed-loop cancellation, and joint denoising techniques combining an autoencoder, residual learning, and multi-scale feature fusion. The algorithm employs deep learning to dynamically generate non-uniform pilot distributions, reducing pilot contamination by 60%. Through orthogonal matching pursuit algorithms, it achieves super-resolution frequency offset estimation with tracking errors controlled within 20 Hz, effectively addressing Doppler spread degradation. The multi-stage denoising mechanism of deep neural networks suppresses various interferences while preserving time-frequency domain signal sparsity. Simulation results demonstrate: Under large frequency offset, multipath, and low SNR conditions, multi-kernel convolution technology achieves significant computational complexity reduction while exhibiting outstanding performance in tracking error and weak multipath detection. In 1000 km/h high-speed mobility scenarios, Doppler error estimation accuracy reaches ±25 Hz (approaching the Cramér-Rao bound), with BER performance of 5.0 × 10−6 (7× improvement over single-Gaussian CNN’s 3.5 × 10−5). In 1024-user interference scenarios with BER = 10−5 requirements, SNR demand decreases from 11.4 dB to 9.2 dB (2.2 dB reduction), while maintaining EVM at 6.5% under 1024-user concurrency (compared to 16.5% for conventional MMSE), effectively increasing concurrent user capacity in 6G ultra-massive connectivity scenarios. These results validate the superior performance of SPARSE-OTFS-Net in 6G ultra-massive connectivity applications and provide critical technical support for realizing integrated space–air–ground networks. Full article
(This article belongs to the Section Microwave and Wireless Communications)
Show Figures

Figure 1

16 pages, 1386 KB  
Article
Balancing Energy Consumption and Detection Accuracy in Cardiovascular Disease Diagnosis: A Spiking Neural Network-Based Approach with ECG and PCG Signals
by Guihao Ran, Yijing Wang, Han Zhang, Jiahui Cheng and Dakun Lai
Sensors 2025, 25(17), 5263; https://doi.org/10.3390/s25175263 - 24 Aug 2025
Viewed by 1324
Abstract
Electrocardiogram (ECG) and phonocardiogram (PCG) signals are widely used in the early prevention and diagnosis of cardiovascular diseases (CVDs) due to their ability to accurately reflect cardiac conditions from different physiological perspectives and their ease of acquisition. Currently, some studies have explored the [...] Read more.
Electrocardiogram (ECG) and phonocardiogram (PCG) signals are widely used in the early prevention and diagnosis of cardiovascular diseases (CVDs) due to their ability to accurately reflect cardiac conditions from different physiological perspectives and their ease of acquisition. Currently, some studies have explored the joint use of ECG and PCG signals for disease screening, but few studies have considered the trade-off between classification performance and energy consumption in model design. In this study, we propose a multimodal CVDs detection framework based on Spiking Neural Networks (SNNs), which integrates ECG and PCG signals. A differential fusion strategy at the signal level is employed to generate a fused EPCG signal, from which time–frequency features are extracted using the Adaptive Superlets Transform (ASLT). Two separate Spiking Convolutional Neural Network (SCNN) models are then trained on the ECG and EPCG signals, respectively. A confidence-based dynamic decision-level (CDD) fusion strategy is subsequently employed to perform the final classification. The proposed method is validated on the PhysioNet/CinC Challenge 2016 dataset, achieving an accuracy of 89.74%, an AUC of 89.08%, and an energy consumption of 209.6 μJ. This method not only achieves better balancing performance compared to unimodal signals but also realizes an effective balance between model energy consumption and classification effect, which provides an effective idea for the development of low-power, multimodal medical diagnostic systems. Full article
(This article belongs to the Special Issue Sensors for Heart Rate Monitoring and Cardiovascular Disease)
Show Figures

Figure 1

19 pages, 738 KB  
Article
Short-Term Multi-Energy Load Forecasting Method Based on Transformer Spatio-Temporal Graph Neural Network
by Heng Zhou, Qing Ai and Ruiting Li
Energies 2025, 18(17), 4466; https://doi.org/10.3390/en18174466 - 22 Aug 2025
Cited by 1 | Viewed by 1738
Abstract
To tackle the limitations in simultaneously modeling long-term dependencies in the time dimension and nonlinear interactions in the feature dimension, as well as their inability to fully reflect the impact of real-time load changes on spatial dependencies, a short-term multi-energy load forecasting method [...] Read more.
To tackle the limitations in simultaneously modeling long-term dependencies in the time dimension and nonlinear interactions in the feature dimension, as well as their inability to fully reflect the impact of real-time load changes on spatial dependencies, a short-term multi-energy load forecasting method based on Transformer Spatio-Temporal Graph neural network (TSTG) is proposed. This method employs a multi-head spatio-temporal attention module to model long-term dependencies in the time dimension and nonlinear interactions in the feature dimension in parallel across multiple subspaces. Additionally, a dynamic adaptive graph convolution module is designed to construct adaptive adjacency matrices by combining physical topology and feature similarity, dynamically adjusting node connection weights based on real-time load characteristics to more accurately characterize the spatial dynamics of multi-energy interactions. Furthermore, TSTG adopts an end-to-end spatio-temporal joint optimization framework, achieving synchronous extraction and fusion of spatio-temporal features through an encoder–decoder architecture. Experimental results show that TSTG significantly outperforms existing methods in short-term load forecasting tasks, providing an effective solution for refined forecasting in integrated energy systems. Full article
Show Figures

Figure 1

26 pages, 36602 KB  
Article
FE-MCFN: Fuzzy-Enhanced Multi-Scale Cross-Modal Fusion Network for Hyperspectral and LiDAR Joint Data Classification
by Shuting Wei, Mian Jia and Junyi Duan
Algorithms 2025, 18(8), 524; https://doi.org/10.3390/a18080524 - 18 Aug 2025
Viewed by 945
Abstract
With the rapid advancement of remote sensing technologies, the joint classification of hyperspectral image (HSI) and LiDAR data has become a key research focus in the field. To address the impact of inherent uncertainties in hyperspectral images on classification—such as the “same spectrum, [...] Read more.
With the rapid advancement of remote sensing technologies, the joint classification of hyperspectral image (HSI) and LiDAR data has become a key research focus in the field. To address the impact of inherent uncertainties in hyperspectral images on classification—such as the “same spectrum, different materials” and “same material, different spectra” phenomena, as well as the complexity of spectral features. Furthermore, existing multimodal fusion approaches often fail to fully leverage the complementary advantages of hyperspectral and LiDAR data. We propose a fuzzy-enhanced multi-scale cross-modal fusion network (FE-MCFN) designed to achieve joint classification of hyperspectral and LiDAR data. The FE-MCFN enhances convolutional neural networks through the application of fuzzy theory and effectively integrates global contextual information via a cross-modal attention mechanism. The fuzzy learning module utilizes a Gaussian membership function to assign weights to features, thereby adeptly capturing uncertainties and subtle distinctions within the data. To maximize the complementary advantages of multimodal data, a fuzzy fusion module is designed, which is grounded in fuzzy rules and integrates multimodal features across various scales while taking into account both local features and global information, ultimately enhancing the model’s classification performance. Experimental results obtained from the Houston2013, Trento, and MUUFL datasets demonstrate that the proposed method outperforms current state-of-the-art classification techniques, thereby validating its effectiveness and applicability across diverse scenarios. Full article
(This article belongs to the Section Databases and Data Structures)
Show Figures

Figure 1

25 pages, 26404 KB  
Review
Review of Deep Learning Applications for Detecting Special Components in Agricultural Products
by Yifeng Zhao and Qingqing Xie
Computers 2025, 14(8), 309; https://doi.org/10.3390/computers14080309 - 30 Jul 2025
Cited by 2 | Viewed by 1650
Abstract
The rapid evolution of deep learning (DL) has fundamentally transformed the paradigm for detecting special components in agricultural products, addressing critical challenges in food safety, quality control, and precision agriculture. This comprehensive review systematically analyzes many seminal studies to evaluate cutting-edge DL applications [...] Read more.
The rapid evolution of deep learning (DL) has fundamentally transformed the paradigm for detecting special components in agricultural products, addressing critical challenges in food safety, quality control, and precision agriculture. This comprehensive review systematically analyzes many seminal studies to evaluate cutting-edge DL applications across three core domains: contaminant surveillance (heavy metals, pesticides, and mycotoxins), nutritional component quantification (soluble solids, polyphenols, and pigments), and structural/biomarker assessment (disease symptoms, gel properties, and physiological traits). Emerging hybrid architectures—including attention-enhanced convolutional neural networks (CNNs) for lesion localization, wavelet-coupled autoencoders for spectral denoising, and multi-task learning frameworks for joint parameter prediction—demonstrate unprecedented accuracy in decoding complex agricultural matrices. Particularly noteworthy are sensor fusion strategies integrating hyperspectral imaging (HSI), Raman spectroscopy, and microwave detection with deep feature extraction, achieving industrial-grade performance (RPD > 3.0) while reducing detection time by 30–100× versus conventional methods. Nevertheless, persistent barriers in the “black-box” nature of complex models, severe lack of standardized data and protocols, computational inefficiency, and poor field robustness hinder the reliable deployment and adoption of DL for detecting special components in agricultural products. This review provides an essential foundation and roadmap for future research to bridge the gap between laboratory DL models and their effective, trusted application in real-world agricultural settings. Full article
(This article belongs to the Special Issue Deep Learning and Explainable Artificial Intelligence)
Show Figures

Figure 1

35 pages, 4940 KB  
Article
A Novel Lightweight Facial Expression Recognition Network Based on Deep Shallow Network Fusion and Attention Mechanism
by Qiaohe Yang, Yueshun He, Hongmao Chen, Youyong Wu and Zhihua Rao
Algorithms 2025, 18(8), 473; https://doi.org/10.3390/a18080473 - 30 Jul 2025
Cited by 3 | Viewed by 2779
Abstract
Facial expression recognition (FER) is a critical research direction in artificial intelligence, which is widely used in intelligent interaction, medical diagnosis, security monitoring, and other domains. These applications highlight its considerable practical value and social significance. Face expression recognition models often need to [...] Read more.
Facial expression recognition (FER) is a critical research direction in artificial intelligence, which is widely used in intelligent interaction, medical diagnosis, security monitoring, and other domains. These applications highlight its considerable practical value and social significance. Face expression recognition models often need to run efficiently on mobile devices or edge devices, so the research on lightweight face expression recognition is particularly important. However, feature extraction and classification methods of lightweight convolutional neural network expression recognition algorithms mostly used at present are not specifically and fully optimized for the characteristics of facial expression images, yet fail to make full use of the feature information in face expression images. To address the lack of facial expression recognition models that are both lightweight and effectively optimized for expression-specific feature extraction, this study proposes a novel network design tailored to the characteristics of facial expressions. In this paper, we refer to the backbone architecture of MobileNet V2 network, and redesign LightExNet, a lightweight convolutional neural network based on the fusion of deep and shallow layers, attention mechanism, and joint loss function, according to the characteristics of the facial expression features. In the network architecture of LightExNet, firstly, deep and shallow features are fused in order to fully extract the shallow features in the original image, reduce the loss of information, alleviate the problem of gradient disappearance when the number of convolutional layers increases, and achieve the effect of multi-scale feature fusion. The MobileNet V2 architecture has also been streamlined to seamlessly integrate deep and shallow networks. Secondly, by combining the own characteristics of face expression features, a new channel and spatial attention mechanism is proposed to obtain the feature information of different expression regions as much as possible for encoding. Thus improve the accuracy of expression recognition effectively. Finally, the improved center loss function is superimposed to further improve the accuracy of face expression classification results, and corresponding measures are taken to significantly reduce the computational volume of the joint loss function. In this paper, LightExNet is tested on the three mainstream face expression datasets: Fer2013, CK+ and RAF-DB, respectively, and the experimental results show that LightExNet has 3.27 M Parameters and 298.27 M Flops, and the accuracy on the three datasets is 69.17%, 97.37%, and 85.97%, respectively. The comprehensive performance of LightExNet is better than the current mainstream lightweight expression recognition algorithms such as MobileNet V2, IE-DBN, Self-Cure Net, Improved MobileViT, MFN, Ada-CM, Parallel CNN(Convolutional Neural Network), etc. Experimental results confirm that LightExNet effectively improves recognition accuracy and computational efficiency while reducing energy consumption and enhancing deployment flexibility. These advantages underscore its strong potential for real-world applications in lightweight facial expression recognition. Full article
Show Figures

Figure 1

22 pages, 2525 KB  
Article
mmHSE: A Two-Stage Framework for Human Skeleton Estimation Using mmWave FMCW Radar Signals
by Jiake Tian, Yi Zou and Jiale Lai
Appl. Sci. 2025, 15(15), 8410; https://doi.org/10.3390/app15158410 - 29 Jul 2025
Viewed by 1515
Abstract
We present mmHSE, a two-stage framework for human skeleton estimation using dual millimeter-Wave (mmWave) Frequency-Modulated Continuous-Wave (FMCW) radar signals. To enable data-driven model design and evaluation, we collect and process over 30,000 range–angle maps from 12 users across three representative indoor environments using [...] Read more.
We present mmHSE, a two-stage framework for human skeleton estimation using dual millimeter-Wave (mmWave) Frequency-Modulated Continuous-Wave (FMCW) radar signals. To enable data-driven model design and evaluation, we collect and process over 30,000 range–angle maps from 12 users across three representative indoor environments using a dual-node radar acquisition platform. Leveraging the collected data, we develop a two-stage neural architecture for human skeleton estimation. The first stage employs a dual-branch network with depthwise separable convolutions and self-attention to extract multi-scale spatiotemporal features from dual-view radar inputs. A cross-modal attention fusion module is then used to generate initial estimates of 21 skeletal keypoints. The second stage refines these estimates using a skeletal topology module based on graph convolutional networks, which captures spatial dependencies among joints to enhance localization accuracy. Experiments show that mmHSE achieves a Mean Absolute Error (MAE) of 2.78 cm. In cross-domain evaluations, the MAE remains at 3.14 cm, demonstrating the method’s generalization ability and robustness for non-intrusive human pose estimation from mmWave FMCW radar signals. Full article
Show Figures

Figure 1

22 pages, 4882 KB  
Article
Dual-Branch Spatio-Temporal-Frequency Fusion Convolutional Network with Transformer for EEG-Based Motor Imagery Classification
by Hao Hu, Zhiyong Zhou, Zihan Zhang and Wenyu Yuan
Electronics 2025, 14(14), 2853; https://doi.org/10.3390/electronics14142853 - 17 Jul 2025
Cited by 1 | Viewed by 1565
Abstract
The decoding of motor imagery (MI) electroencephalogram (EEG) signals is crucial for motor control and rehabilitation. However, as feature extraction is the core component of the decoding process, traditional methods, often limited to single-feature domains or shallow time-frequency fusion, struggle to comprehensively capture [...] Read more.
The decoding of motor imagery (MI) electroencephalogram (EEG) signals is crucial for motor control and rehabilitation. However, as feature extraction is the core component of the decoding process, traditional methods, often limited to single-feature domains or shallow time-frequency fusion, struggle to comprehensively capture the spatio-temporal-frequency characteristics of the signals, thereby limiting decoding accuracy. To address these limitations, this paper proposes a dual-branch neural network architecture with multi-domain feature fusion, the dual-branch spatio-temporal-frequency fusion convolutional network with Transformer (DB-STFFCNet). The DB-STFFCNet model consists of three modules: the spatiotemporal feature extraction module (STFE), the frequency feature extraction module (FFE), and the feature fusion and classification module. The STFE module employs a lightweight multi-dimensional attention network combined with a temporal Transformer encoder, capable of simultaneously modeling local fine-grained features and global spatiotemporal dependencies, effectively integrating spatiotemporal information and enhancing feature representation. The FFE module constructs a hierarchical feature refinement structure by leveraging the fast Fourier transform (FFT) and multi-scale frequency convolutions, while a frequency-domain Transformer encoder captures the global dependencies among frequency domain features, thus improving the model’s ability to represent key frequency information. Finally, the fusion module effectively consolidates the spatiotemporal and frequency features to achieve accurate classification. To evaluate the feasibility of the proposed method, experiments were conducted on the BCI Competition IV-2a and IV-2b public datasets, achieving accuracies of 83.13% and 89.54%, respectively, outperforming existing methods. This study provides a novel solution for joint time-frequency representation learning in EEG analysis. Full article
(This article belongs to the Special Issue Artificial Intelligence Methods for Biomedical Data Processing)
Show Figures

Figure 1

Back to TopTop