MDPI - Publisher of Open Access Journals

16 pages, 13271 KB

Open AccessArticle

Smartphone-Based Estimation of Cotton Leaf Nitrogen: A Learning Approach with Multi-Color Space Fusion

by Shun Chen, Shizhe Qin, Yu Wang, Lulu Ma and Xin Lv

Agronomy 2025, 15(10), 2330; https://doi.org/10.3390/agronomy15102330 - 2 Oct 2025

To address the limitations of traditional cotton leaf nitrogen content estimation methods, which include low efficiency, high cost, poor portability, and challenges in vegetation index acquisition owing to environmental interference, this study focused on emerging non-destructive nutrient estimation technologies. This study proposed an [...] Read more.

To address the limitations of traditional cotton leaf nitrogen content estimation methods, which include low efficiency, high cost, poor portability, and challenges in vegetation index acquisition owing to environmental interference, this study focused on emerging non-destructive nutrient estimation technologies. This study proposed an innovative method that integrates multi-color space fusion with deep and machine learning to estimate cotton leaf nitrogen content using smartphone-captured digital images. A dataset comprising smartphone-acquired cotton leaf images was processed through threshold segmentation and preprocessing, then converted into RGB, HSV, and Lab color spaces. The models were developed using deep-learning architectures including AlexNet, VGGNet-11, and ResNet-50. The conclusions of this study are as follows: (1) The optimal single-color-space nitrogen estimation model achieved a validation set R² of 0.776. (2) Feature-level fusion by concatenation of multidimensional feature vectors extracted from three color spaces using the optimal model, combined with an attention learning mechanism, improved the validation R² to 0.827. (3) Decision-level fusion by concatenating nitrogen estimation values from optimal models of different color spaces into a multi-source decision dataset, followed by machine learning regression modeling, increased the final validation R² to 0.830. The dual fusion method effectively enabled rapid and accurate nitrogen estimation in cotton crops using smartphone images, achieving an accuracy 5–7% higher than that of single-color-space models. The proposed method provides scientific support for efficient cotton production and promotes sustainable development in the cotton industry. Full article

(This article belongs to the Special Issue Crop Nutrition Diagnosis and Efficient Production)

► Show Figures

Figure 1

28 pages, 32809 KB

Open AccessArticle

LiteSAM: Lightweight and Robust Feature Matching for Satellite and Aerial Imagery

by Boya Wang, Shuo Wang, Yibin Han, Linfeng Xu and Dong Ye

Remote Sens. 2025, 17(19), 3349; https://doi.org/10.3390/rs17193349 - 1 Oct 2025

Abstract

We present a (Light)weight (S)atellite–(A)erial feature (M)atching framework (LiteSAM) for robust UAV absolute visual localization (AVL) in GPS-denied environments. Existing satellite–aerial matching methods struggle with large appearance variations, texture-scarce regions, and limited efficiency for real-time UAV [...] Read more.

We present a (Light)weight (S)atellite–(A)erial feature (M)atching framework (LiteSAM) for robust UAV absolute visual localization (AVL) in GPS-denied environments. Existing satellite–aerial matching methods struggle with large appearance variations, texture-scarce regions, and limited efficiency for real-time UAV applications. LiteSAM integrates three key components to address these issues. First, efficient multi-scale feature extraction optimizes representation, reducing inference latency for edge devices. Second, a Token Aggregation–Interaction Transformer (TAIFormer) with a convolutional token mixer (CTM) models inter- and intra-image correlations, enabling robust global–local feature fusion. Third, a MinGRU-based dynamic subpixel refinement module adaptively learns spatial offsets, enhancing subpixel-level matching accuracy and cross-scenario generalization. The experiments show that LiteSAM achieves competitive performance across multiple datasets. On UAV-VisLoc, LiteSAM attains an RMSE@30 of 17.86 m, outperforming state-of-the-art semi-dense methods such as EfficientLoFTR. Its optimized variant, LiteSAM (opt., without dual softmax), delivers inference times of 61.98 ms on standard GPUs and 497.49 ms on NVIDIA Jetson AGX Orin, which are 22.9% and 19.8% faster than EfficientLoFTR (opt.), respectively. With 6.31M parameters, which is 2.4× fewer than EfficientLoFTR’s 15.05M, LiteSAM proves to be suitable for edge deployment. Extensive evaluations on natural image matching and downstream vision tasks confirm its superior accuracy and efficiency for general feature matching. Full article

26 pages, 4710 KB

Open AccessArticle

Research on Safe Multimodal Detection Method of Pilot Visual Observation Behavior Based on Cognitive State Decoding

by Heming Zhang, Changyuan Wang and Pengbo Wang

Multimodal Technol. Interact. 2025, 9(10), 103; https://doi.org/10.3390/mti9100103 - 1 Oct 2025

Abstract

Pilot visual behavior safety assessment is a cross-disciplinary technology that analyzes pilots’ gaze behavior and neurocognitive responses. This paper proposes a multimodal analysis method for pilot visual behavior safety, specifically for cognitive state decoding. This method aims to achieve a quantitative and efficient [...] Read more.

Pilot visual behavior safety assessment is a cross-disciplinary technology that analyzes pilots’ gaze behavior and neurocognitive responses. This paper proposes a multimodal analysis method for pilot visual behavior safety, specifically for cognitive state decoding. This method aims to achieve a quantitative and efficient assessment of pilots’ observational behavior. Addressing the subjective limitations of traditional methods, this paper proposes an observational behavior detection model that integrates facial images to achieve dynamic and quantitative analysis of observational behavior. It addresses the “Midas contact” problem of observational behavior by constructing a cognitive analysis method using multimodal signals. We propose a bidirectional long short-term memory (LSTM) network that matches physiological signal rhythmic features to address the problem of isolated features in multidimensional signals. This method captures the dynamic correlations between multiple physiological behaviors, such as prefrontal theta and chest-abdominal coordination, to decode the cognitive state of pilots’ observational behavior. Finally, the paper uses a decision-level fusion method based on an improved Dempster–Shafer (DS) evidence theory to provide a quantifiable detection strategy for aviation safety standards. This dual-dimensional quantitative assessment system of “visual behavior–neurophysiological cognition” reveals the dynamic correlations between visual behavior and cognitive state among pilots of varying experience. This method can provide a new paradigm for pilot neuroergonomics training and early warning of vestibular-visual integration disorders. Full article

► Show Figures

Figure 1

19 pages, 7222 KB

Open AccessArticle

Multi-Channel Spectro-Temporal Representations for Speech-Based Parkinson’s Disease Detection

by Hadi Sedigh Malekroodi, Nuwan Madusanka, Byeong-il Lee and Myunggi Yi

J. Imaging 2025, 11(10), 341; https://doi.org/10.3390/jimaging11100341 - 1 Oct 2025

Abstract

Early, non-invasive detection of Parkinson’s Disease (PD) using speech analysis offers promise for scalable screening. In this work, we propose a multi-channel spectro-temporal deep-learning approach for PD detection from sentence-level speech, a clinically relevant yet underexplored modality. We extract and fuse three complementary [...] Read more.

Early, non-invasive detection of Parkinson’s Disease (PD) using speech analysis offers promise for scalable screening. In this work, we propose a multi-channel spectro-temporal deep-learning approach for PD detection from sentence-level speech, a clinically relevant yet underexplored modality. We extract and fuse three complementary time–frequency representations—mel spectrogram, constant-Q transform (CQT), and gammatone spectrogram—into a three-channel input analogous to an RGB image. This fused representation is evaluated across CNNs (ResNet, DenseNet, and EfficientNet) and Vision Transformer using the PC-GITA dataset, under 10-fold subject-independent cross-validation for robust assessment. Results showed that fusion consistently improves performance over single representations across architectures. EfficientNet-B2 achieves the highest accuracy (84.39% ± 5.19%) and F1-score (84.35% ± 5.52%), outperforming recent methods using handcrafted features or pretrained models (e.g., Wav2Vec2.0, HuBERT) on the same task and dataset. Performance varies with sentence type, with emotionally salient and prosodically emphasized utterances yielding higher AUC, suggesting that richer prosody enhances discriminability. Our findings indicate that multi-channel fusion enhances sensitivity to subtle speech impairments in PD by integrating complementary spectral information. Our approach implies that multi-channel fusion could enhance the detection of discriminative acoustic biomarkers, potentially offering a more robust and effective framework for speech-based PD screening, though further validation is needed before clinical application. Full article

(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)

► Show Figures

Figure 1

24 pages, 1034 KB

Open AccessArticle

MMFD-Net: A Novel Network for Image Forgery Detection and Localization via Multi-Stream Edge Feature Learning and Multi-Dimensional Information Fusion

by Haichang Yin, KinTak U, Jing Wang and Zhuofan Gan

Mathematics 2025, 13(19), 3136; https://doi.org/10.3390/math13193136 - 1 Oct 2025

Abstract

With the rapid advancement of image processing techniques, digital image forgery detection has emerged as a critical research area in information forensics. This paper proposes a novel deep learning model based on Multi-view Multi-dimensional Forgery Detection Networks (MMFD-Net), designed to simultaneously determine whether [...] Read more.

With the rapid advancement of image processing techniques, digital image forgery detection has emerged as a critical research area in information forensics. This paper proposes a novel deep learning model based on Multi-view Multi-dimensional Forgery Detection Networks (MMFD-Net), designed to simultaneously determine whether an image has been tampered with and precisely localize the forged regions. By integrating a Multi-stream Edge Feature Learning module with a Multi-dimensional Information Fusion module, MMFD-Net employs joint supervised learning to extract semantics-agnostic forgery features, thereby enhancing both detection performance and model generalization. Extensive experiments demonstrate that MMFD-Net achieves state-of-the-art results on multiple public datasets, excelling in both pixel-level localization and image-level classification tasks, while maintaining robust performance in complex scenarios. Full article

(This article belongs to the Special Issue Applied Mathematics in Data Science and High-Performance Computing)

► Show Figures

Figure 1

25 pages, 13841 KB

Open AccessArticle

Adaptive Energy–Gradient–Contrast (EGC) Fusion with AIFI-YOLOv12 for Improving Nighttime Pedestrian Detection in Security

by Lijuan Wang, Zuchao Bao and Dongming Lu

Appl. Sci. 2025, 15(19), 10607; https://doi.org/10.3390/app151910607 - 30 Sep 2025

Abstract

In security applications, visible-light pedestrian detectors are highly sensitive to changes in illumination and fail under low-light or nighttime conditions, while infrared sensors, though resilient to lighting, often produce blurred object boundaries that hinder precise localization. To address these complementary limitations, we propose [...] Read more.

In security applications, visible-light pedestrian detectors are highly sensitive to changes in illumination and fail under low-light or nighttime conditions, while infrared sensors, though resilient to lighting, often produce blurred object boundaries that hinder precise localization. To address these complementary limitations, we propose a practical multimodal pipeline—Adaptive Energy–Gradient–Contrast (EGC) Fusion with AIFI-YOLOv12—that first fuses infrared and low-light visible images using per-pixel weights derived from local energy, gradient magnitude and contrast measures, then detects pedestrians with an improved YOLOv12 backbone. The detector integrates an AIFI attention module at high semantic levels, replaces selected modules with A2C2f blocks to enhance cross-channel feature aggregation, and preserves P3–P5 outputs to improve small-object localization. We evaluate the complete pipeline on the LLVIP dataset and report Precision, Recall, mAP@50, mAP@50–95, GFLOPs, FPS and detection time, comparing against YOLOv8, YOLOv10–YOLOv12 baselines (n and s scales). Quantitative and qualitative results show that the proposed fusion restores complementary thermal and visible details and that the AIFI-enhanced detector yields more robust nighttime pedestrian detection while maintaining a competitive computational profile suitable for real-world security deployments. Full article

(This article belongs to the Special Issue Advanced Image Analysis and Processing Technologies and Applications)

24 pages, 5484 KB

Open AccessArticle

TFI-Fusion: Hierarchical Triple-Stream Feature Interaction Network for Infrared and Visible Image Fusion

by Mingyang Zhao, Shaochen Su and Hao Li

Information 2025, 16(10), 844; https://doi.org/10.3390/info16100844 - 30 Sep 2025

Abstract

As a key technology in multimodal information processing, infrared and visible image fusion holds significant application value in fields such as military reconnaissance, intelligent security, and autonomous driving. To address the limitations of existing methods, this paper proposes the Hierarchical Triple-Feature Interaction Fusion [...] Read more.

As a key technology in multimodal information processing, infrared and visible image fusion holds significant application value in fields such as military reconnaissance, intelligent security, and autonomous driving. To address the limitations of existing methods, this paper proposes the Hierarchical Triple-Feature Interaction Fusion Network (TFI-Fusion). Based on a hierarchical triple-stream feature interaction mechanism, the network achieves high-quality fusion through a two-stage, separate-model processing approach: In the first stage, a single model extracts low-rank components (representing global structural features) and sparse components (representing local detail features) from source images via the Low-Rank Sparse Decomposition (LSRSD) module, while capturing cross-modal shared features using the Shared Feature Extractor (SFE). In the second stage, another model performs fusion and reconstruction: it first enhances the complementarity between low-rank and sparse features through the innovatively introduced Bi-Feature Interaction (BFI) module, realizes multi-level feature fusion via the Triple-Feature Interaction (TFI) module, and finally generates fused images with rich scene representation through feature reconstruction. This separate-model design reduces memory usage and improves operational speed. Additionally, a multi-objective optimization function is designed based on the network’s characteristics. Experiments demonstrate that TFI-Fusion exhibits excellent fusion performance, effectively preserving image details and enhancing feature complementarity, thus providing reliable visual data support for downstream tasks. Full article

► Show Figures

Figure 1

18 pages, 2459 KB

Open AccessArticle

FFMamba: Feature Fusion State Space Model Based on Sound Event Localization and Detection

by Yibo Li, Dongyuan Ge, Jieke Xu and Xifan Yao

Electronics 2025, 14(19), 3874; https://doi.org/10.3390/electronics14193874 - 29 Sep 2025

Abstract

Previous studies on Sound Event Localization and Detection (SELD) have primarily focused on CNN- and Transformer-based designs. While CNNs possess local receptive fields, making it difficult to capture global dependencies over long sequences, Transformers excel at modeling long-range dependencies but have limited sensitivity [...] Read more.

Previous studies on Sound Event Localization and Detection (SELD) have primarily focused on CNN- and Transformer-based designs. While CNNs possess local receptive fields, making it difficult to capture global dependencies over long sequences, Transformers excel at modeling long-range dependencies but have limited sensitivity to local time–frequency features. Recently, the VMamba architecture, built upon the Visual State Space (VSS) model, has shown great promise in handling long sequences, yet it remains limited in modeling local spatial details. To address this issue, we propose a novel state space model with an attention-enhanced feature fusion mechanism, termed FFMamba, which balances both local spatial modeling and long-range dependency capture. At a fine-grained level, we design two key modules: the Multi-Scale Fusion Visual State Space (MSFVSS) module and the Wavelet Transform-Enhanced Downsampling (WTED) module. Specifically, the MSFVSS module integrates a Multi-Scale Fusion (MSF) component into the VSS framework, enhancing its ability to capture both long-range temporal dependencies and detailed local spatial information. Meanwhile, the WTED module employs a dual-branch design to fuse spatial and frequency domain features, improving the richness of feature representations. Comparative experiments were conducted on the DCASE2021 Task 3 and DCASE2022 Task 3 datasets. The results demonstrate that the proposed FFMamba model outperforms recent approaches in capturing long-range temporal dependencies and effectively integrating multi-scale audio features. In addition, ablation studies confirmed the effectiveness of the MSFVSS and WTED modules. Full article

► Show Figures

Figure 1

24 pages, 1826 KB

Open AccessArticle

Cloud and Snow Segmentation via Transformer-Guided Multi-Stream Feature Integration

by Kaisheng Yu, Kai Chen, Liguo Weng, Min Xia and Shengyan Liu

Remote Sens. 2025, 17(19), 3329; https://doi.org/10.3390/rs17193329 - 29 Sep 2025

Abstract

Cloud and snow often share comparable visual and structural patterns in satellite observations, making their accurate discrimination and segmentation particularly challenging. To overcome this, we design an innovative Transformer-guided architecture with complementary feature-extraction capabilities. The encoder adopts a dual-path structure, integrating a Transformer [...] Read more.

Cloud and snow often share comparable visual and structural patterns in satellite observations, making their accurate discrimination and segmentation particularly challenging. To overcome this, we design an innovative Transformer-guided architecture with complementary feature-extraction capabilities. The encoder adopts a dual-path structure, integrating a Transformer Encoder Module (TEM) for capturing long-range semantic dependencies and a ResNet18-based convolutional branch for detailed spatial representation. A Feature-Enhancement Module (FEM) is introduced to promote bidirectional interaction and adaptive feature integration between the two pathways. To improve delineation of object boundaries, especially in visually complex areas, we embed a Deep Feature-Extraction Module (DFEM) at the deepest layer of the convolutional stream. This component refines channel-level information to highlight critical features and enhance edge clarity. Additionally, to address noise from intricate backgrounds and ambiguous cloud-snow transitions, we incorporate both a Transformer Fusion Module (TFM) and a Strip Pooling Auxiliary Module (SPAM) in the decoding phase. These modules collaboratively enhance structural recovery and improve robustness in segmentation. Extensive experiments on the CSWV and SPARCS datasets show that our method consistently outperforms state-of-the-art baselines, demonstrating its strong effectiveness and applicability in real-world cloud and snow-detection scenarios. Full article

(This article belongs to the Special Issue Deep Learning-Based Cloud Detection and Removal for Remote Sensing Images)

► Show Figures

Figure 1

20 pages, 1488 KB

Open AccessArticle

Attention-Fusion-Based Two-Stream Vision Transformer for Heart Sound Classification

by Kalpeshkumar Ranipa, Wei-Ping Zhu and M. N. S. Swamy

Bioengineering 2025, 12(10), 1033; https://doi.org/10.3390/bioengineering12101033 - 26 Sep 2025

Abstract

Vision Transformers (ViTs), inspired by their success in natural language processing, have recently gained attention for heart sound classification (HSC). However, most of the existing studies on HSC rely on single-stream architectures, overlooking the advantages of multi-resolution features. While multi-stream architectures employing early [...] Read more.

Vision Transformers (ViTs), inspired by their success in natural language processing, have recently gained attention for heart sound classification (HSC). However, most of the existing studies on HSC rely on single-stream architectures, overlooking the advantages of multi-resolution features. While multi-stream architectures employing early or late fusion strategies have been proposed, they often fall short of effectively capturing cross-modal feature interactions. Additionally, conventional fusion methods, such as concatenation, averaging, or max pooling, frequently result in information loss. To address these limitations, this paper presents a novel attention fusion-based two-stream Vision Transformer (AFTViT) architecture for HSC that leverages two-dimensional mel-cepstral domain features. The proposed method employs a ViT-based encoder to capture long-range dependencies and diverse contextual information at multiple scales. A novel attention block is then used to integrate cross-context features at the feature level, enhancing the overall feature representation. Experiments conducted on the PhysioNet2016 and PhysioNet2022 datasets demonstrate that the AFTViT outperforms state-of-the-art CNN-based methods in terms of accuracy. These results highlight the potential of the AFTViT framework for early diagnosis of cardiovascular diseases, offering a valuable tool for cardiologists and researchers in developing advanced HSC techniques. Full article

(This article belongs to the Section Biosignal Processing)

► Show Figures

Figure 1

22 pages, 2395 KB

Open AccessArticle

Multimodal Alignment and Hierarchical Fusion Network for Multimodal Sentiment Analysis

by Jiasheng Huang, Huan Li and Xinyue Mo

Electronics 2025, 14(19), 3828; https://doi.org/10.3390/electronics14193828 - 26 Sep 2025

Abstract

The widespread emergence of multimodal data on social platforms has presented new opportunities for sentiment analysis. However, previous studies have often overlooked the issue of detail loss during modal interaction fusion. They also exhibit limitations in addressing semantic alignment challenges and the sensitivity [...] Read more.

The widespread emergence of multimodal data on social platforms has presented new opportunities for sentiment analysis. However, previous studies have often overlooked the issue of detail loss during modal interaction fusion. They also exhibit limitations in addressing semantic alignment challenges and the sensitivity of modalities to noise. To enhance analytical accuracy, a novel model named MAHFNet is proposed. The proposed architecture is composed of three main components. Firstly, an attention-guided gated interaction alignment module is developed for modeling the semantic interaction between text and image using a gated network and a cross-modal attention mechanism. Next, a contrastive learning mechanism is introduced to encourage the aggregation of semantically aligned image-text pairs. Subsequently, an intra-modality emotion extraction module is designed to extract local emotional features within each modality. This module serves to compensate for detail loss during interaction fusion. The intra-modal local emotion features and cross-modal interaction features are then fed into a hierarchical gated fusion module, where the local features are fused through a cross-gated mechanism to dynamically adjust the contribution of each modality while suppressing modality-specific noise. Then, the fusion results and cross-modal interaction features are further fused using a multi-scale attention gating module to capture hierarchical dependencies between local and global emotional information, thereby enhancing the model’s ability to perceive and integrate emotional cues across multiple semantic levels. Finally, extensive experiments have been conducted on three public multimodal sentiment datasets, with results demonstrating that the proposed model outperforms existing methods across multiple evaluation metrics. Specifically, on the TumEmo dataset, our model achieves improvements of 2.55% in ACC and 2.63% in F1 score compared to the second-best method. On the HFM dataset, these gains reach 0.56% in ACC and 0.9% in F1 score, respectively. On the MVSA-S dataset, these gains reach 0.03% in ACC and 1.26% in F1 score. These findings collectively validate the overall effectiveness of the proposed model. Full article

► Show Figures

Figure 1

17 pages, 4074 KB

Open AccessArticle

Groundwater Level Prediction Using a Hybrid TCN–Transformer–LSTM Model and Multi-Source Data Fusion: A Case Study of the Kuitun River Basin, Xinjiang

by Yankun Liu, Mingliang Du, Xiaofei Ma, Shuting Hu and Ziyun Tuo

Sustainability 2025, 17(19), 8544; https://doi.org/10.3390/su17198544 - 23 Sep 2025

Viewed by 238

Abstract

Groundwater level (GWL) prediction in arid regions faces two fundamental challenges in conventional numerical modeling: (i) irreducible parameter uncertainty, which systematically reduces predictive accuracy; (ii) oversimplification of nonlinear process interactions, which leads to error propagation. Although machine learning (ML) methods demonstrate strong nonlinear [...] Read more.

Groundwater level (GWL) prediction in arid regions faces two fundamental challenges in conventional numerical modeling: (i) irreducible parameter uncertainty, which systematically reduces predictive accuracy; (ii) oversimplification of nonlinear process interactions, which leads to error propagation. Although machine learning (ML) methods demonstrate strong nonlinear mapping capabilities, their standalone applications often encounter prediction bias and face the accuracy–generalization trade-off. This study proposes a hybrid TCN–Transformer–LSTM (TTL) model designed to address three key challenges in groundwater prediction: high-frequency fluctuations, medium-range dependencies, and long-term memory effects. The TTL framework integrates TCN layers for short-term features, Transformer blocks to model cross-temporal dependencies, and LSTM to preserve long-term memory, with residual connections facilitating hierarchical feature fusion. The results indicate that (1) at the monthly scale, TTL reduced RMSE by 20.7% (p < 0.01) and increased R² by 0.15 compared with the Groundwater Modeling System (GMS); (2) during abrupt hydrological events, TTL achieved superior performance (R² = 0.96–0.98, MAE < 0.6 m); (3) PCA revealed site-specific responses, corroborating the adaptability and interpretability of TTL; (4) Grad-CAM analysis demonstrated that the model captures physically interpretable attention mechanisms—particularly evapotranspiration and rainfall—thereby providing clear cause–effect explanations and enhancing transparency beyond black-box models. This transferable framework supports groundwater forecasting, risk warning, and practical deployment in arid regions, thereby contributing to sustainable water resource management. Full article

(This article belongs to the Special Issue AI Solutions for Improving Sustainability in Water Resource Management)

► Show Figures

Figure 1

28 pages, 14783 KB

Open AccessArticle

HSSTN: A Hybrid Spectral–Structural Transformer Network for High-Fidelity Pansharpening

by Weijie Kang, Yuan Feng, Yao Ding, Hongbo Xiang, Xiaobo Liu and Yaoming Cai

Remote Sens. 2025, 17(19), 3271; https://doi.org/10.3390/rs17193271 - 23 Sep 2025

Viewed by 133

Abstract

Pansharpening fuses multispectral (MS) and panchromatic (PAN) remote sensing images to generate outputs with high spatial resolution and spectral fidelity. Nevertheless, conventional methods relying primarily on convolutional neural networks or unimodal fusion strategies frequently fail to bridge the sensor modality gap between MS [...] Read more.

Pansharpening fuses multispectral (MS) and panchromatic (PAN) remote sensing images to generate outputs with high spatial resolution and spectral fidelity. Nevertheless, conventional methods relying primarily on convolutional neural networks or unimodal fusion strategies frequently fail to bridge the sensor modality gap between MS and PAN data. Consequently, spectral distortion and spatial degradation often occur, limiting high-precision downstream applications. To address these issues, this work proposes a Hybrid Spectral–Structural Transformer Network (HSSTN) that enhances multi-level collaboration through comprehensive modelling of spectral–structural feature complementarity. Specifically, the HSSTN implements a three-tier fusion framework. First, an asymmetric dual-stream feature extractor employs a residual block with channel attention (RBCA) in the MS branch to strengthen spectral representation, while a Transformer architecture in the PAN branch extracts high-frequency spatial details, thereby reducing modality discrepancy at the input stage. Subsequently, a target-driven hierarchical fusion network utilises progressive crossmodal attention across scales, ranging from local textures to multi-scale structures, to enable efficient spectral–structural aggregation. Finally, a novel collaborative optimisation loss function preserves spectral integrity while enhancing structural details. Comprehensive experiments conducted on QuickBird, GaoFen-2, and WorldView-3 datasets demonstrate that HSSTN outperforms existing methods in both quantitative metrics and visual quality. Consequently, the resulting images exhibit sharper details and fewer spectral artefacts, showcasing significant advantages in high-fidelity remote sensing image fusion. Full article

(This article belongs to the Special Issue Artificial Intelligence in Hyperspectral Remote Sensing Data Analysis)

► Show Figures

Figure 1

18 pages, 1617 KB

Open AccessArticle

GNN-MFF: A Multi-View Graph-Based Model for RTL Hardware Trojan Detection

by Senjie Zhang, Shan Zhou, Panpan Xue, Lu Kong and Jinbo Wang

Appl. Sci. 2025, 15(19), 10324; https://doi.org/10.3390/app151910324 - 23 Sep 2025

Viewed by 196

Abstract

The globalization of hardware design flows has increased the risk of Hardware Trojan (HT) insertion during the design phase. Graph-based learning methods have shown promise for HT detection at the Register Transfer Level (RTL). However, most existing approaches rely on representing RTL designs [...] Read more.

The globalization of hardware design flows has increased the risk of Hardware Trojan (HT) insertion during the design phase. Graph-based learning methods have shown promise for HT detection at the Register Transfer Level (RTL). However, most existing approaches rely on representing RTL designs through a single graph structure. This single-view modeling paradigm inherently constrains the model’s ability to perceive complex behavioral patterns, consequently limiting detection performance. To address these limitations, we propose GNN-MFF, an innovative multi-view feature fusion model based on Graph Neural Networks (GNNs). Our approach centers on joint multi-view modeling of RTL designs to achieve a more comprehensive representation. Specifically, we construct complementary graph-structural views: the Abstract Syntax Tree (AST) capturing structure information, and the Data Flow Graph (DFG) modeling logical dependency relationships. For each graph structure, customized GNN architectures are designed to effectively extract its features. Furthermore, we develop a feature fusion framework that leverages a multi-head attention mechanism to deeply explore and integrate heterogeneous features from distinct views, thereby enhancing the model’s capacity to structurally perceive anomalous logic patterns. Evaluated on an extended Trust-Hub-based HT benchmark dataset, our model achieves an average F1-score of 97.08% in automated detection of unseen HTs, surpassing current state-of-the-art methods. Full article

► Show Figures

Figure 1

19 pages, 2794 KB

Open AccessArticle

Estimating Soil Moisture Content in Winter Wheat in Southern Xinjiang by Fusing UAV Texture Feature with Novel Three-Dimensional Texture Indexes

by Tao Sun, Zhijun Li, Zijun Tang, Wei Zhang, Wangyang Li, Zhiying Liu, Jinqi Wu, Shiqi Liu, Youzhen Xiang and Fucang Zhang

Plants 2025, 14(19), 2948; https://doi.org/10.3390/plants14192948 - 23 Sep 2025

Viewed by 165

Abstract

Winter wheat is a major staple crop worldwide, and real-time monitoring of soil moisture content (SMC) is critical for yield security. Targeting the monitoring needs under arid conditions in southern Xinjiang, this study proposes a UAV multispectral-based SMC estimation method that constructs novel [...] Read more.

Winter wheat is a major staple crop worldwide, and real-time monitoring of soil moisture content (SMC) is critical for yield security. Targeting the monitoring needs under arid conditions in southern Xinjiang, this study proposes a UAV multispectral-based SMC estimation method that constructs novel three-dimensional (3-D) texture indices. Field experiments were conducted over two consecutive growing seasons in Kunyu City, southern Xinjiang, China, with four irrigation and four fertilization levels. High-resolution multispectral imagery was acquired at the jointing stage using a UAV-mounted camera. From the imagery, conventional texture features were extracted, and six two-dimensional (2-D) and four 3-D texture indices were constructed. A correlation matrix approach was used to screen feature combinations significantly associated with SMC. Random forest (RF), partial least squares regression (PLSR), and back-propagation neural networks (BPNN) were then used to develop SMC models for three soil depths (0–20, 20–40, and 40–60 cm). Results showed that estimation accuracy for the shallow layer (0–20 cm) was markedly higher than for the middle and deep layers. Under single-source input, using 3-D texture indices (Combination 3) with RF achieved the best shallow-layer performance (validation R² = 0.827, RMSE = 0.534, MRE = 2.686%). With multi-source fusion inputs (Combination 7: texture features + 2-D texture indices + 3-D texture indices) combined with RF, shallow-layer SMC estimation further improved (R² = 0.890, RMSE = 0.395, MRE = 1.91%). Relative to models using only conventional texture features, fusion increased R² by approximately 11.4%, 11.7%, and 18.1% for the shallow, middle, and deep layers, respectively. The findings indicate that 3-D texture indices (e.g., DTTI), which integrate multi-band texture information, more comprehensively capture canopy spatial structure and are more sensitive to shallow-layer moisture dynamics. Multi-source fusion provides complementary information and substantially enhances model accuracy. The proposed approach offers a new pathway for accurate SMC monitoring in arid croplands and is of practical significance for remote sensing-based moisture estimation and precision irrigation. Full article

(This article belongs to the Special Issue The Application of Spectral Techniques in Agriculture and Forestry—2nd Edition)

► Show Figures

Figure 1

Search Results (1,981)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,981)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI