Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,783)

Search Parameters:
Keywords = multi-scale information fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 3314 KB  
Article
MGF-DTA: A Multi-Granularity Fusion Model for Drug–Target Binding Affinity Prediction
by Zheng Ni, Bo Wei and Yuni Zeng
Int. J. Mol. Sci. 2026, 27(2), 947; https://doi.org/10.3390/ijms27020947 (registering DOI) - 18 Jan 2026
Abstract
Drug–target affinity (DTA) prediction is one of the core components of drug discovery. Despite considerable advances in previous research, DTA tasks still face several limitations with insufficient multi-modal information of drugs, the inherent sequence length limitation of protein language models, and single attention [...] Read more.
Drug–target affinity (DTA) prediction is one of the core components of drug discovery. Despite considerable advances in previous research, DTA tasks still face several limitations with insufficient multi-modal information of drugs, the inherent sequence length limitation of protein language models, and single attention mechanisms that fail to capture critical multi-scale features. To alleviate the above limitations, we developed a multi-granularity fusion model for drug–target binding affinity prediction, termed MGF-DTA. This model is composed of three fusion modules, specifically as follows. First, the model extracts deep semantic features of SMILES strings through ChemBERTa-2 and integrates them with molecular fingerprints by using gated fusion to enhance the multi-modal information of drugs. In addition, it employs a residual fusion mechanism to integrate the global embeddings from ESM-2 with the local features obtained by the k-mer and principal component analysis (PCA) method. Finally, a hierarchical attention mechanism is employed to extract multi-granularity features from both drug SMILES strings and protein sequences. Comparative analysis with other mainstream methods on the Davis, KIBA, and BindingDB datasets reveals that the MGF-DTA model exhibits outstanding performance advantages. Further, ablation studies confirm the effectiveness of the model components and case study illustrates its robust generalization capability. Full article
29 pages, 19178 KB  
Article
Dual-Task Learning for Fine-Grained Bird Species and Behavior Recognition via Token Re-Segmentation, Multi-Scale Mixed Attention, and Feature Interleaving
by Cong Zhang, Zhichao Chen, Ye Lin, Xiuping Huang and Chih-Wei Lin
Appl. Sci. 2026, 16(2), 966; https://doi.org/10.3390/app16020966 (registering DOI) - 17 Jan 2026
Abstract
In the ecosystem, birds are important indicators that can sensitively reflect changes in the ecological environment and its health. However, bird monitoring has challenges due to species diversity, variable behaviors, and distinct morphological characteristics. Therefore, we propose a parallel dual-branch hybrid CNN–Transformer architecture [...] Read more.
In the ecosystem, birds are important indicators that can sensitively reflect changes in the ecological environment and its health. However, bird monitoring has challenges due to species diversity, variable behaviors, and distinct morphological characteristics. Therefore, we propose a parallel dual-branch hybrid CNN–Transformer architecture for feature extraction that simultaneously captures local and global image features to address the “local feature similarity” issue in dual tasks of bird species and behaviors. The dual-task framework comprises three main components: the Token Re-segmentation Module (TRM), the Multi-scale Adaptive Module (MAM), and the Feature Interleaving Structure (FIS). The designed MAM fuses hybrid attention to address the problem of different-scale birds. MAM models the interdependencies between spatial and channel dimensions of features from different scales. It enables the model to adaptively choose scale-specific feature representations, accommodating inputs of different scales. In addition, we designed an efficient feature-sharing mechanism, called FIS, between parallel CNN branches. FIS interleaving delivers and fuses CNN feature maps across parallel layers, combining them with the features of the corresponding Transformer layer to share local and global information at different depths and promote deep feature fusion across parallel networks. Finally, we designed the TRM to address the challenge of visually similar but distinct bird species and of similar poses with distinct behaviors. TRM adopts a two-step approach: first, it locates discriminative regions, and then performs fine segmentation on them. This module enables the network to allocate relatively more attention to key areas while merging non-essential information and reducing interference from irrelevant details. Experiments on the self-made dataset demonstrate that, compared with state-of-the-art classification networks, the proposed network achieves the best performance, achieving 79.70% accuracy in bird species recognition, 76.21% in behavior recognition, and the best performance in dual-task recognition. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
25 pages, 1708 KB  
Article
Distribution Network Electrical Equipment Defect Identification Based on Multi-Modal Image Voiceprint Data Fusion and Channel Interleaving
by An Chen, Junle Liu, Wenhao Zhang, Jiaxuan Lu, Jiamu Yang and Bin Liao
Processes 2026, 14(2), 326; https://doi.org/10.3390/pr14020326 - 16 Jan 2026
Viewed by 21
Abstract
With the explosive growth in the quantity of electrical equipment in distribution networks, traditional manual inspection struggles to achieve comprehensive coverage due to limited manpower and low efficiency. This has led to frequent equipment failures including partial discharge, insulation aging, and poor contact. [...] Read more.
With the explosive growth in the quantity of electrical equipment in distribution networks, traditional manual inspection struggles to achieve comprehensive coverage due to limited manpower and low efficiency. This has led to frequent equipment failures including partial discharge, insulation aging, and poor contact. These issues seriously compromise the safe and stable operation of distribution networks. Real-time monitoring and defect identification of their operation status are critical to ensuring the safety and stability of power systems. Currently, commonly used methods for defect identification in distribution network electrical equipment mainly rely on single-image or voiceprint data features. These methods lack consideration of the complementarity and interleaved nature between image and voiceprint features, resulting in reduced identification accuracy and reliability. To address the limitations of existing methods, this paper proposes distribution network electrical equipment defect identification based on multi-modal image voiceprint data fusion and channel interleaving. First, image and voiceprint feature models are constructed using two-dimensional principal component analysis (2DPCA) and the Mel scale, respectively. Multi-modal feature fusion is achieved using an improved transformer model that integrates intra-domain self-attention units and an inter-domain cross-attention mechanism. Second, an image and voiceprint multi-channel interleaving model is applied. It combines channel adaptability and confidence to dynamically adjust weights and generates defect identification results using a weighting approach based on output probability information content. Finally, simulation results show that, under the dataset size of 3300 samples, the proposed algorithm achieves a 8.96–33.27% improvement in defect recognition accuracy compared with baseline algorithms, and maintains an accuracy of over 86.5% even under 20% random noise interference by using improved transformer and multi-channel interleaving mechanism, verifying its advantages in accuracy and noise robustness. Full article
28 pages, 32251 KB  
Article
A Dual-Resolution Network Based on Orthogonal Components for Building Extraction from VHR PolSAR Images
by Songhao Ni, Fuhai Zhao, Mingjie Zheng, Zhen Chen and Xiuqing Liu
Remote Sens. 2026, 18(2), 305; https://doi.org/10.3390/rs18020305 - 16 Jan 2026
Viewed by 24
Abstract
Sub-meter-resolution Polarimetric Synthetic Aperture Radar (PolSAR) imagery enables precise building footprint extraction but introduces complex scattering correlated with fine spatial structures. This change renders both traditional methods, which rely on simplified scattering models, and existing deep learning approaches, which sacrifice spatial detail through [...] Read more.
Sub-meter-resolution Polarimetric Synthetic Aperture Radar (PolSAR) imagery enables precise building footprint extraction but introduces complex scattering correlated with fine spatial structures. This change renders both traditional methods, which rely on simplified scattering models, and existing deep learning approaches, which sacrifice spatial detail through multi-looking, inadequate for high-precision extraction tasks. To address this, we propose an Orthogonal Dual-Resolution Network (ODRNet) for end-to-end, precise segmentation directly from single-look complex (SLC) data. Unlike complex-valued neural networks that suffer from high computational cost and optimization difficulties, our approach decomposes complex-valued data into its orthogonal real and imaginary components, which are then concurrently fed into a Dual-Resolution Branch (DRB) with Bilateral Information Fusion (BIF) to effectively balance the trade-off between semantic and spatial details. Crucially, we introduce an auxiliary Polarization Orientation Angle (POA) regression task to enforce physical consistency between the orthogonal branches. To tackle the challenge of diverse building scales, we designed a Multi-scale Aggregation Pyramid Pooling Module (MAPPM) to enhance contextual awareness and a Pixel-attention Fusion (PAF) module to adaptively fuse dual-branch features. Furthermore, we have constructed a VHR PolSAR building footprint segmentation dataset to support related research. Experimental results demonstrate that ODRNet achieves 64.3% IoU and 78.27% F1-score on our dataset, and 73.61% IoU with 84.8% F1-score on a large-scale SLC scene, confirming the method’s significant potential and effectiveness in high-precision building extraction directly from SLC. Full article
23 pages, 3847 KB  
Article
DRPU-YOLO11: A Multi-Scale Model for Detecting Rice Panicles in UAV Images with Complex Infield Background
by Dongchen Huang, Zhipeng Chen, Jiajun Zhuang, Ge Song, Huasheng Huang, Feilong Li, Guogang Huang and Changyu Liu
Agriculture 2026, 16(2), 234; https://doi.org/10.3390/agriculture16020234 - 16 Jan 2026
Viewed by 118
Abstract
In the field of precision agriculture, accurately detecting rice panicles is crucial for monitoring rice growth and managing rice production. To address the challenges posed by complex field backgrounds, including variety differences, variations across growth stages, background interference, and occlusion due to dense [...] Read more.
In the field of precision agriculture, accurately detecting rice panicles is crucial for monitoring rice growth and managing rice production. To address the challenges posed by complex field backgrounds, including variety differences, variations across growth stages, background interference, and occlusion due to dense distribution, this study develops an improved YOLO11-based rice panicle detection model, termed DRPU-YOLO11. The model incorporates a task-oriented CSP-PGMA module in the backbone to enhance multi-scale feature extraction and provide richer representations for downstream detection. In the neck network, DySample and CGDown are adopted to strengthen global contextual feature aggregation and suppress background interference for small targets. Furthermore, fine-grained P2 level information is integrated with higher-level features through a cross-scale fusion module (CSP-ONMK) to improve detection robustness in dense and occluded scenes. In addition, the PowerTAL strategy adapts quality-aware label assignment to emphasize high-quality predictions during training. The experimental results based on a self-constructed dataset demonstrate that DRPU-YOLO11 significantly outperforms baseline models in rice panicle detection under complex field environments, achieving an accuracy of 82.5%. Compared with the baseline model YOLO11 and RT-DETR, the mAP50 increases by 2.4% and 5.0%, respectively. These results indicate that the proposed task-driven design provides a practical and high-precision solution for rice panicle detection, with potential applications in rice growth monitoring and yield estimation. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

19 pages, 4395 KB  
Article
An Attention-Based Bidirectional Feature Fusion Algorithm for Insulator Detection
by Binghao Gao, Jinyu Guo, Yongyue Wang, Dong Li and Xiaoqiang Jia
Sensors 2026, 26(2), 584; https://doi.org/10.3390/s26020584 - 15 Jan 2026
Viewed by 119
Abstract
To maintain reliability, safety, and sustainability in power transmission, insulator defect detection has become a critical task in power line inspection. Due to the complex backgrounds and small defect sizes encountered in insulator defect images, issues such as false detections and missed detections [...] Read more.
To maintain reliability, safety, and sustainability in power transmission, insulator defect detection has become a critical task in power line inspection. Due to the complex backgrounds and small defect sizes encountered in insulator defect images, issues such as false detections and missed detections often occur. The existing You Only Look Once (YOLO) object detection algorithm is currently the mainstream method for image-based insulator defect detection in power lines. However, existing models suffer from low detection accuracy. To address this issue, this paper presents an improved YOLOv5-based MC-YOLO insulator detection algorithm. To effectively extract multi-scale information and enhance the model’s ability to represent feature information, a multi-scale attention convolutional fusion (MACF) module incorporating an attention mechanism is proposed. This module utilises parallel convolutions with different kernel sizes to effectively extract features at various scales and highlights the feature representation of key targets through the attention mechanism, thereby improving the detection accuracy. Additionally, a cross-context feature fusion module (CCFM) is designed, where shallow features gain partial deep semantic supplementation and deep features absorb shallow spatial information, achieving bidirectional information flow. Furthermore, the Spatial-Channel Dual Attention Module (SCDAM) is introduced into CCFM. By incorporating a dynamic attention-guided bidirectional cross-fusion mechanism, it effectively resolves the feature deviation between shallow details and deep semantics during multi-scale feature fusion. The experimental results show that the MC-YOLO algorithm achieves an mAP@0.5 of 67.4% on the dataset used in this study, which is a 4.1% improvement over the original YOLOv5. Although the FPS is slightly reduced compared to the original model, it remains practical and capable of rapidly and accurately detecting insulator defects. Full article
(This article belongs to the Section Industrial Sensors)
Show Figures

Figure 1

14 pages, 2106 KB  
Article
A Hierarchical Multi-Modal Fusion Framework for Alzheimer’s Disease Classification Using 3D MRI and Clinical Biomarkers
by Ting-An Chang, Chun-Cheng Yu, Yin-Hua Wang, Zi-Ping Lei and Chia-Hung Chang
Electronics 2026, 15(2), 367; https://doi.org/10.3390/electronics15020367 - 14 Jan 2026
Viewed by 133
Abstract
Accurate and interpretable staging of Alzheimer’s disease (AD) remains challenging due to the heterogeneous progression of neurodegeneration and the complementary nature of imaging and clinical biomarkers. This study implements and evaluates an optimized Hierarchical Multi-Modal Fusion Framework (HMFF) that systematically integrates 3D structural [...] Read more.
Accurate and interpretable staging of Alzheimer’s disease (AD) remains challenging due to the heterogeneous progression of neurodegeneration and the complementary nature of imaging and clinical biomarkers. This study implements and evaluates an optimized Hierarchical Multi-Modal Fusion Framework (HMFF) that systematically integrates 3D structural MRI with clinical assessment scales for robust three-class classification of cognitively normal (CN), mild cognitive impairment (MCI), and AD subjects. A standardized preprocessing pipeline, including N4 bias field correction, nonlinear registration to MNI space, ANTsNet-based skull stripping, voxel normalization, and spatial resampling, was employed to ensure anatomically consistent and high-quality MRI inputs. Within the proposed framework, volumetric imaging features were extracted using a 3D DenseNet-121 architecture, while structured clinical information was modeled via an XGBoost classifier to capture nonlinear clinical priors. These heterogeneous representations were hierarchically fused through a lightweight multilayer perceptron, enabling effective cross-modal interaction. To further enhance discriminative capability and model efficiency, a hierarchical feature selection strategy was incorporated to progressively refine high-dimensional imaging features. Experimental results demonstrated that performance consistently improved with feature refinement and reached an optimal balance at approximately 90 selected features. Under this configuration, the proposed HMFF achieved an accuracy of 0.94 (95% Confidence Interval: [0.918, 0.951]), a recall of 0.91, a precision of 0.94, and an F1-score of 0.92, outperforming unimodal and conventional multimodal baselines under comparable settings. Moreover, Grad-CAM visualization confirmed that the model focused on clinically relevant neuroanatomical regions, including the hippocampus and medial temporal lobe, enhancing interpretability and clinical plausibility. These findings indicate that hierarchical multimodal fusion with interpretable feature refinement offers a promising and extensible solution for reliable and explainable automated AD staging. Full article
(This article belongs to the Special Issue AI-Driven Medical Image/Video Processing)
Show Figures

Figure 1

24 pages, 5801 KB  
Article
MEANet: A Novel Multiscale Edge-Aware Network for Building Change Detection in High-Resolution Remote Sensing Images
by Tao Chen, Linjin Huang, Wenyi Zhao, Shengjie Yu, Yue Yang and Antonio Plaza
Remote Sens. 2026, 18(2), 261; https://doi.org/10.3390/rs18020261 - 14 Jan 2026
Viewed by 172
Abstract
Remote sensing building change detection (RSBCD) is critical for land surface monitoring and understanding interactions between human activities and the ecological environment. However, existing deep learning-based RSBCD methods often result in mis-detected pixels concentrated around object boundaries, mainly due to ambiguous object shapes [...] Read more.
Remote sensing building change detection (RSBCD) is critical for land surface monitoring and understanding interactions between human activities and the ecological environment. However, existing deep learning-based RSBCD methods often result in mis-detected pixels concentrated around object boundaries, mainly due to ambiguous object shapes and complex spatial distributions. To address this problem, we propose a new Multiscale Edge-Aware change detection Network (MEANet) that accurately locates edge pixels of changed objects and enhances the separability between changed and unchanged pixels. Specifically, a high-resolution feature fusion network is adopted to preserve spatial details while integrating deep semantic information, and a multi-scale supervised contrastive loss (MSCL) is designed to jointly optimize pixel-level discrimination and embedding space separability. To further improve the handling of difficult samples, hard negative sampling is adopted in the contrastive learning process. We conduct comparative experiments on three benchmark datasets. Both Visual and quantitative results demonstrate that our new MEANet significantly reduces misclassified pixels at object boundaries and achieve superior detection accuracy compared to existing methods. Especially on the GZ-CD dataset, MEANet improves F1-Score and mIoU by more than 2% compared with ChangeFormer, demonstrating strong robustness in complex scenarios. It is worth noting that the performance of MEANet may still be affected by extremely complex edge textures or highly blurred boundaries. Future work will focus on further improving robustness under such challenges and extending the method to broader RSBCD scenarios. Full article
Show Figures

Figure 1

24 pages, 5237 KB  
Article
DCA-UNet: A Cross-Modal Ginkgo Crown Recognition Method Based on Multi-Source Data
by Yunzhi Guo, Yang Yu, Yan Li, Mengyuan Chen, Wenwen Kong, Yunpeng Zhao and Fei Liu
Plants 2026, 15(2), 249; https://doi.org/10.3390/plants15020249 - 13 Jan 2026
Viewed by 220
Abstract
Wild ginkgo, as an endangered species, holds significant value for genetic resource conservation, yet its practical applications face numerous challenges. Traditional field surveys are inefficient in mountainous mixed forests, while satellite remote sensing is limited by spatial resolution. Current deep learning approaches relying [...] Read more.
Wild ginkgo, as an endangered species, holds significant value for genetic resource conservation, yet its practical applications face numerous challenges. Traditional field surveys are inefficient in mountainous mixed forests, while satellite remote sensing is limited by spatial resolution. Current deep learning approaches relying on single-source data or merely simple multi-source fusion fail to fully exploit information, leading to suboptimal recognition performance. This study presents a multimodal ginkgo crown dataset, comprising RGB and multispectral images acquired by an UAV platform. To achieve precise crown segmentation with this data, we propose a novel dual-branch dynamic weighting fusion network, termed dual-branch cross-modal attention-enhanced UNet (DCA-UNet). We design a dual-branch encoder (DBE) with a two-stream architecture for independent feature extraction from each modality. We further develop a cross-modal interaction fusion module (CIF), employing cross-modal attention and learnable dynamic weights to boost multi-source information fusion. Additionally, we introduce an attention-enhanced decoder (AED) that combines progressive upsampling with a hybrid channel-spatial attention mechanism, thereby effectively utilizing multi-scale features and enhancing boundary semantic consistency. Evaluation on the ginkgo dataset demonstrates that DCA-UNet achieves a segmentation performance of 93.42% IoU (Intersection over Union), 96.82% PA (Pixel Accuracy), 96.38% Precision, and 96.60% F1-score. These results outperform differential feature attention fusion network (DFAFNet) by 12.19%, 6.37%, 4.62%, and 6.95%, respectively, and surpasses the single-modality baselines (RGB or multispectral) in all metrics. Superior performance on cross-flight-altitude data further validates the model’s strong generalization capability and robustness in complex scenarios. These results demonstrate the superiority of DCA-UNet in UAV-based multimodal ginkgo crown recognition, offering a reliable and efficient solution for monitoring wild endangered tree species. Full article
(This article belongs to the Special Issue Advanced Remote Sensing and AI Techniques in Agriculture and Forestry)
Show Figures

Figure 1

26 pages, 5686 KB  
Article
MAFMamba: A Multi-Scale Adaptive Fusion Network for Semantic Segmentation of High-Resolution Remote Sensing Images
by Boxu Li, Xiaobing Yang and Yingjie Fan
Sensors 2026, 26(2), 531; https://doi.org/10.3390/s26020531 - 13 Jan 2026
Viewed by 95
Abstract
With rapid advancements in sub-meter satellite and aerial imaging technologies, high-resolution remote sensing imagery has become a pivotal source for geospatial information acquisition. However, current semantic segmentation models encounter two primary challenges: (1) the inherent trade-off between capturing long-range global context and preserving [...] Read more.
With rapid advancements in sub-meter satellite and aerial imaging technologies, high-resolution remote sensing imagery has become a pivotal source for geospatial information acquisition. However, current semantic segmentation models encounter two primary challenges: (1) the inherent trade-off between capturing long-range global context and preserving precise local structural details—where excessive reliance on downsampled deep semantics often results in blurred boundaries and the loss of small objects and (2) the difficulty in modeling complex scenes with extreme scale variations, where objects of the same category exhibit drastically different morphological features. To address these issues, this paper introduces MAFMamba, a multi-scale adaptive fusion visual Mamba network tailored for high-resolution remote sensing images. To mitigate scale variation, we design a lightweight hybrid encoder incorporating an Adaptive Multi-scale Mamba Block (AMMB) in each stage. Driven by a Multi-scale Adaptive Fusion (MSAF) mechanism, the AMMB dynamically generates pixel-level weights to recalibrate cross-level features, establishing a robust multi-scale representation. Simultaneously, to strictly balance local details and global semantics, we introduce a Global–Local Feature Enhancement Mamba (GLMamba) in the decoder. This module synergistically integrates local fine-grained features extracted by convolutions with global long-range dependencies modeled by the Visual State Space (VSS) layer. Furthermore, we propose a Multi-Scale Cross-Attention Fusion (MSCAF) module to bridge the semantic gap between the encoder’s shallow details and the decoder’s high-level semantics via an efficient cross-attention mechanism. Extensive experiments on the ISPRS Potsdam and Vaihingen datasets demonstrate that MAFMamba surpasses state-of-the-art Convolutional Neural Network (CNN), Transformer, and Mamba-based methods in terms of mIoU and mF1 scores. Notably, it achieves superior accuracy while maintaining linear computational complexity and low memory usage, underscoring its efficiency in complex remote sensing scenarios. Full article
(This article belongs to the Special Issue Intelligent Sensors and Artificial Intelligence in Building)
Show Figures

Figure 1

21 pages, 23946 KB  
Article
Infrared Image Denoising Algorithm Based on Wavelet Transform and Self-Attention Mechanism
by Hongmei Li, Yang Zhang, Luxia Yang and Hongrui Zhang
Sensors 2026, 26(2), 523; https://doi.org/10.3390/s26020523 - 13 Jan 2026
Viewed by 104
Abstract
Infrared images are often degraded by complex noise due to hardware and environmental factors, posing challenges for subsequent processing and target detection. To overcome the shortcomings of existing denoising methods in balancing noise removal and detail preservation, this paper proposes a Wavelet Transform [...] Read more.
Infrared images are often degraded by complex noise due to hardware and environmental factors, posing challenges for subsequent processing and target detection. To overcome the shortcomings of existing denoising methods in balancing noise removal and detail preservation, this paper proposes a Wavelet Transform Enhanced Infrared Denoising Model (WTEIDM). Firstly, a Wavelet Transform Self-Attention (WTSA) is designed, which combines the frequency-domain decomposition ability of the discrete wavelet transform (DWT) with the dynamic weighting mechanism of self-attention to achieve effective separation of noise and detail. Secondly, a Multi-Scale Gated Linear Unit (MSGLU) is devised to improve the ability to capture detail information and dynamically control features through dual-branch multi-scale depth-wise convolution and gating strategy. Finally, a Parallel Hybrid Attention Module (PHAM) is proposed to enhance cross-dimensional feature fusion effect through the parallel cross-interaction of spatial and channel attention. Extensive experiments are conducted on five infrared datasets under different noise levels (σ = 15, 25, and 50). The results demonstrate that the proposed WTEIDM outperforms several state-of-the-art denoising algorithms on both PSNR and SSIM metrics, confirming its superior generalization capability and robustness. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

24 pages, 5571 KB  
Article
Bearing Fault Diagnosis Based on a Depthwise Separable Atrous Convolution and ASPP Hybrid Network
by Xiaojiao Gu, Chuanyu Liu, Jinghua Li, Xiaolin Yu and Yang Tian
Machines 2026, 14(1), 93; https://doi.org/10.3390/machines14010093 - 13 Jan 2026
Viewed by 86
Abstract
To address the computational redundancy, inadequate multi-scale feature capture, and poor noise robustness of traditional deep networks used for bearing vibration and acoustic signal feature extraction, this paper proposes a fault diagnosis method based on Depthwise Separable Atrous Convolution (DSAC) and Acoustic Spatial [...] Read more.
To address the computational redundancy, inadequate multi-scale feature capture, and poor noise robustness of traditional deep networks used for bearing vibration and acoustic signal feature extraction, this paper proposes a fault diagnosis method based on Depthwise Separable Atrous Convolution (DSAC) and Acoustic Spatial Pyramid Pooling (ASPP). First, the Continuous Wavelet Transform (CWT) is applied to the vibration and acoustic signals to convert them into time–frequency representations. The vibration CWT is then fed into a multi-scale feature extraction module to obtain preliminary vibration features, whereas the acoustic CWT is processed by a Deep Residual Shrinkage Network (DRSN). The two feature streams are concatenated in a feature fusion module and subsequently fed into the DSAC and ASPP modules, which together expand the effective receptive field and aggregate multi-scale contextual information. Finally, global pooling followed by a classifier outputs the bearing fault category, enabling high-precision bearing fault identification. Experimental results show that, under both clean data and multiple low signal-to-noise ratio (SNR) noise conditions, the proposed DSAC-ASPP method achieves higher accuracy and lower variance than baselines such as ResNet, VGG, and MobileNet, while requiring fewer parameters and FLOPs and exhibiting superior robustness and deployability. Full article
Show Figures

Figure 1

20 pages, 3283 KB  
Article
Small-Target Pest Detection Model Based on Dynamic Multi-Scale Feature Extraction and Dimensionally Selected Feature Fusion
by Junjie Li, Wu Le, Zhenhong Jia, Gang Zhou, Jiajia Wang, Guohong Chen, Yang Wang and Yani Guo
Appl. Sci. 2026, 16(2), 793; https://doi.org/10.3390/app16020793 - 13 Jan 2026
Viewed by 99
Abstract
Pest detection in the field is crucial for realizing smart agriculture. Deep learning-based target detection algorithms have become an important pest identification method due to their high detection accuracy, but the existing methods still suffer from misdetection and omission when detecting small-targeted pests [...] Read more.
Pest detection in the field is crucial for realizing smart agriculture. Deep learning-based target detection algorithms have become an important pest identification method due to their high detection accuracy, but the existing methods still suffer from misdetection and omission when detecting small-targeted pests and small-targeted pests in more complex backgrounds. For this reason, this study improves on YOLO11 and proposes a new model called MSDS-YOLO for enhanced detection of small-target pests. First, a new dynamic multi-scale feature extraction module (C3k2_DMSFE) is introduced, which can be adaptively adjusted according to different input features and thus effectively capture multi-scale and diverse feature information. Next, a novel Dimensional Selective Feature Pyramid Network (DSFPN) is proposed, which employs adaptive feature selection and multi-dimensional fusion mechanisms to enhance small-target saliency. Finally, the ability to fit small targets was enhanced by adding 160 × 160 detection heads removing 20 × 20 detection heads and using Normalized Gaussian Wasserstein Distance (NWD) combined with CIoU as a position loss function to measure the prediction error. In addition, a real small-target pest dataset, Cottonpest2, is constructed for validating the proposed model. The experimental results showed that a mAP50 of 86.7% was achieved on the self-constructed dataset Cottonpest2, which was improved by 3.0% compared to the baseline. At the same time, MSDS-YOLO has achieved better detection accuracy than other YOLO models on public datasets. Model evaluation on these three datasets shows that the MSDS-YOLO model has excellent robustness and model generalization ability. Full article
Show Figures

Figure 1

39 pages, 2940 KB  
Article
Trustworthy AI-IoT for Citizen-Centric Smart Cities: The IMTPS Framework for Intelligent Multimodal Crowd Sensing
by Wei Li, Ke Li, Zixuan Xu, Mengjie Wu, Yang Wu, Yang Xiong, Shijie Huang, Yijie Yin, Yiping Ma and Haitao Zhang
Sensors 2026, 26(2), 500; https://doi.org/10.3390/s26020500 - 12 Jan 2026
Viewed by 203
Abstract
The fusion of Artificial Intelligence and the Internet of Things (AI-IoT, also widely referred to as AIoT) offers transformative potential for smart cities, yet presents a critical challenge: how to process heterogeneous data streams from intelligent sensing—particularly crowd sensing data derived from citizen [...] Read more.
The fusion of Artificial Intelligence and the Internet of Things (AI-IoT, also widely referred to as AIoT) offers transformative potential for smart cities, yet presents a critical challenge: how to process heterogeneous data streams from intelligent sensing—particularly crowd sensing data derived from citizen interactions like text, voice, and system logs—into reliable intelligence for sustainable urban governance. To address this challenge, we introduce the Intelligent Multimodal Ticket Processing System (IMTPS), a novel AI-IoT smart system. Unlike ad hoc solutions, the novelty of IMTPS resides in its theoretically grounded architecture, which orchestrates Information Theory and Game Theory for efficient, verifiable extraction, and employs Causal Inference and Meta-Learning for robust reasoning, thereby synergistically converting noisy, heterogeneous data streams into reliable governance intelligence. This principled design endows IMTPS with four foundational capabilities essential for modern smart city applications: Sustainable and Efficient AI-IoT Operations: Guided by Information Theory, the IMTPS compression module achieves provably efficient semantic-preserving compression, drastically reducing data storage and energy costs. Trustworthy Data Extraction: A Game Theory-based adversarial verification network ensures high reliability in extracting critical information, mitigating the risk of model hallucination in high-stakes citizen services. Robust Multimodal Fusion: The fusion engine leverages Causal Inference to distinguish true causality from spurious correlations, enabling trustworthy integration of complex, multi-source urban data. Adaptive Intelligent System: A Meta-Learning-based retrieval mechanism allows the system to rapidly adapt to new and evolving query patterns, ensuring long-term effectiveness in dynamic urban environments. We validate IMTPS on a large-scale, publicly released benchmark dataset of 14,230 multimodal records. IMTPS demonstrates state-of-the-art performance, achieving a 96.9% reduction in storage footprint and a 47% decrease in critical data extraction errors. By open-sourcing our implementation, we aim to provide a replicable blueprint for building the next generation of trustworthy and sustainable AI-IoT systems for citizen-centric smart cities. Full article
(This article belongs to the Special Issue AI-IoT for New Challenges in Smart Cities)
Show Figures

Figure 1

25 pages, 4395 KB  
Article
Correlation-Aware Multimodal Fusion Network for Fashion Compatibility Modeling
by Yan Fang, Jiangnan Ge, Ran Xiao and Yidan Zhang
Electronics 2026, 15(2), 332; https://doi.org/10.3390/electronics15020332 - 12 Jan 2026
Viewed by 100
Abstract
The rapid growth of e-commerce and the booming online fashion industry are driving growing user demand for sophisticated, compatible fashion outfits. As an emerging multimodal information retrieval technology, fashion compatibility modeling aims to predict the compatibility degree for any given outfit and provide [...] Read more.
The rapid growth of e-commerce and the booming online fashion industry are driving growing user demand for sophisticated, compatible fashion outfits. As an emerging multimodal information retrieval technology, fashion compatibility modeling aims to predict the compatibility degree for any given outfit and provide complementary item recommendations for incomplete outfits. Although existing research has made significant progress in exploring fashion compatibility tasks from a multimodal perspective, it has yet to fully exploit the multimodal information and correlations among fashion items. To effectively tackle these challenges, a correlation-aware multimodal fusion network for fashion compatibility modeling is proposed. Long-distance correlated visual features are investigated during multimodal processing to enhance the quality of visual features. An improved dual-interaction mechanism is used to achieve deep multimodal fusion. Furthermore, we explore both negative and multi-scale correlations to obtain complex correlations among items and thereby enhance the accuracy of fashion compatibility assessment. Extensive experiments on real-world fashion datasets demonstrate that our method outperforms existing advanced benchmark models in AUC and ACC metrics. This indicates the efficiency of our model in enhancing fashion compatibility evaluation performance. Full article
Show Figures

Figure 1

Back to TopTop