MDPI - Publisher of Open Access Journals

38 pages, 585 KB

Open AccessReview

A Unified Information Bottleneck Framework for Multimodal Biomedical Machine Learning

by Liang Dong

Entropy 2026, 28(4), 445; https://doi.org/10.3390/e28040445 - 14 Apr 2026

Viewed by 178

Multimodal biomedical machine learning increasingly integrates heterogeneous data sources (including medical imaging, multi-omics profiles, electronic health records, and wearable sensor signals) to support clinical diagnosis, prognosis, and treatment response prediction. Despite strong empirical performance, most existing multimodal systems lack a principled theoretical foundation [...] Read more.

Multimodal biomedical machine learning increasingly integrates heterogeneous data sources (including medical imaging, multi-omics profiles, electronic health records, and wearable sensor signals) to support clinical diagnosis, prognosis, and treatment response prediction. Despite strong empirical performance, most existing multimodal systems lack a principled theoretical foundation for understanding why fusion improves prediction, how information is distributed across modalities, and when models can be trusted under incomplete or shifting data. This paper develops a unified information-theoretic framework that formalizes multimodal biomedical learning as an information optimization problem. We formulate multimodal representation learning through the information bottleneck principle, deriving a variational objective that balances predictive sufficiency against informational compression in an architecture-agnostic manner. Building on this foundation, we introduce information-theoretic tools for decomposing modality contributions via conditional mutual information, quantifying redundancy and synergy, and diagnosing fusion collapse. We further show that robustness to missing modalities can be cast as an information consistency problem and extend the framework to longitudinal disease modeling through transfer entropy and sequential information bottleneck objectives. Applications to multimodal foundation models, uncertainty quantification, calibration, and out-of-distribution detection are developed. Empirical case studies across three biomedical datasets (TCGA breast cancer multi-omics, TCGA glioma clinical-plus-molecular data, and OASIS-2 longitudinal Alzheimer’s data) show that the framework’s key quantities are computable and interpretable on real data: MI decomposition identifies modality dominance and redundancy; the VMIB traces a compression–prediction tradeoff in the information plane; entropy-based selective prediction raises accuracy from 0.787 to 0.939 at 50% coverage; transfer entropy reveals stage-dependent modality influence in disease progression; and pretraining/adaptation diagnostics distinguish efficient from wasteful fine-tuning strategies. Together, these results develop entropy and mutual information as organizing principles for the design, analysis, and evaluation of multimodal biomedical AI systems. Full article

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning for Biomedical Applications: Entropy and Information-Theoretic Perspectives)

► Show Figures

Figure 1

18 pages, 1357 KB

Open AccessArticle

Fault Diagnosis for Hydropower Units Based on Multi-Sensor Data with Multi-Scale Fusion

by Di Zhou, Xiangqu Xiao and Chaoshun Li

Water 2026, 18(8), 915; https://doi.org/10.3390/w18080915 - 11 Apr 2026

Viewed by 222

Abstract

Accurate fault diagnosis of hydropower units is crucial for ensuring the efficient and complete utilization of hydropower resources. Existing diagnostic methods predominantly consider either single-sensor or single-scale multi-sensor fusion, failing to fully exploit the effective information within monitoring data. Furthermore, they neglect the [...] Read more.

Accurate fault diagnosis of hydropower units is crucial for ensuring the efficient and complete utilization of hydropower resources. Existing diagnostic methods predominantly consider either single-sensor or single-scale multi-sensor fusion, failing to fully exploit the effective information within monitoring data. Furthermore, they neglect the correlation between different sensors and faults during fusion diagnosis, thereby limiting the diagnostic performance of fusion models. To address this, this paper proposes a multi-sensor data fault diagnosis method based on multi-scale fusion. First, a feature extraction model is constructed to extract shallow-level features from multi-sensor signals across multiple dimensions. Subsequently, an attention-based feature fusion network is designed to extract and fuse multi-depth features, yielding high-quality deep-fused features. Finally, an information-entropy-based decision fusion strategy is established to effectively enhance the model’s diagnostic performance. Experimental validation on the public rotating machinery fault dataset and the hydropower unit fault dataset yielded diagnostic accuracies of 96.42% and 99.28%, respectively, demonstrating the significant effectiveness and robustness of the proposed method. Full article

(This article belongs to the Section Water-Energy Nexus)

35 pages, 856 KB

Open AccessArticle

Stock Forecasting Based on Informational Complexity Representation: A Framework of Wavelet Entropy, Multiscale Entropy, and Dual-Branch Network

by Guisheng Tian, Chengjun Xu and Yiwen Yang

Entropy 2026, 28(4), 424; https://doi.org/10.3390/e28040424 - 10 Apr 2026

Viewed by 168

Abstract

Stock price sequences are characterized by pronounced nonlinearity, non-stationarity, and multi-scale volatility. They are further influenced by complex, multi-source factors, such as macroeconomic conditions and market behavior, making high-precision forecasting highly challenging. Existing approaches are limited by noise and multi-dimensional market features, as [...] Read more.

Stock price sequences are characterized by pronounced nonlinearity, non-stationarity, and multi-scale volatility. They are further influenced by complex, multi-source factors, such as macroeconomic conditions and market behavior, making high-precision forecasting highly challenging. Existing approaches are limited by noise and multi-dimensional market features, as well as difficulties in balancing prediction accuracy with model complexity. To address these challenges, we propose Wavelet Entropy and Cross-Attention Network (WECA-Net), which combines wavelet decomposition with a multimodal cross-attention mechanism. From an information-theoretic perspective, stock price dynamics reflect the time-varying uncertainty and informational complexity of the market. We employ wavelet entropy to quantify the dispersion and uncertainty of energy distribution across frequency bands, and multiscale entropy to measure the scale-dependent complexity and regularity of the time series. These entropy-derived descriptors provide an interpretable prior of “information content” for cross-modal attention fusion, thereby improving robustness and generalization under non-stationary market conditions. Experiments on Chinese stock indices, A-Share, and CSI 300 component stock datasets demonstrate that WECA-Net consistently outperforms mainstream models in Mean Absolute Error (MAE) and R² across all datasets. Notably, on the CSI 300 dataset, WECA-Net achieves an R² of 0.9895, underscoring its strong predictive accuracy and practical applicability. This framework is also well aligned with sensor data fusion and intelligent perception paradigms, offering a robust solution for financial signal processing and real-time market state awareness. Full article

(This article belongs to the Section Complexity)

24 pages, 36350 KB

Open AccessArticle

Partial Multi-Label Feature Selection via Entropy-Weighted Multi-Scale Neighborhood Granular Label Distribution Learning

by Yifan Cao, Mao Li, Cong Wang, Shuyu Fan, Ziqiao Yin and Binghui Guo

Entropy 2026, 28(4), 422; https://doi.org/10.3390/e28040422 - 9 Apr 2026

Viewed by 205

Abstract

Partial multi-label feature selection aims to identify discriminative features from data where each instance is associated with an ambiguous candidate label set. Existing methods are typically built upon single-scale modeling assumptions and may fail to fully exploit the multi-granularity structure underlying instance–label relationships. [...] Read more.

Partial multi-label feature selection aims to identify discriminative features from data where each instance is associated with an ambiguous candidate label set. Existing methods are typically built upon single-scale modeling assumptions and may fail to fully exploit the multi-granularity structure underlying instance–label relationships. To address this limitation, we propose a novel framework termed PML-FSMNG, which integrates entropy-weighted multi-scale neighborhood granules with label distribution learning. Specifically, multi-scale neighborhood systems are constructed to estimate label distinguishability at multiple structural scales, and Shannon entropy is employed to adaptively fuse scale-specific label distributions into a robust soft supervisory signal. Based on the learned label distribution, an embedded sparse regression model with

ℓ_{2, 1}

-norm regularization is developed for discriminative feature selection, together with an entropy-regularized adaptive graph learning mechanism to preserve intrinsic geometric structure. Extensive experiments on benchmark datasets demonstrate that the proposed method consistently outperforms several state-of-the-art approaches, validating the effectiveness of multi-scale modeling and entropy-guided adaptive learning under label ambiguity. Full article

► Show Figures

Figure 1

25 pages, 4248 KB

Open AccessArticle

A Spatial Post-Multiscale Fusion Entropy and Multi-Feature Synergy Model for Disturbance Identification of Charging Stations

by Hui Zhou, Xiujuan Zeng, Tong Liu, Wei Wu, Bolun Du and Yinglong Diao

Energies 2026, 19(8), 1837; https://doi.org/10.3390/en19081837 - 8 Apr 2026

Viewed by 315

Abstract

The large-scale integration and grid connection of renewable energy sources and charging stations introduce a multitude of nonlinear and impact loads, resulting in more severe distortion and higher complexity of disturbance signals in power systems. As a consequence, power quality disturbances (PQDs) in [...] Read more.

The large-scale integration and grid connection of renewable energy sources and charging stations introduce a multitude of nonlinear and impact loads, resulting in more severe distortion and higher complexity of disturbance signals in power systems. As a consequence, power quality disturbances (PQDs) in active distribution networks, including overvoltage and harmonics, display greater randomness and diversity, which increases the challenge of PQD identification. To tackle this problem, this study presents a dual-channel early-fusion approach for PQD recognition based on Spatial Post-MultiScale Fusion Entropy (SMFE). SMFE is used as an entropy-based feature-construction pipeline in which a time–frequency representation is formed prior to spatial post-multiscale aggregation to produce a compact complexity map complementary to waveform morphology. Subsequently, a dual-channel model is constructed by integrating waveform-morphology input with SMFE-derived complexity features for joint learning. By leveraging the ConvNeXt architecture and a Squeeze-and-Excitation (SE) mechanism, a multimodal channel-recalibration model is implemented to emphasize informative feature responses during PQD recognition. Experimental verification with simulated signals shows that the proposed approach achieves an identification accuracy of 97.83% under an SNR of 30 dB, indicating robust performance under the tested noise settings. Full article

(This article belongs to the Special Issue Advances in Active Distribution Networks with Multi-Type Renewable Energy and Load Integration)

► Show Figures

Figure 1

25 pages, 7467 KB

Open AccessArticle

Double Cost-Volume Stereo Matching with Entropy-Difference-Guided Fusion

by Huanchun Yang, Hongshe Dang, Xuande Zhang and Quanping Chen

Electronics 2026, 15(7), 1525; https://doi.org/10.3390/electronics15071525 - 6 Apr 2026

Viewed by 339

Abstract

To address the reduced accuracy of stereo matching networks near object boundaries and disparity discontinuities, a double cost–volume stereo matching network with entropy-difference-guided fusion is proposed. The proposed network was built based on RAFT-Stereo. It employs a pretrained backbone to extract multi-scale features [...] Read more.

To address the reduced accuracy of stereo matching networks near object boundaries and disparity discontinuities, a double cost–volume stereo matching network with entropy-difference-guided fusion is proposed. The proposed network was built based on RAFT-Stereo. It employs a pretrained backbone to extract multi-scale features and uses deformable attention for cross-scale feature fusion. A shallow image-guided branch was used to generate pixel-wise constraint information to limit the magnitude of sampling offsets and alleviate cross-structure sampling. Based on the extracted features, a group-wise correlation cost–volume and a normalized correlation cost–volume were constructed. Both cost–volumes were regularized by 3D Hourglass networks, and a structure-consistent intra-scale aggregation module was introduced during the regularization of the group-wise correlation cost–volume. The two aggregated results were then fused by the entropy-difference-guided fusion module to obtain the final cost–volume. The experimental results show the effectiveness of the proposed network in the Scene Flow, KITTI, and ETH3D datasets, achieving an endpoint error of 0.45 px and a >3 px error rate of 2.41% on the Scene Flow dataset. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

32 pages, 43664 KB

Open AccessArticle

MVFF: Multi-View Feature Fusion Network for Small UAV Detection

by Kunlin Zou, Haitao Zhao, Xingwei Yan, Wei Wang, Yan Zhang and Yaxiu Zhang

Drones 2026, 10(4), 264; https://doi.org/10.3390/drones10040264 - 4 Apr 2026

Viewed by 524

Abstract

With the widespread adoption of various types of Unmanned Aerial Vehicles (UAVs), their non-compliant operations pose a severe challenge to public safety, necessitating the urgent identification and detection of UAV targets. However, in complex backgrounds, UAV targets exhibit small-scale dimensions and low contrast, [...] Read more.

With the widespread adoption of various types of Unmanned Aerial Vehicles (UAVs), their non-compliant operations pose a severe challenge to public safety, necessitating the urgent identification and detection of UAV targets. However, in complex backgrounds, UAV targets exhibit small-scale dimensions and low contrast, coupled with extremely low signal-to-noise ratios. This forces conventional target detection methods to confront issues such as feature convergence, missed detections, and false alarms. To address these challenges, we propose a Multi-View Feature Fusion Network (MVFF) that achieves precise identification of small, low-contrast UAV targets by leveraging complementary multi-view information. First, we design a collaborative view alignment fusion module. This module employs a cross-map feature fusion attention mechanism to establish pixel-level mapping relationships and perform deep fusion, effectively resolving geometric distortion and semantic overlap caused by imaging angle differences. Furthermore, we introduce a view feature smoothing module that employs displacement operators to construct a lightweight long-range modeling mechanism. This overcomes the limitations of traditional convolutional local receptive fields, effectively eliminating ghosting artifacts and response discontinuities arising from multi-view fusion. Additionally, we developed a small object binary cross-entropy loss function. By incorporating scale-adaptive gain factors and confidence-aware weights, this function enhances the learning capability of edge features in small objects, significantly reducing prediction uncertainty caused by background noise. Comparative experiments conducted on a multi-perspective UAV dataset demonstrate that our approach consistently outperforms existing state-of-the-art methods across multiple performance metrics. Specifically, it achieves a Structure-measure of 91.50% and an F-measure of 85.14%, validating the effectiveness and superiority of the proposed method. Full article

(This article belongs to the Special Issue Detection, Identification and Tracking of UAVs and Drones: 2nd Edition)

► Show Figures

Figure 1

18 pages, 1085 KB

Open AccessArticle

Self-Learning Multimodal Emotion Recognition Based on Multi-Scale Dilated Attention

by Xiuli Du and Luyao Zhu

Brain Sci. 2026, 16(4), 350; https://doi.org/10.3390/brainsci16040350 - 25 Mar 2026

Viewed by 385

Abstract

Background/Objectives: Emotions can be recognized through external behavioral cues and internal physiological signals. Owing to the inherently complex psychological and physiological nature of emotions, models relying on a single modality often suffer from limited robustness. This study aims to improve emotion recognition performance [...] Read more.

Background/Objectives: Emotions can be recognized through external behavioral cues and internal physiological signals. Owing to the inherently complex psychological and physiological nature of emotions, models relying on a single modality often suffer from limited robustness. This study aims to improve emotion recognition performance by effectively integrating electroencephalogram (EEG) signals and facial expressions through a multimodal framework. Methods: We propose a multimodal emotion recognition model that employs a Multi-Scale Dilated Attention Convolution (MSDAC) network tailored for facial expression recognition, integrates an EEG emotion recognition method based on three-dimensional features, and adopts a self-learning decision-level fusion strategy. MSDAC incorporates Multi-Scale Dilated Convolutions and a Dual-Branch Attention (D-BA) module to capture discontinuous facial action units. For EEG processing, raw signals are converted into a multidimensional time–frequency–spatial representation to preserve temporal, spectral, and spatial information. To overcome the limitations of traditional stitching or fixed-weight fusion approaches, a self-learning weight fusion mechanism is introduced at the decision level to adaptively adjust modality contributions. Results: The facial analysis branch achieved average accuracies of 74.1% on FER2013, 99.69% on CK+, and 98.05% (valence)/96.15% (arousal) on DEAP. On the DEAP dataset, the complete multimodal model reached 98.66% accuracy for valence and 97.49% for arousal classification. Conclusions: The proposed framework enhances emotion recognition by improving facial feature extraction and enabling adaptive multimodal fusion, demonstrating the effectiveness of combining EEG and facial information for robust emotion analysis. Full article

(This article belongs to the Section Cognitive, Social and Affective Neuroscience)

► Show Figures

Graphical abstract

23 pages, 10822 KB

Open AccessArticle

Off-Road Autonomous Vehicle Semantic Segmentation and Spatial Overlay Video Assembly

by Itai Dror, Omer Aviv and Ofer Hadar

Sensors 2026, 26(6), 1944; https://doi.org/10.3390/s26061944 - 19 Mar 2026

Viewed by 457

Abstract

Autonomous systems are expanding rapidly, driving a demand for robust perception technologies capable of navigating challenging, unstructured environments. While urban autonomy has made significant progress, off-road environments pose unique challenges, including dynamic terrain and limited communication infrastructure. This research addresses these challenges by [...] Read more.

Autonomous systems are expanding rapidly, driving a demand for robust perception technologies capable of navigating challenging, unstructured environments. While urban autonomy has made significant progress, off-road environments pose unique challenges, including dynamic terrain and limited communication infrastructure. This research addresses these challenges by introducing a novel three-part solution for off-road autonomous vehicles. First, we present a large-scale off-road dataset curated to capture the visual complexity and variability of unstructured environments, providing a realistic training ground that supports improved model generalization. Second, we propose a Confusion-Aware Loss (CAL) that dynamically penalizes systematic misclassifications based on class-level confusion statistics. When combined with cross-entropy, CAL improves segmentation mean Intersection over Union (mIoU) on the off-road test set from 68.66% to 70.06% and achieves cross-domain gains of up to ~0.49% mIoU on the Cityscapes dataset. Third, leveraging semantic segmentation as an intermediate representation, we introduce a spatial overlay video encoding scheme that preserves high-fidelity RGB information in semantically critical regions while compressing non-essential background regions. Experimental results demonstrate Peak Signal-to-Noise Ratio (PSNR) improvements of up to +5 dB and Video Multi-Method Assessment Fusion (VMAF) gains of up to +40 points under lossy compression, enabling efficient and reliable off-road autonomous operation. This integrated approach provides a robust framework for real-time remote operation in bandwidth-constrained environments. Full article

(This article belongs to the Special Issue Machine Learning in Image/Video Processing and Sensing)

► Show Figures

Figure 1

21 pages, 4667 KB

Open AccessArticle

MM-WAE: Multimodal Wasserstein Autoencoders for Semi-Supervised Wafer Map Defect Recognition

by Yifeng Zhang, Qingqing Sun, Ziyu Liu and David Wei Zhang

Micromachines 2026, 17(3), 367; https://doi.org/10.3390/mi17030367 - 18 Mar 2026

Viewed by 310

Abstract

Wafer map defect pattern recognition is a key task for ensuring yield in integrated circuit manufacturing. However, in real production lines it commonly suffers from scarce labeled data, long-tailed class distributions, and limited feature representations, which cause existing deep learning models to degrade [...] Read more.

Wafer map defect pattern recognition is a key task for ensuring yield in integrated circuit manufacturing. However, in real production lines it commonly suffers from scarce labeled data, long-tailed class distributions, and limited feature representations, which cause existing deep learning models to degrade in performance, particularly for minority defect classes and complex defect morphologies. To address these challenges, we propose a semi-supervised classification method for wafer maps based on a multimodal Wasserstein autoencoder (MM-WAE). The framework constructs three parallel feature branches in the spatial, frequency, and texture domains, using a multi-head attention mechanism and gating mechanism for adaptive multimodal fusion. This allows defect patterns to be comprehensively characterized by macroscopic geometric distributions, spectral periodic structures, and microscopic texture details. The Wasserstein autoencoder is introduced, with the latent space distribution regularized by a maximum mean discrepancy (MMD) loss using an inverse multiquadratic kernel. Additionally, an inverse class-frequency weighted cross-entropy loss and a modality consistency loss between the encoder and classifier jointly optimize the reconstruction and classification paths while leveraging large amounts of unlabeled wafer maps for semi-supervised learning. Experimental results show that MM-WAE mitigates performance limitations caused by insufficient labels and class imbalance, significantly improving the accuracy and robustness of wafer defect classification, with promising potential for industrial application and further development. Full article

(This article belongs to the Section E：Engineering and Technology)

► Show Figures

Figure 1

36 pages, 11911 KB

Open AccessArticle

Soil Moisture Retrieval Using Multi-Satellite Dual-Frequency GNSS-IR Considering Environmental Factors

by Shihai Nie, Yongjun Jia, Peng Li, Xing Wu and Yuchao Tang

Remote Sens. 2026, 18(6), 917; https://doi.org/10.3390/rs18060917 - 18 Mar 2026

Viewed by 364

Abstract

Global Navigation Satellite System Interferometric Reflectometry (GNSS-IR) provides a low-cost, all-weather approach for continuous soil moisture content (SMC) retrieval. However, in single-constellation, multi-satellite applications, the optimal satellite number and the combined effects of multiple environmental factors on retrieval accuracy and stability remain insufficiently [...] Read more.

Global Navigation Satellite System Interferometric Reflectometry (GNSS-IR) provides a low-cost, all-weather approach for continuous soil moisture content (SMC) retrieval. However, in single-constellation, multi-satellite applications, the optimal satellite number and the combined effects of multiple environmental factors on retrieval accuracy and stability remain insufficiently quantified. To address these issues, this study develops a dual-frequency GNSS-IR SMC retrieval framework that explicitly incorporates multiple environmental factors. Entropy-based fusion (EFM) is used to adaptively weight dual-frequency phase-delay observations, and a marginal-gain criterion is introduced to determine a suitable number of participating satellites. On this basis, univariate linear regression (ULR) and random forest (RF) models are established, and the Normalized Difference Vegetation Index (NDVI), temperature, and precipitation are incorporated into the RF model to improve retrieval robustness and quantify the relative contributions of environmental factors. The results show that multi-satellite combinations significantly improve SMC retrieval performance, while the incremental gain exhibits clearly diminishing returns and converges when the number of participating satellites reaches about 5–6 within a single constellation. Dual-frequency fusion consistently outperforms single-frequency schemes across different GNSS constellations, demonstrating the complementary value of multi-frequency information under multi-satellite conditions. In addition, the environmentally informed nonlinear model achieves higher accuracy and stability than the linear model, and the dominant environmental drivers differ across stations. Overall, this study provides quantitative support for configuring single-constellation multi-satellite GNSS-IR soil moisture monitoring schemes and for improving retrieval robustness under complex environmental conditions. Full article

(This article belongs to the Special Issue Remote Sensing in Monitoring Coastal and Inland Waters)

► Show Figures

Figure 1

19 pages, 880 KB

Open AccessArticle

A Hybrid Model for Copper Futures Price Forecasting Utilizing Complexity-Aware Variational Mode Decomposition and Reconstruction and Multi-Behavior-Triggered Interaction Modeling

by Yan Li and Dezhi Liu

Entropy 2026, 28(3), 320; https://doi.org/10.3390/e28030320 - 12 Mar 2026

Viewed by 418

Abstract

Accurate forecasting of copper futures prices is crucial for risk management and investment decisions. However, existing approaches primarily rely on historical prices and incorporate behavioral signals without a unified modeling framework. To address this limitation, we propose MBTI-Net (Multi-source Behavior-Triggered Interaction Network), a [...] Read more.

Accurate forecasting of copper futures prices is crucial for risk management and investment decisions. However, existing approaches primarily rely on historical prices and incorporate behavioral signals without a unified modeling framework. To address this limitation, we propose MBTI-Net (Multi-source Behavior-Triggered Interaction Network), a behavior-aware forecasting framework for heterogeneous copper market data. We first construct a compact behavioral factor from Baidu search indices via a multi-view projection strategy that preserves structural and predictive information. We then develop a complexity-aware reconstruction mechanism that aggregates intrinsic mode functions into multi-frequency components based on fuzzy entropy and energy. To accommodate distributional and volatility differences between behavioral and market variables, we introduce VB-ReVIN (Volatility- and Behavior-aware Reversible Instance Normalization). Building upon these representations, MBTI-Net models dynamic multi-source interactions triggered by behavioral intensity and market conditions, enabling adaptive cross-source information fusion. Experiments on LME and SHFE copper futures datasets demonstrate consistent improvements over state-of-the-art benchmarks, highlighting the importance of explicitly modeling behavior-driven dependencies in financial forecasting. Full article

(This article belongs to the Special Issue Time Series Analysis for Signal Processing)

► Show Figures

Figure 1

22 pages, 3475 KB

Open AccessArticle

Cross-Layer Feature Fusion and Attention-Based Class Feature Alignment Network for Unsupervised Cross-Domain Remote Sensing Scene Classification

by Jiahao Wei, Erzhu Li and Ce Zhang

Remote Sens. 2026, 18(6), 859; https://doi.org/10.3390/rs18060859 - 11 Mar 2026

Viewed by 330

Abstract

Remote sensing scene classification is one of the crucial techniques for high-resolution remote sensing image interpretation and has received widespread attention in recent years. However, acquiring high-quality labeled data is both costly and time-consuming, making unsupervised domain adaptation (UDA) an important research focus [...] Read more.

Remote sensing scene classification is one of the crucial techniques for high-resolution remote sensing image interpretation and has received widespread attention in recent years. However, acquiring high-quality labeled data is both costly and time-consuming, making unsupervised domain adaptation (UDA) an important research focus in scene classification. Existing UDA methods focus primarily on aligning the overall feature distributions across domains but neglect class feature alignment, resulting in the loss of critical class information. To address this issue, a cross-layer feature fusion and attention-based class feature alignment network (CFACA-NET) is proposed for unsupervised cross-domain remote sensing scene classification. Specifically, a multi-layer feature extraction module (MFEM) consisting of a cross-layer feature fusion module (CFFM), a multi-scale dynamic attention module (MSDAM), and a fused feature optimization module (FFOM) is designed to enhance the representation ability of scene features. A high-confidence sample selection module is further introduced, which utilizes evidence theory and information entropy to obtain reliable pseudo-labels. Finally, a class feature alignment module is proposed, incorporating a two-stage training strategy to achieve effective class feature alignment. Experimental results on three remote sensing scene classification datasets demonstrate that CFACA-NET outperforms existing state-of-the-art methods in cross-domain classification performance, effectively enhancing cross-domain adaptation capability. Full article

(This article belongs to the Special Issue Advanced Application of Artificial Intelligence and Machine Vision in Remote Sensing (Fourth Edition))

► Show Figures

Figure 1

27 pages, 7489 KB

Open AccessArticle

A Novel CNN–ViT Model with Cascade Upsampling for Efficient Crack Segmentation

by Ahmed Tibermacine, Imad Eddine Tibermacine, Zineddine S. Kahhoul, Ilyes Naidji, Abdelaziz Rabehi and Mustapha Habib

Sensors 2026, 26(5), 1667; https://doi.org/10.3390/s26051667 - 6 Mar 2026

Viewed by 459

Abstract

Accurate crack segmentation in civil infrastructure imagery remains challenging because of the prevalence of thin, low-contrast, and spatially discontinuous defects that often appear amid textured surfaces, shadows, and acquisition noise. Although Transformer-based models improve global context modeling, many existing solutions incur substantial computational [...] Read more.

Accurate crack segmentation in civil infrastructure imagery remains challenging because of the prevalence of thin, low-contrast, and spatially discontinuous defects that often appear amid textured surfaces, shadows, and acquisition noise. Although Transformer-based models improve global context modeling, many existing solutions incur substantial computational and memory overhead, which limits their use in practical, resource-constrained inspection settings. In this work, we introduce an efficient hybrid segmentation architecture that combines a convolutional encoder for high-fidelity local representation with a lightweight Transformer bottleneck for global dependency modeling, followed by a progressive decoder that restores spatial resolution through multi-level skip-feature fusion. To better accommodate severe foreground sparsity and preserve fine crack structures, the framework is trained with a composite Dice–Binary Cross-Entropy objective and employs a tokenization strategy designed to preserve fine spatial details while enabling efficient global context modeling. We validate the proposed approach on four public benchmarks, demonstrating consistent improvements over representative convolutional, Transformer-based, and hybrid baselines, while ablation studies confirm the contribution of each design component. Finally, runtime profiling shows favorable latency and memory characteristics, supporting real-time or near real-time deployment on embedded and edge inspection platforms. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

26 pages, 3000 KB

Open AccessArticle

Material Classification from Non-Line-of-Sight Acoustic Echoes Using Wavelet-Acoustic Hybrid Feature Fusion

by Dilan Onat Alakuş and İbrahim Türkoğlu

Sensors 2026, 26(5), 1577; https://doi.org/10.3390/s26051577 - 3 Mar 2026

Viewed by 453

Abstract

Acoustic material classification under non-line-of-sight (NLOS) conditions—where direct sound paths are obstructed—is a challenging task due to echo attenuation, complex reflections, and noise effects. This study aims to improve NLOS material recognition by introducing a novel wavelet–acoustic hybrid feature fusion method integrated with [...] Read more.

Acoustic material classification under non-line-of-sight (NLOS) conditions—where direct sound paths are obstructed—is a challenging task due to echo attenuation, complex reflections, and noise effects. This study aims to improve NLOS material recognition by introducing a novel wavelet–acoustic hybrid feature fusion method integrated with deep recurrent neural network architectures. Echo signals from nine different materials were collected using the newly developed ANLOS-R (Acoustic Non-Line-of-Sight Recognition) dataset, which was specifically designed to simulate realistic NLOS propagation environments. From these recordings, time-domain acoustic features and multi-scale wavelet-based energy and entropy statistics were extracted using ten wavelet families. The resulting 70-dimensional hybrid feature set was used to train several deep learning architectures, including Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), and Convolutional Neural Network–LSTM (CNN–LSTM). Among these, the CNN–LSTM achieved the highest balanced accuracy and macro-F1 score of 0.99, showing strong generalization and convergence performance. SHapley Additive exPlanations (SHAP) analysis indicated that Mel-Frequency Cepstral Coefficients (MFCCs) and wavelet entropy–energy features play complementary roles in material discrimination. The proposed approach provides a robust and interpretable framework for real-time NLOS acoustic sensing, bridging data-driven deep learning with the physical understanding of acoustic material behavior. Full article

(This article belongs to the Section Sensor Materials)

► Show Figures

Figure 1

Search Results (240)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (240)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI