Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (117)

Search Parameters:
Keywords = Multimodality and multiscale approaches

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 7609 KB  
Article
MMDFRNet: Dynamic Cross-Modal Decoupling and Alignment for Robust Rice Mapping
by Tingyan Fu, Jia Ge and Shufang Tian
Remote Sens. 2026, 18(9), 1413; https://doi.org/10.3390/rs18091413 - 2 May 2026
Abstract
Accurate rice mapping is critical for grain yield estimation and food security, yet traditional methods often struggle with asynchronous data quality and the inherent statistical gap between SAR and optical signals. To bridge this gap, we propose MMDFRNet, a novel multi-modal deep learning [...] Read more.
Accurate rice mapping is critical for grain yield estimation and food security, yet traditional methods often struggle with asynchronous data quality and the inherent statistical gap between SAR and optical signals. To bridge this gap, we propose MMDFRNet, a novel multi-modal deep learning framework that synergistically integrates Sentinel-1 SAR and Sentinel-2 optical imagery. Unlike conventional static fusion approaches, MMDFRNet features a dual-stream modality-specific encoder architecture designed to decouple structural backscattering signals from spectral reflectance. Central to this framework is the multi-modal feature fusion (MMF) module, which employs an adaptive attention mechanism to dynamically align and recalibrate features based on their reliability, effectively mitigating noise from compromised modalities. Additionally, a multi-scale feature fusion (MSF) module is incorporated to coordinate hierarchical semantic information, enhancing boundary delineation in fragmented landscapes. Extensive experiments conducted across multiple study areas in China demonstrate the superiority of MMDFRNet. The model achieves a Precision of 0.9234, an IoU of 0.8612, and an F1-score of 0.9252. Notably, it consistently outperforms state-of-the-art benchmarks (e.g., UNetFormer, STMA, and CCRNet) by margins of up to 11.72% (Precision) and 7.39% (IoU) compared to classic baselines. Furthermore, rigorous ablation studies and degradation analyses confirm the model’s robustness, verifying its ability to transform the degradation paradox into a performance booster through pixel-wise adaptive alignment. Consequently, MMDFRNet offers a promising solution for precise rice area statistics and long-term monitoring in complex agricultural landscapes. Full article
Show Figures

Figure 1

5 pages, 843 KB  
Proceeding Paper
Spatial Scaling Effects in Water Demand
by Roberto Magini, Maria Antonietta Boniforti and Roberto Guercio
Eng. Proc. 2026, 135(1), 3; https://doi.org/10.3390/engproc2026135003 - 29 Apr 2026
Viewed by 136
Abstract
Residential water demand exhibits stochastic variability across all spatial and temporal scales, making probabilistic approaches essential for realistic modelling. Using scaling laws enables the derivation of statistics on aggregated demand from single-user data while maintaining information about spatial correlation. This study highlights how [...] Read more.
Residential water demand exhibits stochastic variability across all spatial and temporal scales, making probabilistic approaches essential for realistic modelling. Using scaling laws enables the derivation of statistics on aggregated demand from single-user data while maintaining information about spatial correlation. This study highlights how scaling laws define the dependence of mean and variance on the number of users and also reveals multiscale dynamics. In particular, variance growth at variable exponents and the emergence of multimodal distributions are shown. Furthermore, the Poisson Rectangular Pulse (PRP) model can reproduce these features by introducing frequency-use patterns that reflect the non-homogeneous nature of water demand. Full article
Show Figures

Figure 1

22 pages, 5140 KB  
Article
Application of Deep Multi-Scale Representation Learning Based on Eye-Tracking and Facial Expression Data in Cognitive Decline Assessment
by Yanfeng Xue, Xianpeng Luo, Shuai Guo and Tao Song
Sensors 2026, 26(9), 2600; https://doi.org/10.3390/s26092600 - 23 Apr 2026
Viewed by 408
Abstract
Digital biomarkers derived from eye-tracking and facial expression hold significant potential for the non-invasive screening of cognitive decline (CD). However, existing approaches predominantly rely on single-task or feature engineering-based unimodal methods, which struggle to capture complex temporal behavioral patterns. While deep learning (DL) [...] Read more.
Digital biomarkers derived from eye-tracking and facial expression hold significant potential for the non-invasive screening of cognitive decline (CD). However, existing approaches predominantly rely on single-task or feature engineering-based unimodal methods, which struggle to capture complex temporal behavioral patterns. While deep learning (DL) excels at extracting hierarchical features and intricate temporal dynamics from behavioral sequences, its application in this specific multimodal sensing domain remains exploratory. Addressing this gap, this study designed an assessment system integrating five multi-dimensional cognitive paradigms and collected eye-tracking and facial expression data from 20 healthy controls (HC) and 20 individuals with CD. For these multimodal sequences, we propose a deep neural network capable of multi-scale representation learning. By utilizing subspace exploration and multi-scale convolutions, this architecture extracts deep representations directly from data and incorporates a decision fusion mechanism to enhance diagnostic robustness. Experimental results demonstrate that our method achieves a 90% classification accuracy, outperforming machine learning models. Furthermore, statistical analyses conducted in this study validated several features associated with CD and also explored some novel potential behavioral patterns. This study confirms the feasibility of a DL framework based on eye-tracking and facial expression signals for identifying CD, providing a reference for developing objective and efficient digital screening tools. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

34 pages, 2037 KB  
Article
Stock Forecasting Based on Informational Complexity Representation: A Framework of Wavelet Entropy, Multiscale Entropy, and Dual-Branch Network
by Guisheng Tian, Chengjun Xu and Yiwen Yang
Entropy 2026, 28(4), 424; https://doi.org/10.3390/e28040424 - 10 Apr 2026
Viewed by 233
Abstract
Stock price sequences are characterized by pronounced nonlinearity, non-stationarity, and multi-scale volatility. They are further influenced by complex, multi-source factors, such as macroeconomic conditions and market behavior, making high-precision forecasting highly challenging. Existing approaches are limited by noise and multi-dimensional market features, as [...] Read more.
Stock price sequences are characterized by pronounced nonlinearity, non-stationarity, and multi-scale volatility. They are further influenced by complex, multi-source factors, such as macroeconomic conditions and market behavior, making high-precision forecasting highly challenging. Existing approaches are limited by noise and multi-dimensional market features, as well as difficulties in balancing prediction accuracy with model complexity. To address these challenges, we propose Wavelet Entropy and Cross-Attention Network (WECA-Net), which combines wavelet decomposition with a multimodal cross-attention mechanism. From an information-theoretic perspective, stock price dynamics reflect the time-varying uncertainty and informational complexity of the market. We employ wavelet entropy to quantify the dispersion and uncertainty of energy distribution across frequency bands, and multiscale entropy to measure the scale-dependent complexity and regularity of the time series. These entropy-derived descriptors provide an interpretable prior of “information content” for cross-modal attention fusion, thereby improving robustness and generalization under non-stationary market conditions. Experiments on Chinese stock indices, A-Share, and CSI 300 component stock datasets demonstrate that WECA-Net consistently outperforms mainstream models in Mean Absolute Error (MAE) and R2 across all datasets. Notably, on the CSI 300 dataset, WECA-Net achieves an R2 of 0.9895, underscoring its strong predictive accuracy and practical applicability. This framework is also well aligned with sensor data fusion and intelligent perception paradigms, offering a robust solution for financial signal processing and real-time market state awareness. Full article
(This article belongs to the Section Complexity)
Show Figures

Figure 1

25 pages, 4248 KB  
Article
A Spatial Post-Multiscale Fusion Entropy and Multi-Feature Synergy Model for Disturbance Identification of Charging Stations
by Hui Zhou, Xiujuan Zeng, Tong Liu, Wei Wu, Bolun Du and Yinglong Diao
Energies 2026, 19(8), 1837; https://doi.org/10.3390/en19081837 - 8 Apr 2026
Viewed by 357
Abstract
The large-scale integration and grid connection of renewable energy sources and charging stations introduce a multitude of nonlinear and impact loads, resulting in more severe distortion and higher complexity of disturbance signals in power systems. As a consequence, power quality disturbances (PQDs) in [...] Read more.
The large-scale integration and grid connection of renewable energy sources and charging stations introduce a multitude of nonlinear and impact loads, resulting in more severe distortion and higher complexity of disturbance signals in power systems. As a consequence, power quality disturbances (PQDs) in active distribution networks, including overvoltage and harmonics, display greater randomness and diversity, which increases the challenge of PQD identification. To tackle this problem, this study presents a dual-channel early-fusion approach for PQD recognition based on Spatial Post-MultiScale Fusion Entropy (SMFE). SMFE is used as an entropy-based feature-construction pipeline in which a time–frequency representation is formed prior to spatial post-multiscale aggregation to produce a compact complexity map complementary to waveform morphology. Subsequently, a dual-channel model is constructed by integrating waveform-morphology input with SMFE-derived complexity features for joint learning. By leveraging the ConvNeXt architecture and a Squeeze-and-Excitation (SE) mechanism, a multimodal channel-recalibration model is implemented to emphasize informative feature responses during PQD recognition. Experimental verification with simulated signals shows that the proposed approach achieves an identification accuracy of 97.83% under an SNR of 30 dB, indicating robust performance under the tested noise settings. Full article
Show Figures

Figure 1

27 pages, 24041 KB  
Article
PMDet: Patch-Aware Enhancement and Fusion for Multispectral Object Detection
by Jie Li, Chenhong Sui, Jing Wang and Jun Zhou
Remote Sens. 2026, 18(7), 1068; https://doi.org/10.3390/rs18071068 - 2 Apr 2026
Viewed by 386
Abstract
Multispectral object detection addresses the limitations of single-modal approaches by fusing complementary information from visible and infrared images, thereby improving robustness in complex environments. However, the inter-modal representations are inherently misaligned due to sensing discrepancies, and the complementary cues they provide are often [...] Read more.
Multispectral object detection addresses the limitations of single-modal approaches by fusing complementary information from visible and infrared images, thereby improving robustness in complex environments. However, the inter-modal representations are inherently misaligned due to sensing discrepancies, and the complementary cues they provide are often imbalanced, making it difficult to exploit modality-specific information effectively. Moreover, directly merging features from different modalities can introduce noise and artifacts that deteriorate the detection performance. To this end, this paper proposes a patch-aware enhancement and fusion network for multispectral object detection (PMDet). This method employs a dual-stream backbone equipped with the patch-aware Feature Enhancer (FE) module for cross-modal features alignment and enhancement. FE not only reinforces the feature representation of key regions but also helps to suppress local noise and enhance the model’s perception of fine textures and differences. Building on these enriched features, the patch-based Feature Aggregator (FA) module allows for efficient inter-modal feature interaction and semantic fusion with noise resistance. Specifically, both FE and FA modules leverage the shifted-patch design to preserve computational efficiency while enabling long-range modeling. In this regard, PMDet couples multi-scale cross-modal semantic enhancement with deep semantic fusion to form a stable and discriminative multimodal representation pipeline. Experiments on FLIR, LLVIP, and VEDAI demonstrate that the method outperforms mainstream approaches in detection accuracy and robustness, and ablation studies further verify the effectiveness of each module. Full article
Show Figures

Figure 1

33 pages, 10259 KB  
Article
Multimodal Remote Sensing Image Classification Based on Dynamic Group Convolution and Bidirectional Guided Cross-Attention Fusion
by Lu Zhang, Yaoguang Yang, Zhaoshuang He, Guolong Li, Feng Zhao, Wenqiang Hua, Gongwei Xiao and Jingyan Zhang
Remote Sens. 2026, 18(7), 1066; https://doi.org/10.3390/rs18071066 - 2 Apr 2026
Viewed by 423
Abstract
The synergistic integration of Hyperspectral Imaging (HSI) and Light Detection and Ranging (LiDAR) data has become a pivotal strategy in remote sensing for precise land-cover classification. However, existing multimodal deep learning frameworks frequently suffer from intrinsic limitations, including rigid feature extraction protocols, underutilization [...] Read more.
The synergistic integration of Hyperspectral Imaging (HSI) and Light Detection and Ranging (LiDAR) data has become a pivotal strategy in remote sensing for precise land-cover classification. However, existing multimodal deep learning frameworks frequently suffer from intrinsic limitations, including rigid feature extraction protocols, underutilization of LiDAR-derived textural information, and asymmetric fusion mechanisms that fail to balance the contribution of spectral and elevation features effectively. To address these challenges, this paper proposes a novel framework named DGC-BCAF, which integrates Dynamic Group Convolution and Bidirectional Guided Cross-Attention Fusion to achieve adaptive feature representation and robust cross-modal interaction. First, a Dynamic Group Convolution (DGConv) module embedded within a ResNet18 backbone is designed to function as the central spatial context extractor. Unlike traditional group convolution, this module learns a dynamic relationship matrix to automatically group input channels, thereby facilitating flexible and context-aware feature representation that adapts to complex spatial distributions. Second, to overcome the insufficient exploitation of elevation data, we introduce a dedicated LiDAR texture encoding branch. This branch innovatively fuses Gray-Level Co-occurrence Matrix (GLCM) statistical features with multi-scale convolutional representations, capturing both geometric height information and fine-grained surface textural details that are critical for distinguishing objects with similar elevations. Finally, central to our architecture is the Bidirectional Cross-Attention Fusion (BCAF) module. Unlike standard unidirectional fusion approaches, BCAF employs a LiDAR geometry to guide the selection of salient spectral bands, while simultaneously utilizing spectral signatures to emphasize informative LiDAR channels. This mutual guidance ensures a balanced contribution from both modalities. Extensive experiments conducted on three benchmark datasets—Houston 2013, Trento, and MUUFL—demonstrate that DGC-BCAF consistently outperforms state-of-the-art methods in terms of overall accuracy, average accuracy, and Kappa coefficient. The results confirm that the proposed adaptive grouping and bidirectional guidance strategies significantly improve classification performance, particularly in distinguishing spectrally similar materials and delineating complex urban structures. Full article
Show Figures

Figure 1

31 pages, 13534 KB  
Article
CSFADet: Dual-Modal Anti-UAV Detection via Cross-Spectral Feature Alignment and Adaptive Multi-Scale Refinement
by Heqin Yuan and Yuheng Li
Algorithms 2026, 19(4), 254; https://doi.org/10.3390/a19040254 - 26 Mar 2026
Viewed by 474
Abstract
Anti-unmanned aerial vehicle (Anti-UAV) detection is critical for airspace security, yet existing single-modality approaches suffer from severe performance degradation under adverse illumination, thermal crossover, and extreme scale variation. In this paper, we propose CSFADet, a dual-modal detection framework that jointly exploits visible and [...] Read more.
Anti-unmanned aerial vehicle (Anti-UAV) detection is critical for airspace security, yet existing single-modality approaches suffer from severe performance degradation under adverse illumination, thermal crossover, and extreme scale variation. In this paper, we propose CSFADet, a dual-modal detection framework that jointly exploits visible and infrared imagery through four tightly integrated modules. First, a Cross-Spectral Feature Alignment (CSFA) module performs early-stage spectral calibration by computing cross-modal query–value attention maps, generating modality-aware channel descriptors that re-weight and concatenate the two spectral streams. Second, a Dual-path Texture Enhancement Module (DTEM) enriches fine-grained spatial details via cascaded convolutions with residual connections. Third, a Dual-path Cross-Attention Module (DCAM) introduces a feature-shrinking token generation strategy followed by symmetric cross-attention branches with learnable scaling factors, Squeeze-and-Excitation recalibration, and a 1×1 convolution fusion head, enabling deep bidirectional interaction between modalities. Fourth, a Dual-path Information Refinement Module (DIRM) embeds Adaptive Residual Groups (ARGs) that cascade Multi-modal Spatial Attention Blocks (MSABs) with channel and dynamic spatial attention, culminating in a Multi-scale Scale-aware Fusion Refinement (MSFR) unit that employs three parallel multi-head attention branches with a Scale Reasoning Gate and Channel Fusion Layer to produce scale-discriminative enhanced features. Experiments on the public Anti-UAV300 benchmark show that CSFADet achieves 91.4% mAP@0.5 and 58.7% mAP@0.5:0.95, surpassing fifteen representative detectors spanning single-stage, two-stage, YOLO-family, and Transformer-based categories. Ablation studies confirm the complementary contributions of each module, and heatmap visualizations verify the model’s capacity to focus on small, distant UAV targets under challenging conditions. Full article
Show Figures

Figure 1

18 pages, 1085 KB  
Article
Self-Learning Multimodal Emotion Recognition Based on Multi-Scale Dilated Attention
by Xiuli Du and Luyao Zhu
Brain Sci. 2026, 16(4), 350; https://doi.org/10.3390/brainsci16040350 - 25 Mar 2026
Viewed by 452
Abstract
Background/Objectives: Emotions can be recognized through external behavioral cues and internal physiological signals. Owing to the inherently complex psychological and physiological nature of emotions, models relying on a single modality often suffer from limited robustness. This study aims to improve emotion recognition performance [...] Read more.
Background/Objectives: Emotions can be recognized through external behavioral cues and internal physiological signals. Owing to the inherently complex psychological and physiological nature of emotions, models relying on a single modality often suffer from limited robustness. This study aims to improve emotion recognition performance by effectively integrating electroencephalogram (EEG) signals and facial expressions through a multimodal framework. Methods: We propose a multimodal emotion recognition model that employs a Multi-Scale Dilated Attention Convolution (MSDAC) network tailored for facial expression recognition, integrates an EEG emotion recognition method based on three-dimensional features, and adopts a self-learning decision-level fusion strategy. MSDAC incorporates Multi-Scale Dilated Convolutions and a Dual-Branch Attention (D-BA) module to capture discontinuous facial action units. For EEG processing, raw signals are converted into a multidimensional time–frequency–spatial representation to preserve temporal, spectral, and spatial information. To overcome the limitations of traditional stitching or fixed-weight fusion approaches, a self-learning weight fusion mechanism is introduced at the decision level to adaptively adjust modality contributions. Results: The facial analysis branch achieved average accuracies of 74.1% on FER2013, 99.69% on CK+, and 98.05% (valence)/96.15% (arousal) on DEAP. On the DEAP dataset, the complete multimodal model reached 98.66% accuracy for valence and 97.49% for arousal classification. Conclusions: The proposed framework enhances emotion recognition by improving facial feature extraction and enabling adaptive multimodal fusion, demonstrating the effectiveness of combining EEG and facial information for robust emotion analysis. Full article
(This article belongs to the Section Cognitive, Social and Affective Neuroscience)
Show Figures

Graphical abstract

21 pages, 4953 KB  
Article
Bifurcation Analysis and Vibration Control of a Top-Tensioned Riser Under Parametric Resonance with a Tuned Mass Damper
by Hai-Su Wang, Guang Liu and Zhong-Rong Lu
J. Mar. Sci. Eng. 2026, 14(7), 602; https://doi.org/10.3390/jmse14070602 - 25 Mar 2026
Viewed by 361
Abstract
This paper presents a dynamic model of a top-tensioned riser (TTR) subjected to combined vortex-induced vibration (VIV) and time-varying tension excitation. The model employs a van der Pol oscillator to simulate load excitation caused by vortex shedding and incorporates a tuned mass damper [...] Read more.
This paper presents a dynamic model of a top-tensioned riser (TTR) subjected to combined vortex-induced vibration (VIV) and time-varying tension excitation. The model employs a van der Pol oscillator to simulate load excitation caused by vortex shedding and incorporates a tuned mass damper (TMD) to suppress nonlinear vibrations in the riser. The key contributions include, first, employing the Galerkin method to obtain a multi-mode approximate solution and analyzing it using single-mode approximate equations, and subsequently, applying a multi-scale approach to investigate the vibration reduction effect of the TMD under two typical resonance scenarios. By introducing a complex impedance term derived from the complex transfer function, the physical effect of the TMD is transformed into a frequency-dependent dynamic reaction force coupled to the riser’s equation of motion. The first involves 1:1 internal resonance between the structural frequency and vortex-induced frequency, while the second involves 1:2 parametric resonance between the structural frequency and the top tension frequency. Results indicate that when the structural frequency exhibits 1:2 parametric resonance with the top tension frequency, complex bifurcation behavior occurs, leading to large-amplitude structural responses. Findings demonstrate that TMDs effectively alter the system’s stability distribution and exhibit outstanding vibration-reduction efficiency under both typical resonance conditions. Full article
(This article belongs to the Special Issue Analysis of Strength, Fatigue, and Vibration in Marine Structures)
Show Figures

Figure 1

33 pages, 3891 KB  
Article
Correlation and Semantic Prior-Guided Multi-Scale Cross-Modal Interaction Network for SAR-OPT Image Fusion
by Xiaoyang Hou, Lingxi Zhou, Chenguo Feng, Hao Cha, Yang Liu, Liguo Liu and Haibo Liu
Remote Sens. 2026, 18(7), 975; https://doi.org/10.3390/rs18070975 - 24 Mar 2026
Viewed by 461
Abstract
Syntheticaperture radar (SAR) and optical (OPT) image fusion aims to leverage their complementary information to obtain a more comprehensive representation of ground objects. However, significant discrepancies exist between the two modalities in terms of imaging mechanisms and feature distributions. Consequently, existing multi-modal image [...] Read more.
Syntheticaperture radar (SAR) and optical (OPT) image fusion aims to leverage their complementary information to obtain a more comprehensive representation of ground objects. However, significant discrepancies exist between the two modalities in terms of imaging mechanisms and feature distributions. Consequently, existing multi-modal image fusion methods struggle to achieve robust cross-modal feature alignment and deep semantic consistency between the fused results and the source modalities. To address the above challenges, this paper proposes a correlation and semantic prior-guided multi-scale cross-modal interaction network (CSP-MCIN) for effective SAR-OPT image fusion. Specifically, CSP-MCIN first employs two modality-specific encoders based on ResNet-18 to extract low-level details and high-level semantic features from SAR and OPT images, respectively. Subsequently, a multi-scale interactive decoder integrating cross-modal Transformers and gated fusion units is constructed to align and aggregate semantic and detail information from both encoders. Finally, to strengthen source-modality representations, a novel loss function combining a pixel-domain correlation loss and a CLIP-guided semantic consistency loss is designed and optimized under a PCGrad-based multi-objective optimization scheme. Experimental results on public SAR-OPT image datasets demonstrate that the proposed CSP-MCIN achieves superior fusion performance and computational efficiency compared with state-of-the-art approaches. Full article
Show Figures

Figure 1

30 pages, 11585 KB  
Article
Study on Low-Carbon Planning and Design Strategies for University Campus Built Environment
by Long Ma, Xinge Du, Feng Gao, Yang Yang and Rui Gao
Buildings 2026, 16(7), 1274; https://doi.org/10.3390/buildings16071274 - 24 Mar 2026
Cited by 1 | Viewed by 360
Abstract
With the wave of new campus construction gradually receding, the focus of green campus planning and design is shifting toward the low-carbon retrofitting of the existing built environment. University campuses often face challenges such as dispersed land use, inadequate spatial planning, disorganized road [...] Read more.
With the wave of new campus construction gradually receding, the focus of green campus planning and design is shifting toward the low-carbon retrofitting of the existing built environment. University campuses often face challenges such as dispersed land use, inadequate spatial planning, disorganized road layouts, suboptimal landscape design, and low energy efficiency. Grounded in a review of current research on campus carbon emissions, this study integrates green technology indicators with planning and design approaches to establish a multi-scale, context-adaptive planning framework for carbon control, spanning five dimensions: intensive land use, spatial layout, transportation systems, landscape development, and facility integration. Employing a combined approach of bibliometric analysis and case studies, this research examines and compares typical university campuses both domestically and internationally to validate the effectiveness of the synergistic “technology-system-behavior” pathway in mitigating high-carbon lock-in. Through a systematic comparative analysis of representative low-carbon campuses, the synthesized results indicate that under optimal operational conditions, the clustered reorganization of functional zones demonstrates the potential to reduce transportation carbon emissions by approximately 25%; comprehensive retrofitting of building envelopes can decrease building energy consumption intensity by an estimated 30%; a multimodal coordinated transport system can increase the share of non-motorized travel to around 65%; establishing high carbon-sequestration plant communities can enhance carbon sink capacity by up to 30%; and smart facility integration can reduce overall campus carbon emissions by a projected range of 25–40%. It should be noted that these quantitative outcomes represent high-probability potential ranges, with actual performance subject to behavioral and operational fluctuations. This study provides theoretical support and practical pathways for achieving “near-zero carbon campuses” and underscores the important demonstrative role that higher education institutions can play in addressing climate change. Full article
(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)
Show Figures

Figure 1

47 pages, 3035 KB  
Review
A Review of Photovoltaic Uncertainty Modeling Based on Statistical Relational AI
by Linfeng Yang and Xueqian Fu
Energies 2026, 19(6), 1509; https://doi.org/10.3390/en19061509 - 18 Mar 2026
Viewed by 445
Abstract
With the growing penetration of photovoltaic (PV) generation, robust uncertainty characterization is essential for secure operation, economic dispatch, and flexibility planning. This review surveys PV scenario generation from three perspectives: (i) explicit probabilistic approaches (distribution fitting, Copula-based dependence modeling, autoregressive moving average (ARMA)-type [...] Read more.
With the growing penetration of photovoltaic (PV) generation, robust uncertainty characterization is essential for secure operation, economic dispatch, and flexibility planning. This review surveys PV scenario generation from three perspectives: (i) explicit probabilistic approaches (distribution fitting, Copula-based dependence modeling, autoregressive moving average (ARMA)-type time-series methods, and clustering/dimensionality reduction), (ii) deep generative models (GANs, VAEs, and diffusion models), and (iii) hybrid Statistical Relational AI (SRAI) frameworks. We discuss the strengths of explicit models in interpretability and tractability, and their limitations in representing high-dimensional nonlinear, multimodal, and multiscale spatiotemporal dependencies. We also examine the ability of deep generative methods to synthesize diverse scenarios across meteorological regimes and multiple sites, while noting persistent challenges in interpretability, physical consistency, and deployment. To bridge these gaps, we outline an SRAI-oriented integration pathway that embeds statistical structure, meteorology–power relations, spatiotemporal coupling, and operational constraints into generative architectures. Finally, we highlight directions for future research, including unified evaluation protocols, cross-regional data collaboration, controllable extreme-scenario generation, and computationally efficient generative designs. Full article
Show Figures

Figure 1

21 pages, 10378 KB  
Article
A Method for Detecting Slow-Moving Landslides Based on the Integration of Surface Deformation and Texture
by Xuerong Chen, Cuiying Zhou, Zhen Liu, Chaoying Zhao, Xiaojie Liu and Zhong Lu
Remote Sens. 2026, 18(6), 899; https://doi.org/10.3390/rs18060899 - 15 Mar 2026
Viewed by 468
Abstract
Slow-moving landslides can trigger severe disasters when activated by earthquakes, torrential rains, or typhoons. Early detection is crucial for mitigating loss of life and property damage. Interferometric Synthetic Aperture Radar (InSAR) technology is among the most effective techniques for detecting slow-moving landslides, though [...] Read more.
Slow-moving landslides can trigger severe disasters when activated by earthquakes, torrential rains, or typhoons. Early detection is crucial for mitigating loss of life and property damage. Interferometric Synthetic Aperture Radar (InSAR) technology is among the most effective techniques for detecting slow-moving landslides, though its accuracy can be further improved through integration with optical imagery and Digital Elevation Models (DEM). Current machine learning approaches that combine InSAR and optical data suffer from limited efficiency, poor transferability, and challenges in regional-scale application. To address these limitations, this study proposes a multimodal dual-path network that integrates InSAR products with textural information from optical imagery to detect slow-moving landslides. One path processes InSAR deformation rates and topographic factors, while the other incorporates texture information and auxiliary data. Together, these paths extract semantic information from high-dimensional spatial features and condense it into low-dimensional representations. A pyramid pooling module is employed to capture multi-scale features during low-level semantic extraction. For feature fusion, a rate-constrained adaptive module is introduced to enhance the contribution of deformation rates to slow-moving landslides. According to the results, the proposed method improves the F1-score for landslide detection by 6% compared to using InSAR products alone. These results provide reliable technical support for regional landslide inventory compilation and disaster management, as well as new insights for regional-scale surveys in slow-moving landslide-prone areas. Full article
(This article belongs to the Special Issue Advances in AI-Driven Remote Sensing for Geohazard Perception)
Show Figures

Figure 1

24 pages, 1883 KB  
Article
A Multi-Scale Vision–Sensor Collaborative Framework for Small-Target Insect Pest Management
by Chongyu Wang, Yicheng Chen, Shangshan Chen, Ranran Chen, Ziqi Xia, Ruoyu Hu and Yihong Song
Insects 2026, 17(3), 281; https://doi.org/10.3390/insects17030281 - 4 Mar 2026
Viewed by 719
Abstract
In complex agricultural production environments, small-target pests—characterized by tiny scales, strong background confusion, and close dependence on environmental conditions—pose major challenges to precise monitoring and green pest control. To facilitate the transition from experience-driven to data-driven pest management, a multi-scale vision–sensor collaborative recognition [...] Read more.
In complex agricultural production environments, small-target pests—characterized by tiny scales, strong background confusion, and close dependence on environmental conditions—pose major challenges to precise monitoring and green pest control. To facilitate the transition from experience-driven to data-driven pest management, a multi-scale vision–sensor collaborative recognition method is proposed for field and protected agriculture scenarios to improve the accuracy and stability of small-target pest recognition under complex conditions. The method jointly models multi-scale visual representations and pest ecological mechanisms: a multi-scale visual feature module enhances fine-grained texture and morphological cues of small targets in deep networks, alleviating feature sparsity and scale mismatch, while environmental sensor data, including temperature, humidity, and illumination, are introduced as priors to modulate visual features and explicitly incorporate ecological constraints into the discrimination process. Stable multimodal fusion and pest category prediction are then achieved through a vision–sensor collaborative discrimination module. Experiments on a multimodal dataset collected from real farmland and greenhouse environments in Linhe District, Bayannur City, Inner Mongolia, demonstrate that the proposed method achieves approximately 93.1% accuracy, 92.0% precision, 91.2% recall, and a 91.6% F1-score on the test set, significantly outperforming traditional machine learning approaches, single-scale deep learning models, and multi-scale vision baselines without environmental priors. Category-level evaluations show balanced performance across multiple small-target pests, including aphids, thrips, whiteflies, leafhoppers, spider mites, and leaf beetles, while ablation studies confirm the critical contributions of multi-scale visual modeling, environmental prior modulation, and vision–sensor collaborative discrimination. Full article
Show Figures

Figure 1

Back to TopTop