Saved Queries

This paper addresses the critical challenge of semantic segmentation for remote sensing images (RSIs) under extremely limited labeled data. A high-quality initial model is paramount for downstream semi-supervised or weakly supervised learning paradigms, as it mitigates error propagation from the outset. We conducted a systematic investigation into self-supervised pretraining to serve this precise need. Within the low-label regime, we identify and tackle two pivotal factors limiting performance: (1) the domain shift between large-scale pretraining data and specific target tasks, and (2) the deficiency in local feature learning caused by large-window masking in visual foundation model (VFM) pretraining. To resolve these issues, we first benchmark various pretraining strategies, demonstrating that a two-phase General-Purpose Pretraining (GPPT) followed by Domain-Adaptive Pretraining (DAPT) framework is optimal, significantly outperforming both single-phase methods and the existing two-phase paradigm initialized from ImageNet. Subsequently, we propose an Edge-Guided Masked Image Modeling (EGMIM) method for the DAPT phase, which explicitly integrates edge priors to guide the masking and reconstruction process, thereby enhancing the model’s capability to capture fine-grained local structures. Extensive experiments on four RSI benchmarks validate the effectiveness of our approach, showing consistent and substantial gains, particularly in extreme low-label scenarios. Beyond empirical results, we provide in-depth mechanistic analyses to explain the synergistic roles of GPPT and DAPT. Full article

(This article belongs to the Special Issue Remote Sensing Image Processing, Analysis and Application)

►▼ Show Figures

Figure 1

16 pages, 2796 KB

Open AccessArticle

MiMics-Net: A Multimodal Interaction Network for Blastocyst Component Segmentation

by Adnan Haider, Muhammad Arsalan and Kyungeun Cho

Diagnostics 2026, 16(4), 631; https://doi.org/10.3390/diagnostics16040631 (registering DOI) - 21 Feb 2026

Abstract

Objectives: Global infertility rates are rapidly increasing. Assisted reproductive technologies combined with artificial intelligence are the next hope for overcoming infertility. In vitro fertilization (IVF) is gaining popularity owing to its increasing success rates. The success rate of IVF essentially depends on the assessment and inspection of blastocysts. Blastocysts can be segmented into several important compartments, and advanced and precise assessment of these compartments is strongly associated with successful pregnancies. However, currently, embryologists must manually analyze blastocysts, which is a time-consuming, subjective, and error-prone process. Several AI-based techniques, including segmentation, have been recently proposed to fill this gap. However, most existing methods rely only on raw grayscale intensity and do not perform well under challenging blastocyst image conditions, such as low contrast, similarity in textures, shape variability, and class imbalance. Methods: To overcome this limitation, we developed a novel and lightweight architecture, the microscopic multimodal interaction segmentation network (MiMics-Net), to accurately segment blastocyst components. MiMics-Net employs a multimodal blastocyst stem to decompose and process each frame into three modalities (photometric intensity, local textures, and directional orientation), followed by feature fusion to enhance segmentation performance. Moreover, MiMic dual-path grouped blocks have been designed, in which parallel-grouped convolutional paths are fused through point-wise convolutional layers to increase diverse learning. A lightweight refinement decoder is employed to refine and restore the spatial features while maintaining computational efficiency. Finally, semantic skip pathways are induced to transfer low- and mid-level spatial features after passing through the grouped and point-wise convolutional layers. Results/Conclusions: MiMics-Net was evaluated using a publicly available human blastocyst dataset and achieved a Jaccard index score of 87.9% while requiring only 0.65 million trainable parameters. Full article

(This article belongs to the Special Issue Artificial Intelligence in Diagnostics: From Algorithms to Clinical Impact)

►▼ Show Figures

Figure 1

25 pages, 29828 KB

Open AccessArticle

Self-Training Based Image–Text Multimodal Unsupervised Domain Adaptation Segmentation Model for Remote Sensing Images

by Qianqian Liu and Xili Wang

Remote Sens. 2026, 18(4), 651; https://doi.org/10.3390/rs18040651 - 20 Feb 2026

Viewed by 30

Abstract

Deep self-training-based unsupervised domain adaptation (UDA) semantic segmentation methods learn from labeled source domain images and unlabeled target domain images, performing more stably than those based on adversarial training. We propose a self-training-based image–text multimodal unsupervised domain adaptation semantic segmentation model (SIT-UDA) for remote sensing images. Unlike UDA methods, which rely solely on images, SIT-UDA enhances generalization performance by integrating category hint information from textual descriptions with image data to segment images. SIT-UDA employs a teacher–student self-training framework consisting of two components: the teacher multimodal segmentation model, which predicts pseudo-labels for target domain images, and the student multimodal segmentation model, which is trained to learn feature representations from both the source and target domains with guidance from the teacher model. To enhance the adaptability of image–text pretrained models in remote sensing domains, SIT-UDA introduces text prompt tuning to optimize the text features in the student model, and two learning strategies are proposed to optimize the model’s training objectives: One is the entropy-guided pixel-level weighting (EGPW) strategy, which adaptively weights the loss obtained by self-training on target domain images, leveraging the pseudo-labels rationally according to the entropy value at the pixel level. The other is the contrastive text constraint (CTC) strategy, which maximizes the similarity of text features for the same category between teacher and student models while minimizing the similarity of text features across different categories, improving text feature discriminability to promote cross-domain image–text alignment. Experiments in various domain adaptation scenarios among three remote sensing datasets (Potsdam, Vaihingen and LoveDA) demonstrate that the SIT-UDA is superior to the comparative domain adaptation semantic segmentation methods in terms of qualitative and quantitative segmentation results. Full article

(This article belongs to the Special Issue Advances in Deep Learning and Machine Learning for Remote Sensing Image Analysis)

26 pages, 6887 KB

Open AccessArticle

Decoding Urban Riverscape Perception: An Interpretable Machine Learning Approach Integrating Computer Vision and High-Fidelity 3D Models

by Yuzhen Tang, Shensheng Chen, Wenhui Xu, Jinxuan Ren and Junjie Luo

ISPRS Int. J. Geo-Inf. 2026, 15(2), 91; https://doi.org/10.3390/ijgi15020091 - 20 Feb 2026

Viewed by 44

Abstract

Visual perception serves as a crucial interface connecting human psychology with the built environment. However, current studies on urban riverscapes often rely on static 2D imagery, failing to capture the spatial depth and immersive experience essential for ecological validity. Furthermore, the “black box” nature of traditional machine learning models hinders the understanding of how specific environmental features drive public perception. To address these gaps, this study proposes an innovative framework integrating high-fidelity 3D models, computer vision (CV), and interpretable artificial intelligence (XAI). Using the River Thames (London) and the River Seine (Paris) as diverse case studies, we constructed high-precision 3D digital twins to quantify 3D spatial metrics (e.g., Viewshed Area, H/W Ratio) and applied the SegFormer model to extract 2D visual elements (e.g., Green View Index) from water-based panoramic imagery. Subjective perception data were collected via immersive Virtual Reality (VR) experiments. Random Forest models combined with SHAP were employed to decode the non-linear driving mechanisms of perception. The results reveal three universal principles: (1) Sense of Affluence and Vibrancy are primarily driven by high building density and vertical enclosure, challenging the traditional preference for openness in waterfronts; (2) Scenic Beauty is determined by a synergy of high Green View Index and quality artificial interfaces, suggesting a preference for nature-culture integration; (3) Sense of Boredom is significantly positively correlated with Viewshed Area, indicating that empty prospects without visual foci lead to monotony. This study demonstrates the efficacy of integrating Digital Twins and XAI in revealing robust perception mechanisms across different urban contexts, providing a scientific, evidence-based tool for precision urban planning and riverside regeneration. Full article

►▼ Show Figures

Figure 1

30 pages, 50904 KB

Open AccessArticle

A Realistic Instance-Level Data Augmentation Method for Small-Object Detection Based on Scene Understanding

by Chuwei Li, Zhilong Zhang, Ping Zhong and Jun He

Remote Sens. 2026, 18(4), 647; https://doi.org/10.3390/rs18040647 - 20 Feb 2026

Viewed by 51

Abstract

Instance-level data augmentation methods, exemplified by “copy-paste”, serve as a conventional strategy for improving the performance of small object detectors. The core idea involves leveraging background redundancy by compositing object instances with suitable backgrounds—drawn either from the same image or from different images—to increase both the quantity and diversity of training samples. However, existing methods often struggle with mismatches in background, scale, illumination, and viewpoint between instances and backgrounds. More critically, their predominant reliance on background information, without a joint understanding of instance-background characteristics, results in augmented images lacking visual realism. Empirical studies have demonstrated that such unrealistic images not only fail to improve detection performance but can even be detrimental. To tackle this problem, we propose a scene-understanding-driven approach that systematically addresses these mismatches via joint instance-background understanding. This is achieved through a unified framework that integrates image inpainting, image tagging, open-set object detection, the Segment Anything Model (SAM), and pose estimation to jointly model instance attributes, background semantics, and their interrelationships, thereby abandoning the random operation paradigm of existing methods and synthesizing highly realistic augmented images while preserving data diversity. On the VisDrone dataset, our method improves the mAP@0.5:0.95 and mAP@0.5 of the baseline detector by 1.6% and 2.2%, respectively. Both quantitative gains and qualitative visualizations confirm that the systematic resolution of these mismatches directly translates into significantly higher visual realism and detection performance improvements. Full article

(This article belongs to the Special Issue Multi-Object Detection and Feature Extraction of Remote Sensing Images)

24 pages, 4248 KB

Open AccessArticle

Multi-Scale Feature Learning for Farmland Segmentation Under Complex Spatial Structures

by Yongqi Han, Yuqing Wang, Yun Zhang, Hongfu Ai, Chuan Qin and Xinle Zhang

Entropy 2026, 28(2), 242; https://doi.org/10.3390/e28020242 - 19 Feb 2026

Viewed by 96

Abstract

Fragmented, irregular, and scale-heterogeneous farmland parcels introduce high spatial complexity into high-resolution remote sensing imagery, leading to boundary ambiguity and inter-class spectral confusion that hinder effective feature discrimination in semantic segmentation. To address these challenges, we propose CSMNet, which adopts a ConvNeXt V2 encoder for hierarchical representation learning and a multi-scale fusion architecture with redesigned skip connections and lateral outputs to reduce semantic gaps and preserve cross-scale information. An adaptive multi-head attention module dynamically integrates channel-wise, spatial, and global contextual cues through a lightweight gating mechanism, enhancing boundary awareness in structurally complex regions. To further improve robustness, a hybrid loss combining Binary Cross-Entropy and Dice loss is employed to alleviate class imbalance and ensure reliable extraction of small and fragmented parcels. Experimental results from Nong’an County demonstrate that the proposed model achieves superior performance compared with several state-of-the-art segmentation methods, attaining a Precision of 95.91%, a Recall of 93.95%, an F1-score of 94.92%, and an IoU of 90.85%. The IoU exceeds that of Unet++ by 8.92% and surpasses PSPNet, SegNet, DeepLabv3+, TransUNet, SeaFormer and SegMAN by more than 15%, 10%, 7%, 6%, 5% and 2%, respectively. These results indicate that CSMNet effectively improves information utilization and boundary delineation in complex agricultural landscapes. Full article

(This article belongs to the Section Multidisciplinary Applications)

15 pages, 6426 KB

Open AccessArticle

Adaptive Multiple-Attribute Scenario LoRA Merge for Robust Perception in Autonomous Driving

by Ryosuke Kawata, Joonho Lee, Yanlei Gu and Shunsuke Kamijo

Sensors 2026, 26(4), 1336; https://doi.org/10.3390/s26041336 - 19 Feb 2026

Viewed by 193

Abstract

Perception models for autonomous driving are predominantly trained on clear, daytime data, leaving their performance under rare conditions—particularly in multiple-attribute (joint weather–lighting) conditions such as night × rainy or night × snowy—an open challenge. To address this, we propose a parameter-efficient fine-tuning (PEFT) framework that dynamically applies lightweight, scenario-specific Low-Rank Adaptation (LoRA) experts. At its core, our method features an adaptive pipeline that dynamically determines the LoRA experts to apply based on the encountered environmental conditions. We validate our framework on a unified semantic segmentation benchmark (MUSES, BDD100K, and Cityscapes) covering six scenarios (day/night × weather). Our approach improves the mIoU by up to 3.23 points over a strong baseline in single-attribute settings, and in data-scarce multiple-attribute cases, merged LoRA experts outperform the baseline expert by up to 5.99 points, demonstrating effective generalization across compounded conditions. Full article

(This article belongs to the Special Issue Sensing Technologies for Autonomous Driving and Intelligent Transportation Systems)

►▼ Show Figures

Figure 1

27 pages, 18819 KB

Open AccessArticle

DSAFNet: Dilated–Separable Convolution and Attention Fusion Network for Real-Time Semantic Segmentation

by Wencong Lv, Xin Liu, Jianjun Zhang, Dongmei Luo and Ping Han

Electronics 2026, 15(4), 866; https://doi.org/10.3390/electronics15040866 - 19 Feb 2026

Viewed by 62

Abstract

Real-time semantic segmentation has been widely adopted in resource-constrained applications such as mobile devices, autonomous driving, and drones due to its high efficiency. However, existing lightweight networks often compromise segmentation accuracy to reduce parameter count and improve inference speed. To achieve an optimal balance among accuracy, latency, and model size, we propose the Dilated–Separable Convolution and Attention Fusion Network (DSAFNet), a lightweight real-time semantic segmentation network based on an asymmetric encoder–decoder framework. DSAFNet integrates three core components: (i) the Double-Layer Multi-Branch Depthwise Convolution (DL-MBDC) module that fuses channel splitting and multi-branch depthwise convolutions to efficiently extract multi-scale features with minimal parameters; (ii) the Multi-scale Dilated Fusion Attention (MDFA) module that utilizes factorized dilated convolutions and channel-spatial collaborative attention to expand the receptive field and reinforce key contextual features; (iii) the Multi-scale Attention Lightweight Decoder (MALD) that integrates multi-scale feature maps to generate attention-guided segmentation results. Experiments conducted on an RTX 3090 platform demonstrate that DSAFNet, with only 1.00 M parameters, achieves 74.78% mIoU and a frame rate of 74.74 FPS on the Cityscapes dataset, while 70.5% mIoU and a frame rate of 89.5 FPS on the CamVid dataset. Full article

(This article belongs to the Section Artificial Intelligence)

►▼ Show Figures

Figure 1

29 pages, 14455 KB

Open AccessReview

Few-Shot Semantic Segmentation in Remote Sensing: A Review on Definitions, Methods, Datasets, Advances and Future Trends

by Marko Petrov, Ema Pandilova, Ivica Dimitrovski, Dimitar Trajanov, Vlatko Spasev and Ivan Kitanovski

Remote Sens. 2026, 18(4), 637; https://doi.org/10.3390/rs18040637 - 18 Feb 2026

Viewed by 182

Abstract

Semantic segmentation in remote sensing images, which is the task of classifying each pixel of the image in a specific category, is widely used in areas such as disaster management, environmental monitoring, precision agriculture, and many others. However, traditional semantic segmentation methods face a major challenge: they require large amounts of annotated data to train effectively. To tackle this challenge, few-shot semantic segmentation has been introduced, where the models can learn and adapt quickly to new classes from just a few annotated samples. This paper presents a comprehensive review of recent advances in few-shot semantic segmentation (FSSS) for remote sensing, covering datasets, methods, and emerging research directions. We first outline the fundamental principles of few-shot learning and summarize commonly used remote-sensing benchmarks, emphasizing their scale, geographic diversity, and relevance to episodic evaluation. Next, we categorize FSSS methods into major families (meta-learning, conditioning-based, and foundation-assisted approaches) and analyze how architectural choices, pretraining strategies, and inference protocols influence performance. The discussion highlights empirical trends across datasets, the behavior of different conditioning mechanisms, the impact of self-supervised and multimodal pretraining, and the role of reproducibility and evaluation design. Finally, we identify key challenges and future trends, including benchmark standardization, integration with foundation and multimodal models, efficiency at scale, and uncertainty-aware adaptation. Collectively, they signal a shift toward unified, adaptive models capable of segmenting novel classes across sensors, regions, and temporal domains with minimal supervision. Full article

►▼ Show Figures

Figure 1

29 pages, 19866 KB

Open AccessArticle

GCF-Net: A Geometric Context and Frequency Domain Fusion Network for Landslide Segmentation in Remote Sensing Imagery

by Chunlong Du, Shaoqun Qi, Luhe Wan, Yin Chen, Zhiwei Lin, Ling Zhu and Xiaona Yu

Remote Sens. 2026, 18(4), 635; https://doi.org/10.3390/rs18040635 - 18 Feb 2026

Viewed by 155

Abstract

Remote sensing-based landslide segmentation is of great significance for geological hazard assessment and post-disaster rescue. Existing convolutional neural network methods, constrained by the inherent limitations of spatial convolution, tend to lose high-frequency edge details during deep semantic extraction, while frequency-domain analysis, although capable of globally preserving high-frequency components, struggles to perceive local multi-scale features. The lack of an effective synergistic mechanism between them makes it difficult for networks to balance regional integrity and boundary precision. To address these issues, this paper proposes the Geometric Context and Frequency Domain Fusion Network (GCF-Net), which achieves explicit edge enhancement through a three-stage progressive framework. First, the Pyramid Lightweight Fusion (PGF) block is proposed to aggregate multi-scale context and provide rich hierarchical features for subsequent stages. Second, the Geometric Context and Frequency Domain Fusion (GCF) module is designed, where the frequency-domain branch generates dynamic high-frequency masks via the Fourier transform to locate boundary positions, while the spatial branch models foreground–background relationships to understand boundary semantics, with both branches fused through an adaptive gating mechanism. Finally, Edge-aware Detail Consistency Improvement (EDCI) module is designed to balance boundary preservation and noise suppression based on edge confidence, achieving adaptive output refinement. Under the joint supervision of Focal loss, Dice loss, and Edge loss, experiments on the mixed dataset and LMHLD dataset demonstrate that GCF-Net achieves OAs of 96.42% and 96.71%, respectively. Ablation experiments and visualization results further validate the effectiveness of each module and the significant improvement in boundary segmentation. Full article

►▼ Show Figures

Figure 1

22 pages, 1743 KB

Open AccessArticle

WMCA-Net: Wavelet Multi-Scale Contextual Attention Network for Segmentation of the Intercondylar Notch

by Yi Wu, Xiangxin Wang, Hu Liu, Quan Zhou, Lingyan Zhang, Yujia Zhou and Qianjin Feng

Bioengineering 2026, 13(2), 236; https://doi.org/10.3390/bioengineering13020236 - 18 Feb 2026

Viewed by 82

Abstract

Accurate segmentation of the intercondylar notch of the femur is of great significance for the diagnosis of knee joint diseases, surgical planning, and anterior cruciate ligament (ACL) reconstruction. Among them, the obvious anatomical heterogeneity, the interference of structurally similar tissues, and the blurred boundaries in MRI images make the segmentation of the intercondylar notch challenging. The segmentation of the intercondylar notch is often regarded as a standard semantic segmentation problem, but doing so leaves the inherent high-order internal variation and low-contrast features of its anatomical structure unresolved. We proposed a new Wavelet Multi-scale Contextual Attention Network (WMCA-Net). We have coordinated the Shallow High-frequency Feature Dense Extraction Block (SHFDEB) and Wavelet Split and Fusion Block (WSFB) modules with each other. The SHFDEB intensively extracts high-frequency detailed features at the shallowest layer of the network, while the WSFB effectively splits and fuses features at various resolutions, suppressing noise while better preserving the high-frequency detailed structural information we need. The Multi-scale Depth-wise Convolution Block (MDCB) captures cross-scale features from the narrow intercondylar notch (5–8mm wide) to the surrounding femoral structure (approximately 50 mm diameter), dynamically adapting to different morphologies, including pathological changes caused by osteophyte formation. The Contextual-Weighted Attention Module (CWAM) establishes long-term semantic associations between fuzzy regions and clear anatomical landmarks by precisely locating uncertain regions through foreground and background decomposition. The Dice Similarity Coefficient of WMCA-Net on the intercondylar notch dataset is 93.16%, and the 95% Hausdorff Distance is 1.42 mm, demonstrating its advanced segmentation performance and good anatomical adaptability. Full article

(This article belongs to the Special Issue Application of Bioengineering to Orthopedics)

28 pages, 9652 KB

Open AccessArticle

A Heritage Information System Based on Point-Clouds: Research and Intervention Analyses Made Accessible

by Paula Redweik, Manuel Sánchez-Fernández, María José Marín-Miranda and José Juan Sanjosé-Blasco

Heritage 2026, 9(2), 77; https://doi.org/10.3390/heritage9020077 - 17 Feb 2026

Viewed by 115

Abstract

Heritage buildings can now be surveyed in great detail using geospatial techniques such as photogrammetry and TLS to produce dense point-clouds. For the purposes of research and building analyses, data about interventions and other relevant semantic data from the building are available from many sources, though not always in a well-organized way. Allying semantic data to point-clouds requires the elaboration of an ontology and the segmentation and classification of the point-clouds in accordance with that ontology. The present paper deals with an approach to make semantic classified point-clouds accessible to researchers, heritage managers and members of the public who wish to explore the 3D point-cloud data with ease and without the need for geospatial expertise. The app presented here, ‘HISTERIA’ (Heritage Information System Tool to Enable Research and Intervention Analysis), was developed with MATLAB 2023 App Designer, an object-oriented programming software module. HISTERIA has an interface in which the user can choose which parts of the heritage building s/he wishes to analyze according to several criteria presented in pre-defined queries. The result of most queries is shown in a point-cloud viewer window inside the app. A point can also be selected in the viewer, and all the values attached to it can be accessed in the different classes. HISTERIA is intended to give to the exploration of semantic heritage data in 3D added value in a simplified way. Full article

►▼ Show Figures

Figure 1

21 pages, 1581 KB

Open AccessArticle

DCANet: Disentanglement and Category-Aware Aggregation for Medical Image Segmentation

by Xiaoqing Li, Hua Huo and Chen Zhang

Sensors 2026, 26(4), 1300; https://doi.org/10.3390/s26041300 - 17 Feb 2026

Viewed by 142

Abstract

Medical image segmentation is essential for clinical decision-making, treatment planning, and disease monitoring. However, ambiguous boundaries and complex anatomical structures continue to pose challenges for accurate segmentation. To address these issues, we propose DCANet (Disentangled and Category-Aware Network), a novel framework that effectively integrates local and global feature representations while enhancing category-aware feature interactions. In DCANet, features from convolutional and Transformer layers are fused using the Feature Coupling Unit (FCU), which aligns and combines local and global information across multiple semantic levels. The Decoupled Feature Module (DFM) then separates high-level representations into multi-class foreground and background features, improving discriminability and mitigating boundary ambiguity. Finally, the Category-Aware Integration Aggregator (CAIA) guides multi-level feature fusion, emphasizes critical regions, and refines segmentation boundaries. Extensive experiments on four public datasets—Synapse, ACDC, GlaS, and MoNuSeg—demonstrate the superior performance of DCANet, achieving Dice scores of 84.80%, 94.07%, 94.60%, and 79.85%, respectively. These results confirm the effectiveness and generalizability of DCANet in accurately segmenting complex anatomical structures and resolving boundary ambiguities across diverse medical image segmentation tasks. Full article

(This article belongs to the Collection Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing)

►▼ Show Figures

Figure 1

27 pages, 5880 KB

Open AccessArticle

The Impact of Blue–Green Visual Composition in Waterfront Walkway on Psychophysiological Recovery: Evidence from First-Person Dynamic VR Exposure and Semantic Segmentation Quantification

by Wei Nie, Zhaotian Li, Jing Liu, Yongchao Jin, Gang Li and Jie Xu

Buildings 2026, 16(4), 819; https://doi.org/10.3390/buildings16040819 - 17 Feb 2026

Viewed by 152

Abstract

Urban waterfront walkways are everyday public built environments where people commonly engage in slow walking, yet evidence remains limited that links what pedestrians see to immediate psychophysiological responses under controlled first-person dynamic exposure. To address this gap, we developed a fixed-speed, fixed-duration VR walk-through model using real-world 360° panoramic video and quantified scene visual composition via computer vision-based image analysis. Based on the visible shares of key components (greenery, water, sky, hardscape, and built structures), clips were grouped into four interpretable waterfront typologies: Vegetation-Enclosed, Built-Dominant, Hardscape-Plaza, and Blue-Open. Fifty healthy adults completed within-subject VR exposures to the four typologies (50 s per clip), while multimodal physiological signals and brief affect and landscape ratings were collected before and after exposure. The results showed that scenes with more water and vegetation coverage, along with expansive views, were associated with promoted autonomic nervous system calming responses, whereas scenes with fewer natural elements and higher built structure density were more likely to induce tension responses. Negative emotions decreased significantly across all four scene experiences, though artificial scenes concurrently exhibited emotional improvement alongside physiological tension. Overall, brief first-person dynamic VR exposure can yield immediate emotional benefits, and waterfront designs combining water proximity, abundant greenery, and expansive vistas may maximize short-term restorative potential, offering quantitative targets for health-supportive planning and retrofitting. Full article

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

►▼ Show Figures

Figure 1

29 pages, 4367 KB

Open AccessArticle

Contrastive Masked Feature Modeling for Self-Supervised Representation Learning of High-Resolution Remote Sensing Images

by Shiyan Pang, Jianwu Xiang, Zhiqi Zuo, Hanchun Hu and Huiwei Jiang

Remote Sens. 2026, 18(4), 626; https://doi.org/10.3390/rs18040626 - 17 Feb 2026

Viewed by 128

Abstract

As an emerging learning paradigm, self-supervised learning (SSL) has attracted extensive attention due to its ability to mine features with effective representation from massive unlabeled data. In particular, SSL, driven by contrastive learning and masked modeling, shows great potential in general visual tasks. However, because of the diversity of ground target types, the complexity of spectral radiation characteristics, and changes in environmental conditions, existing SSL frameworks exhibit limited feature extraction accuracy and generalization ability when applied to complex remote sensing scenarios. To address this issue, we propose a hybrid SSL framework that integrates the advantages of contrastive learning and masked modeling to extract more robust and reliable features from remote sensing images. The proposed framework includes two parallel branches: one branch uses a contrastive learning strategy to strengthen global feature representation and capture image structural information by constructing positive and negative sample pairs; the other branch adopts a masked modeling strategy, focusing on the fine analysis of local details and predicting the features of masked areas, thereby establishing connections between global and local features. Additionally, to better integrate local and global features, we adopt a hybrid CNN+Transformer architecture, which is particularly suitable for intensive downstream tasks such as semantic segmentation. Extensive experimental results demonstrate that the proposed framework not only exhibits superior feature extraction ability and higher accuracy in small-sample scenarios but also outperforms state-of-the-art mainstream SSL frameworks on large-scale datasets. Full article

(This article belongs to the Special Issue Recent Advances in Deep Learning-Based High-Resolution Image Processing and Analysis)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 80.

Go to page 1 2 3 4 5

Search Results (3,994)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI