Saved Queries

Adverse weather and low illumination remain major challenges for autonomous driving perception, where semantic segmentation must stay reliable despite severe appearance degradation. In unsupervised domain adaptation without target annotations, self-training is widely used, but it is often limited by the inconsistent quality of teacher-generated pseudo labels across samples, regions, and training stages. This paper presents RaDA, a reliability-aware self-training framework that regulates pseudo supervision at three levels. First, a progressive exposure strategy determines which target images are admitted for training. Second, spatial reliability weighting suppresses gradients from degraded regions while retaining informative supervision. Third, adaptive teacher update scheduling stabilizes pseudo label generation over time. Experiments on real-world adverse driving benchmarks show that RaDA improves robustness, training stability, and cross-dataset generalization compared with strong baselines. Compared with the previous state-of-the-art method MIC, RaDA achieves mIoU gains of 10.6 percentage points on Foggy Zurich and 8.8 percentage points on the Foggy Driving benchmark. These results indicate that explicit reliability regulation can strengthen self-training domain adaptation for semantic segmentation in autonomous driving under challenging environmental conditions. Full article

20 pages, 19133 KB

Open AccessArticle

Uncovering Several Degrees of Anxiety in Mexican Students Through Advanced Deep Learning Techniques

by Marco A. Moreno-Armendáriz, Arturo Lara-Cázares, Jared Castillo-González and Halder V. Galdo-Navarro

Algorithms 2026, 19(3), 235; https://doi.org/10.3390/a19030235 - 20 Mar 2026

Abstract

Emotion identification via computer vision has made continuous progress over the last few years. Although images have been the gold standard for the past two decades, video is increasingly common. Video is particularly suitable for the study of emotions, as it allows them to be considered as spatiotemporal phenomena. In particular, the discovery of anxiety among Mexican students is a key element for improving their learning in the classroom. In pursuit of this goal, we focused on the following challenges. First, the scarcity of specialized datasets for this task prompted us to develop an experimental protocol to generate a specific dataset; second, to conduct a thorough study of the appropriate number of emotional intensity levels; and third, to develop a suitable design for a deep learning architecture. Our pivotal results include the development of a new dataset labeled with three different emotion levels and appropriate ConvNet architectures, complemented by a study of various intensity levels. The optimal architecture achieved an F1-score of 0.7620 across five intensity levels and provides an adequate baseline for multiclass classification. Full article

(This article belongs to the Special Issue Modern Algorithms for Image Processing and Computer Vision)

►▼ Show Figures

Figure 1

21 pages, 5649 KB

Open AccessArticle

Analysis of Generalization Performance of Tornado Detection Models: A Cross-Domain Evaluation from U.S. to Chinese Weather Radar Observations

by Biao Jiang, Shuai Zhang, Yubao Chen, Xuehua Li and Yancheng Wang

Remote Sens. 2026, 18(6), 948; https://doi.org/10.3390/rs18060948 (registering DOI) - 20 Mar 2026

Abstract

Tornadoes pose severe threats, yet their low frequency in China creates a labeled data scarcity that hinders training robust detection models. Leveraging abundant U.S. data offers a solution, though cross-domain generalization remains challenging due to distinct climatic environments and heterogeneous radar systems. This study systematically evaluates the generalization capability of three representative models—TORP, TORP-XGB, and TDA-CNN—trained on the U.S. TorNet dataset and applied to Chinese CINRAD observations (2020–2024) via a zero-shot transfer strategy. The results indicate that while all models demonstrated robust performance in the source domain (with POD values of 0.75, 0.72, and 0.71 for TORP, TORP-XGB, and TDA-CNN, respectively), they experienced varying degrees of performance attenuation in the target domain (with POD values dropping to 0.56, 0.48, and 0.41, respectively). Notably, the TORP model exhibited superior robustness with minimal performance degradation. Further analysis primarily attributes this cross-domain degradation to three factors: disparities in radar systems, magnitude differences in tornado rotational features, and data quality issues. Crucially, sensitivity experiments confirm that linear feature enhancement substantially improves the detection rate and effectively mitigates the cross-domain performance gap, albeit at the cost of increased false alarms. These findings provide a reference for the cross-domain deployment of tornado identification models and future improvements in transfer learning strategies. Full article

(This article belongs to the Special Issue State-of-the-Art Remote Sensing in Precipitation and Thunderstorm)

26 pages, 857 KB

Open AccessArticle

A Hybrid Machine Learning Approach for Classifying Indonesian Cybercrime Discourse Using a Localized Threat Taxonomy

by Firman Arifman, Teddy Mantoro and Dini Oktarina Dwi Handayani

Information 2026, 17(3), 301; https://doi.org/10.3390/info17030301 - 20 Mar 2026

Abstract

Indonesia’s rapid digital growth has been accompanied by escalating cyber threats, with public discourse on social media emerging as a critical but underutilized source of threat intelligence. This discourse is characterized by informal language and local nuances that render existing international cybercrime taxonomies ineffective, creating a gap in scalable, locally relevant threat analytics. This study introduces the Indonesian Cybercrime Threat Taxonomy (ICTT), a novel five-dimensional framework tailored to Indonesian online environments. An end-to-end OSINT pipeline was developed to collect 2344 samples from X (formerly Twitter) and YouTube, employing weak supervision with 12 high-precision regex patterns to generate training labels. A state-of-the-art IndoBERT model was fine-tuned on this data, and its performance was compared against rule-based and hybrid classification models. On a manually annotated gold-standard dataset of 600 samples, both the IndoBERT and hybrid models achieved 96.8% accuracy, significantly outperforming the rule-based baseline (66.7%). The models demonstrated strong generalization across both social media platforms, and the hybrid approach provided an effective balance of high performance and interpretability. This research demonstrates that informal public discourse can be systematically transformed into structured threat intelligence. The ICTT and the accompanying hybrid classification system provide a scalable, interpretable, and locally relevant foundation for cyber threat analytics in Indonesia, establishing a methodological blueprint for other low-resource language contexts. Full article

(This article belongs to the Special Issue Information Extraction and Language Discourse Processing)

18 pages, 802 KB

Open AccessArticle

Multi-Source-Free Domain Adaptation via Proxy Domain Adversarial Learning with Nuclear-Norm Maximization

by Liran Yang, Jinrong Qu, Tianyu Su, Zaishan Qi and Pan Su

Appl. Sci. 2026, 16(6), 3006; https://doi.org/10.3390/app16063006 - 20 Mar 2026

Abstract

Deep neural networks suffer performance drops when source and target domains differ in distribution, motivating research into domain adaptation (DA). Traditional DA approaches presume source samples come from a single domain and can be available during adaptation. Nevertheless, in real-world applications, multiple source domains often exist, and source samples may be inaccessible owing to privacy and storage limitations. In response to the challenges of multi-source and source-free, multi-source-free domain adaptation (MSFDA) is proposed, which captures transferable information from a set of pre-trained source models to boost performance of the model on target domain. Most MSFDA methods meet these challenges by utilizing pseudo-labeling. However, pseudo-labels generated by distinct source models may contain noise and even be contradictory, which weakens their efficacy in facilitating source models adapting to the target domain. Moreover, these methods do not consider class imbalance, which would lead to biased predictions for minority classes, and undermine adaptation. Therefore, we propose a novel MSFDA method which extends adversarial learning to a multi-source-free setting. This method presents proxy multi-source domain adversarial learning, which aligns target features extracted by different source models in an adversarial manner, enhancing the capability of source models to extract domain-invariant features and potentially obtain high-quality pseudo-labels. Moreover, a nuclear-norm maximization regularization is employed to constrain prediction matrices, which can reduce the prediction uncertainty and enhance the discriminability of the model, while mitigating the prediction bias and promoting the prediction accuracy for minority classes. Finally, comprehensive evaluations on four benchmark datasets prove the validity of the proposed method. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

►▼ Show Figures

Figure 1

26 pages, 9198 KB

Open AccessArticle

Towards Pseudo-Labeling with Dynamic Thresholds for Cross-View Image Geolocalization

by Yuanyuan Yuan, Jianzhong Guo, Ruoxin Zhu, Ning Li, Ziwei Li and Weiran Luo

Remote Sens. 2026, 18(6), 944; https://doi.org/10.3390/rs18060944 - 20 Mar 2026

Abstract

Cross-view image geolocalization aims to achieve accurate localization of geo-tagged images without geo-tagging by matching ground-view images with satellite images. However, there are huge imaging differences between ground and satellite viewpoints, and existing methods usually rely on a large number of accurately labeled cross-view image pairs. Therefore, to address issues such as significant perspective differences, high annotation costs, and low utilization of unpaired data, this paper proposes a cross-view generation model that integrates multi-scale contrastive learning and dynamic optimization, designs a multi-scale contrast loss function to strengthen the semantic consistency between the generated images and the target domain, adaptively balances the quality and quantity of pseudo-labels according to a dynamic threshold screening mechanism, and introduces a hard-sample triplet loss to enhance the model discriminative ability. Ablation experiments on the CVUSA and CVACT datasets show that the BEV-CycleGAN+CL (Bird’s-Eye View Cycle-Consistent Generative Adversarial Network with Contrastive Learning) model proposed in this paper significantly outperforms the comparative models in PSNR, SSIM, and RMSE metrics. Specifically, on the CVACT dataset, compared with the BEV-CycleGAN, BEV, and CycleGAN baselines, PSNR increased by 2.83%, 16.02%, and 42.30%, SSIM increased by 6.12%, 8.00%, and 18.48%, and RMSE decreased by 9.28%, 15.51%, and 25.35%, respectively. Similar advantages are observed on the CVUSA dataset. Compared with current state-of-the-art models, the dynamic threshold pseudo-label localization method in this paper demonstrates overall superiority in recall metrics such as R@1, R@5, R@10, and R@1%, for example achieving an R@1 of 98.94% on CVUSA, outperforming the best comparative model, Sample4G^†, which reached 98.68%. This study provides innovative methodological support for disaster emergency response, high-precision map construction for autonomous driving, military reconnaissance, and other applications. Full article

(This article belongs to the Special Issue Image Matching and Target Recognition Technologies: Prospects and Challenges)

►▼ Show Figures

Figure 1

22 pages, 2263 KB

Open AccessArticle

Acridinium Chemiluminogenic Labels—Synthesis, Analytical Performance, and Mechanism of Light Generation—A Comparison in View of Biomedical Diagnostics

by Karol Krzymiński, Beata Zadykowicz, Justyna Czechowska, Paweł Rudnicki-Velasquez, Illia Serdiuk, Adam K. Sieradzan and Lucyna Holec-Gąsior

Molecules 2026, 31(6), 1041; https://doi.org/10.3390/molecules31061041 - 20 Mar 2026

Abstract

This paper presents the synthesis, physicochemical characterisation, and analytical applications of chemiluminescent (CL) labels based on acridinium salts (ALs) for biomedical diagnostics. These compounds emit light as a result of oxidative reactions and represent an established class of reagents widely employed in chemiluminescence immunochemical assays (CLIAs) today. A series of structurally differentiated acridinium labels (AL1–AL5) was synthesised applying mostly original synthetic routes and purified to chromatographic purity (>90%, RP-HPLC). The compounds, including a commercial product treated as a reference, were successfully conjugated to anti-human IgG, yielding stable immunochemical reagents suitable for immunoassays with CL detection. The chemiluminescence properties of the obtained labels and their protein conjugates were investigated in aqueous buffers and in the presence of surfactants. The emission profiles exhibited characteristic flash-type kinetics with emission maxima occurring within 0.15–0.25 s after reaction initiation. The presence of surfactants more or less significantly enhanced the emission intensity, with signal increases of up to approx. 2-fold compared to surfactant-free systems. Analytical calibration demonstrated a linear response of signal derived from native labels over at least one order of magnitude of concentration, with detection limits falling in the range of 10⁻⁹–10⁻¹⁰ M, confirming the high sensitivity of the developed compounds. The experimental results were supported by theoretical studies using density functional theory (DFT), which confirmed the energetic feasibility of the CL reaction pathway and identified structural factors influencing activation barriers. Additional semiempirical calculations (PM7) indicated that the dielectric environment and proximity of ionic species can influence the reaction energetics, providing mechanistic support for the experimentally observed effects of surfactants. The results demonstrate that both molecular structure and microenvironment influence CL efficiency and kinetics of the investigated systems. The developed acridinium labels exhibit analytical performance better or comparable to commercial reagents and are fully compatible with standard immunodiagnostic conjugation protocols, confirming their suitability for use in modern chemiluminescent immunoassays. Full article

(This article belongs to the Special Issue Chemiluminescence and Photoluminescence of Advanced Compounds)

►▼ Show Figures

Figure 1

15 pages, 4876 KB

Open AccessArticle

Prediction of Cataract Severity Using Slit Lamp Images from a Portable Smartphone Device: A Pilot Study

by David Z. Chen, Changshuo Liu, Junran Wu, Lei Zhu and Beng Chin Ooi

Sensors 2026, 26(6), 1954; https://doi.org/10.3390/s26061954 - 20 Mar 2026

Abstract

Cataract diagnosis requires a comprehensive dilated examination by an ophthalmologist using a slit lamp; there is currently no effective means to objectively screen for cataracts in the community using portable devices without dilation. We hypothesized that it would be possible to predict cataract severity using deep learning on images taken using a portable smartphone-based slit lamp prototype, with and without dilation. In this prospective cross-sectional pilot study, slit lamp images were captured from eligible patients with cataracts in a tertiary clinic using a portable slit lamp prototype attached to a smartphone. The Pentacam nuclear staging score (PNS, Pentacam^®, Oculus, Inc., Arlington, WA, USA) was taken from the dilated pupils and served as ground truth. A transformer prototypical network with the Swin transformer on the images was trained to assign the class label corresponding to the highest predicted probability. Heat maps were generated based on attribution masks to identify the anatomical areas of concern. A total of 1900 images from 198 eyes of 99 patients were captured. The average age was 65.3 ± 10.4 years (range, 41.0 to 88.0 years) and the average PNS score was 1.57 ± 0.81 (range, 0 to 4). The model achieved an average accuracy of 81.25% and 74.38% for undilated and dilated eyes, respectively. Heat map visualization using the integrated gradient method successfully identified the anatomical area of interest in certain images. This study suggests the possibility of estimating cataract density using a portable smartphone slit lamp device without dilation. Further work is under way to validate this technique in a larger and more diverse group of eyes with cataracts. Full article

(This article belongs to the Special Issue Smartphone Sensors and Their Applications)

►▼ Show Figures

Figure 1

22 pages, 8609 KB

Open AccessArticle

Integrating SimAM Attention and S-DRU Feature Reconstruction for Sentinel-2 Imagery-Based Soybean Planting Area Extraction

by Haotong Wu, Xinwen Wan, Rong Qian, Chao Ruan, Jinling Zhao and Chuanjian Wang

Agriculture 2026, 16(6), 693; https://doi.org/10.3390/agriculture16060693 - 19 Mar 2026

Abstract

Accurate and stable acquisition of the spatial distribution of soybean planting areas is essential for supporting precision agricultural monitoring and ensuring food security. However, crop remote-sensing mapping for specific regions still faces critical data bottlenecks: high-precision, large-scale pixel-level annotation is costly, resulting in scarce available labeled samples that make it difficult to construct large-scale training datasets. Although parameter-intensive models such as FCN and SegNet can achieve sufficient end-to-end training on large-scale public remote sensing datasets like LoveDA, when directly applied to the data-limited dataset in this study area, the models are prone to overfitting, leading to a significant decline in generalization ability. To address these issues, this study proposes a lightweight U-shaped semantic segmentation model, SimSDRU-Net. The model utilizes a pre-trained VGG-16 backbone to extract shallow texture and deep semantic features. The pre-trained weights mitigate the impact of overfitting in data-limited settings. In the decoding stage, a parameter-free lightweight SimAM attention module enhances effective soybean features and suppresses soil background redundancy, while an embedded S-DRU unit fuses multi-scale features for deep complementary reconstruction to improve edge detail capture. A label dataset was constructed using Sentinel-2 images as the data source and Menard County (USA) as the study area. The USDA CDL was used as a foundation for the dataset, with Google high-resolution images serving as visual interpretation aids. In the context of the experiment, Deeplabv3+ and U-Net++ were compared with U-Net under identical conditions. The results demonstrated that SimSDRU-Net exhibited optimal performance, with MIoU of 89.03%, MPA of 93.81%, and OA of 95.96%. Specifically, SimSDRU-Net uses the SimAM attention module to generate spatial attention weights by analyzing feature statistical differences through an energy function, so as to adaptively enhance soybean texture features. Meanwhile, the S-DRU unit groups, dynamically weights, and cross-branch reconstructs multi-scale convolutional features to preserve fine boundary details and achieve accurate segmentation of soybean plots. The present study demonstrates that SimSDRU-Net integrates lightweight design and high precision in data-limited scenarios, thereby providing effective technical support for the rapid extraction of soybean planting areas in North America. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

►▼ Show Figures

Figure 1

27 pages, 7096 KB

Open AccessArticle

From Simulation to Reality: GAN-Based Transformation of Pavement Defect Images for YOLO Detection

by Jiangang Yang, Shukai Yu, Yuquan Yao, Shiji Cao and Xiaojuan Ai

Appl. Sci. 2026, 16(6), 2978; https://doi.org/10.3390/app16062978 - 19 Mar 2026

Abstract

The application of three-dimensional ground-penetrating radar (3D GPR) for intelligent pavement defect analysis is often constrained by the limited availability of labeled samples. To address this challenge, this study employed Ground Penetrating Radar Maxwell (GprMax) to simulate typical pavement defects, including cracks, loose materials, and interlayer debonding. A Cycle-Consistent Generative Adversarial Network (Cycle-GAN) was then introduced to perform style transfer on the simulated images, thereby reducing the domain gap between simulated and real radar images. Furthermore, four You Only Look Once (YOLO) models—YOLO version 5, YOLOX, YOLO version 7, and YOLO version 8—were systematically compared using real datasets to identify the best-performing model, which was subsequently used to evaluate the effect of different proportions of synthetic data on detection performance. The results demonstrated that the moderate inclusion of synthetic data improved the recognition accuracy of loose defects (from 76.7% to 78.9%), whereas its impact on crack and debonding detection was negative. Moreover, excessive reliance on synthetic data led to overfitting, thereby reducing the model’s generalization capability. Among the four models, YOLOv7 achieved the best overall performance, with a mean Average Precision (mAP) of 83.4% and a crack detection rate of 88.2%. This study thus provides a feasible technical pathway and model selection reference for automated GPR-based pavement defect identification, offering practical value for efficient and accurate road maintenance inspections. Full article

(This article belongs to the Topic Service Safety and Green Maintenance Technology for Road Infrastructure in Complex Environments)

►▼ Show Figures

Figure 1

27 pages, 2690 KB

Open AccessArticle

S2A-Swin: Spectral Smoothing–Guided Spectral–Spatial Windows with Generative Augmentation for Hyperspectral Image Classification Under Class Imbalance and Limited Labels

by Baisen Liu, Jianxin Chen, Wulin Zhang, Zhiming Dang, Xinyao Li and Weili Kong

Remote Sens. 2026, 18(6), 935; https://doi.org/10.3390/rs18060935 - 19 Mar 2026

Abstract

Hyperspectral image (HSI) classification faces the challenges of scarce labeled data and severe class imbalance, which limits the effective training and generalization capabilities of models. To address these issues, we propose S2A-Swin, a joint spatial–spectral hybrid Swin Transformer framework. First, we develop a spectral–spatial conditional generative adversarial network (SSC-cGAN), which combines spectral and spatial smoothing regularizers to synthesize class-specific image patches, thus alleviating the problems of data scarcity and class imbalance while maintaining spectral continuity and local spatial structure consistent with real data. Second, we introduce a dimension-aware hybrid Transformer module, which adds local windows along the spectral dimension to the standard spatial window, thereby facilitating cross-dimensional feature interactions and ensuring that each spectral band is modeled using the local spatial context for more efficient joint spatial–spectral modeling. In this module, attention mechanisms for spectral and spatial windows are applied alternately (“cross-sequence” attention mechanisms), the execution order of which is guided by hyperspectral prior knowledge to enhance cross-dimensional representation learning. This module is embedded in the lightweight Swin backbone and extends the traditional spatial window mechanism through spectral window attention, capturing spectral continuity while maintaining spatial structure consistency. Extensive experiments on multiple datasets demonstrate that, compared to mainstream CNN and Transformer baselines on four benchmark datasets, the proposed method achieves overall accuracy (OA) improvements of 2.45%, 7.05%, 5.17%, and 0.85%. Full article

(This article belongs to the Special Issue Advances in Video Satellite Remote Sensing and Moving Target Monitoring)

22 pages, 1509 KB

Open AccessArticle

ICTD: Combination of Improved CNN–Transformer and Enhanced Deep Canonical Correlation Analysis for Eye-Movement Emotion Classification

by Cong Zhang, Xisheng Li, Jiannan Chi, Ming Cao, Qingfeng Gu and Jiahui Liu

Brain Sci. 2026, 16(3), 330; https://doi.org/10.3390/brainsci16030330 - 19 Mar 2026

Abstract

Background/Objectives: Emotion classification based on eye-movement features has become a widely adopted approach due to the simplicity of data acquisition and the strong association between ocular responses and emotional states. However, several challenges remain with regard to existing emotion recognition methods, including the relatively weak correlation between eye-movement features and emotional labels and the fact that the key features are not prominently presented. Methods: To address abovelimitations, this study proposes an improved CNN-transformer combined with enhanced deep canonical correlation analysis network (ICTD). The proposed method first performs preprocessing and reconstruction of raw eye-movement signals to extract informative features. Subsequently, convolutional neural networks (CNNs) and transformer architectures are employed to capture local and global feature, respectively. In addition, an incremental feature feedforward network is incorporated to enhance the transformer, enabling the model to assign higher importance to salient feature information. Finally, the extracted representations are processed through deep canonical correlation analysis based on cosine similarity in order to generate classification outcomes. Results: Experiments conducted on the SEED-IV, SEED-V, and eSEE-d datasets demonstrate that the proposed ICTD framework consistently outperforms baseline approaches and attains optimal classification results. (1) On the eSEE-d dataset, the results of three-category arousal and valence classification reach 81.8% and 85.2%, respectively; (2) on the SEED-IV dataset, the emotion four-category classification result reaches 91.2%; (3) finally, on the SEED-V dataset, the emotion five-category classification result reaches 85.1%. Conclusions: The proposed ICTD framework effectively improves feature representation and classification performance, showing strong potential for practical emotion recognition and physiological signal analysis. Full article

(This article belongs to the Section Cognitive, Social and Affective Neuroscience)

►▼ Show Figures

Figure 1

14 pages, 245 KB

Open AccessReview

The Fate of Borderline Pathology in Dimensional Classification Systems: A Narrative Review

by Danilo Pesic, Dusica Lecic-Tosevski, Bojana Pejuskovic, Ana Munjiza-Jovanovic and Olivera Vukovic

Brain Sci. 2026, 16(3), 326; https://doi.org/10.3390/brainsci16030326 - 19 Mar 2026

Abstract

Recent revisions of personality disorder (PD) classifications have moved from categorical diagnoses toward dimensional models, raising renewed questions about the nosological status and clinical utility of borderline personality disorder (BPD). This narrative review traces the development of the borderline construct from early descriptions of patients positioned between neurosis and psychosis, through its theoretical consolidation within the concept of borderline personality organization, to the operationalization of BPD in DSM-III and subsequent diagnostic revisions. A central section summarizes contemporary controversies regarding the validity and utility of BPD features. Arguments for abandoning the diagnosis emphasize the absence of a distinct borderline factor in factor analytic studies, the tendency of the construct to capture fluctuating symptoms and patterns of behaviour rather than stable maladaptive personality traits, the stigmatizing and non-selective use of the label, and the lack of disorder-specific treatment approaches. In contrast, converging evidence supports the view that core borderline symptoms frequently function as markers of general PD pathology and of the severity of impairments in self and interpersonal functioning. The paper integrates the concept of the borderline level of personality functioning, conceptualizing borderline pathology as a dynamic dimension of dysfunction with potential transient regressions, and links this concept to the Level of Personality Functioning (LPF, Criterion A) within the DSM 5 Alternative Model for Personality Disorders (AMPD). Retaining borderline pathology as a dimension may support contemporary PD assessment by offering a clinically recognizable marker of overall dysfunction, a guide for rating severity, an indicator of personality structure and need for psychotherapy, without disrupting continuity with an extensive clinical and research tradition. Full article

(This article belongs to the Special Issue Advances in Borderline Personality Disorder: From Detection to Treatment)

22 pages, 4742 KB

Open AccessArticle

PromptSeg: An End-to-End Universal Medical Image Segmentation Method via Visual Prompts

by Minfan Zhao, Bingxun Wang, Jun Shi and Hong An

Entropy 2026, 28(3), 342; https://doi.org/10.3390/e28030342 - 18 Mar 2026

Viewed by 36

Abstract

Deep learning has achieved remarkable advancements in medical image segmentation, yet its generalization capability across unseen tasks remains a significant challenge. The variety of task objectives, disease-dependent labeling variations, and multi-center data contribute to the high uncertainty of task-specific models on unseen distributions. In this study, we propose PromptSeg, an innovative Transformer-based unified framework for universal 2D medical image segmentation. From an information-theoretic perspective, PromptSeg formulates the segmentation process as a conditional entropy minimization problem, utilizing visual prompts as side information to reduce the uncertainty of the target task. Guided by the information bottleneck principle, PromptSeg aims to utilize the provided visual prompts to filter out redundant noise and learn contextual representations, thereby breaking the restrictions of the task-specific paradigm. When faced with unseen datasets or segmentation targets, our method only requires a few annotated visual prompt pairs to extract task-specific semantics and segment the query images without retraining. Extensive experiments on CT and MRI datasets demonstrate that PromptSeg not only outperforms state-of-the-art methods but also exhibits strong multi-modality generalization capabilities. Full article

(This article belongs to the Special Issue Methods in Artificial Intelligence and Information Processing, 4th Edition)

►▼ Show Figures

Figure 1

22 pages, 2166 KB

Open AccessArticle

Sound-to-Image Translation Through Direct Cross-Modal Connection Using a Convolutional–Attention Generative Model

by Leonardo A. Fanzeres, Climent Nadeu and José A. R. Fonollosa

Appl. Sci. 2026, 16(6), 2942; https://doi.org/10.3390/app16062942 - 18 Mar 2026

Viewed by 34

Abstract

Sound plays a fundamental role in human perception, conveying information about events, objects, and spatial dynamics that may not be visually accessible. However, current technologies such as Acoustic Event Detection typically reduce complex soundscapes to textual labels, often failing to preserve their semantic richness. This limitation motivates the exploration of sound-to-image (S2I) translation as an alternative connection between audio and visual modalities. Unlike multimodal approaches guided by intermediary constraints during the learning process, we investigate S2I translation without class supervision, cluster-based alignment, or textual mediation, a paradigm we refer to as direct S2I translation. To the best of our knowledge, apart from our previous work, no prior study addresses S2I translation under this fully direct setting. We propose a convolutional–attention generative framework composed of an audio encoder and a densely connected GAN integrating self-attention and cross-attention mechanisms. The attention-based model is systematically compared with a purely convolutional baseline. Results show that introducing attention at early stages of the generator significantly improves translation performance, increasing the likelihood of producing interpretable and semantically coherent visual representations of sound. These findings indicate that attention strengthens semantic correspondence between audio and vision while preserving the fully direct nature of the translation process. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 110.

Go to page 1 2 3 4 5

Search Results (5,477)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI