Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (270)

Search Parameters:
Keywords = vision-based paradigm

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
41 pages, 3214 KB  
Review
The Intelligent Home: A Systematic Review of Technological Pillars, Emerging Paradigms, and Future Directions
by Khalil M. Abdelnaby, Mohammed A. F. Al-Husainy, Mohammad O. Alhawarat, Mohamed A. Rohaim, Khairy M. Assar and Khaled A. Elshafey
Symmetry 2026, 18(5), 718; https://doi.org/10.3390/sym18050718 - 24 Apr 2026
Abstract
Home automation is undergoing a paradigm shift from connected IoT environments with rule based control to intelligent homes exhibiting ambient intelligence and proactive adaptation. Artificial intelligence, privacy-preserving sensing, and converging connectivity standards are the primary forces driving this transition. This systematic literature review [...] Read more.
Home automation is undergoing a paradigm shift from connected IoT environments with rule based control to intelligent homes exhibiting ambient intelligence and proactive adaptation. Artificial intelligence, privacy-preserving sensing, and converging connectivity standards are the primary forces driving this transition. This systematic literature review synthesizes the technological foundations, architectural developments, emerging paradigms, and socio-technical challenges characterizing the next generation of smart homes, evaluated against the original Ambient Intelligence (AmI) vision. Following PRISMA 2020 guidelines, searches were conducted across four databases—IEEE Xplore, ACM Digital Library, Scopus, and Web of Science—covering studies published between January 2020 and June 2025. From 3450 records, 113 studies were selected through a two-reviewer screening procedure with inter-rater reliability assessments. Quality was assessed using a modified JBI Critical Appraisal Checklist, and findings were synthesized through thematic analysis. Three converging technological pillars were identified: multi-modal privacy-preserving sensing including mmWave radar; a hierarchical cloud-edge TinyML intelligence engine; and unified connectivity through the Matter/Thread standard. Emerging paradigms include LLM-based cognitive orchestration, hyper-personalization, Digital Twin simulation, and grid-interactive prosumer energy management. Realizing that the intelligent home vision requires addressing the privacy–security–trust trilemma, algorithmic bias, system reliability, and human–agent collaboration, a research roadmap encompassing explainable AI, privacy-by-design, lifelong learning, and standardized ethical auditing is proposed. Full article
20 pages, 5677 KB  
Article
Robust Image Watermarking via Clustered Visual State-Space Modeling
by Bo Liu and Jianhua Ren
Appl. Sci. 2026, 16(9), 4166; https://doi.org/10.3390/app16094166 - 24 Apr 2026
Viewed by 68
Abstract
Most existing DNN-based image watermarking methods adopt an “encoder–noise–decoder” paradigm, where the watermark is typically replicated and expanded in a straightforward manner and then directly fused with image features, which limits robustness under complex distortions. Although Transformers improve fusion via attention mechanisms, their [...] Read more.
Most existing DNN-based image watermarking methods adopt an “encoder–noise–decoder” paradigm, where the watermark is typically replicated and expanded in a straightforward manner and then directly fused with image features, which limits robustness under complex distortions. Although Transformers improve fusion via attention mechanisms, their quadratic computational complexity makes high-resolution processing prohibitively expensive. To address these issues, we propose CCViM, a robust watermarking framework built on Vision Mamba, which leverages the linear-complexity property of state-space models (SSMs) to enable efficient global interactions. We design a Watermark Representation Learning Module (WRLM) that performs hierarchical feature extraction and structured expansion of the watermark through cascaded VSS blocks, yielding semantically rich and perturbation-resistant watermark representations. In addition, we introduce an Interwoven Fusion Enhancement Module (IFEM), which employs a CCS6 structure to treat the watermark as a dynamic guidance signal. By combining contextual clustering with the Mamba mechanism, IFEM deeply interweaves the watermark into host features at both local and global levels. Experiments on COCO, DIV2K, and ImageNet demonstrate that CCViM consistently improves imperceptibility, robustness, and efficiency to varying degrees, and remains stable and high quality under attacks such as JPEG compression, cropping, and Gaussian blur. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision, 2nd Edition)
Show Figures

Figure 1

21 pages, 1958 KB  
Article
Adapter-Based Vision Transformer for Cross Domain Few-Shot Classification Using Prototypical Networks
by Sahar Gull and Juntae Kim
Appl. Sci. 2026, 16(8), 3994; https://doi.org/10.3390/app16083994 - 20 Apr 2026
Viewed by 278
Abstract
Cross-domain few-shot learning (CD-FSL) remains challenging in medical imaging, where labeled data are scarce and source–target domain gaps are often large due to modality differences. In particular, existing few-shot learning methods rely on source–target domain similarity, which limits their effectiveness in cross-modality settings [...] Read more.
Cross-domain few-shot learning (CD-FSL) remains challenging in medical imaging, where labeled data are scarce and source–target domain gaps are often large due to modality differences. In particular, existing few-shot learning methods rely on source–target domain similarity, which limits their effectiveness in cross-modality settings such as MRI-to-CT transfer. To address this problem, this paper proposes an adapter-based Vision Transformer framework for cross-domain few-shot brain tumor classification. Lightweight adapter modules are inserted into a pretrained Vision Transformer to enable parameter-efficient domain adaptation without fine-tuning the entire backbone. In addition, a Prototypical Network is employed to construct class prototypes from limited labeled samples, while a prototype-level Maximum Mean Discrepancy (MMD) loss is introduced to align feature distributions across domains. Unlike prior approaches, the proposed framework introduces a unified prototype-level alignment strategy within an episodic learning paradigm, enabling direct class-wise cross-modal alignment. This design improves generalization under large modality gaps and limited labeled data by jointly optimizing representation learning and domain adaptation. The proposed framework is evaluated on MRI-to-CT brain tumor classification as well as several heterogeneous cross-domain benchmarks, including Chest X-ray, ISIC, CropDisease, and EuroSAT. Experimental results demonstrate that the proposed method achieves competitive performance compared to existing few-shot learning baselines, showing strong robustness under significant domain shifts. Full article
(This article belongs to the Special Issue Artificial Intelligence Techniques for Medical Data Analytics)
Show Figures

Figure 1

17 pages, 5384 KB  
Review
Hyperspectral Sensing Enabled by Optics-Free Sensor Architectures
by Yicheng Wang, Xueyi Wang, Xintong Guo and Yining Mu
Nanomanufacturing 2026, 6(2), 8; https://doi.org/10.3390/nanomanufacturing6020008 - 20 Apr 2026
Viewed by 178
Abstract
Hyperspectral sensing allows for the capture of spatially resolved spectral data, a capability critical for applications spanning from remote sensing to biomedical diagnostics. Nevertheless, the widespread adoption of this technology is hindered by the bulk and complexity of traditional systems based on diffractive [...] Read more.
Hyperspectral sensing allows for the capture of spatially resolved spectral data, a capability critical for applications spanning from remote sensing to biomedical diagnostics. Nevertheless, the widespread adoption of this technology is hindered by the bulk and complexity of traditional systems based on diffractive optics. To overcome these hurdles, substantial research efforts have been dedicated to system miniaturization via component scaling and computational imaging. This review outlines the technological progression of compact hyperspectral imaging, ranging from miniaturized dispersive elements and tunable filters to computational snapshot designs using optical multiplexing. Although these approaches decrease system volume, they generally treat the sensor as a passive intensity recorder requiring external encoding. Therefore, we focus here on the rising paradigm of sensor-level integration made possible by nanomanufacturing. We examine optics-free architectures where spectral discrimination is embedded directly into the pixel, distinguishing between pixel-level nanophotonic filtering and intrinsic material-based selectivity. We specifically highlight emerging platforms such as compositionally engineered and cavity-enhanced perovskites, as well as electrically tunable organic or two-dimensional (2D) material heterostructures. To conclude, this review discusses persistent challenges regarding fabrication uniformity and stability, providing an outlook on the future of scalable and fully integrated hyperspectral vision systems. Full article
Show Figures

Figure 1

33 pages, 5543 KB  
Article
The New Frontier of Quality Evaluation for Visual Sensors: A Survey of Large Multimodal Model-Based Methods
by Qihang Ge, Xiongkuo Min, Sijing Wu, Yunhao Li and Guangtao Zhai
Sensors 2026, 26(8), 2530; https://doi.org/10.3390/s26082530 - 20 Apr 2026
Viewed by 362
Abstract
Visual quality assessment is entering a new frontier as media evolve from static images to temporally dynamic videos and 3D content. These visual signals are typically captured by sensing devices such as cameras and depth sensors, whose acquisition characteristics significantly influence perceptual quality. [...] Read more.
Visual quality assessment is entering a new frontier as media evolve from static images to temporally dynamic videos and 3D content. These visual signals are typically captured by sensing devices such as cameras and depth sensors, whose acquisition characteristics significantly influence perceptual quality. Traditional quality models, including distortion-centric and regression-based approaches, perform well on conventional degradations but struggle to evaluate higher-level attributes such as semantic plausibility and structural coherence in modern AI-generated and multimodal scenarios. The emergence of large multimodal models (LMMs), including vision–language models (VLMs) and multimodal large language models (MLLMs), reshapes the evaluation paradigm by enabling semantic grounding, instruction-driven assessment, and explainable reasoning. This survey presents a unified perspective on visual quality assessment for sensor-captured visual data across image, video, and 3D modalities. We review conventional deep learning approaches and recent LMM-based methods, highlighting how multimodal fusion and language-conditioned reasoning transform quality assessment from scalar prediction to perceptual intelligence. Finally, we discuss key challenges and future opportunities for building efficient, robust, and sensor-aware visual quality assessment systems. Full article
(This article belongs to the Special Issue Perspectives in Intelligent Sensors and Sensing Systems)
Show Figures

Figure 1

39 pages, 6816 KB  
Article
Automatic Calibration of Robotic 3D Printer Swarms for Cooperative 3D Printing
by Swaleh Owais, Charith Oshadi Nanayakkara Ratnayake, Ali Ugur, Zhenghui Sha and Wenchao Zhou
Machines 2026, 14(4), 443; https://doi.org/10.3390/machines14040443 - 16 Apr 2026
Viewed by 218
Abstract
Cooperative 3D printing (C3DP) is an additive manufacturing paradigm where a swarm of robotic 3D printers work cooperatively in a shared environment to fabricate continuous parts. Reliable operation requires both accurate per-printer kinematic calibration and cross-printer spatial alignment. This paper presents an automatic [...] Read more.
Cooperative 3D printing (C3DP) is an additive manufacturing paradigm where a swarm of robotic 3D printers work cooperatively in a shared environment to fabricate continuous parts. Reliable operation requires both accurate per-printer kinematic calibration and cross-printer spatial alignment. This paper presents an automatic vision-based XY calibration workflow for C3DP using ArUco fiducials and low-cost monocular cameras. The method performs intra-printer kinematic calibration and inter-printer alignment through peer-to-peer observations without fixed global infrastructure. In a two-printer Selective Compliance Assembly Robot Arm (SCARA) Fused Filament Fabrication (FFF) testbed, the automatic workflow reduced total calibration time from 157.19 min (manual) to 36.49 min while improving positional consistency and print accuracy. For individual-printer artifacts, the mean Euclidean error was 0.03 ± 0.02 mm, whereas cooperative artifacts exhibited a mean Euclidean error of 0.078 ± 0.002 mm. These results show that practical and repeatable C3DP calibration can be achieved with low-cost vision hardware. Full article
28 pages, 3232 KB  
Article
Fisher-DARTS: A Neural Architecture Search Framework with Fisher Information Optimization
by Yu Zhang and Changyuan Wang
Appl. Sci. 2026, 16(8), 3808; https://doi.org/10.3390/app16083808 - 14 Apr 2026
Viewed by 379
Abstract
Differentiable Neural Architecture Search has emerged as a powerful paradigm for automated network design, yet it suffers from a fundamental optimization inconsistency problem: Architectures optimized under continuous relaxation often fail to maintain their performance after discretization. To address this challenge, we propose Fisher-DARTS—a [...] Read more.
Differentiable Neural Architecture Search has emerged as a powerful paradigm for automated network design, yet it suffers from a fundamental optimization inconsistency problem: Architectures optimized under continuous relaxation often fail to maintain their performance after discretization. To address this challenge, we propose Fisher-DARTS—a Fisher information-driven differentiable NAS framework. The proposed method introduces three key innovations: (1) a Fisher information-based momentum update mechanism that guides architectural parameters toward statistically significant operations, aligning the search objective with discrete deployment; (2) a progressive three-region pruning strategy that adaptively eliminates redundant operations with low Fisher information, ensuring architectural compactness; and (3) a cell-weighted fusion module that preserves multi-scale features across stacked cells. Additionally, the search space is expanded by incorporating attention mechanisms to enhance feature representation capability. The proposed framework is generic and applicable to a wide range of vision tasks. To validate its effectiveness, we apply it to gaze estimation—a core technology in multimodal human–computer interaction. Experimental results on three public datasets, MPIIFaceGaze, RT-GENE, and ETH-XGaze, demonstrate that Fisher-DARTS achieves mean angular errors of 3.22°, 5.45°, and 4.12°, respectively, outperforming hand-designed networks and existing NAS-based gaze estimation models. These results validate the effectiveness of the proposed Fisher-driven NAS framework and its generalization capability across diverse scenarios. Full article
Show Figures

Figure 1

38 pages, 5277 KB  
Review
Artificial Intelligence in Pulmonary Endoscopy: Current Evidence, Limitations, and Future Directions
by Sara Lopes, Miguel Mascarenhas, João Fonseca and Adelino F. Leite-Moreira
J. Imaging 2026, 12(4), 167; https://doi.org/10.3390/jimaging12040167 - 12 Apr 2026
Viewed by 253
Abstract
Background: Artificial intelligence (AI) is increasingly applied in pulmonary endoscopy, including diagnostic bronchoscopy, interventional pulmonology and endobronchial imaging. Advances in computer vision, machine learning and robotic systems have expanded the potential for automated lesion detection, navigation to peripheral pulmonary lesions, and real-time procedural [...] Read more.
Background: Artificial intelligence (AI) is increasingly applied in pulmonary endoscopy, including diagnostic bronchoscopy, interventional pulmonology and endobronchial imaging. Advances in computer vision, machine learning and robotic systems have expanded the potential for automated lesion detection, navigation to peripheral pulmonary lesions, and real-time procedural support. However, the current evidence base remains heterogeneous, and translational challenges persist. Methods: This review summarizes current applications and developments of AI across white-light bronchoscopy (WLB), image-enhanced bronchoscopy (e.g., narrow-band imaging and autofluorescence imaging), endobronchial ultrasound (EBUS), virtual and robotic bronchoscopies, and workflow optimization and training. The authors also examine the methodological limitations, regulatory considerations, and implementation barriers that affect translation into routine practice. Results: Reported developments include deep learning-based models for mucosal abnormality detection, lymph-node characterization during EBUS-guided transbronchial needle aspiration (EBUS-TBNA), improved lesion localization, and reduction in operator-dependent variability. Additionally, AI-assisted simulation platforms and decision-support tools are reshaping training paradigms. Nevertheless, most studies remain retrospective or single-center, with limited external validation, dataset heterogeneity, unclear model explainability, and incomplete integration into clinical workflows. Conclusions: AI has the potential to support lesion detection, navigation, and training in pulmonary endoscopy. However, robust prospective validation, standardized datasets, transparent model reporting, robust data governance, multidisciplinary collaboration, and careful integration into clinical practice are required before widespread adoption. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

17 pages, 570 KB  
Perspective
Towards a Closed-Loop Bioengineering Framework for Immersive VR-Based Telerehabilitation Integrating Wearable Biosensing and Adaptive Feedback
by Gaia Roccaforte, Arianna Sinardi, Sofia Ruello, Carmela Lipari, Flavio Corpina, Antonio Epifanio, Anna Isgrò, Francesco Davide Russo, Alfio Puglisi, Giovanni Pioggia and Flavia Marino
Bioengineering 2026, 13(4), 439; https://doi.org/10.3390/bioengineering13040439 - 9 Apr 2026
Viewed by 538
Abstract
Telerehabilitation—the remote delivery of rehabilitation services—is undergoing a paradigm shift with the convergence of immersive virtual reality (VR) and wearable biosensor technologies. This perspective article outlines a vision for home-based motor and cognitive rehabilitation that is engaging, personalized, and data-driven. We describe how [...] Read more.
Telerehabilitation—the remote delivery of rehabilitation services—is undergoing a paradigm shift with the convergence of immersive virtual reality (VR) and wearable biosensor technologies. This perspective article outlines a vision for home-based motor and cognitive rehabilitation that is engaging, personalized, and data-driven. We describe how immersive VR environments (for example, simulations of home settings or supermarkets) coupled with wearable sensors can address current challenges in rehabilitation by increasing patient motivation, enabling real-time biofeedback, and supporting remote clinician supervision. Gamification mechanisms and rich sensory feedback in VR are highlighted as key strategies to enhance user engagement and adherence to therapy. We discuss conceptual innovations such as multi-sensor data integration, dynamic difficulty adaptation, and AI-driven personalization of exercises, derived from recent research and our development experience, and consider their potential benefits for patients with neuro-cognitive-motor impairments (e.g., stroke, Parkinson’s disease, and multiple sclerosis). Implementation scenarios for home-based therapy are presented, emphasizing scalability, standardized digital metrics for monitoring progress, and seamless involvement of clinicians via telehealth platforms. We also critically examine the current limitations of VR and telehealth rehabilitation and how an integrative model could overcome these barriers. More specifically, this perspective defines the engineering requirements of a closed-loop VR-based telerehabilitation framework, including multimodal data synchronization, calibration, signal-quality management, interpretable adaptive control, digital biomarker validation, and practical strategies to improve accessibility, privacy, and scalability in home-based neurological rehabilitation. Full article
(This article belongs to the Special Issue Physical Therapy and Rehabilitation)
Show Figures

Figure 1

26 pages, 12156 KB  
Article
Precision Micro-Vibration Measurement for Linear Array Imaging via Complex Morlet Wavelet Phase Magnification
by Meiyi Zhu, Dezhi Zheng, Ying Zhang and Shuai Wang
Appl. Sci. 2026, 16(7), 3518; https://doi.org/10.3390/app16073518 - 3 Apr 2026
Viewed by 278
Abstract
Traditional vision-based vibration measurement is fundamentally constrained by the low sampling rates of area-scan cameras and the noise sensitivity of existing motion magnification algorithms. To overcome these spatiotemporal barriers, we propose a high-fidelity framework that integrates ultra-high-speed line-scan imaging with a 1D Complex [...] Read more.
Traditional vision-based vibration measurement is fundamentally constrained by the low sampling rates of area-scan cameras and the noise sensitivity of existing motion magnification algorithms. To overcome these spatiotemporal barriers, we propose a high-fidelity framework that integrates ultra-high-speed line-scan imaging with a 1D Complex Morlet Wavelet Phase-Based Video Magnification (CMW-PVM) algorithm. By extracting and manipulating the localized phase of 1D spatial signals, CMW-PVM effectively decouples structural dynamics from background noise while eliminating the computational redundancy associated with 2D spatial pyramid methods. Simulations demonstrate that CMW-PVM significantly extends the linear magnification range (up to α35) while preserving exceptional structural fidelity (FSIM >0.87) under severe noise conditions (SNR = 10 dB). Experimental validation against a laser Doppler vibrometer (LDV) reveals near-perfect kinematic accuracy, with a relative amplitude error of only 1.65%. Furthermore, at a 100 Hz high-frequency excitation, the system successfully resolves microscopic displacements (≈10 μm) without temporal aliasing—enabled not by violating sampling theory but by leveraging the high physical line rate of the line-scan sensor. This establishes a robust, non-contact, and computationally efficient paradigm for broadband, micro-amplitude vibration monitoring in industrial environments. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 3rd Edition)
Show Figures

Figure 1

38 pages, 2287 KB  
Article
Universal Comparison Methodology for Hough Transform Approaches
by Danil Kazimirov, Vitalii Gulevskii, Alexey Kroshnin, Ekaterina Rybakova, Arseniy Terekhin, Elena Limonova and Dmitry Nikolaev
Mathematics 2026, 14(7), 1136; https://doi.org/10.3390/math14071136 - 28 Mar 2026
Viewed by 398
Abstract
The Hough transform (HT) is widely used in computer vision, tomography, and neural networks. Numerous algorithms for HT computation have been proposed, making their systematic comparison essential. However, existing comparative methodologies are either non-universal and limited to certain HT formulations or task-oriented, relying [...] Read more.
The Hough transform (HT) is widely used in computer vision, tomography, and neural networks. Numerous algorithms for HT computation have been proposed, making their systematic comparison essential. However, existing comparative methodologies are either non-universal and limited to certain HT formulations or task-oriented, relying on application-specific criteria that do not fully capture algorithmic properties. This paper introduces a novel unified methodology for the systematic comparison of HT algorithms. It evaluates key characteristics, including computational complexity, accuracy, and auxiliary space complexity, while explicitly accounting for the property of self-adjointness. The methodology integrates both implementation-level and theoretical considerations related to the interpretation of HT as a discrete approximation of the Radon transform. A set of mathematically justified evaluation functions, not previously described in the literature, is proposed to support our methodology. Importantly, the methodology is universal, applicable across diverse HT paradigms, encompasses pattern-based and Fourier-based fast HT (FHT) algorithms, and offers a comprehensive alternative to existing task-specific methodologies. Its application to several state-of-the-art FHT algorithms (FHT2DT, FHT2SP, ASD2, KHM, and Fast Slant Stack) yields new experimentally confirmed theoretical insights, identifies ASD2 as the most balanced algorithm, and provides practical guidelines for algorithm selection. In particular, the methodology reveals that for image sizes up to 3000, the maximum normalized computational complexity increases as follows: FHT2DT (1.1), ASD2 (15.3), and KHM (30.6), while the remaining algorithms exhibit at least 1.1 times higher values. The maximum orthotropic approximation error equals 0.5 for ASD2, KHM, and Fast Slant Stack; lies between 0.5 and 1.5 for FHT2SP; and reaches 2.1 for FHT2DT. In terms of worst-case normalized auxiliary space complexity, the lowest values are achieved by FHT2DT (2.0), Fast Slant Stack (4.0, lower bound), and ASD2 (6.8), with all other algorithms requiring at least 8.2 times more memory. Full article
Show Figures

Figure 1

26 pages, 2135 KB  
Article
Mapping Research Trends in Road Safety: A Topic Modeling Perspective
by Iulius Alexandru Tudor and Florin Gîrbacia
Vehicles 2026, 8(4), 69; https://doi.org/10.3390/vehicles8040069 - 27 Mar 2026
Viewed by 592
Abstract
Over the past decade, road safety research has experienced rapid development due to the rapid expansion of large crash databases, the adoption of artificial intelligence techniques, and the demand for proactive and predictive safety solutions. This study conducts a data-driven review of recent [...] Read more.
Over the past decade, road safety research has experienced rapid development due to the rapid expansion of large crash databases, the adoption of artificial intelligence techniques, and the demand for proactive and predictive safety solutions. This study conducts a data-driven review of recent research trends in transport safety. It focuses on main domains including crash severity analysis, human factors, vulnerable road users (VRUs), spatial modeling, and artificial intelligence applications. A systematic search of the Scopus database identified 15,599 relevant scientific papers published between 2016 and 2025. After constructing this corpus, titles, abstracts, and keywords were preprocessed using a natural language pipeline. The analysis employed BERTopic, a transformer-based topic modeling framework. The analysis identified 29 distinct research topics, further synthesized into five major thematic areas: (1) crash severity and injury analysis, (2) driver behavior and human factors, (3) vulnerable road users, (4) artificial intelligence, machine learning, and computer vision in intelligent transportation systems, and (5) spatial analysis and hotspot detection. A notable increase in publications related to artificial intelligence and machine learning has been evident since 2020. The results show a transition from descriptive, post-crash studies to integrated, multimodal, predictive analysis. Overall, the findings reveal a paradigm shift in the field. This study also identifies ethical and economic issues associated with the use of artificial intelligence in intelligent transportation systems, including data management, infrastructure requirements, system security, and model transparency. The results signify a transition from intuition-based models to explainable, spatially explicit, and data-intensive models, ultimately facilitating proactive risk assessment and informed decision-making. Full article
(This article belongs to the Special Issue Intelligent Mobility and Sustainable Automotive Technologies)
Show Figures

Figure 1

23 pages, 51743 KB  
Article
Debiased Multiplex Tokenization Using Mamba-Based Pointers for Efficient and Versatile Map-Free Visual Relocalization
by Wenshuai Wang, Hong Liu, Shengquan Li, Peifeng Jiang, Dandan Che and Runwei Ding
Mach. Learn. Knowl. Extr. 2026, 8(3), 83; https://doi.org/10.3390/make8030083 - 23 Mar 2026
Viewed by 352
Abstract
Visual localization plays a critical role for mobile robots to estimate their position and orientation in GPS-denied environments. However, its efficiency, robustness, and generalization are fundamentally undermined by severe viewpoint changes and dramatic appearance variations, which present persistent challenges for image-based feature representation [...] Read more.
Visual localization plays a critical role for mobile robots to estimate their position and orientation in GPS-denied environments. However, its efficiency, robustness, and generalization are fundamentally undermined by severe viewpoint changes and dramatic appearance variations, which present persistent challenges for image-based feature representation and pose estimation under real-world conditions. Recently, map-free visual relocalization (MFVR) has emerged as a promising paradigm for lightweight deployment and privacy isolation on edge devices, while how to learn compact and invariant image tokens without relying on structural 3D maps still remains a core problem, particularly in highly dynamic or long-term scenarios. In this paper, we propose the Debiased Multiplex Tokenizer as a novel method (termed as DMT-Loc) for efficient and versatile MFVR to address these issues. Specifically, DMT-Loc is built upon a pretrained vision Mamba encoder and integrates three key modules for relative pose regression: First, Multiplex Interactive Tokenization yields robust image tokens with non-local affinities and cross-domain descriptions. Second, Debiased Anchor Registration facilitates anchor token matching through proximity graph retrieval and autoregressive pointer attribution. Third, Geometry-Informed Pose Regression empowers multi-layer perceptrons with a symmetric swap gating mechanism operating inside each decoupled regression head to support accurate and flexible pose prediction in both pair-wise and multi-view modes. Extensive evaluations across seven public datasets demonstrate that DMT-Loc substantially outperforms existing baselines and ablation variants in diverse indoor and outdoor environments. Full article
Show Figures

Graphical abstract

22 pages, 2166 KB  
Article
Sound-to-Image Translation Through Direct Cross-Modal Connection Using a Convolutional–Attention Generative Model
by Leonardo A. Fanzeres, Climent Nadeu and José A. R. Fonollosa
Appl. Sci. 2026, 16(6), 2942; https://doi.org/10.3390/app16062942 - 18 Mar 2026
Viewed by 297
Abstract
Sound plays a fundamental role in human perception, conveying information about events, objects, and spatial dynamics that may not be visually accessible. However, current technologies such as Acoustic Event Detection typically reduce complex soundscapes to textual labels, often failing to preserve their semantic [...] Read more.
Sound plays a fundamental role in human perception, conveying information about events, objects, and spatial dynamics that may not be visually accessible. However, current technologies such as Acoustic Event Detection typically reduce complex soundscapes to textual labels, often failing to preserve their semantic richness. This limitation motivates the exploration of sound-to-image (S2I) translation as an alternative connection between audio and visual modalities. Unlike multimodal approaches guided by intermediary constraints during the learning process, we investigate S2I translation without class supervision, cluster-based alignment, or textual mediation, a paradigm we refer to as direct S2I translation. To the best of our knowledge, apart from our previous work, no prior study addresses S2I translation under this fully direct setting. We propose a convolutional–attention generative framework composed of an audio encoder and a densely connected GAN integrating self-attention and cross-attention mechanisms. The attention-based model is systematically compared with a purely convolutional baseline. Results show that introducing attention at early stages of the generator significantly improves translation performance, increasing the likelihood of producing interpretable and semantically coherent visual representations of sound. These findings indicate that attention strengthens semantic correspondence between audio and vision while preserving the fully direct nature of the translation process. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

20 pages, 48606 KB  
Article
GMUD-Net: Global Modulated Unbalanced Dual-Branch Network for Image Restoration in Various Degraded Environments
by Shengchun Wang, Yingjie Liu and Huijie Zhu
Appl. Sci. 2026, 16(6), 2854; https://doi.org/10.3390/app16062854 - 16 Mar 2026
Viewed by 260
Abstract
Image restoration has wide applications in the field of computer vision, yet existing methods suffer from limitations. CNNs struggle to capture long-range dependencies, while transformers exhibit insufficient performance in handling local details and high computational complexity. Additionally, existing dual-branch networks fail to define [...] Read more.
Image restoration has wide applications in the field of computer vision, yet existing methods suffer from limitations. CNNs struggle to capture long-range dependencies, while transformers exhibit insufficient performance in handling local details and high computational complexity. Additionally, existing dual-branch networks fail to define a clear dominant–auxiliary role between branches, leading to redundancy and high computational costs. This paper proposes a Global Modulated Unbalanced Dual-Branch Network (GMUD-Net), which innovatively adopts an unbalanced structure with a CNN as the main branch and a transformer as the auxiliary branch. Specifically, the CNN branch achieves strong restoration capability by integrating the global–local hybrid backbone block (GLBB) and the frequency-based global attention module (FGAM). As the key building block in the CNN branch, GLBB integrates a local backbone branch, a global Fourier branch, and a residual branch to fuse local details with global context. Meanwhile, FGAM leverages the fast Fourier transform at the bottleneck to enhance cross-channel interaction and improve global restoration performance. In addition, the lightweight transformer branch employs efficient cross-channel attention to provide complementary global cues, which are filtered and injected into the CNN branch via the global attention guidance block (GAG). These designs integrate the advantages of both CNNs and transformers while significantly reducing computational burden, offering a new paradigm to address the limitations of traditional dual-branch architectures. Experimental results demonstrate that compared with existing algorithms, the proposed method achieves state-of-the-art or highly competitive performance in both quantitative evaluations and qualitative results across nine datasets. Full article
(This article belongs to the Special Issue AI-Driven Image and Signal Processing)
Show Figures

Figure 1

Back to TopTop