Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,251)

Search Parameters:
Keywords = vision interaction

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 3616 KB  
Article
A Deep Learning-Driven Semantic Mapping Strategy for Robotic Inspection of Desalination Facilities
by Albandari Alotaibi, Reem Alrashidi, Hanan Alatawi, Lamaa Duwayriat, Aseel Binnouh, Tareq Alhmiedat and Ahmad Al-Qerem
Machines 2025, 13(12), 1129; https://doi.org/10.3390/machines13121129 - 8 Dec 2025
Viewed by 100
Abstract
The area of robot autonomous navigation has become essential for reducing labor-intensive tasks. These robots’ current navigation systems are based on sensed geometrical structures of the environment, with the engagement of an array of sensor units such as laser scanners, range-finders, and light [...] Read more.
The area of robot autonomous navigation has become essential for reducing labor-intensive tasks. These robots’ current navigation systems are based on sensed geometrical structures of the environment, with the engagement of an array of sensor units such as laser scanners, range-finders, and light detection and ranging (LiDAR) in order to obtain the environment layout. Scene understanding is an important task in the development of robots that need to act autonomously. Hence, this paper presents an efficient semantic mapping system that integrates LiDAR, RGB-D, and odometry data to generate precise and information-rich maps. The proposed system enables the automatic detection and labeling of critical infrastructure components, while preserving high spatial accuracy. As a case study, the system was applied to a desalination plant, where it interactively labeled key entities by integrating Simultaneous Localization and Mapping (SLAM) with vision-based techniques in order to determine the location of installed pipes. The developed system was validated using an efficient development environment known as Robot Operating System (ROS) and a two-wheel-drive robot platform. Several simulations and real-world experiments were conducted to validate the efficiency of the developed semantic mapping system. The obtained results are promising, as the developed semantic map generation system achieves an average object detection accuracy of 84.97% and an average localization error of 1.79 m. Full article
Show Figures

Figure 1

28 pages, 29492 KB  
Article
RSAM: Vision-Language Two-Way Guidance for Referring Remote Sensing Image Segmentation
by Zilong Zhao, Xin Xu, Bingxin Huang, Hongjia Chen and Fangling Pu
Remote Sens. 2025, 17(24), 3960; https://doi.org/10.3390/rs17243960 - 8 Dec 2025
Viewed by 142
Abstract
Referring remote sensing image segmentation (RRSIS) aims to accurately segment target objects in remote sensing images based on natural language instructions. Despite its growing relevance, progress in this field is constrained by limited datasets and weak cross-modal alignment. To support RRSIS research, we [...] Read more.
Referring remote sensing image segmentation (RRSIS) aims to accurately segment target objects in remote sensing images based on natural language instructions. Despite its growing relevance, progress in this field is constrained by limited datasets and weak cross-modal alignment. To support RRSIS research, we construct referring image segmentation in optical remote sensing (RISORS), a large-scale benchmark containing 36,697 instruction–mask pairs. RISORS provides diverse and high-quality samples that enable comprehensive experiment in remote sensing contexts. Building on this foundation, we propose Referring-SAM (RSAM), a novel framework that extends Segment Anything Model 2 to support text-prompted segmentation. RSAM integrates a Two-Way Guidance Module (TWGM) and a Multimodal Mask Decoder (MMMD). TWGM facilitates a two-way guidance mechanism that mutually refines image and text features, with positional encodings incorporated across all attention layers to significantly enhance relational reasoning. MMMD effectively separates textual prompts from spatial prompts, improving segmentation accuracy in complex multimodal settings. Extensive experiments on RISORS, as well as RefSegRS and RRSIS-D datasets, demonstrate that RSAM achieves state-of-the-art performance, particularly in segmenting small and diverse targets. Ablation studies further validate the individual contributions of TWGM and MMMD. This work provides a solid foundation for further developments in integrated vision-language analysis within remote sensing applications. Full article
Show Figures

Figure 1

48 pages, 11913 KB  
Article
A Symbiotic Digital Environment Framework for Industry 4.0 and 5.0: Enhancing Lifecycle Circularity
by Pedro Ponce, Javier Maldonado-Romo, Brian W. Anthony, Russel Bradley and Luis Montesinos
Eng 2025, 6(12), 355; https://doi.org/10.3390/eng6120355 - 6 Dec 2025
Viewed by 249
Abstract
This paper introduces a Symbiotic Digital Environment Framework (SDEF) that integrates Human Digital Twins (HDTs) and Machine Digital Twins (MDTs) to advance lifecycle circularity across all stages of the CADMID model (i.e., Concept, Assessment, Design, Manufacture, In-Service, and Disposal). Unlike existing frameworks that [...] Read more.
This paper introduces a Symbiotic Digital Environment Framework (SDEF) that integrates Human Digital Twins (HDTs) and Machine Digital Twins (MDTs) to advance lifecycle circularity across all stages of the CADMID model (i.e., Concept, Assessment, Design, Manufacture, In-Service, and Disposal). Unlike existing frameworks that address either digital twins or sustainability in isolation, SDEF establishes a bidirectional adaptive system where human, machine, and environmental digital entities continuously interact to co-optimize performance, resource efficiency, and well-being. The framework’s novelty lies in unifying human-centric adaptability (via HDTs) with circular economy principles to enable real-time symbiosis between industrial processes and their operators. Predictive analytics, immersive simulation, and continuous feedback loops dynamically adjust production parameters based on operator states and environmental conditions, extending asset lifespan while minimizing waste. Two simulation-based scenarios in VR using synthetic data demonstrate the framework’s capacity to integrate circularity metrics (material throughput, energy efficiency, remanufacturability index) with human-machine interaction variables in virtual manufacturing environments. SDEF bridges Industry 4.0’s automation capabilities and Industry 5.0’s human-centric vision, offering a scalable pathway toward sustainable and resilient industrial ecosystems by closing the loop between physical and digital realms. Full article
Show Figures

Figure 1

17 pages, 3220 KB  
Article
ArecaNet: Robust Facial Emotion Recognition via Assembled Residual Enhanced Cross-Attention Networks for Emotion-Aware Human–Computer Interaction
by Jaemyung Kim and Gyuho Choi
Sensors 2025, 25(23), 7375; https://doi.org/10.3390/s25237375 - 4 Dec 2025
Viewed by 260
Abstract
Recently, the convergence of advanced sensor technologies and innovations in artificial intelligence and robotics has highlighted facial emotion recognition (FER) as an essential component of human–computer interaction (HCI). Traditional FER studies based on handcrafted features and shallow machine learning have shown a limited [...] Read more.
Recently, the convergence of advanced sensor technologies and innovations in artificial intelligence and robotics has highlighted facial emotion recognition (FER) as an essential component of human–computer interaction (HCI). Traditional FER studies based on handcrafted features and shallow machine learning have shown a limited performance, while convolutional neural networks (CNNs) have improved nonlinear emotion pattern analysis but have been constrained by local feature extraction. Vision transformers (ViTs) have addressed this by leveraging global correlations, yet both CNN- and ViT-based single networks often suffer from overfitting, single-network dependency, and information loss in ensemble operations. To overcome these limitations, we propose ArecaNet, an assembled residual enhanced cross-attention network that integrates multiple feature streams without information loss. The framework comprises (i) channel and spatial feature extraction via SCSESResNet, (ii) landmark feature extraction from specialized sub-networks, (iii) iterative fusion through residual enhanced cross-attention, (iv) final emotion classification from the fused representation. Our research introduces a novel approach by integrating pre-trained sub-networks specialized in facial recognition with an attention mechanism and our uniquely designed main network, which is optimized for size reduction and efficient feature extraction. The extracted features are fused through an iterative residual enhanced cross-attention mechanism, which minimizes information loss and preserves complementary representations across networks. This strategy overcomes the limitations of conventional ensemble methods, enabling seamless feature integration and robust recognition. The experimental results show that the proposed ArecaNet achieved accuracies of 97.0% and 97.8% using the public databases, FER-2013 and RAF-DB, which were 4.5% better than the existing state-of-the-art method, PAtt-Lite, for FER-2013 and 2.75% for RAF-DB, and achieved a new state-of-the-art accuracy for each database. Full article
(This article belongs to the Special Issue Sensor-Based Behavioral Biometrics)
Show Figures

Figure 1

18 pages, 445 KB  
Article
Exploring the Coordination of Cancer Care for Teenagers and Young Adults in England and Wales: BRIGHTLIGHT_2021 Rapid Qualitative Study
by Elysse Bautista-Gonzalez, Rachel M. Taylor, Lorna A. Fern, Julie A. Barber, Jamie Cargill, Rozalia Dobrogowska, Richard G. Feltbower, Laura Haddad, Nicolas Hall, Maria Lawal, Martin G. McCabe, Sophie Moniz, Louise Soanes, Dan P. Stark and Cecilia Vindrola-Padros
Cancers 2025, 17(23), 3874; https://doi.org/10.3390/cancers17233874 - 3 Dec 2025
Viewed by 215
Abstract
Background: Commissioning of ‘joint care’ across teenage and young adult (TYA) principal treatment centres (PTC) and regional designated hospitals was introduced to enable cancer care closer to home, while providing support through the TYA multidisciplinary team. We aimed to explore the processes being [...] Read more.
Background: Commissioning of ‘joint care’ across teenage and young adult (TYA) principal treatment centres (PTC) and regional designated hospitals was introduced to enable cancer care closer to home, while providing support through the TYA multidisciplinary team. We aimed to explore the processes being used to enable inter-organisational collaboration under joint care models through rapid ethnography. Methods: Healthcare professionals in TYA PTCs in England and Wales between June 2022 and December 2023 were identified by the TYA lead in each PTC as delivering TYA cancer care. Semi-structured interviews were conducted virtually or by telephone based on the structuration model of collaboration proposed by D’Amour. Data were analysed against the model through framework analysis. Results: Our study highlighted variation across the different dimensions of inter-organisational collaboration. We found that healthcare professionals delivering TYA cancer care were working toward a shared goal but this was not always achieved. Social interaction between professionals was required to develop relationships and trust, but opportunities for social interaction were not regularly available. Processes for sharing information were not streamlined, so there were instances when information could not be shared between organisations. Interventions to achieve coordinated care, such as an outreach team, supported the delivery of joint care but these were not available in every region. While there were some levels of leadership within aspects of services, there were limited examples nationally or across geographical regions, which hindered the development of coordinated care. Conclusions: Coordination of care is mostly developing; however, the shared vision and goals dimension did achieve full active collaboration. The implementation of a service specification will address regional leadership requirements, but resources are required to extend the delivery of interventions to support coordination and collaboration, allowing the commissioned model of care to be delivered safely. Full article
(This article belongs to the Special Issue New Developments in Adolescent and Young Adult Oncology)
Show Figures

Figure 1

44 pages, 10088 KB  
Article
NAIA: A Robust Artificial Intelligence Framework for Multi-Role Virtual Academic Assistance
by Adrián F. Pabón M., Kenneth J. Barrios Q., Samuel D. Solano C. and Christian G. Quintero M.
Systems 2025, 13(12), 1091; https://doi.org/10.3390/systems13121091 - 3 Dec 2025
Viewed by 308
Abstract
Virtual assistants in academic environments often lack comprehensive multimodal integration and specialized role-based architecture. This paper presents NAIA (Nimble Artificial Intelligence Assistant), a robust artificial intelligence framework designed for multi-role virtual academic assistance through a modular monolithic approach. The system integrates Large Language [...] Read more.
Virtual assistants in academic environments often lack comprehensive multimodal integration and specialized role-based architecture. This paper presents NAIA (Nimble Artificial Intelligence Assistant), a robust artificial intelligence framework designed for multi-role virtual academic assistance through a modular monolithic approach. The system integrates Large Language Models (LLMs), Computer Vision, voice processing, and animated digital avatars within five specialized roles: researcher, receptionist, personal skills trainer, personal assistant, and university guide. NAIA’s architecture implements simultaneous voice, vision, and text processing through a three-model LLM system for optimized response quality, Redis-based conversation state management for context-aware interactions, and strategic third-party service integration with OpenAI, Backblaze B2, and SerpAPI. The framework seamlessly connects with the institutional ecosystem through Microsoft Graph API integration, while the frontend delivers immersive experiences via 3D avatar rendering using Ready Player Me and Mixamo. System effectiveness is evaluated through a comprehensive mixed-methods approach involving 30 participants from Universidad del Norte, employing Technology Acceptance Model (TAM2/TAM3) constructs and System Usability Scale (SUS) assessments. Results demonstrate strong user acceptance: 93.3% consider NAIA useful overall, 93.3% find it easy to use and learn, 100% intend to continue using and recommend it, and 90% report confident independent operation. Qualitative analysis reveals high satisfaction with role specialization, intuitive interface design, and institutional integration. The comparative analysis positions NAIA’s distinctive contributions through its synthesis of institutional knowledge integration with enhanced multimodal capabilities and specialized role architecture, establishing a comprehensive framework for intelligent human-AI interaction in modern educational environments. Full article
Show Figures

Figure 1

28 pages, 2792 KB  
Article
Multimodal Deep Learning Framework for Automated Usability Evaluation of Fashion E-Commerce Sites
by Nahed Alowidi
J. Theor. Appl. Electron. Commer. Res. 2025, 20(4), 343; https://doi.org/10.3390/jtaer20040343 - 3 Dec 2025
Viewed by 328
Abstract
Effective website usability assessment is crucial for improving user experience, driving customer satisfaction, and ensuring business success, particularly in the competitive e-commerce sector. Traditional methods, such as expert reviews and user testing, are resource-intensive and often fail to fully capture the complex interplay [...] Read more.
Effective website usability assessment is crucial for improving user experience, driving customer satisfaction, and ensuring business success, particularly in the competitive e-commerce sector. Traditional methods, such as expert reviews and user testing, are resource-intensive and often fail to fully capture the complex interplay between a site’s aesthetic design and its technical performance. This paper introduces an end-to-end multimodal deep learning framework that automates the usability assessment of fashion e-commerce websites. The framework fuses structured numerical indicators (e.g., load time, mobile compatibility) with high-level visual features extracted from full-page screenshots. The proposed framework employs a comprehensive set of visual backbones—including modern architectures such as ConvNeXt and Vision Transformers (ViT, Swin) alongside established CNNs—and systematically evaluates three fusion strategies: early fusion, late fusion, and a state-of-the-art cross-modal fusion strategy that enables deep, bidirectional interactions between modalities. Extensive experiments demonstrate that the cross-modal fusion approach, particularly when paired with a ConvNeXt backbone, achieves superior performance with a 0.92 accuracy and 0.89 F1-score, outperforming both unimodal and simpler fusion baselines. Model interpretability is provided through SHAP and LIME, confirming that the predictions align with established usability principles and generate actionable insights. Although validated on fashion e-commerce sites, the framework is highly adaptable to other domains—such as e-learning and e-government—via domain-specific data and light fine-tuning. It provides a robust, explainable benchmark for data-driven, multimodal website usability assessment and paves the way for more intelligent, automated user-experience optimization. Full article
Show Figures

Figure 1

38 pages, 3741 KB  
Article
Hybrid Convolutional Vision Transformer for Robust Low-Channel sEMG Hand Gesture Recognition: A Comparative Study with CNNs
by Ruthber Rodriguez Serrezuela, Roberto Sagaro Zamora, Daily Milanes Hermosilla, Andres Eduardo Rivera Gomez and Enrique Marañon Reyes
Biomimetics 2025, 10(12), 806; https://doi.org/10.3390/biomimetics10120806 - 3 Dec 2025
Viewed by 329
Abstract
Hand gesture classification using surface electromyography (sEMG) is fundamental for prosthetic control and human–machine interaction. However, most existing studies focus on high-density recordings or large gesture sets, leaving limited evidence on performance in low-channel, reduced-gesture configurations. This study addresses this gap by comparing [...] Read more.
Hand gesture classification using surface electromyography (sEMG) is fundamental for prosthetic control and human–machine interaction. However, most existing studies focus on high-density recordings or large gesture sets, leaving limited evidence on performance in low-channel, reduced-gesture configurations. This study addresses this gap by comparing a classical convolutional neural network (CNN), inspired by Atzori’s design, with a Convolutional Vision Transformer (CViT) tailored for compact sEMG systems. Two datasets were evaluated: a proprietary Myo-based collection (10 subjects, 8 channels, six gestures) and a subset of NinaPro DB3 (11 transradial amputees, 12 channels, same gestures). Both models were trained using standardized preprocessing, segmentation, and balanced windowing procedures. Results show that the CNN performs robustly on homogeneous signals (Myo: 94.2% accuracy) but exhibits increased variability in amputee recordings (NinaPro: 92.0%). In contrast, the CViT consistently matches or surpasses the CNN, reaching 96.6% accuracy on Myo and 94.2% on NinaPro. Statistical analyses confirm significant differences in the Myo dataset. The objective of this work is to determine whether hybrid CNN–ViT architectures provide superior robustness and generalization under low-channel sEMG conditions. Rather than proposing a new architecture, this study delivers the first systematic benchmark of CNN and CViT models across amputee and non-amputee subjects using short windows, heterogeneous signals, and identical protocols, highlighting their suitability for compact prosthetic–control systems. Full article
Show Figures

Graphical abstract

21 pages, 3387 KB  
Article
Development of an Autonomous and Interactive Robot Guide for Industrial Museum Environments Using IoT and AI Technologies
by Andrés Arteaga-Vargas, David Velásquez, Juan Pablo Giraldo-Pérez and Daniel Sanin-Villa
Sci 2025, 7(4), 175; https://doi.org/10.3390/sci7040175 - 1 Dec 2025
Viewed by 314
Abstract
This paper presents the design of an autonomous robot guide for a museum-like environment in a motorcycle assembly plant. The system integrates Industry 4.0 technologies such as artificial vision, indoor positioning, generative artificial intelligence, and cloud connectivity to enhance the visitor experience. The [...] Read more.
This paper presents the design of an autonomous robot guide for a museum-like environment in a motorcycle assembly plant. The system integrates Industry 4.0 technologies such as artificial vision, indoor positioning, generative artificial intelligence, and cloud connectivity to enhance the visitor experience. The development follows the Design Inclusive Research (DIR) methodology and the VDI 2206 standard to ensure a structured scientific and engineering process. A key innovation is the integration of mmWave sensors alongside LiDAR and RGB-D cameras, enabling reliable human detection and improved navigation safety in reflective indoor environments, as well as the deployment of an open-source large language model for natural, on-device interaction with visitors. The current results include the complete mechanical, electronic, and software architecture; simulation validation; and a preliminary implementation in the real museum environment, where the system demonstrated consistent autonomous navigation, stable performance, and effective user interaction. Full article
(This article belongs to the Section Computer Sciences, Mathematics and AI)
Show Figures

Figure 1

36 pages, 895 KB  
Review
Robotic Motion Techniques for Socially Aware Navigation: A Scoping Review
by Jesus Eduardo Hermosilla-Diaz, Ericka Janet Rechy-Ramirez and Antonio Marin-Hernandez
Future Internet 2025, 17(12), 552; https://doi.org/10.3390/fi17120552 - 1 Dec 2025
Viewed by 312
Abstract
The increasing inclusion of robots in social areas requires continuous improvement of behavioral strategies that robots must follow. Although behavioral strategies mainly focus on operational efficiency, other aspects should be considered to provide a reliable interaction in terms of sociability (e.g., methods for [...] Read more.
The increasing inclusion of robots in social areas requires continuous improvement of behavioral strategies that robots must follow. Although behavioral strategies mainly focus on operational efficiency, other aspects should be considered to provide a reliable interaction in terms of sociability (e.g., methods for detection and interpretation of human behaviors, how and where human–robot interaction is performed, and participant evaluation of robot behavior). This scoping review aims to answer seven research questions related to robotic motion in socially aware navigation, considering some aspects such as: type of robots used, characteristics, and type of sensors used to detect human behavioral cues, type of environment, and situations. Articles were collected on the ACM Digital Library, Emerald Insight, IEEE Xplore, ScienceDirect, MDPI, and SpringerLink databases. The PRISMA-ScR protocol was used to conduct the searches. Selected articles met the following inclusion criteria. They: (1) were published between January 2018 and August 2025, (2) were written in English, (3) were published in journals or conference proceedings, (4) focused on social robots, (5) addressed Socially Aware Navigation (SAN), and (6) involved the participation of volunteers in experiments. As a result, 22 studies were included; 77.27% of them employed mobile wheeled robots. Platforms using differential and omnidirectional drive systems were each used in 36.36% of the articles. 50% of the studies used a functional robot appearance, in contrast to bio-inspired appearances used in 31.80% of the cases. Among the frequency of sensors used to collect data from participants, vision-based technologies were the most used (with monocular cameras and 3D-vision systems each reported in 7 articles). Processing was mainly performed on board (50%) of the robot. A total of 59.1% of the studies were performed in real-world environments rather than simulations (36.36%), and a few studies were performed in hybrid environments (4.54%). Robot interactive behaviors were identified in different experiments: physical behaviors were present in all experiments. A few studies employed visual behaviors (2 times). In just over half of the studies (13 studies), participants were asked to provide post-experiment feedback. Full article
(This article belongs to the Special Issue Mobile Robotics and Autonomous System)
Show Figures

Figure 1

22 pages, 1821 KB  
Article
Generative AI for Video Translation: A Scalable Architecture for Multilingual Video Conferencing
by Amirkia Rafiei Oskooei, Eren Caglar, Ibrahim Şahin, Ayse Kayabay and Mehmet S. Aktas
Appl. Sci. 2025, 15(23), 12691; https://doi.org/10.3390/app152312691 - 30 Nov 2025
Viewed by 310
Abstract
The real-time deployment of cascaded generative AI pipelines for applications like video translation is constrained by significant system-level challenges. These include the cumulative latency of sequential model inference and the quadratic (O(N2)) computational complexity that renders multi-user [...] Read more.
The real-time deployment of cascaded generative AI pipelines for applications like video translation is constrained by significant system-level challenges. These include the cumulative latency of sequential model inference and the quadratic (O(N2)) computational complexity that renders multi-user video conferencing applications unscalable. This paper proposes and evaluates a practical system-level framework designed to mitigate these critical bottlenecks. The proposed architecture incorporates a turn-taking mechanism to reduce computational complexity from quadratic to linear in multi-user scenarios, and a segmented processing protocol to manage inference latency for a perceptually real-time experience. We implement a proof-of-concept pipeline and conduct a rigorous performance analysis across a multi-tiered hardware setup, including commodity (NVIDIA RTX 4060), cloud (NVIDIA T4), and enterprise (NVIDIA A100) GPUs. Our objective evaluation demonstrates that the system achieves real-time throughput (τ<1.0) on modern hardware. A subjective user study further validates the approach, showing that a predictable, initial processing delay is highly acceptable to users in exchange for a smooth, uninterrupted playback experience. The work presents a validated, end-to-end system design that offers a practical roadmap for deploying scalable, real-time generative AI applications in multilingual communication platforms. Full article
Show Figures

Figure 1

29 pages, 1924 KB  
Article
VT-MFLV: Vision–Text Multimodal Feature Learning V Network for Medical Image Segmentation
by Wenju Wang, Jiaqi Li, Zinuo Ye, Yuyang Cai, Zhen Wang and Renwei Zhang
J. Imaging 2025, 11(12), 425; https://doi.org/10.3390/jimaging11120425 - 28 Nov 2025
Viewed by 166
Abstract
Currently, existing multimodal segmentation methods face limitations in effectively leveraging medical text to guide visual feature learning. They often suffer from insufficient multimodal fusion and inadequate accuracy in fine-grained lesion segmentation accuracy. To address these challenges, the Vision–Text Multimodal Feature Learning V Network [...] Read more.
Currently, existing multimodal segmentation methods face limitations in effectively leveraging medical text to guide visual feature learning. They often suffer from insufficient multimodal fusion and inadequate accuracy in fine-grained lesion segmentation accuracy. To address these challenges, the Vision–Text Multimodal Feature Learning V Network (VT-MFLV) is proposed. This model exploits the complementarity between medical images and text to enhance multimodal fusion, which consequently improves critical lesion recognition accuracy. VT-MFLV introduces three key modules: Diagnostic Image–Text Residual Multi-Head Semantic Encoding (DIT-RMHSE) module that preserves critical semantic cues while reducing preprocessing complexity; Fine-Grained Multimodal Fusion Local Attention Encoding (FG-MFLA) module that strengthens local cross-modal interaction; and Adaptive Global Feature Compression and Focusing (AGCF) module that emphasizes clinically relevant lesion regions. Experiments are conducted on two publicly available pulmonary infection datasets. On the MosMedData dataset, VT-MFLV achieved Dice and mIoU scores of 75.61 ± 0.32% and 63.98 ± 0.29%. On the QaTa-COV1 dataset, VT-MFLV achieved Dice and mIoU scores of 83.34 ± 0.36% and 72.09 ± 0.30%, both reaching world-leading levels. Full article
Show Figures

Figure 1

19 pages, 5932 KB  
Article
FACMamba: Frequency-Aware Coupled State Space Modeling for Underwater Image Enhancement
by Li Wang, Keyong Shen, Haiyang Sun, Xiaoling Cheng, Jun Zhu and Bixuan Wang
J. Mar. Sci. Eng. 2025, 13(12), 2258; https://doi.org/10.3390/jmse13122258 - 27 Nov 2025
Viewed by 238
Abstract
Recent advances in underwater image enhancement (UIE) have achieved notable progress using deep learning techniques; however, existing methods often struggle with limited receptive fields, inadequate frequency modeling, and poor structural perception, leading to sub-optimal visual quality and weak generalization in complex underwater environments. [...] Read more.
Recent advances in underwater image enhancement (UIE) have achieved notable progress using deep learning techniques; however, existing methods often struggle with limited receptive fields, inadequate frequency modeling, and poor structural perception, leading to sub-optimal visual quality and weak generalization in complex underwater environments. To tackle these issues, we propose FACMamba, a Mamba-based framework augmented with frequency-aware mechanisms, enabling efficient modeling of long-range spatial relations for underwater image restoration. Specifically, FACMamba incorporates three key components: a Multi-Directional Vision State-Space Module (MVSM) to model directional spatial context via the proposed 8-direction selective scan block (SS8D), a Frequency-Aware Guidance Module (FAGM) for learning informative frequency representations with low overhead, and a Structure-Aware Fusion Module (SAFM) to preserve fine-grained structural cues through adaptive multi-scale integration. Recognizing the importance of spatial-frequency interaction, our model fuses these representations via lightweight architecture to enhance both texture and color fidelity. Experiments on standard UIE benchmarks demonstrate that FACMamba achieves a favorable balance between enhancement quality and computational efficiency, outperforming many existing UIE methods. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

38 pages, 8601 KB  
Article
Vision Control of a Vehicle Intended for Tourist Routes Designed for People with Special Needs
by Marcin Staniek and Ireneusz Celiński
Appl. Sci. 2025, 15(23), 12573; https://doi.org/10.3390/app152312573 - 27 Nov 2025
Viewed by 181
Abstract
Off-road vehicles, including those intended for mountain tourism, are also designed for people with special needs. These designs primarily concern the design of the drive of the vehicle, which can be manual, foot-powered, electric or a combination of these. Unusual forms of controlling [...] Read more.
Off-road vehicles, including those intended for mountain tourism, are also designed for people with special needs. These designs primarily concern the design of the drive of the vehicle, which can be manual, foot-powered, electric or a combination of these. Unusual forms of controlling these vehicles are also used, which use various parts of the body for this purpose, including the torso. In addition to using specific parts of the body to control the vehicle, an alternative is to use vision for this purpose, such as through eye tracking and similar techniques. The problem with these applications is the high prices of the devices and software used. They are mainly implemented in military solutions. The cost of these forms of vehicle control is too high; often higher than the price of the vehicle. This article presents an overview of the broad concept and technical solutions used to control various vehicles using hardware that can interact with the organ of vision. An extremely cheap prototype of this type of solution for several dozen EUR is also proposed in this article. The device uses methods based on vision techniques using the OpenCV 4_11 library. The research results in this area are presented, stating that such control is efficient. Full article
(This article belongs to the Section Transportation and Future Mobility)
Show Figures

Figure 1

34 pages, 5784 KB  
Article
Linking Megalin, Cubilin, Caveolin-1, GIPC1 and Dab2IP Expression to Ocular Tumorigenesis: Profiles in Retinoblastoma, Choroidal Melanoma, and the Normal Human Eye
by Petra Kovačević, Petar Todorović, Nela Kelam, Suzana Konjevoda, Nenad Kunac, Josipa Marin Lovrić and Katarina Vukojević
Cancers 2025, 17(23), 3785; https://doi.org/10.3390/cancers17233785 - 26 Nov 2025
Viewed by 165
Abstract
Background/Objectives: Retinoblastoma (RB) and uveal melanoma (UM) remain vision-threatening and lethal ocular malignancies with limited molecular markers of differentiation state and prognosis. We investigated whether proteins governing endocytosis and signaling, including Megalin (LRP2), Cubilin (CUBN), Caveolin-1, GAIP-interacting protein C-terminus 1 (GIPC1), and [...] Read more.
Background/Objectives: Retinoblastoma (RB) and uveal melanoma (UM) remain vision-threatening and lethal ocular malignancies with limited molecular markers of differentiation state and prognosis. We investigated whether proteins governing endocytosis and signaling, including Megalin (LRP2), Cubilin (CUBN), Caveolin-1, GAIP-interacting protein C-terminus 1 (GIPC1), and Disabled homolog 2-interacting protein (DAB2IP), exhibit subtype-specific expression patterns in ocular tumors and whether these patterns are related to transcriptomic profiles and survival. Methods: Formalin-fixed, paraffin-embedded human ocular tissues included controls (n = 10), retinoblastoma (n = 10), and UM subtypes (epithelioid, spindle, mixoid; total n = 30). Immunofluorescence for LRP2, CUBN, CAV1, GIPC1, and DAB2IP was quantified using ImageJ (version 1.54g) across standardized high-power fields; per-specimen means were used for statistical analysis (Shapiro–Wilk test; one-way ANOVA with Tukey’s post hoc test). Public data analyses comprised: (i) overall survival in TCGA-UVM using GEPIA2; (ii) differential expression in GEO datasets (GSE62075: melanocytes vs. UM cell lines; GSE208143: retinoblastoma vs. pediatric control retina) and (iii) multivariate Cox proportional hazards regression analysis using the GEPIA3 online platform. Results: LRP2 expression was uniformly reduced across retinoblastoma and all UM subtypes versus control. CUBN expression decreased in retinoblastoma and epithelioid melanoma, was retained in spindle melanoma, and increased in mixoid-cell melanoma. CAV1 expression was increased in epithelioid melanoma but reduced in retinoblastoma, mixoid, and spindle melanomas. GIPC1 and DAB2IP expression were preserved in epithelioid melanoma yet significantly reduced in retinoblastoma and mixoid/spindle melanomas. In TCGA-UVM, higher CAV1 and GIPC1 mRNA expression was associated with worse overall survival (p ≈ 0.025 and 0.036), whereas LRP2, CUBN, and DAB2IP expression were not significant. GEO analyses revealed no significant differences for the five genes in UM cell lines versus melanocytes (GSE62075). However, in retinoblastoma (GSE208143), LRP2 was downregulated, while CUBN, CAV1, GIPC1, and DAB2IP were upregulated. Conclusions: Endocytic/signaling proteins exhibit distinct, subtype-linked expression in ocular tumors. Integration with public datasets highlights CAV1 and GIPC1 as adverse survival correlates in UM and positions LRP2/CUBN/DAB2IP dysregulation as features of ocular tumor biology, nominating candidate biomarkers and mechanistic targets. Full article
(This article belongs to the Special Issue Current Progress and Research Trends in Ocular Oncology—2nd Edition)
Show Figures

Figure 1

Back to TopTop