Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (5,401)

Search Parameters:
Keywords = visual localization

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 1299 KB  
Article
Experimental Study on the Proppant Transport and Deposition Behavior of CO2 Dry Fracturing Fluid
by Quanhuai Shen, Meilong Fu, Jun Chen, Yuhao Zhu and Yuxin Bai
Processes 2026, 14(10), 1611; https://doi.org/10.3390/pr14101611 (registering DOI) - 15 May 2026
Abstract
Supercritical carbon dioxide (SC-CO2) fracturing has emerged as an environmentally friendly alternative to conventional water-based hydraulic fracturing; however, its inherently low viscosity restricts proppant-carrying efficiency and reduces fracture conductivity. To address this limitation, this study systematically investigates the rheological behavior and [...] Read more.
Supercritical carbon dioxide (SC-CO2) fracturing has emerged as an environmentally friendly alternative to conventional water-based hydraulic fracturing; however, its inherently low viscosity restricts proppant-carrying efficiency and reduces fracture conductivity. To address this limitation, this study systematically investigates the rheological behavior and sand-carrying mechanisms of CO2 dry fracturing fluid under various thermodynamic and compositional conditions. Rheological measurements were conducted to evaluate the effects of thickener concentration, temperature, and pressure on viscosity, while visualized experiments were performed to examine the influence of injection rate, sand ratio, thickener concentration, and temperature on proppant migration and deposition. A numerical model developed in Fluent was further employed to simulate the temporal evolution of proppant transport within the fracture. The results show that higher thickener concentrations and injection rates significantly enhance proppant transport distance and uniformity, whereas elevated temperature and sand ratio promote localized settling. The simulation results agree well with the experimental observations, validating the model’s reliability. This study elucidates the coupled effects of rheology and operating parameters on CO2 dry fracturing behavior and provides theoretical and experimental guidance for optimizing CO2-based fracturing fluids in low-permeability reservoirs. Full article
(This article belongs to the Section Petroleum and Low-Carbon Energy Process Engineering)
25 pages, 5573 KB  
Review
A Review of Synergistic Acoustic Mechanisms in Porous Media: Microfluidic Insights for Geo-Energy Applications
by Han Ge, Ziling Teng, Shibo Liu, Xiulei Chen and Jiawang Chen
Appl. Sci. 2026, 16(10), 4949; https://doi.org/10.3390/app16104949 (registering DOI) - 15 May 2026
Abstract
Geothermal energy extraction, hydrocarbon recovery, and CO2 geological sequestration are frequently hindered by interfacial barriers and slow mass transfer. While high-power ultrasound offers a sustainable, purely physical method for reservoir stimulation, its field effectiveness remains debated because traditional macroscopic experiments fail to [...] Read more.
Geothermal energy extraction, hydrocarbon recovery, and CO2 geological sequestration are frequently hindered by interfacial barriers and slow mass transfer. While high-power ultrasound offers a sustainable, purely physical method for reservoir stimulation, its field effectiveness remains debated because traditional macroscopic experiments fail to isolate mechanisms like acoustic streaming and cavitation. This review systematically examines acoustic mechanisms in porous media via microfluidic visualization, focusing on pore-scale fluid dynamics during enhanced oil recovery, hydrate dissociation, and CO2 sequestration. Microscopic evidence reveals that fluid transport mechanisms depend heavily on pore geometry and local acoustic intensity. In wider channels, nonlinear acoustic flow provides sustained, directed convection to strip away concentration boundary layers; in narrow throats, microjets and pulsed stresses generated by transient cavitation are responsible for physically breaking capillary barriers. The spatiotemporal synergy of these mechanisms is critical for multiphase fluid transport in tight porous networks. Pore geometry serves not only as the application context but also as a core physical variable. To translate microfluidic results into reservoir-scale applications, future research must address two-dimensional simplifications, thermodynamic discrepancies under high-temperature and high-pressure conditions, and bubble cluster interactions, alongside the development of adaptive frequency-modulated control and multiscale computational models. Full article
(This article belongs to the Section Fluid Science and Technology)
35 pages, 14993 KB  
Article
A Unified Deep Learning-Based Corridor Following with Image-Based Obstacle Avoidance for Autonomous Wheelchair Navigation
by A. H. Abdul Hafez
Mathematics 2026, 14(10), 1698; https://doi.org/10.3390/math14101698 - 15 May 2026
Abstract
Autonomous wheelchair navigation requires both reliable global guidance and safe local interaction with the environment, typically addressed using separate perception and control strategies. This paper presents a unified vision-based control framework that combines learning-based corridor following with image-based obstacle avoidance under a common [...] Read more.
Autonomous wheelchair navigation requires both reliable global guidance and safe local interaction with the environment, typically addressed using separate perception and control strategies. This paper presents a unified vision-based control framework that combines learning-based corridor following with image-based obstacle avoidance under a common visual servoing perspective. This work provides a unified interpretation of learning-based and analytical control as complementary realizations of visual servoing. A convolutional neural network (CNN) is employed to directly predict steering commands from monocular images, enabling robust corridor following without explicit feature extraction. In parallel, obstacle avoidance is formulated as an image-based visual servoing (IBVS) task, where detected obstacles are represented as image features and regulated toward safe regions. A supervisory control strategy coordinates these components by prioritizing safety-critical avoidance when necessary, while maintaining nominal navigation otherwise. The system is implemented using a single monocular camera and deployed on a low-cost embedded platform. Experimental results demonstrate that the CNN-based module maintains stable performance under challenging visual conditions, while the IBVS controller provides predictable and reliable avoidance behavior. The proposed framework highlights the complementary roles of learning-based and analytical visual servoing, offering a practical and scalable solution for assistive autonomous mobility. Full article
27 pages, 12822 KB  
Article
Positive-Guided Local Supervision for Robust Road Extraction from Remote Sensing Imagery
by Hao He, Shuyang Wang, Lei Huang, Xiaohu Fan, Yongfei Li and Dongfang Yang
Remote Sens. 2026, 18(10), 1589; https://doi.org/10.3390/rs18101589 - 15 May 2026
Abstract
Road extraction from high-resolution remote sensing imagery is fundamental to numerous practical applications, yet still faces notable challenges caused by label noise, particularly the underlabeling of rural roads within training datasets. End-to-end dense prediction networks deliver high efficiency and strong global context capture [...] Read more.
Road extraction from high-resolution remote sensing imagery is fundamental to numerous practical applications, yet still faces notable challenges caused by label noise, particularly the underlabeling of rural roads within training datasets. End-to-end dense prediction networks deliver high efficiency and strong global context capture capability, yet they are highly vulnerable to such label noise. In contrast, patch-based methods achieve better robustness but sacrifice global reasoning ability and computational efficiency. This paper proposes a novel training strategy named Positive-guided Local Supervision (PLS), which integrates the strengths of the two aforementioned paradigms. PLS preserves the full end-to-end forward pass to leverage global context, while restricting loss computation to local patches centered on reliably annotated road pixels (positive samples) via a standard dense segmentation loss. By isolating the model from misleading gradients generated in underlabeled regions, PLS effectively mitigates the negative impact of underlabeling without compromising computational efficiency and prediction quality. We evaluate the proposed PLS on two datasets: the public DeepGlobe benchmark and a newly constructed challenging dataset, namely China Four Provinces (CH4P). CH4P includes 13,498 high-resolution images of rural China, which suffers from severe underlabeling inherited from public web maps. Extensive quantitative evaluations on DeepGlobe and the newly built CH4P dataset validate that our PLS strategy surpasses conventional end-to-end baselines and competitive state-of-the-art methods under both noisy original labels and manually refined annotations. On the refined DeepGlobe-mini-test and CH4P-mini-test subsets, PLS obtains prominent absolute IoU improvements of 0.127 and 0.104 over baseline models, respectively, showing distinct superiority in handling severe real-world underlabeling. Qualitative visualizations and cross-dataset generalization tests further demonstrate that PLS can effectively retrieve road segments omitted in raw annotations, delivers strong robustness against practical label noise, and introduces no extra computational burden in the inference stage. Full article
24 pages, 16415 KB  
Article
Decoding Spatial Non-Stationarity in Coastal–Mountainous Housing Markets: A Sustainable Urban Informatics Framework Using Explainable STGCN
by Jong-Hwa Lee and Sung Jae Kim
Sustainability 2026, 18(10), 4986; https://doi.org/10.3390/su18104986 (registering DOI) - 15 May 2026
Abstract
Traditional linear models in urban informatics struggle to capture the complex, non-linear spatial non-stationarity inherent in metropolitan housing markets. To overcome these constraints, this study introduces a data-driven computational framework integrating a Spatio-Temporal Graph Convolutional Network (STGCN) with gradient-based Explainable Artificial Intelligence (XAI) [...] Read more.
Traditional linear models in urban informatics struggle to capture the complex, non-linear spatial non-stationarity inherent in metropolitan housing markets. To overcome these constraints, this study introduces a data-driven computational framework integrating a Spatio-Temporal Graph Convolutional Network (STGCN) with gradient-based Explainable Artificial Intelligence (XAI) and Geographically Weighted Regression (GWR). This framework is empirically tested using 217,598 apartment transactions in Busan, the Republic of Korea, augmented with high-resolution micro-demographic grids and Digital Elevation Model (DEM) topographical data. Utilizing unsupervised K-Means clustering, the region is spatially stratified into a dense Urban Core and a dispersed Suburban Periphery. The STGCN demonstrates overwhelming predictive superiority (R2=0.802) over the traditional Spatial Error Model (R2=0.437). Crucially, gradient-based XAI and localized GWR coefficients successfully unspool the deep learning “black box,” visualizing hyper-localized economic realities that global linear models obscure. The analysis expose stark regional market segmentation driven by environmental topography, mathematically quantifying non-linear dynamics such as coastal high-floor premiums, severe mountainous altitude penalties, and latent urban reconstruction premiums. Ultimately, this research bridges the gap between predictive computational power and spatial economic interpretability, offering a robust informatics framework for equitable urban planning. Full article
(This article belongs to the Section Sustainable Urban and Rural Development)
Show Figures

Figure 1

17 pages, 28992 KB  
Article
Object Recognition-Based Grasping with a Soft Modular Gripper
by Yu Zhang, Fengwen Zhang, Zhihui Guo, Lingkai Luan, Dongbao Sui, Tianshuo Wang, Jiangyu Zhou, Fuyue Zhang, Chen Chen, Dongjie Li and Bo You
Biomimetics 2026, 11(5), 347; https://doi.org/10.3390/biomimetics11050347 - 15 May 2026
Abstract
Soft modular grippers play a significant role in multiple fields due to their excellent adaptability and flexibility. This paper proposes a modular soft modular gripper driven by pneumatically actuated multi-chambers. The designed soft modular gripper features three operational modes, with its modular fingers [...] Read more.
Soft modular grippers play a significant role in multiple fields due to their excellent adaptability and flexibility. This paper proposes a modular soft modular gripper driven by pneumatically actuated multi-chambers. The designed soft modular gripper features three operational modes, with its modular fingers employing independently controlled dual chambers. The distal and proximal dual-chamber structure enhances the fingertip force of the modular fingers. Based on classical laminated plate theory and incorporating the large deformation characteristics of soft materials, a relationship between the bending centerline of the fingers and the driving pressure is established, providing a theoretical foundation for grasping tasks executed by the soft modular gripper. The Denavit-Hartenberg (D-H) parameter method is utilized to develop the coordinate system of the soft modular gripper, thereby defining its operational workspace. Visual sensing technology is introduced, incorporating improvements to the YOLOv8-based object recognition and localization framework, which enhances recognition accuracy for target objects and ensures grasping stability. Full article
(This article belongs to the Section Locomotion and Bioinspired Robotics)
Show Figures

Figure 1

24 pages, 17355 KB  
Article
A Deep Feature Approach to Visual Similarity Analysis of Ethnic Brocades in Southwest China
by Quan Li, Huaxing Lu, Shichen Liu, Dengwei Sun and Biao Zhang
Appl. Sci. 2026, 16(10), 4928; https://doi.org/10.3390/app16104928 - 15 May 2026
Abstract
Visual similarity analysis of ethnic brocades is valuable for image retrieval, style comparison, and digital archiving in cultural heritage informatics. However, although deep neural networks provide powerful visual representations, their encoded similarity structures are often difficult to interpret. This study presents an interpretable [...] Read more.
Visual similarity analysis of ethnic brocades is valuable for image retrieval, style comparison, and digital archiving in cultural heritage informatics. However, although deep neural networks provide powerful visual representations, their encoded similarity structures are often difficult to interpret. This study presents an interpretable deep feature framework for analyzing inter-ethnic visual similarity in brocade images from ten minority groups in Southwest China. Four convolutional neural network backbones, including AlexNet, VGG-16, ResNet-18, and an SE-enhanced ResNet-18 (SResNet-18), were first evaluated to identify a reliable feature extractor. The best-performing model was then used to construct deep feature-based similarity and distance relationships among ethnic categories. To interpret this structure, five handcrafted descriptor types, namely color, texture, geometric, local-structure, and frequency-domain features, were compared with the deep feature similarity matrix using Spearman correlation analysis and weighted descriptor fusion. Experimental results showed that SResNet-18 achieved the best classification performance, with an accuracy of 95.15% and an F1-score of 95.14%. Among the handcrafted descriptors, color showed the strongest correspondence with the RGB-based deep similarity structure (r=0.643), followed by local-structure descriptors (r=0.416), whereas classical texture descriptors showed near-zero correspondence (r=0.063). The optimal weighted fusion further improved the correlation to r=0.731. These findings suggest that the SResNet-18 feature space is more strongly associated with color composition and local motif organization than with the specific grayscale texture, global geometric, or frequency-domain descriptors used in this study. The proposed framework provides an interpretable approach for understanding deep visual similarity in cultural heritage images and offers methodological support for pattern-based retrieval, comparative style analysis, and digital documentation. Full article
Show Figures

Figure 1

35 pages, 11720 KB  
Article
Effects of Street-Level Visual Perception on Different Types of Leisure Activity Intensity in Waterfront Spaces: A Case Study of the Core Section of the Pearl River, Guangzhou
by Yudan Pan, Yang Chen and Jin Cao
Land 2026, 15(5), 849; https://doi.org/10.3390/land15050849 (registering DOI) - 15 May 2026
Abstract
As urban waterfront public spaces have increasingly become important settings for residents’ daily leisure activities, there remains a lack of empirical evidence based on objective image data regarding how street-level visual environments influence different types of leisure activities. The existing studies have largely [...] Read more.
As urban waterfront public spaces have increasingly become important settings for residents’ daily leisure activities, there remains a lack of empirical evidence based on objective image data regarding how street-level visual environments influence different types of leisure activities. The existing studies have largely relied on macro-scale built environment indicators and paid limited attention to micro-scale visual perception from the pedestrian perspective. To address this gap, this study focuses on the core waterfront section of the Pearl River in Guangzhou. Behavioral observations were conducted across nine spatial units during different time periods on weekdays and weekends, yielding 54 samples of passive, active, and social activity intensity. Meanwhile, 109 street-view sampling points were established, generating 436 pedestrian-view images. Using Mask2Former with an ADE20K pre-trained model, visual environmental indicators—including the Green View Index (GVI), Sky View Index (SVI), built environment proportion, road proportion, and visual diversity (Entropy)—were extracted. Spearman correlation and multiple linear regression were applied to examine their effects on activity intensity. The results show that leisure activities are generally more active in the evening and on weekends, with social activities exhibiting the strongest temporal variation. Active activities remain relatively stable, passive activities show temporal dependence, and social activities display localized high-intensity clustering. Regression results reveal differentiated environmental responses: visual diversity has a stable positive effect on passive activities, active activities show weak associations with visual variables, and social activities are the most sensitive, with GVI, SVI, and built proportion showing significant negative effects, while visual diversity shows a significant positive effect. The social activity model also demonstrates the highest explanatory power (Adj. R2 = 0.488). Overall, this study develops a street-view semantic segmentation-based method for quantifying waterfront visual environments, demonstrates the critical role of visual environmental composition in shaping activity patterns, and provides empirical support for the fine-grained and activity-oriented optimization of waterfront public spaces. Full article
Show Figures

Figure 1

18 pages, 1439 KB  
Article
Unsupervised Segmentation of Wear Surface Defects in Hydroturbine Bearing Pads Guided by Local Anomaly Scores
by Xiaolong Yang, Jingxuan Han, Gang Wan, Fengdi Zhu, Chuangji Qin, Ning Xu and Shuo Wang
Lubricants 2026, 14(5), 202; https://doi.org/10.3390/lubricants14050202 - 14 May 2026
Abstract
Vision-based defect detection on bearing-pad wear surfaces is essential for quantifying damage geometry and assessing condition in hydroturbine units. Compared with 2D color images, depth images can suppress disturbances caused by complex textures, surface color variations, and specular reflections, thereby providing a more [...] Read more.
Vision-based defect detection on bearing-pad wear surfaces is essential for quantifying damage geometry and assessing condition in hydroturbine units. Compared with 2D color images, depth images can suppress disturbances caused by complex textures, surface color variations, and specular reflections, thereby providing a more reliable basis for precise damage localization. Nevertheless, depth-based damage segmentation under a large field of view remains challenging, mainly due to fine-scale texture noise and weak defect saliency; moreover, robust defect probability estimation is often hindered by limited labeled data. To address these challenges, this paper proposes an unsupervised defect segmentation framework for hydroturbine friction components guided by local anomaly score distributions. First, a salient damage detection module is developed based on topography–texture separation, which mitigates the interference of local micro-texture noise on defect segmentation. Then, a normal reference dataset is constructed using defect-free bearing-pad depth images, and an unsupervised network is employed as the core to generate anomaly score representations of potential damage regions for coarse localization. Finally, the obtained anomaly score distribution is used as adaptive weights to fuse depth-based defect cues with morphological processing, enabling self-adaptive refinement of the damage regions. Experiments on real depth images acquired from hydroturbine bearing pads demonstrate that the proposed method achieves accurate defect extraction and reliable geometric quantification. Quantitative evaluations on the testing set yield a mean surface area error of 9.39% ± 4.25% and a volume error of 4.91% ± 2.85%, with best-case errors dropping as low as 3.67% and 1.03%, respectively. Crucially, these results demonstrate that our framework goes beyond mere visual detection; by operating entirely without pixel-level annotations, it offers a highly practical tool for diagnosing specific lubrication failure modes and driving predictive maintenance in actual hydroturbine engineering. Full article
(This article belongs to the Special Issue Advanced Methods for Wear Monitoring)
37 pages, 7759 KB  
Article
Research on Visual Recognition and Harvesting Point Localization System for Grape-Picking Robots in Smart Agriculture
by Tao Lin, Qiurong Lv, Fuchun Sun, Wei Ma and Xiaoxiao Li
Agriculture 2026, 16(10), 1073; https://doi.org/10.3390/agriculture16101073 - 14 May 2026
Abstract
To improve grape target perception and picking-point positioning for intelligent harvesting robots, this study develops a vision-based method for orchard grape detection and harvesting-point localization. The method is intended to address missed detections, insufficient recognition accuracy, and unsatisfactory peduncle segmentation caused by illumination [...] Read more.
To improve grape target perception and picking-point positioning for intelligent harvesting robots, this study develops a vision-based method for orchard grape detection and harvesting-point localization. The method is intended to address missed detections, insufficient recognition accuracy, and unsatisfactory peduncle segmentation caused by illumination variation, occlusion, and interference from branches and leaves in complex orchard scenes. For grape cluster and peduncle detection, a lightweight YOLOv7-derived model, termed YOLO-FES, was established. In this model, FasterNet and SCConv were introduced to refine the backbone and neck structures, and the EMA mechanism was incorporated to lower parameter complexity and computational cost while improving detection performance. For suspended grape structure association and peduncle extraction, the GJK algorithm was combined with nearest-neighbor rectangular discrimination, and an improved YOLACT-based peduncle segmentation network, named M-YOLACT, was constructed. With the integration of the MLCA mechanism and the Mish activation function, accurate peduncle segmentation was achieved. In addition, a stereo depth camera was employed to obtain two-dimensional picking-point information and further recover the corresponding three-dimensional spatial coordinates. Experimental results showed that the mAP@0.5 of YOLO-FES for grape clusters and peduncles reached 95.37%. For grape peduncle segmentation, the mAP@0.5 values of the bounding boxes and masks produced by M-YOLACT reached 95.73% and 94.36%, respectively. The proposed method achieved an overall harvesting success rate of 89.2%, with an average time consumption of 11 s for a single harvesting operation. By integrating deep-learning-based detection and segmentation with binocular-vision localization, this study provides a practical technical solution and useful reference for the visual system design of grape-harvesting robots. Full article
28 pages, 125254 KB  
Article
Bridging Image-Based Detection and Field Evaluation: A Semi-Automated Pavement Distress Assessment Framework
by Betül Değer Şitilbay and Mehmet Ozan Yılmaz
Sustainability 2026, 18(10), 4935; https://doi.org/10.3390/su18104935 - 14 May 2026
Abstract
Accurate, rapid, and consistent evaluation of pavement condition across large-scale road networks is critical for sustainable maintenance and rehabilitation planning. However, conventional approaches largely rely on manual visual inspections, which are time-consuming, subjective, and difficult to implement at the network level. In this [...] Read more.
Accurate, rapid, and consistent evaluation of pavement condition across large-scale road networks is critical for sustainable maintenance and rehabilitation planning. However, conventional approaches largely rely on manual visual inspections, which are time-consuming, subjective, and difficult to implement at the network level. In this study, a semi-automated pavement distress evaluation framework that integrates field-based assessment with computer vision techniques is proposed. The study was conducted on a 3 km roadway network located within the Yıldız Technical University Davutpaşa Campus. Field-based distress observations were used as reference data, while street-level images obtained from the Mapillary platform were analyzed using a deep learning-based YOLOv8 model trained on the RDD2022 dataset, which was specifically developed for road distress detection. The analysis focuses on crack and pothole distress, which have a dominant influence on PCR and are highly distinguishable in image-based approaches. Correlation analyses between automated detection results and field-based data demonstrate a strong agreement, reaching values of approximately ρ0.90 in some routes. These findings indicate that these distress types are effective in representing variations in pavement condition. The results demonstrate that multi-source image data and deep learning-based detection methods can be reliably used for section-level pavement condition assessment. The proposed approach addresses a key gap in the literature by transforming image-level detections into engineering-based decision-support information. Furthermore, by leveraging publicly available data sources, the framework offers a low-cost and scalable solution that enables rapid preliminary assessment over large road networks, thereby providing significant potential for sustainable infrastructure management and the development of data-driven maintenance strategies. Several practical challenges encountered during the detection process—including sensitivity to contrast enhancement parameters, false positives from shadows and surface reflections, heterogeneous image resolution across crowdsourced imagery, and training distribution gaps for locally prevalent infrastructure features—are discussed, and directions for reducing human intervention through adaptive preprocessing and targeted model refinement are identified. Full article
Show Figures

Figure 1

20 pages, 1704 KB  
Article
Digital Twin-Driven Trajectory and Resource Optimization for UAV Swarms in Low-Altitude Urban Logistics and Communication Environments
by Hanyang Tong, Ziyang Song, Zhenyan Zhu and Jinlong Sun
Drones 2026, 10(5), 376; https://doi.org/10.3390/drones10050376 - 14 May 2026
Abstract
Unmanned aerial vehicles (UAVs) serve as both communication relays and aerial couriers in modern urban logistics networks. Conventional trajectory optimization methods assume perfect localization and isotropic free-space tracking signal propagation, which limits their effectiveness in urban canyons. To address the positional uncertainty and [...] Read more.
Unmanned aerial vehicles (UAVs) serve as both communication relays and aerial couriers in modern urban logistics networks. Conventional trajectory optimization methods assume perfect localization and isotropic free-space tracking signal propagation, which limits their effectiveness in urban canyons. To address the positional uncertainty and signal blockage from buildings, we propose a digital twin-driven framework for continuous trajectory and resource optimization in UAV swarms. We model an urban environment containing random high-rise structures, applying a non-line-of-sight (NLoS) uncertainty to reflect realistic communication degradation. The digital twin (DT) architecture utilizes a dual-layer spatial representation that captures a dynamically decaying positional uncertainty radius of the recipient. We define a strict visual localization boundary that initiates deterministic target tracking with a state transition mechanism. To manage the complexity of swarm routing, we apply Density-Based Spatial Clustering of Applications with Noise (DBSCAN), assigning one UAV courier and one logistics transfer station to each cluster. The system executes a continuous re-optimization loop using an adaptive multi-objective Genetic Algorithm. This framework jointly minimizes cumulative outage probability and total flight time while enforcing a signal-to-noise ratio threshold and throughput constraints. This continuous adaptation mechanism mitigates NLoS blockage risks, supporting reliable communication and efficient delivery in Global Navigation Satellite System (GNSS)-degraded and obstacle-dense urban environments. Full article
(This article belongs to the Section Innovative Urban Mobility)
Show Figures

Figure 1

20 pages, 606 KB  
Article
Retrieval-Guided and Semantically Grounded Image Captioning for Open-Domain Scenes
by Shanshan Lin, Xiaoxuan Xie, Zexian Yang and Chao Chen
Mathematics 2026, 14(10), 1667; https://doi.org/10.3390/math14101667 - 13 May 2026
Viewed by 64
Abstract
Recent image captioning methods based on pre-trained vision–language models can generate fluent and coherent descriptions, yet they still struggle in open-domain scenes that contain long-tail concepts, uncommon object combinations, and ambiguous visual evidence. Two limitations are especially important. First, the knowledge needed to [...] Read more.
Recent image captioning methods based on pre-trained vision–language models can generate fluent and coherent descriptions, yet they still struggle in open-domain scenes that contain long-tail concepts, uncommon object combinations, and ambiguous visual evidence. Two limitations are especially important. First, the knowledge needed to recognize and name rare or domain-specific entities is only weakly represented in model parameters, causing captions to be generic, incomplete, or biased toward frequent concepts. Second, token generation is typically grounded mainly by local visual matching, making it sensitive to clutter, occlusion, and visually similar distractors, and therefore prone to attribute errors, relation confusion, and object hallucination. To address these issues, we propose R2G (retrieval- and grounding-guided captioning), a lightweight plug-in framework for frozen image captioning backbones. R2G consists of two complementary components. The first, retrieval-guided visual prompting, retrieves image-relevant concepts from an external visual concept memory, converts them into a continuous prompt representation, and injects this representation into selected layers of the visual encoder, so that external semantic information can influence visual feature formation before decoding begins. The second, global–local semantic grounding, derives a global semantic prior from an auxiliary vision–language encoder and adaptively fuses it with token-level local visual evidence through a decoder-state-dependent gating mechanism, thereby improving semantic stability while preserving fine-grained visual support. The resulting framework is lightweight, compatible with frozen pre-trained backbones, and designed to improve both concept coverage and semantic faithfulness. Experimental results on MS-COCO and NoCaps show that R2G consistently improves caption quality over the baseline and yields particularly clear gains in open-domain and out-of-domain settings. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

23 pages, 5936 KB  
Article
Siamese-ViT: A Local–Global Feature Fusion Method for Real-Time Visual Navigation of UAVs in Real-World Environments
by Yu Cheng, Xixiang Liu, Shuai Chen and Chuan Xu
Remote Sens. 2026, 18(10), 1556; https://doi.org/10.3390/rs18101556 - 13 May 2026
Viewed by 11
Abstract
Visual scene matching navigation (VSMN) for unmanned aerial vehicles (UAVs) boasts advantages such as high precision, high reliability, and autonomy. The biggest challenge lies in the tension between local fine-grained information and global semantics, as well as limited generalization ability in real-world environments. [...] Read more.
Visual scene matching navigation (VSMN) for unmanned aerial vehicles (UAVs) boasts advantages such as high precision, high reliability, and autonomy. The biggest challenge lies in the tension between local fine-grained information and global semantics, as well as limited generalization ability in real-world environments. While existing Transformer-based cross-view geolocation methods enhance global context modeling capabilities, they still generally face issues such as high demands on training data and computational resources, insufficient fusion of local fine-grained information and global semantics, and real-time performance in real-world complex environment. To address these problems, we propose a scene matching and localization algorithm based on the Siamese-ViT. For feature extraction, we use the ViT model to extract global features and K-means clustering to aggregate local features. Combined with the global features extracted by the ViT, a robust local–global feature representation vector is generated. For feature matching, incremental principal component analysis (IPCA) is used to reduce the dimensionality of the high-dimensional feature space, and a KD-tree is constructed for fast feature retrieval to improve matching efficiency. We validated our algorithm on the University-1652 dataset and a dataset of real-world satellite-drone image pairs. The results show that our Siamese-ViT outperforms other models in both Recall and AP. We conduct flight experiments in real-world environments, capturing drone images of complex scenes, including farmland, urban buildings, and waterways. The results show that, at a flight altitude of 350 m, our algorithm achieves an average absolute value of 6.2063 m for latitude, 6.7552 m for longitude, and 10.1922 m for horizontal error. Therefore, our Siamese-ViT demonstrates ideal overall positioning accuracy. Full article
33 pages, 1423 KB  
Review
Non-Prosthetic Assistive Technologies for Persons with Hearing Losses: A Survey
by Reemas Alsubaiei, Farah AlHayek, Mariam Alsahhaf, Ghadah Alajmi, Aliah Almutairi, Karim Youssef, Ghina El Mir, Sherif Said, Taha Beyrouthy and Samer Al Kork
Technologies 2026, 14(5), 302; https://doi.org/10.3390/technologies14050302 - 13 May 2026
Viewed by 20
Abstract
Millions of persons worldwide experience varying degrees of hearing loss, traditionally addressed through prosthetic solutions such as hearing aids and cochlear implants. However, a significant proportion of individuals cannot benefit from these technologies, cannot access them, or choose not to use them. In [...] Read more.
Millions of persons worldwide experience varying degrees of hearing loss, traditionally addressed through prosthetic solutions such as hearing aids and cochlear implants. However, a significant proportion of individuals cannot benefit from these technologies, cannot access them, or choose not to use them. In this context, non-prosthetic assistive technologies have emerged as a complementary paradigm, leveraging advances in sensing, artificial intelligence, and wearable computing to transform acoustic information into alternative perceptual representations rather than restoring auditory function. This survey provides a review of such systems, focusing on technologies that enhance environmental awareness, communication, and social interaction. Existing approaches are categorized along two main dimensions: the tasks they perform and the platforms on which they operate. Task-oriented analysis includes sound recognition (speech and non-speech), sound source localization, emotion recognition, sign language recognition, and related emerging functionalities. Platform-based analysis emphasizes wearable devices and mobile solutions enabling real-time and context-aware assistance. The survey further highlights key research trends, including real-time auditory scene analysis, portable processing, and artificial intelligence. It shows that recent studies increasingly demonstrate that combining auditory, visual, and haptic modalities improves robustness and usability in real-world conditions, particularly in noisy and dynamic environments. Finally, open challenges such as energy efficiency, latency, evaluation methodologies, and user acceptance are discussed. By synthesizing existing work and identifying open research directions, this survey aims to provide a structured foundation for future developments in intelligent, non-prosthetic assistive systems that redefine how auditory information is accessed and interpreted. Full article
(This article belongs to the Section Assistive Technologies)
Show Figures

Figure 1

Back to TopTop