Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (11)

Search Parameters:
Keywords = view-invariant geometric descriptor

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 4323 KB  
Article
Render-Rank-Refine: Accurate 6D Indoor Localization via Circular Rendering
by Haya Monawwar and Guoliang Fan
J. Imaging 2026, 12(1), 10; https://doi.org/10.3390/jimaging12010010 - 25 Dec 2025
Abstract
Accurate six-degree-of-freedom (6-DoF) camera pose estimation is essential for augmented reality, robotics navigation, and indoor mapping. Existing pipelines often depend on detailed floorplans, strict Manhattan-world priors, and dense structural annotations, which lead to failures in ambiguous room layouts where multiple rooms appear in [...] Read more.
Accurate six-degree-of-freedom (6-DoF) camera pose estimation is essential for augmented reality, robotics navigation, and indoor mapping. Existing pipelines often depend on detailed floorplans, strict Manhattan-world priors, and dense structural annotations, which lead to failures in ambiguous room layouts where multiple rooms appear in a query image and their boundaries may overlap or be partially occluded. We present Render-Rank-Refine, a two-stage framework operating on coarse semantic meshes without requiring textured models or per-scene fine-tuning. First, panoramas rendered from the mesh enable global retrieval of coarse pose hypotheses. Then, perspective views from the top-k candidates are compared to the query via rotation-invariant circular descriptors, which re-ranks the matches before final translation and rotation refinement. Our method increases camera localization accuracy compared to the state-of-the-art SPVLoc baseline by reducing the translation error by 40.4% and the rotation error by 29.7% in ambiguous layouts, as evaluated on the Zillow Indoor Dataset. In terms of inference throughput, our method achieves 25.8–26.4 QPS, (Queries Per Second) which is significantly faster than other recent comparable methods, while maintaining accuracy comparable to or better than the SPVLoc baseline. These results demonstrate robust, near-real-time indoor localization that overcomes structural ambiguities and heavy geometric assumptions. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

16 pages, 2127 KB  
Article
VIPS: Learning-View-Invariant Feature for Person Search
by Hexu Wang, Wenlong Luo, Wei Wu, Fei Xie, Jindong Liu, Jing Li and Shizhou Zhang
Sensors 2025, 25(17), 5362; https://doi.org/10.3390/s25175362 - 29 Aug 2025
Viewed by 723
Abstract
Unmanned aerial vehicles (UAVs) have become indispensable tools for surveillance, enabled by their ability to capture multi-perspective imagery in dynamic environments. Among critical UAV-based tasks, cross-platform person search—detecting and identifying individuals across distributed camera networks—presents unique challenges. Severe viewpoint variations, occlusions, and cluttered [...] Read more.
Unmanned aerial vehicles (UAVs) have become indispensable tools for surveillance, enabled by their ability to capture multi-perspective imagery in dynamic environments. Among critical UAV-based tasks, cross-platform person search—detecting and identifying individuals across distributed camera networks—presents unique challenges. Severe viewpoint variations, occlusions, and cluttered backgrounds in UAV-captured data degrade the performance of conventional discriminative models, which struggle to maintain robustness under such geometric and semantic disparities. To address this, we propose view-invariant person search (VIPS), a novel two-stage framework combining Faster R-CNN with a view-invariant re-Identification (VIReID) module. Unlike conventional discriminative models, VIPS leverages the semantic flexibility of large vision–language models (VLMs) and adopts a two-stage training strategy to decouple and align text-based ID descriptors and visual features, enabling robust cross-view matching through shared semantic embeddings. To mitigate noise from occlusions and cluttered UAV-captured backgrounds, we introduce a learnable mask generator for feature purification. Furthermore, drawing from vision–language models, we design view prompts to explicitly encode perspective shifts into feature representations, enhancing adaptability to UAV-induced viewpoint changes. Extensive experiments on benchmark datasets demonstrate state-of-the-art performance, with ablation studies validating the efficacy of each component. Beyond technical advancements, this work highlights the potential of VLM-derived semantic alignment for UAV applications, offering insights for future research in real-time UAV-based surveillance systems. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

25 pages, 24232 KB  
Article
Topology-Aware Multi-View Street Scene Image Matching for Cross-Daylight Conditions Integrating Geometric Constraints and Semantic Consistency
by Haiqing He, Wenbo Xiong, Fuyang Zhou, Zile He, Tao Zhang and Zhiyuan Sheng
ISPRS Int. J. Geo-Inf. 2025, 14(6), 212; https://doi.org/10.3390/ijgi14060212 - 29 May 2025
Cited by 3 | Viewed by 1091
Abstract
While deep learning-based image matching methods excel at extracting high-level semantic features from remote sensing data, their performance degrades significantly under cross-daylight conditions and wide-baseline geometric distortions, particularly in multi-source street-view scenarios. This paper presents a novel illumination-invariant framework that synergistically integrates geometric [...] Read more.
While deep learning-based image matching methods excel at extracting high-level semantic features from remote sensing data, their performance degrades significantly under cross-daylight conditions and wide-baseline geometric distortions, particularly in multi-source street-view scenarios. This paper presents a novel illumination-invariant framework that synergistically integrates geometric topology and semantic consistency to achieve robust multi-view matching for cross-daylight urban perception. We first design a self-supervised learning paradigm to extract illumination-agnostic features by jointly optimizing local descriptors and global geometric structures across multi-view images. To address extreme perspective variations, a homography-aware transformation module is introduced to stabilize feature representation under large viewpoint changes. Leveraging a graph neural network with hierarchical attention mechanisms, our method dynamically aggregates contextual information from both local keypoints and semantic topology graphs, enabling precise matching in occluded regions and repetitive-textured urban scenes. A dual-branch learning strategy further refines similarity metrics through supervised patch alignment and unsupervised spatial consistency constraints derived from Delaunay triangulation. Finally, a topology-guided multi-plane expansion mechanism propagates initial matches by exploiting the inherent structural regularity of street scenes, effectively suppressing mismatches while expanding coverage. Extensive experiments demonstrate that our framework outperforms state-of-the-art methods, achieving a 6.4% improvement in matching accuracy and a 30.5% reduction in mismatches under cross-daylight conditions. These advancements establish a new benchmark for reliable multi-source image retrieval and localization in dynamic urban environments, with direct applications in autonomous driving systems and large-scale 3D city reconstruction. Full article
Show Figures

Figure 1

19 pages, 3011 KB  
Article
A Novel Object-Level Building-Matching Method across 2D Images and 3D Point Clouds Based on the Signed Distance Descriptor (SDD)
by Chunhui Zhao, Wenxuan Wang, Yiming Yan, Nan Su, Shou Feng, Wei Hou and Qingyu Xia
Remote Sens. 2023, 15(12), 2974; https://doi.org/10.3390/rs15122974 - 7 Jun 2023
Cited by 1 | Viewed by 2595
Abstract
In this work, a novel object-level building-matching method using cross-dimensional data, including 2D images and 3D point clouds, is proposed. The core of this method is a newly proposed plug-and-play Joint Descriptor Extraction Module (JDEM) that is used to extract descriptors containing buildings’ [...] Read more.
In this work, a novel object-level building-matching method using cross-dimensional data, including 2D images and 3D point clouds, is proposed. The core of this method is a newly proposed plug-and-play Joint Descriptor Extraction Module (JDEM) that is used to extract descriptors containing buildings’ three-dimensional shape information from object-level remote sensing data of different dimensions for matching. The descriptor is named Signed Distance Descriptor (SDD). Due to differences in the inherent properties of different dimensional data, it is challenging to match buildings’ 2D images and 3D point clouds on the object level. In addition, features extracted from the same building in images taken at different angles are usually not exactly identical, which will also affect the accuracy of cross-dimensional matching. Therefore, the question of how to extract accurate, effective, and robust joint descriptors is key to cross-dimensional matching. Our JDEM maps different dimensions of data to the same 3D descriptor SDD space through the 3D geometric invariance of buildings. In addition, Multi-View Adaptive Loss (MAL), proposed in this paper, aims to improve the adaptability of the image encoder module to images with different angles and enhance the robustness of the joint descriptors. Moreover, a cross-dimensional object-level data set was created to verify the effectiveness of our method. The data set contains multi-angle optical images, point clouds, and the corresponding 3D models of more than 400 buildings. A large number of experimental results show that our object-level cross-dimensional matching method achieves state-of-the-art outcomes. Full article
Show Figures

Figure 1

17 pages, 6416 KB  
Article
A Pseudoinverse Siamese Convolutional Neural Network of Transformation Invariance Feature Detection and Description for a SLAM System
by Chaofeng Yuan, Yuelei Xu, Jingjing Yang, Zhaoxiang Zhang and Qing Zhou
Machines 2022, 10(11), 1070; https://doi.org/10.3390/machines10111070 - 12 Nov 2022
Cited by 3 | Viewed by 2090
Abstract
Simultaneous localization and mapping (SLAM) systems play an important role in the field of automated robotics and artificial intelligence. Feature detection and matching are crucial aspects affecting the overall accuracy of the SLAM system. However, the accuracy of the position and matching cannot [...] Read more.
Simultaneous localization and mapping (SLAM) systems play an important role in the field of automated robotics and artificial intelligence. Feature detection and matching are crucial aspects affecting the overall accuracy of the SLAM system. However, the accuracy of the position and matching cannot be guaranteed when confronted with a cross-view angle, illumination, texture, etc. Moreover, deep learning methods are very sensitive to perspective change and do not have the invariance of geometric transformation. Therefore, a novel pseudo-Siamese convolutional network of a transformation invariance feature detection and a description for the SLAM system is proposed in this paper. The proposed method, by learning transformation invariance features and descriptors, simultaneously improves the front-end landmark detection and tracking module of the SLAM system. We converted the input image to the transform field; the backbone network was designed to extract feature maps. Then, the feature detection subnetwork and feature description subnetwork were decomposed and designed; finally, we constructed a convolutional network of transformation invariance feature detections and a description for the visual SLAM system. We implemented many experiments in datasets, and the results of the experiments demonstrated that our method has a state-of-the-art performance in global tracking when compared to that of the traditional visual SLAM systems. Full article
(This article belongs to the Section Robotics, Mechatronics and Intelligent Machines)
Show Figures

Figure 1

25 pages, 16837 KB  
Article
Automatic Registration for Panoramic Images and Mobile LiDAR Data Based on Phase Hybrid Geometry Index Features
by Genyi Wan, Yong Wang, Tao Wang, Ningning Zhu, Ruizhuo Zhang and Ruofei Zhong
Remote Sens. 2022, 14(19), 4783; https://doi.org/10.3390/rs14194783 - 24 Sep 2022
Cited by 6 | Viewed by 3701
Abstract
The registration of panoramic images and mobile light detection and ranging (LiDAR) data is quite challenging because different imaging mechanisms and viewing angle differences generate significant geometric and radiation distortions between the two multimodal data sources. To address this problem, we propose a [...] Read more.
The registration of panoramic images and mobile light detection and ranging (LiDAR) data is quite challenging because different imaging mechanisms and viewing angle differences generate significant geometric and radiation distortions between the two multimodal data sources. To address this problem, we propose a registration method for panoramic images and mobile LiDAR data based on the hybrid geometric structure index feature of phase. We use the initial GPS/IMU to transform the mobile LiDAR data into an intensity map and align the two images to complete registration. Firstly, a novel feature descriptor called a hybrid geometric structure index of phase (HGIFP) is built to capture the structural information of the images. Then, a set of corresponding feature points is obtained from the two images using the constructed feature descriptor combined with a robust false-match elimination algorithm. The average pixel distance of the corresponding feature points is used as the error function. Finally, in order to complete the accurate registration of the mobile LiDAR data and panoramic images and improve computational efficiency, we propose the assumption of local motion invariance of 3D–2D corresponding feature points and minimize the error function through multiple reprojections to achieve the best registration parameters. The experimental results show that the method in this paper can complete the registration of panoramic images and the mobile LiDAR data under a rotation error within 12° and a translation error within 2 m. After registration, the average error of rotation is about 0.15°, and the average error of translation is about 1.27 cm. Moreover, it achieves a registration accuracy of less than 3 pixels in all cases, which outperforms the current five state-of-the-art methods, demonstrating its superior registration performance. Full article
Show Figures

Graphical abstract

20 pages, 10652 KB  
Article
An Efficient Point-Matching Method Based on Multiple Geometrical Hypotheses
by Miguel Carrasco, Domingo Mery, Andrés Concha, Ramiro Velázquez, Roberto De Fazio and Paolo Visconti
Electronics 2021, 10(3), 246; https://doi.org/10.3390/electronics10030246 - 22 Jan 2021
Cited by 4 | Viewed by 4095
Abstract
Point matching in multiple images is an open problem in computer vision because of the numerous geometric transformations and photometric conditions that a pixel or point might exhibit in the set of images. Over the last two decades, different techniques have been proposed [...] Read more.
Point matching in multiple images is an open problem in computer vision because of the numerous geometric transformations and photometric conditions that a pixel or point might exhibit in the set of images. Over the last two decades, different techniques have been proposed to address this problem. The most relevant are those that explore the analysis of invariant features. Nonetheless, their main limitation is that invariant analysis all alone cannot reduce false alarms. This paper introduces an efficient point-matching method for two and three views, based on the combined use of two techniques: (1) the correspondence analysis extracted from the similarity of invariant features and (2) the integration of multiple partial solutions obtained from 2D and 3D geometry. The main strength and novelty of this method is the determination of the point-to-point geometric correspondence through the intersection of multiple geometrical hypotheses weighted by the maximum likelihood estimation sample consensus (MLESAC) algorithm. The proposal not only extends the methods based on invariant descriptors but also generalizes the correspondence problem to a perspective projection model in multiple views. The developed method has been evaluated on three types of image sequences: outdoor, indoor, and industrial. Our developed strategy discards most of the wrong matches and achieves remarkable F-scores of 97%, 87%, and 97% for the outdoor, indoor, and industrial sequences, respectively. Full article
(This article belongs to the Special Issue Applications of Computer Vision)
Show Figures

Figure 1

17 pages, 6626 KB  
Article
Template Matching for Wide-Baseline Panoramic Images from a Vehicle-Borne Multi-Camera Rig
by Shunping Ji, Dawen Yu, Yong Hong and Meng Lu
ISPRS Int. J. Geo-Inf. 2018, 7(7), 236; https://doi.org/10.3390/ijgi7070236 - 21 Jun 2018
Cited by 2 | Viewed by 4443
Abstract
Automatic detection and locating of objects such as poles, traffic signs, and building corners in street scenes captured from a mobile mapping system has many applications. Template matching is a technique that could automatically recognise the counterparts or correspondents of an object from [...] Read more.
Automatic detection and locating of objects such as poles, traffic signs, and building corners in street scenes captured from a mobile mapping system has many applications. Template matching is a technique that could automatically recognise the counterparts or correspondents of an object from multi-view images. In this study, we aim at finding correspondents of an object from wide baseline panoramic images with large geometric deformations from sphere projection and significant systematic errors from multi-camera rig geometry. Firstly, we deduce the camera model and epipolar model of a multi-camera rig system. Then, epipolar errors are analysed to determine the search area for pixelwise matching. A low-cost laser scanner is optionally used to constrain the depth of an object. Lastly, several classic feature descriptors are introduced to template matching and evaluated on the multi-view panoramic image dataset. We propose a template matching method combining a fast variation of a scale-invariant feature transform (SIFT) descriptor. Our method experimentally achieved the best performance in terms of accuracy and efficiency comparing to other feature descriptors and the most recent robust template matching methods. Full article
Show Figures

Figure 1

19 pages, 1426 KB  
Article
Fall Detection for Elderly from Partially Observed Depth-Map Video Sequences Based on View-Invariant Human Activity Representation
by Rami Alazrai, Mohammad Momani and Mohammad I. Daoud
Appl. Sci. 2017, 7(4), 316; https://doi.org/10.3390/app7040316 - 24 Mar 2017
Cited by 34 | Viewed by 8625
Abstract
This paper presents a new approach for fall detection from partially-observed depth-map video sequences. The proposed approach utilizes the 3D skeletal joint positions obtained from the Microsoft Kinect sensor to build a view-invariant descriptor for human activity representation, called the motion-pose geometric descriptor [...] Read more.
This paper presents a new approach for fall detection from partially-observed depth-map video sequences. The proposed approach utilizes the 3D skeletal joint positions obtained from the Microsoft Kinect sensor to build a view-invariant descriptor for human activity representation, called the motion-pose geometric descriptor (MPGD). Furthermore, we have developed a histogram-based representation (HBR) based on the MPGD to construct a length-independent representation of the observed video subsequences. Using the constructed HBR, we formulate the fall detection problem as a posterior-maximization problem in which the posteriori probability for each observed video subsequence is estimated using a multi-class SVM (support vector machine) classifier. Then, we combine the computed posteriori probabilities from all of the observed subsequences to obtain an overall class posteriori probability of the entire partially-observed depth-map video sequence. To evaluate the performance of the proposed approach, we have utilized the Kinect sensor to record a dataset of depth-map video sequences that simulates four fall-related activities of elderly people, including: walking, sitting, falling form standing and falling from sitting. Then, using the collected dataset, we have developed three evaluation scenarios based on the number of unobserved video subsequences in the testing videos, including: fully-observed video sequence scenario, single unobserved video subsequence of random lengths scenarios and two unobserved video subsequences of random lengths scenarios. Experimental results show that the proposed approach achieved an average recognition accuracy of 93 . 6 % , 77 . 6 % and 65 . 1 % , in recognizing the activities during the first, second and third evaluation scenario, respectively. These results demonstrate the feasibility of the proposed approach to detect falls from partially-observed videos. Full article
(This article belongs to the Special Issue Human Activity Recognition)
Show Figures

Figure 1

21 pages, 11246 KB  
Article
Automatic Registration Method for Optical Remote Sensing Images with Large Background Variations Using Line Segments
by Xiaolong Shi and Jie Jiang
Remote Sens. 2016, 8(5), 426; https://doi.org/10.3390/rs8050426 - 19 May 2016
Cited by 25 | Viewed by 6608
Abstract
Image registration is an essential step in the process of image fusion, environment surveillance and change detection. Finding correct feature matches during the registration process proves to be difficult, especially for remote sensing images with large background variations (e.g., images taken pre and [...] Read more.
Image registration is an essential step in the process of image fusion, environment surveillance and change detection. Finding correct feature matches during the registration process proves to be difficult, especially for remote sensing images with large background variations (e.g., images taken pre and post an earthquake or flood). Traditional registration methods based on local intensity probably cannot maintain steady performances, as differences are significant in the same area of the corresponding images, and ground control points are not always available in many disaster images. In this paper, an automatic image registration method based on the line segments on the main shape contours (e.g., coastal lines, long roads and mountain ridges) is proposed for remote sensing images with large background variations because the main shape contours can hold relatively more invariant information. First, a line segment detector called EDLines (Edge Drawing Lines), which was proposed by Akinlar et al. in 2011, is used to extract line segments from two corresponding images, and a line validation step is performed to remove meaningless and fragmented line segments. Then, a novel line segment descriptor with a new histogram binning strategy, which is robust to global geometrical distortions, is generated for each line segment based on the geometrical relationships,including both the locations and orientations of theremaining line segments relative to it. As a result of the invariance of the main shape contours, correct line segment matches will have similar descriptors and can be obtained by cross-matching among the descriptors. Finally, a spatial consistency measure is used to remove incorrect matches, and transformation parameters between the reference and sensed images can be figured out. Experiments with images from different types of satellite datasets, such as Landsat7, QuickBird, WorldView, and so on, demonstrate that the proposed algorithm is automatic, fast (4 ms faster than the second fastest method, i.e., the rotation- and scale-invariant shape context) and can achieve a recall of 79.7%, a precision of 89.1% and a root mean square error (RMSE) of 1.0 pixels on average for remote sensing images with large background variations. Full article
Show Figures

Graphical abstract

20 pages, 1880 KB  
Article
Characteristic Number: Theory and Its Application to Shape Analysis
by Xin Fan, Zhongxuan Luo, Jielin Zhang, Xinchen Zhou, Qi Jia and Daiyun Luo
Axioms 2014, 3(2), 202-221; https://doi.org/10.3390/axioms3020202 - 15 May 2014
Cited by 7 | Viewed by 6905
Abstract
Geometric invariants are important for shape recognition and matching. Existing invariants in projective geometry are typically defined on the limited number (e.g., five for the classical cross-ratio) of collinear planar points and also lack the ability to characterize the curve or surface underlying [...] Read more.
Geometric invariants are important for shape recognition and matching. Existing invariants in projective geometry are typically defined on the limited number (e.g., five for the classical cross-ratio) of collinear planar points and also lack the ability to characterize the curve or surface underlying the given points. In this paper, we present a projective invariant named after the characteristic number of planar algebraic curves. The characteristic number in this work reveals an intrinsic property of an algebraic hypersurface or curve, which relies no more on the existence of the surface or curve as its planar version. The new definition also generalizes the cross-ratio by relaxing the collinearity and number of points for the cross-ratio. We employ the characteristic number to construct more informative shape descriptors that improve the performance of shape recognition, especially when severe affine and perspective deformations occur. In addition to the application to shape recognition, we incorporate the geometric constraints on facial feature points derived from the characteristic number into facial feature matching. The experiments show the improvements on accuracy and robustness to pose and view changes over the method with the collinearity and cross-ratio constraints. Full article
Show Figures

Figure 1

Back to TopTop