Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (16)

Search Parameters:
Keywords = NetVLAD

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 16833 KB  
Article
R2SCAT-LPR: Rotation-Robust Network with Self- and Cross-Attention Transformers for LiDAR-Based Place Recognition
by Weizhong Jiang, Hanzhang Xue, Shubin Si, Liang Xiao, Dawei Zhao, Qi Zhu, Yiming Nie and Bin Dai
Remote Sens. 2025, 17(6), 1057; https://doi.org/10.3390/rs17061057 - 17 Mar 2025
Cited by 2 | Viewed by 926
Abstract
LiDAR-based place recognition (LPR) is crucial for the navigation and localization of autonomous vehicles and mobile robots in large-scale outdoor environments and plays a critical role in loop closure detection for simultaneous localization and mapping (SLAM). Existing LPR methods, which utilize 2D bird’s-eye [...] Read more.
LiDAR-based place recognition (LPR) is crucial for the navigation and localization of autonomous vehicles and mobile robots in large-scale outdoor environments and plays a critical role in loop closure detection for simultaneous localization and mapping (SLAM). Existing LPR methods, which utilize 2D bird’s-eye view (BEV) projections of 3D point clouds, achieve competitive performance in efficiency and recognition accuracy. However, these methods often struggle with capturing global contextual information and maintaining robustness to viewpoint variations. To address these challenges, we propose R2SCAT-LPR, a novel, transformer-based model that leverages self-attention and cross-attention mechanisms to extract rotation-robust place feature descriptors from BEV images. R2SCAT-LPR consists of three core modules: (1) R2MPFE, which employs weight-shared cascaded multi-head self-attention (MHSA) to extract multi-level spatial contextual patch features from both the original BEV image and its randomly rotated counterpart; (2) DSCA, which integrates dual-branch self-attention and multi-head cross-attention (MHCA) to capture intrinsic correspondences between multi-level patch features before and after rotation, enhancing the extraction of rotation-robust local features; and (3) a combined NetVLAD module, which aggregates patch features from both the original feature space and the rotated interaction space into a compact and viewpoint-robust global descriptor. Extensive experiments conducted on the KITTI and NCLT datasets validate the effectiveness of the proposed model, demonstrating its robustness to rotation variations and its generalization ability across diverse scenes and LiDAR sensors types. Furthermore, we evaluate the generalization performance and computational efficiency of R2SCAT-LPR on our self-constructed OffRoad-LPR dataset for off-road autonomous driving, verifying its deployability on resource-constrained platforms. Full article
Show Figures

Figure 1

16 pages, 2096 KB  
Article
CamGNN: Cascade Graph Neural Network for Camera Re-Localization
by Li Wang, Jiale Jia, Hualin Dai and Guoyan Li
Electronics 2024, 13(9), 1734; https://doi.org/10.3390/electronics13091734 - 1 May 2024
Viewed by 1633
Abstract
In response to the inaccurate positioning of traditional camera relocation methods in scenes with large-scale or severe viewpoint changes, this study proposes a camera relocation method based on a cascaded graph neural network to achieve accurate scene relocation. Firstly, the NetVLAD retrieval method, [...] Read more.
In response to the inaccurate positioning of traditional camera relocation methods in scenes with large-scale or severe viewpoint changes, this study proposes a camera relocation method based on a cascaded graph neural network to achieve accurate scene relocation. Firstly, the NetVLAD retrieval method, which has advantages in image feature representation and similarity calculation, is used to retrieve the most similar images to a given query image. Then, the feature pyramid is employed to extract features at different scales of these images, and the features at the same scale are treated as nodes of the graph neural network to construct a single-layer graph neural network structure. Secondly, a top–down connection is used to cascade the single-layer graph structures, where the information of nodes in the previous graph is fused into a message node to improve the accuracy of camera pose estimation. To better capture the topological relationships and spatial geometric constraints between images, an attention mechanism is introduced in the single-layer graph structure, which helps to effectively propagate information to the next graph during the cascading process, thereby enhancing the robustness of camera relocation. Experimental results on the public dataset 7-Scenes demonstrate that the proposed method can effectively improve the accuracy of camera absolute pose localization, with average translation and rotation errors of 0.19 m and 6.9°, respectively. Compared to other deep learning-based methods, the proposed method achieves more than 10% improvement in both average translation and rotation accuracy, demonstrating highly competitive localization precision. Full article
Show Figures

Figure 1

20 pages, 5360 KB  
Article
An Appearance-Semantic Descriptor with Coarse-to-Fine Matching for Robust VPR
by Jie Chen, Wenbo Li, Pengshuai Hou, Zipeng Yang and Haoyu Zhao
Sensors 2024, 24(7), 2203; https://doi.org/10.3390/s24072203 - 29 Mar 2024
Cited by 1 | Viewed by 1372
Abstract
In recent years, semantic segmentation has made significant progress in visual place recognition (VPR) by using semantic information that is relatively invariant to appearance and viewpoint, demonstrating great potential. However, in some extreme scenarios, there may be semantic occlusion and semantic sparsity, which [...] Read more.
In recent years, semantic segmentation has made significant progress in visual place recognition (VPR) by using semantic information that is relatively invariant to appearance and viewpoint, demonstrating great potential. However, in some extreme scenarios, there may be semantic occlusion and semantic sparsity, which can lead to confusion when relying solely on semantic information for localization. Therefore, this paper proposes a novel VPR framework that employs a coarse-to-fine image matching strategy, combining semantic and appearance information to improve algorithm performance. First, we construct SemLook global descriptors using semantic contours, which can preliminarily screen images to enhance the accuracy and real-time performance of the algorithm. Based on this, we introduce SemLook local descriptors for fine screening, combining robust appearance information extracted by deep learning with semantic information. These local descriptors can address issues such as semantic overlap and sparsity in urban environments, further improving the accuracy of the algorithm. Through this refined screening process, we can effectively handle the challenges of complex image matching in urban environments and obtain more accurate results. The performance of SemLook descriptors is evaluated on three public datasets (Extended-CMU Season, Robot-Car Seasons v2, and SYNTHIA) and compared with six state-of-the-art VPR algorithms (HOG, CoHOG, AlexNet_VPR, Region VLAD, Patch-NetVLAD, Forest). In the experimental comparison, considering both real-time performance and evaluation metrics, the SemLook descriptors are found to outperform the other six algorithms. Evaluation metrics include the area under the curve (AUC) based on the precision–recall curve, Recall@100%Precision, and Precision@100%Recall. On the Extended-CMU Season dataset, SemLook descriptors achieve a 100% AUC value, and on the SYNTHIA dataset, they achieve a 99% AUC value, demonstrating outstanding performance. The experimental results indicate that introducing global descriptors for initial screening and utilizing local descriptors combining both semantic and appearance information for precise matching can effectively address the issue of location recognition in scenarios with semantic ambiguity or sparsity. This algorithm enhances descriptor performance, making it more accurate and robust in scenes with variations in appearance and viewpoint. Full article
Show Figures

Figure 1

13 pages, 839 KB  
Article
Contextual Patch-NetVLAD: Context-Aware Patch Feature Descriptor and Patch Matching Mechanism for Visual Place Recognition
by Wenyuan Sun, Wentang Chen, Runxiang Huang and Jing Tian
Sensors 2024, 24(3), 855; https://doi.org/10.3390/s24030855 - 28 Jan 2024
Cited by 4 | Viewed by 3056
Abstract
The goal of visual place recognition (VPR) is to determine the location of a query image by identifying its place in a collection of image databases. Visual sensor technologies are crucial for visual place recognition as they allow for precise identification and location [...] Read more.
The goal of visual place recognition (VPR) is to determine the location of a query image by identifying its place in a collection of image databases. Visual sensor technologies are crucial for visual place recognition as they allow for precise identification and location of query images within a database. Global descriptor-based VPR methods face the challenge of accurately capturing the local specific regions within a scene; consequently, it leads to an increasing probability of confusion during localization in such scenarios. To tackle feature extraction and feature matching challenges in VPR, we propose a modified patch-NetVLAD strategy that includes two new modules: a context-aware patch descriptor and a context-aware patch matching mechanism. Firstly, we propose a context-driven patch feature descriptor to overcome the limitations of global and local descriptors in visual place recognition. This descriptor aggregates features from each patch’s surrounding neighborhood. Secondly, we introduce a context-driven feature matching mechanism that utilizes cluster and saliency context-driven weighting rules to assign higher weights to patches that are less similar to densely populated or locally similar regions for improved localization performance. We further incorporate both of these modules into the patch-NetVLAD framework, resulting in a new approach called contextual patch-NetVLAD. Experimental results are provided to show that our proposed approach outperforms other state-of-the-art methods to achieve a Recall@10 score of 99.82 on Pittsburgh30k, 99.82 on FMDataset, and 97.68 on our benchmark dataset. Full article
(This article belongs to the Special Issue Vision Sensors: Image Processing Technologies and Applications)
Show Figures

Figure 1

27 pages, 15549 KB  
Article
Loop Closure Detection Based on Compressed ConvNet Features in Dynamic Environments
by Shuhai Jiang, Zhongkai Zhou and Shangjie Sun
Appl. Sci. 2024, 14(1), 8; https://doi.org/10.3390/app14010008 - 19 Dec 2023
Cited by 2 | Viewed by 1603
Abstract
In dynamic environments, convolutional neural networks (CNNs) often produce image feature maps with significant redundancy due to external factors such as moving objects and occlusions. These feature maps are inadequate as precise image descriptors for similarity measurement, hindering loop closure detection. Addressing this [...] Read more.
In dynamic environments, convolutional neural networks (CNNs) often produce image feature maps with significant redundancy due to external factors such as moving objects and occlusions. These feature maps are inadequate as precise image descriptors for similarity measurement, hindering loop closure detection. Addressing this issue, this paper proposes feature compression of convolutional neural network output. The approach is detailed as follows: (1) employing ResNet152 as the backbone feature-extraction network, a Siamese neural network is constructed to enhance the efficiency of feature extraction; (2) utilizing KL transformation to extract principal components from the backbone network’s output, thereby eliminating redundant information; (3) employing the compressed features as input for NetVLAD to construct a spatially informed feature descriptor for similarity measurement. Experimental results demonstrate that, on the New College dataset, the proposed improved method exhibits an approximately 9.98% enhancement in average accuracy compared to the original network. On the City Center dataset, there is an improvement of approximately 2.64%, with an overall increase of about 23.51% in time performance. These findings indicate that the enhanced ResNet152 performs better than the original network in environments with more moving objects and occlusions. Full article
(This article belongs to the Section Robotics and Automation)
Show Figures

Figure 1

19 pages, 5782 KB  
Article
Landmark Topology Descriptor-Based Place Recognition and Localization under Large View-Point Changes
by Guanhong Gao, Zhi Xiong, Yao Zhao and Ling Zhang
Sensors 2023, 23(24), 9775; https://doi.org/10.3390/s23249775 - 12 Dec 2023
Cited by 1 | Viewed by 1445
Abstract
Accurate localization between cameras is a prerequisite for a vision-based heterogeneous robot systems task. The core issue is how to accurately perform place recognition from different view-points. Traditional appearance-based methods have a high probability of failure in place recognition and localization under large [...] Read more.
Accurate localization between cameras is a prerequisite for a vision-based heterogeneous robot systems task. The core issue is how to accurately perform place recognition from different view-points. Traditional appearance-based methods have a high probability of failure in place recognition and localization under large view-point changes. In recent years, semantic graph matching-based place recognition methods have been proposed to solve the above problem. However, these methods rely on high-precision semantic segmentation results and have a high time complexity in node extraction or graph matching. In addition, methods only utilize the semantic labels of the landmarks themselves to construct graphs and descriptors, making such approaches fail in some challenging scenarios (e.g., scene repetition). In this paper, we propose a graph-matching method based on a novel landmark topology descriptor, which is robust to view-point changes. According to the experiment on real-world data, our algorithm can run in real-time and is approximately four times and three times faster than state-of-the-art algorithms in the graph extraction and matching phases, respectively. In terms of place recognition performance, our algorithm achieves the best place recognition precision at a recall of 0–70% compared with classic appearance-based algorithms and an advanced graph-based algorithm in the scene of significant view-point changes. In terms of positioning accuracy, compared to the traditional appearance-based DBoW2 and NetVLAD algorithms, our method outperforms by 95%, on average, in terms of the mean translation error and 95% in terms of the mean RMSE. Compared to the state-of-the-art SHM algorithm, our method outperforms by 30%, on average, in terms of the mean translation error and 29% in terms of the mean RMSE. In addition, our method outperforms the current state-of-the-art algorithm, even in challenging scenarios where the benchmark algorithms fail. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

19 pages, 4252 KB  
Article
SaMfENet: Self-Attention Based Multi-Scale Feature Fusion Coding and Edge Information Constraint Network for 6D Pose Estimation
by Zhuoxiao Li, Xiaobing Li, Shihao Chen, Jialong Du and Yong Li
Mathematics 2022, 10(19), 3671; https://doi.org/10.3390/math10193671 - 7 Oct 2022
Cited by 2 | Viewed by 2648
Abstract
Accurate estimation of an object’s 6D pose is one of the crucial technologies for robotic manipulators. Especially when the lighting conditions changes or the object is occluded, resulting in the missing or the interference of the object information, which makes the accurate 6D [...] Read more.
Accurate estimation of an object’s 6D pose is one of the crucial technologies for robotic manipulators. Especially when the lighting conditions changes or the object is occluded, resulting in the missing or the interference of the object information, which makes the accurate 6D pose estimation more challenging. To estimate the 6D pose of the object accurately, a self-attention-based multi-scale feature fusion coding and edge information constraint 6D pose estimation network is proposed, which can achieve accurate 6D pose estimation by employing RGB-D images. The proposed algorithm first introduces the edge reconstruction module into the pose estimation network, which improves the attention of the feature extraction network to the edge features. Furthermore, a self-attention multi-scale point cloud feature extraction module, i.e., MSPNet, is proposed to extract point cloud geometric features, which are reconstructed from depth maps. Finally, the clustering feature encoding module, i.e., SE-NetVLAD, is proposed to encode multi-modal dense feature sequences to construct more expressive global features. The proposed method is evaluated on the LineMOD and YCB-Video datasets, and the experimental results illustrate that the proposed method has an outstanding performance, which is close to the current state-of-the-art methods. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

17 pages, 4788 KB  
Article
Unifying Deep ConvNet and Semantic Edge Features for Loop Closure Detection
by Jie Jin, Jiale Bai, Yan Xu and Jiani Huang
Remote Sens. 2022, 14(19), 4885; https://doi.org/10.3390/rs14194885 - 30 Sep 2022
Cited by 3 | Viewed by 2200
Abstract
Loop closure detection is an important component of Simultaneous Localization and Mapping (SLAM). In this paper, a novel two-branch loop closure detection algorithm unifying deep Convolutional Neural Network (ConvNet) features and semantic edge features is proposed. In detail, we use one feature extraction [...] Read more.
Loop closure detection is an important component of Simultaneous Localization and Mapping (SLAM). In this paper, a novel two-branch loop closure detection algorithm unifying deep Convolutional Neural Network (ConvNet) features and semantic edge features is proposed. In detail, we use one feature extraction module to extract both ConvNet and semantic edge features simultaneously. The deep ConvNet features are subjected to a Context Feature Enhancement (CFE) module in the global feature ranking branch to generate a representative global feature descriptor. Concurrently, to reduce the interference of dynamic features, the extracted semantic edge information of landmarks is encoded through the Vector of Locally Aggregated Descriptors (VLAD) framework in the semantic edge feature ranking branch to form semantic edge descriptors. Finally, semantic, visual, and geometric information is integrated by the similarity score fusion calculation. Extensive experiments on six public datasets show that the proposed approach can achieve competitive recall rates at 100% precision compared to other state-of-the-art methods. Full article
Show Figures

Graphical abstract

11 pages, 1190 KB  
Article
Monocular Depth Estimation from a Single Infrared Image
by Daechan Han and Yukyung Choi
Electronics 2022, 11(11), 1729; https://doi.org/10.3390/electronics11111729 - 30 May 2022
Cited by 4 | Viewed by 4502
Abstract
Thermal infrared imaging is attracting much attention due to its strength against illuminance variation. However, because of the spectral difference between thermal infrared images and RGB images, the existing research on self-supervised monocular depth estimation has performance limitations. Therefore, in this study, we [...] Read more.
Thermal infrared imaging is attracting much attention due to its strength against illuminance variation. However, because of the spectral difference between thermal infrared images and RGB images, the existing research on self-supervised monocular depth estimation has performance limitations. Therefore, in this study, we propose a novel Self-Guided Framework using a Pseudolabel predicted from RGB images. Our proposed framework, which solves the problem of appearance matching loss in the existing framework, transfers the high accuracy of Pseudolabel to the thermal depth estimation network by comparing low- and high-level pixels. Furthermore, we propose Patch-NetVLAD Loss, which strengthens local detail and global context information in the depth map from thermal infrared imaging by comparing locally global patch-level descriptors. Finally, we introduce an Image Matching Loss to estimate a more accurate depth map in a thermal depth network by enhancing the performance of the Pseudolabel. We demonstrate that the proposed framework shows significant performance improvement even when applied to various depth networks in the KAIST Multispectral Dataset. Full article
Show Figures

Figure 1

17 pages, 25326 KB  
Article
Fine-Grained Pests Recognition Based on Truncated Probability Fusion Network via Internet of Things in Forestry and Agricultural Scenes
by Kai Ma, Ming-Jun Nie, Sen Lin, Jianlei Kong, Cheng-Cai Yang and Jinhao Liu
Algorithms 2021, 14(10), 290; https://doi.org/10.3390/a14100290 - 30 Sep 2021
Cited by 3 | Viewed by 3044
Abstract
Accurate identification of insect pests is the key to improve crop yield and ensure quality and safety. However, under the influence of environmental conditions, the same kind of pests show obvious differences in intraclass representation, while the different kinds of pests show slight [...] Read more.
Accurate identification of insect pests is the key to improve crop yield and ensure quality and safety. However, under the influence of environmental conditions, the same kind of pests show obvious differences in intraclass representation, while the different kinds of pests show slight similarities. The traditional methods have been difficult to deal with fine-grained identification of pests, and their practical deployment is low. In order to solve this problem, this paper uses a variety of equipment terminals in the agricultural Internet of Things to obtain a large number of pest images and proposes a fine-grained identification model of pests based on probability fusion network FPNT. This model designs a fine-grained feature extractor based on an optimized CSPNet backbone network, mining different levels of local feature expression that can distinguish subtle differences. After the integration of the NetVLAD aggregation layer, the gated probability fusion layer gives full play to the advantages of information complementarity and confidence coupling of multi-model fusion. The comparison test shows that the PFNT model has an average recognition accuracy of 93.18% for all kinds of pests, and its performance is better than other deep-learning methods, with the average processing time drop to 61 ms, which can meet the needs of fine-grained image recognition of pests in the Internet of Things in agricultural and forestry practice, and provide technical application reference for intelligent early warning and prevention of pests. Full article
(This article belongs to the Special Issue Algorithms for Machine Learning and Pattern Recognition Tasks)
Show Figures

Figure 1

22 pages, 10732 KB  
Article
Intelligent Recognition Method of Decorative Openwork Windows with Sustainable Application for Suzhou Traditional Private Gardens in China
by Rui Zhang, Yuwei Zhao, Jianlei Kong, Chen Cheng, Xinyan Liu and Chang Zhang
Sustainability 2021, 13(15), 8439; https://doi.org/10.3390/su13158439 - 28 Jul 2021
Cited by 20 | Viewed by 4134
Abstract
Decorative openwork windows (DO-Ws) in Suzhou traditional private gardens play a vital role in Chinese traditional garden art. Due to the delicate and elegant patterns, as well as their rich cultural meaning, DO-Ws have quite high protection and utilization value. In this study, [...] Read more.
Decorative openwork windows (DO-Ws) in Suzhou traditional private gardens play a vital role in Chinese traditional garden art. Due to the delicate and elegant patterns, as well as their rich cultural meaning, DO-Ws have quite high protection and utilization value. In this study, we firstly visited 15 extant traditional gardens in Suzhou and took almost 3000 photos to establish the DO-W datasets. Then, we present an effective visual recognition method named CSV-Net to classify different DO-Ws’ patterns in Suzhou traditional gardens. On the basis of the backbone module of the cross stage partial network optimized with the Soft-VLAD architecture, the proposed CSV-Net achieves a preferable representation ability for distinguishing different DO-Ws in practical scenes. The comparative experimental results show that the CSV-Net model achieves a good balance between its performance, robustness and complexity for identifying DO-Ws, also having further potential for sustainable application in traditional gardens. Moreover, the Canglang Pavilion and the Humble Administrator’s Garden were selected as the cases to analyze the relation between identifying DO-W types and their locations in intelligent approaches, which further reveals the design rules of the sustainable culture contained in Chinese traditional gardens. This work ultimately promotes the sustainable application of artificial intelligence technology in the field of garden design and inheritance of the garden art. Full article
Show Figures

Figure 1

18 pages, 5821 KB  
Article
Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition
by Bogdan Mocanu, Ruxandra Tapu and Titus Zaharia
Sensors 2021, 21(12), 4233; https://doi.org/10.3390/s21124233 - 20 Jun 2021
Cited by 22 | Viewed by 4618
Abstract
Emotion is a form of high-level paralinguistic information that is intrinsically conveyed by human speech. Automatic speech emotion recognition is an essential challenge for various applications; including mental disease diagnosis; audio surveillance; human behavior understanding; e-learning and human–machine/robot interaction. In this paper, we [...] Read more.
Emotion is a form of high-level paralinguistic information that is intrinsically conveyed by human speech. Automatic speech emotion recognition is an essential challenge for various applications; including mental disease diagnosis; audio surveillance; human behavior understanding; e-learning and human–machine/robot interaction. In this paper, we introduce a novel speech emotion recognition method, based on the Squeeze and Excitation ResNet (SE-ResNet) model and fed with spectrogram inputs. In order to overcome the limitations of the state-of-the-art techniques, which fail in providing a robust feature representation at the utterance level, the CNN architecture is extended with a trainable discriminative GhostVLAD clustering layer that aggregates the audio features into compact, single-utterance vector representation. In addition, an end-to-end neural embedding approach is introduced, based on an emotionally constrained triplet loss function. The loss function integrates the relations between the various emotional patterns and thus improves the latent space data representation. The proposed methodology achieves 83.35% and 64.92% global accuracy rates on the RAVDESS and CREMA-D publicly available datasets, respectively. When compared with the results provided by human observers, the gains in global accuracy scores are superior to 24%. Finally, the objective comparative evaluation with state-of-the-art techniques demonstrates accuracy gains of more than 3%. Full article
(This article belongs to the Special Issue Emotion Monitoring System Based on Sensors and Data Analysis)
Show Figures

Figure 1

18 pages, 6162 KB  
Article
KVGCN: A KNN Searching and VLAD Combined Graph Convolutional Network for Point Cloud Segmentation
by Nan Luo, Hongquan Yu, Zhenfeng Huo, Jinhui Liu, Quan Wang, Ying Xu and Yun Gao
Remote Sens. 2021, 13(5), 1003; https://doi.org/10.3390/rs13051003 - 6 Mar 2021
Cited by 14 | Viewed by 3824
Abstract
Semantic segmentation of the sensed point cloud data plays a significant role in scene understanding and reconstruction, robot navigation, etc. This work presents a Graph Convolutional Network integrating K-Nearest Neighbor searching (KNN) and Vector of Locally Aggregated Descriptors (VLAD). KNN searching is utilized [...] Read more.
Semantic segmentation of the sensed point cloud data plays a significant role in scene understanding and reconstruction, robot navigation, etc. This work presents a Graph Convolutional Network integrating K-Nearest Neighbor searching (KNN) and Vector of Locally Aggregated Descriptors (VLAD). KNN searching is utilized to construct the topological graph of each point and its neighbors. Then, we perform convolution on the edges of constructed graph to extract representative local features by multiple Multilayer Perceptions (MLPs). Afterwards, a trainable VLAD layer, NetVLAD, is embedded in the feature encoder to aggregate the local and global contextual features. The designed feature encoder is repeated for multiple times, and the extracted features are concatenated in a jump-connection style to strengthen the distinctiveness of features and thereby improve the segmentation. Experimental results on two datasets show that the proposed work settles the shortcoming of insufficient local feature extraction and promotes the accuracy (mIoU 60.9% and oAcc 87.4% for S3DIS) of semantic segmentation comparing to existing models. Full article
Show Figures

Graphical abstract

24 pages, 3959 KB  
Article
A Panoramic Localizer Based on Coarse-to-Fine Descriptors for Navigation Assistance
by Yicheng Fang, Kailun Yang, Ruiqi Cheng, Lei Sun and Kaiwei Wang
Sensors 2020, 20(15), 4177; https://doi.org/10.3390/s20154177 - 27 Jul 2020
Cited by 8 | Viewed by 4112
Abstract
Visual Place Recognition (VPR) addresses visual instance retrieval tasks against discrepant scenes and gives precise localization. During a traverse, the captured images (query images) would be traced back to the already existing positions in the database images, rendering vehicles or pedestrian navigation devices [...] Read more.
Visual Place Recognition (VPR) addresses visual instance retrieval tasks against discrepant scenes and gives precise localization. During a traverse, the captured images (query images) would be traced back to the already existing positions in the database images, rendering vehicles or pedestrian navigation devices distinguish ambient environments. Unfortunately, diverse appearance variations can bring about huge challenges for VPR, such as illumination changing, viewpoint varying, seasonal cycling, disparate traverses (forward and backward), and so on. In addition, the majority of current VPR algorithms are designed for forward-facing images, which can only provide with narrow Field of View (FoV) and come with severe viewpoint influences. In this paper, we propose a panoramic localizer, which is based on coarse-to-fine descriptors, leveraging panoramas for omnidirectional perception and sufficient FoV up to 360. We adopt NetVLAD descriptors in the coarse matching in a panorama-to-panorama way, for their robust performances in distinguishing different appearances, utilizing Geodesc keypoint descriptors in the fine stage in the meantime, for their capacity of detecting detailed information, formatting powerful coarse-to-fine descriptors. A comprehensive set of experiments is conducted on several datasets including both public benchmarks and our real-world campus scenes. Our system is proved to be with high recall and strong generalization capacity across various appearances. The proposed panoramic localizer can be integrated into mobile navigation devices, available for a variety of localization application scenarios. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

17 pages, 2346 KB  
Article
Image Representation Method Based on Relative Layer Entropy for Insulator Recognition
by Zhenbing Zhao, Hongyu Qi, Xiaoqing Fan, Guozhi Xu, Yincheng Qi, Yongjie Zhai and Ke Zhang
Entropy 2020, 22(4), 419; https://doi.org/10.3390/e22040419 - 8 Apr 2020
Cited by 5 | Viewed by 3565
Abstract
Deep convolutional neural networks (DCNNs) with alternating convolutional, pooling and decimation layers are widely used in computer vision, yet current works tend to focus on deeper networks with many layers and neurons, resulting in a high computational complexity. However, the recognition task is [...] Read more.
Deep convolutional neural networks (DCNNs) with alternating convolutional, pooling and decimation layers are widely used in computer vision, yet current works tend to focus on deeper networks with many layers and neurons, resulting in a high computational complexity. However, the recognition task is still challenging for insufficient and uncomprehensive object appearance and training sample types such as infrared insulators. In view of this, more attention is focused on the application of a pretrained network for image feature representation, but the rules on how to select the feature representation layer are scarce. In this paper, we proposed a new concept, the layer entropy and relative layer entropy, which can be referred to as an image representation method based on relative layer entropy (IRM_RLE). It was designed to excavate the most suitable convolution layer for image recognition. First, the image was fed into an ImageNet pretrained DCNN model, and deep convolutional activations were extracted. Then, the appropriate feature layer was selected by calculating the layer entropy and relative layer entropy of each convolution layer. Finally, the number of the feature map was selected according to the importance degree and the feature maps of the convolution layer, which were vectorized and pooled by VLAD (vector of locally aggregated descriptors) coding and quantifying for final image representation. The experimental results show that the proposed approach performs competitively against previous methods across all datasets. Furthermore, for the indoor scenes and actions datasets, the proposed approach outperforms the state-of-the-art methods. Full article
(This article belongs to the Section Multidisciplinary Applications)
Show Figures

Figure 1

Back to TopTop