MDPI - Publisher of Open Access Journals

25 pages, 16833 KB

Open AccessArticle

R2SCAT-LPR: Rotation-Robust Network with Self- and Cross-Attention Transformers for LiDAR-Based Place Recognition

by Weizhong Jiang, Hanzhang Xue, Shubin Si, Liang Xiao, Dawei Zhao, Qi Zhu, Yiming Nie and Bin Dai

Remote Sens. 2025, 17(6), 1057; https://doi.org/10.3390/rs17061057 - 17 Mar 2025

Cited by 2 | Viewed by 926

Abstract

LiDAR-based place recognition (LPR) is crucial for the navigation and localization of autonomous vehicles and mobile robots in large-scale outdoor environments and plays a critical role in loop closure detection for simultaneous localization and mapping (SLAM). Existing LPR methods, which utilize 2D bird’s-eye [...] Read more.

LiDAR-based place recognition (LPR) is crucial for the navigation and localization of autonomous vehicles and mobile robots in large-scale outdoor environments and plays a critical role in loop closure detection for simultaneous localization and mapping (SLAM). Existing LPR methods, which utilize 2D bird’s-eye view (BEV) projections of 3D point clouds, achieve competitive performance in efficiency and recognition accuracy. However, these methods often struggle with capturing global contextual information and maintaining robustness to viewpoint variations. To address these challenges, we propose R2SCAT-LPR, a novel, transformer-based model that leverages self-attention and cross-attention mechanisms to extract rotation-robust place feature descriptors from BEV images. R2SCAT-LPR consists of three core modules: (1) R2MPFE, which employs weight-shared cascaded multi-head self-attention (MHSA) to extract multi-level spatial contextual patch features from both the original BEV image and its randomly rotated counterpart; (2) DSCA, which integrates dual-branch self-attention and multi-head cross-attention (MHCA) to capture intrinsic correspondences between multi-level patch features before and after rotation, enhancing the extraction of rotation-robust local features; and (3) a combined NetVLAD module, which aggregates patch features from both the original feature space and the rotated interaction space into a compact and viewpoint-robust global descriptor. Extensive experiments conducted on the KITTI and NCLT datasets validate the effectiveness of the proposed model, demonstrating its robustness to rotation variations and its generalization ability across diverse scenes and LiDAR sensors types. Furthermore, we evaluate the generalization performance and computational efficiency of R2SCAT-LPR on our self-constructed OffRoad-LPR dataset for off-road autonomous driving, verifying its deployability on resource-constrained platforms. Full article

(This article belongs to the Special Issue 3D Reconstruction and Mobile Mapping in Urban Environments Using Remote Sensing (Second Edition))

► Show Figures

Figure 1

16 pages, 2096 KB

Open AccessArticle

CamGNN: Cascade Graph Neural Network for Camera Re-Localization

by Li Wang, Jiale Jia, Hualin Dai and Guoyan Li

Electronics 2024, 13(9), 1734; https://doi.org/10.3390/electronics13091734 - 1 May 2024

Viewed by 1633

Abstract

In response to the inaccurate positioning of traditional camera relocation methods in scenes with large-scale or severe viewpoint changes, this study proposes a camera relocation method based on a cascaded graph neural network to achieve accurate scene relocation. Firstly, the NetVLAD retrieval method, [...] Read more.

In response to the inaccurate positioning of traditional camera relocation methods in scenes with large-scale or severe viewpoint changes, this study proposes a camera relocation method based on a cascaded graph neural network to achieve accurate scene relocation. Firstly, the NetVLAD retrieval method, which has advantages in image feature representation and similarity calculation, is used to retrieve the most similar images to a given query image. Then, the feature pyramid is employed to extract features at different scales of these images, and the features at the same scale are treated as nodes of the graph neural network to construct a single-layer graph neural network structure. Secondly, a top–down connection is used to cascade the single-layer graph structures, where the information of nodes in the previous graph is fused into a message node to improve the accuracy of camera pose estimation. To better capture the topological relationships and spatial geometric constraints between images, an attention mechanism is introduced in the single-layer graph structure, which helps to effectively propagate information to the next graph during the cascading process, thereby enhancing the robustness of camera relocation. Experimental results on the public dataset 7-Scenes demonstrate that the proposed method can effectively improve the accuracy of camera absolute pose localization, with average translation and rotation errors of 0.19 m and 6.9°, respectively. Compared to other deep learning-based methods, the proposed method achieves more than 10% improvement in both average translation and rotation accuracy, demonstrating highly competitive localization precision. Full article

► Show Figures

Figure 1

20 pages, 5360 KB

Open AccessArticle

An Appearance-Semantic Descriptor with Coarse-to-Fine Matching for Robust VPR

by Jie Chen, Wenbo Li, Pengshuai Hou, Zipeng Yang and Haoyu Zhao

Sensors 2024, 24(7), 2203; https://doi.org/10.3390/s24072203 - 29 Mar 2024

Cited by 1 | Viewed by 1372

Abstract

In recent years, semantic segmentation has made significant progress in visual place recognition (VPR) by using semantic information that is relatively invariant to appearance and viewpoint, demonstrating great potential. However, in some extreme scenarios, there may be semantic occlusion and semantic sparsity, which [...] Read more.

In recent years, semantic segmentation has made significant progress in visual place recognition (VPR) by using semantic information that is relatively invariant to appearance and viewpoint, demonstrating great potential. However, in some extreme scenarios, there may be semantic occlusion and semantic sparsity, which can lead to confusion when relying solely on semantic information for localization. Therefore, this paper proposes a novel VPR framework that employs a coarse-to-fine image matching strategy, combining semantic and appearance information to improve algorithm performance. First, we construct SemLook global descriptors using semantic contours, which can preliminarily screen images to enhance the accuracy and real-time performance of the algorithm. Based on this, we introduce SemLook local descriptors for fine screening, combining robust appearance information extracted by deep learning with semantic information. These local descriptors can address issues such as semantic overlap and sparsity in urban environments, further improving the accuracy of the algorithm. Through this refined screening process, we can effectively handle the challenges of complex image matching in urban environments and obtain more accurate results. The performance of SemLook descriptors is evaluated on three public datasets (Extended-CMU Season, Robot-Car Seasons v2, and SYNTHIA) and compared with six state-of-the-art VPR algorithms (HOG, CoHOG, AlexNet_VPR, Region VLAD, Patch-NetVLAD, Forest). In the experimental comparison, considering both real-time performance and evaluation metrics, the SemLook descriptors are found to outperform the other six algorithms. Evaluation metrics include the area under the curve (AUC) based on the precision–recall curve, Recall@100%Precision, and Precision@100%Recall. On the Extended-CMU Season dataset, SemLook descriptors achieve a 100% AUC value, and on the SYNTHIA dataset, they achieve a 99% AUC value, demonstrating outstanding performance. The experimental results indicate that introducing global descriptors for initial screening and utilizing local descriptors combining both semantic and appearance information for precise matching can effectively address the issue of location recognition in scenarios with semantic ambiguity or sparsity. This algorithm enhances descriptor performance, making it more accurate and robust in scenes with variations in appearance and viewpoint. Full article

(This article belongs to the Special Issue Simultaneous Localization and Mapping (SLAM) and Artificial Intelligence (AI) Based Localization for Positioning Applications and Mobile Robot Navigation—Second Edition)

► Show Figures

Figure 1

13 pages, 839 KB

Open AccessArticle

Contextual Patch-NetVLAD: Context-Aware Patch Feature Descriptor and Patch Matching Mechanism for Visual Place Recognition

by Wenyuan Sun, Wentang Chen, Runxiang Huang and Jing Tian

Sensors 2024, 24(3), 855; https://doi.org/10.3390/s24030855 - 28 Jan 2024

Cited by 4 | Viewed by 3056

Abstract

The goal of visual place recognition (VPR) is to determine the location of a query image by identifying its place in a collection of image databases. Visual sensor technologies are crucial for visual place recognition as they allow for precise identification and location [...] Read more.

The goal of visual place recognition (VPR) is to determine the location of a query image by identifying its place in a collection of image databases. Visual sensor technologies are crucial for visual place recognition as they allow for precise identification and location of query images within a database. Global descriptor-based VPR methods face the challenge of accurately capturing the local specific regions within a scene; consequently, it leads to an increasing probability of confusion during localization in such scenarios. To tackle feature extraction and feature matching challenges in VPR, we propose a modified patch-NetVLAD strategy that includes two new modules: a context-aware patch descriptor and a context-aware patch matching mechanism. Firstly, we propose a context-driven patch feature descriptor to overcome the limitations of global and local descriptors in visual place recognition. This descriptor aggregates features from each patch’s surrounding neighborhood. Secondly, we introduce a context-driven feature matching mechanism that utilizes cluster and saliency context-driven weighting rules to assign higher weights to patches that are less similar to densely populated or locally similar regions for improved localization performance. We further incorporate both of these modules into the patch-NetVLAD framework, resulting in a new approach called contextual patch-NetVLAD. Experimental results are provided to show that our proposed approach outperforms other state-of-the-art methods to achieve a Recall@10 score of

99.82

on

P i t t s b u r g h 30 k

,

99.82

on FMDataset, and

97.68

on our benchmark dataset. Full article

(This article belongs to the Special Issue Vision Sensors: Image Processing Technologies and Applications)

► Show Figures

Figure 1

27 pages, 15549 KB

Open AccessArticle

Loop Closure Detection Based on Compressed ConvNet Features in Dynamic Environments

by Shuhai Jiang, Zhongkai Zhou and Shangjie Sun

Appl. Sci. 2024, 14(1), 8; https://doi.org/10.3390/app14010008 - 19 Dec 2023

Cited by 2 | Viewed by 1603

Abstract

In dynamic environments, convolutional neural networks (CNNs) often produce image feature maps with significant redundancy due to external factors such as moving objects and occlusions. These feature maps are inadequate as precise image descriptors for similarity measurement, hindering loop closure detection. Addressing this [...] Read more.

In dynamic environments, convolutional neural networks (CNNs) often produce image feature maps with significant redundancy due to external factors such as moving objects and occlusions. These feature maps are inadequate as precise image descriptors for similarity measurement, hindering loop closure detection. Addressing this issue, this paper proposes feature compression of convolutional neural network output. The approach is detailed as follows: (1) employing ResNet152 as the backbone feature-extraction network, a Siamese neural network is constructed to enhance the efficiency of feature extraction; (2) utilizing KL transformation to extract principal components from the backbone network’s output, thereby eliminating redundant information; (3) employing the compressed features as input for NetVLAD to construct a spatially informed feature descriptor for similarity measurement. Experimental results demonstrate that, on the New College dataset, the proposed improved method exhibits an approximately 9.98% enhancement in average accuracy compared to the original network. On the City Center dataset, there is an improvement of approximately 2.64%, with an overall increase of about 23.51% in time performance. These findings indicate that the enhanced ResNet152 performs better than the original network in environments with more moving objects and occlusions. Full article

(This article belongs to the Section Robotics and Automation)

► Show Figures

Figure 1

19 pages, 5782 KB

Open AccessArticle

Landmark Topology Descriptor-Based Place Recognition and Localization under Large View-Point Changes

by Guanhong Gao, Zhi Xiong, Yao Zhao and Ling Zhang

Sensors 2023, 23(24), 9775; https://doi.org/10.3390/s23249775 - 12 Dec 2023

Cited by 1 | Viewed by 1445

Abstract

Accurate localization between cameras is a prerequisite for a vision-based heterogeneous robot systems task. The core issue is how to accurately perform place recognition from different view-points. Traditional appearance-based methods have a high probability of failure in place recognition and localization under large [...] Read more.

Accurate localization between cameras is a prerequisite for a vision-based heterogeneous robot systems task. The core issue is how to accurately perform place recognition from different view-points. Traditional appearance-based methods have a high probability of failure in place recognition and localization under large view-point changes. In recent years, semantic graph matching-based place recognition methods have been proposed to solve the above problem. However, these methods rely on high-precision semantic segmentation results and have a high time complexity in node extraction or graph matching. In addition, methods only utilize the semantic labels of the landmarks themselves to construct graphs and descriptors, making such approaches fail in some challenging scenarios (e.g., scene repetition). In this paper, we propose a graph-matching method based on a novel landmark topology descriptor, which is robust to view-point changes. According to the experiment on real-world data, our algorithm can run in real-time and is approximately four times and three times faster than state-of-the-art algorithms in the graph extraction and matching phases, respectively. In terms of place recognition performance, our algorithm achieves the best place recognition precision at a recall of 0–70% compared with classic appearance-based algorithms and an advanced graph-based algorithm in the scene of significant view-point changes. In terms of positioning accuracy, compared to the traditional appearance-based DBoW2 and NetVLAD algorithms, our method outperforms by 95%, on average, in terms of the mean translation error and 95% in terms of the mean RMSE. Compared to the state-of-the-art SHM algorithm, our method outperforms by 30%, on average, in terms of the mean translation error and 29% in terms of the mean RMSE. In addition, our method outperforms the current state-of-the-art algorithm, even in challenging scenarios where the benchmark algorithms fail. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

19 pages, 4252 KB

Open AccessArticle

SaMfENet: Self-Attention Based Multi-Scale Feature Fusion Coding and Edge Information Constraint Network for 6D Pose Estimation

by Zhuoxiao Li, Xiaobing Li, Shihao Chen, Jialong Du and Yong Li

Mathematics 2022, 10(19), 3671; https://doi.org/10.3390/math10193671 - 7 Oct 2022

Cited by 2 | Viewed by 2648

Abstract

Accurate estimation of an object’s 6D pose is one of the crucial technologies for robotic manipulators. Especially when the lighting conditions changes or the object is occluded, resulting in the missing or the interference of the object information, which makes the accurate 6D [...] Read more.

Accurate estimation of an object’s 6D pose is one of the crucial technologies for robotic manipulators. Especially when the lighting conditions changes or the object is occluded, resulting in the missing or the interference of the object information, which makes the accurate 6D pose estimation more challenging. To estimate the 6D pose of the object accurately, a self-attention-based multi-scale feature fusion coding and edge information constraint 6D pose estimation network is proposed, which can achieve accurate 6D pose estimation by employing RGB-D images. The proposed algorithm first introduces the edge reconstruction module into the pose estimation network, which improves the attention of the feature extraction network to the edge features. Furthermore, a self-attention multi-scale point cloud feature extraction module, i.e., MSPNet, is proposed to extract point cloud geometric features, which are reconstructed from depth maps. Finally, the clustering feature encoding module, i.e., SE-NetVLAD, is proposed to encode multi-modal dense feature sequences to construct more expressive global features. The proposed method is evaluated on the LineMOD and YCB-Video datasets, and the experimental results illustrate that the proposed method has an outstanding performance, which is close to the current state-of-the-art methods. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)

► Show Figures

Figure 1

17 pages, 4788 KB

Open AccessArticle

Unifying Deep ConvNet and Semantic Edge Features for Loop Closure Detection

by Jie Jin, Jiale Bai, Yan Xu and Jiani Huang

Remote Sens. 2022, 14(19), 4885; https://doi.org/10.3390/rs14194885 - 30 Sep 2022

Cited by 3 | Viewed by 2200

Abstract

Loop closure detection is an important component of Simultaneous Localization and Mapping (SLAM). In this paper, a novel two-branch loop closure detection algorithm unifying deep Convolutional Neural Network (ConvNet) features and semantic edge features is proposed. In detail, we use one feature extraction [...] Read more.

Loop closure detection is an important component of Simultaneous Localization and Mapping (SLAM). In this paper, a novel two-branch loop closure detection algorithm unifying deep Convolutional Neural Network (ConvNet) features and semantic edge features is proposed. In detail, we use one feature extraction module to extract both ConvNet and semantic edge features simultaneously. The deep ConvNet features are subjected to a Context Feature Enhancement (CFE) module in the global feature ranking branch to generate a representative global feature descriptor. Concurrently, to reduce the interference of dynamic features, the extracted semantic edge information of landmarks is encoded through the Vector of Locally Aggregated Descriptors (VLAD) framework in the semantic edge feature ranking branch to form semantic edge descriptors. Finally, semantic, visual, and geometric information is integrated by the similarity score fusion calculation. Extensive experiments on six public datasets show that the proposed approach can achieve competitive recall rates at 100% precision compared to other state-of-the-art methods. Full article

(This article belongs to the Topic Information Sensing Technology for Intelligent/Driverless Vehicle)

► Show Figures

Graphical abstract

11 pages, 1190 KB

Open AccessArticle

Monocular Depth Estimation from a Single Infrared Image

by Daechan Han and Yukyung Choi

Electronics 2022, 11(11), 1729; https://doi.org/10.3390/electronics11111729 - 30 May 2022

Cited by 4 | Viewed by 4502

Abstract

Thermal infrared imaging is attracting much attention due to its strength against illuminance variation. However, because of the spectral difference between thermal infrared images and RGB images, the existing research on self-supervised monocular depth estimation has performance limitations. Therefore, in this study, we [...] Read more.

Thermal infrared imaging is attracting much attention due to its strength against illuminance variation. However, because of the spectral difference between thermal infrared images and RGB images, the existing research on self-supervised monocular depth estimation has performance limitations. Therefore, in this study, we propose a novel Self-Guided Framework using a Pseudolabel predicted from RGB images. Our proposed framework, which solves the problem of appearance matching loss in the existing framework, transfers the high accuracy of Pseudolabel to the thermal depth estimation network by comparing low- and high-level pixels. Furthermore, we propose Patch-NetVLAD Loss, which strengthens local detail and global context information in the depth map from thermal infrared imaging by comparing locally global patch-level descriptors. Finally, we introduce an Image Matching Loss to estimate a more accurate depth map in a thermal depth network by enhancing the performance of the Pseudolabel. We demonstrate that the proposed framework shows significant performance improvement even when applied to various depth networks in the KAIST Multispectral Dataset. Full article

(This article belongs to the Special Issue Deep Learning Techniques for Manned and Unmanned Ground, Aerial and Marine Vehicles)

► Show Figures

Figure 1

17 pages, 25326 KB

Open AccessArticle

Fine-Grained Pests Recognition Based on Truncated Probability Fusion Network via Internet of Things in Forestry and Agricultural Scenes

by Kai Ma, Ming-Jun Nie, Sen Lin, Jianlei Kong, Cheng-Cai Yang and Jinhao Liu

Algorithms 2021, 14(10), 290; https://doi.org/10.3390/a14100290 - 30 Sep 2021

Cited by 3 | Viewed by 3044

Abstract

Accurate identification of insect pests is the key to improve crop yield and ensure quality and safety. However, under the influence of environmental conditions, the same kind of pests show obvious differences in intraclass representation, while the different kinds of pests show slight [...] Read more.

Accurate identification of insect pests is the key to improve crop yield and ensure quality and safety. However, under the influence of environmental conditions, the same kind of pests show obvious differences in intraclass representation, while the different kinds of pests show slight similarities. The traditional methods have been difficult to deal with fine-grained identification of pests, and their practical deployment is low. In order to solve this problem, this paper uses a variety of equipment terminals in the agricultural Internet of Things to obtain a large number of pest images and proposes a fine-grained identification model of pests based on probability fusion network FPNT. This model designs a fine-grained feature extractor based on an optimized CSPNet backbone network, mining different levels of local feature expression that can distinguish subtle differences. After the integration of the NetVLAD aggregation layer, the gated probability fusion layer gives full play to the advantages of information complementarity and confidence coupling of multi-model fusion. The comparison test shows that the PFNT model has an average recognition accuracy of 93.18% for all kinds of pests, and its performance is better than other deep-learning methods, with the average processing time drop to 61 ms, which can meet the needs of fine-grained image recognition of pests in the Internet of Things in agricultural and forestry practice, and provide technical application reference for intelligent early warning and prevention of pests. Full article

(This article belongs to the Special Issue Algorithms for Machine Learning and Pattern Recognition Tasks)

► Show Figures

Figure 1

22 pages, 10732 KB

Open AccessArticle

Intelligent Recognition Method of Decorative Openwork Windows with Sustainable Application for Suzhou Traditional Private Gardens in China

by Rui Zhang, Yuwei Zhao, Jianlei Kong, Chen Cheng, Xinyan Liu and Chang Zhang

Sustainability 2021, 13(15), 8439; https://doi.org/10.3390/su13158439 - 28 Jul 2021

Cited by 20 | Viewed by 4134

Abstract

Decorative openwork windows (DO-Ws) in Suzhou traditional private gardens play a vital role in Chinese traditional garden art. Due to the delicate and elegant patterns, as well as their rich cultural meaning, DO-Ws have quite high protection and utilization value. In this study, [...] Read more.

Decorative openwork windows (DO-Ws) in Suzhou traditional private gardens play a vital role in Chinese traditional garden art. Due to the delicate and elegant patterns, as well as their rich cultural meaning, DO-Ws have quite high protection and utilization value. In this study, we firstly visited 15 extant traditional gardens in Suzhou and took almost 3000 photos to establish the DO-W datasets. Then, we present an effective visual recognition method named CSV-Net to classify different DO-Ws’ patterns in Suzhou traditional gardens. On the basis of the backbone module of the cross stage partial network optimized with the Soft-VLAD architecture, the proposed CSV-Net achieves a preferable representation ability for distinguishing different DO-Ws in practical scenes. The comparative experimental results show that the CSV-Net model achieves a good balance between its performance, robustness and complexity for identifying DO-Ws, also having further potential for sustainable application in traditional gardens. Moreover, the Canglang Pavilion and the Humble Administrator’s Garden were selected as the cases to analyze the relation between identifying DO-W types and their locations in intelligent approaches, which further reveals the design rules of the sustainable culture contained in Chinese traditional gardens. This work ultimately promotes the sustainable application of artificial intelligence technology in the field of garden design and inheritance of the garden art. Full article

► Show Figures

Figure 1

18 pages, 5821 KB

Open AccessArticle

Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition

by Bogdan Mocanu, Ruxandra Tapu and Titus Zaharia

Sensors 2021, 21(12), 4233; https://doi.org/10.3390/s21124233 - 20 Jun 2021

Cited by 22 | Viewed by 4618

Abstract

Emotion is a form of high-level paralinguistic information that is intrinsically conveyed by human speech. Automatic speech emotion recognition is an essential challenge for various applications; including mental disease diagnosis; audio surveillance; human behavior understanding; e-learning and human–machine/robot interaction. In this paper, we [...] Read more.

Emotion is a form of high-level paralinguistic information that is intrinsically conveyed by human speech. Automatic speech emotion recognition is an essential challenge for various applications; including mental disease diagnosis; audio surveillance; human behavior understanding; e-learning and human–machine/robot interaction. In this paper, we introduce a novel speech emotion recognition method, based on the Squeeze and Excitation ResNet (SE-ResNet) model and fed with spectrogram inputs. In order to overcome the limitations of the state-of-the-art techniques, which fail in providing a robust feature representation at the utterance level, the CNN architecture is extended with a trainable discriminative GhostVLAD clustering layer that aggregates the audio features into compact, single-utterance vector representation. In addition, an end-to-end neural embedding approach is introduced, based on an emotionally constrained triplet loss function. The loss function integrates the relations between the various emotional patterns and thus improves the latent space data representation. The proposed methodology achieves 83.35% and 64.92% global accuracy rates on the RAVDESS and CREMA-D publicly available datasets, respectively. When compared with the results provided by human observers, the gains in global accuracy scores are superior to 24%. Finally, the objective comparative evaluation with state-of-the-art techniques demonstrates accuracy gains of more than 3%. Full article

(This article belongs to the Special Issue Emotion Monitoring System Based on Sensors and Data Analysis)

► Show Figures

Figure 1

18 pages, 6162 KB

Open AccessArticle

KVGCN: A KNN Searching and VLAD Combined Graph Convolutional Network for Point Cloud Segmentation

by Nan Luo, Hongquan Yu, Zhenfeng Huo, Jinhui Liu, Quan Wang, Ying Xu and Yun Gao

Remote Sens. 2021, 13(5), 1003; https://doi.org/10.3390/rs13051003 - 6 Mar 2021

Cited by 14 | Viewed by 3824

Abstract

Semantic segmentation of the sensed point cloud data plays a significant role in scene understanding and reconstruction, robot navigation, etc. This work presents a Graph Convolutional Network integrating K-Nearest Neighbor searching (KNN) and Vector of Locally Aggregated Descriptors (VLAD). KNN searching is utilized [...] Read more.

Semantic segmentation of the sensed point cloud data plays a significant role in scene understanding and reconstruction, robot navigation, etc. This work presents a Graph Convolutional Network integrating K-Nearest Neighbor searching (KNN) and Vector of Locally Aggregated Descriptors (VLAD). KNN searching is utilized to construct the topological graph of each point and its neighbors. Then, we perform convolution on the edges of constructed graph to extract representative local features by multiple Multilayer Perceptions (MLPs). Afterwards, a trainable VLAD layer, NetVLAD, is embedded in the feature encoder to aggregate the local and global contextual features. The designed feature encoder is repeated for multiple times, and the extracted features are concatenated in a jump-connection style to strengthen the distinctiveness of features and thereby improve the segmentation. Experimental results on two datasets show that the proposed work settles the shortcoming of insufficient local feature extraction and promotes the accuracy (mIoU 60.9% and oAcc 87.4% for S3DIS) of semantic segmentation comparing to existing models. Full article

► Show Figures

Graphical abstract

24 pages, 3959 KB

Open AccessArticle

A Panoramic Localizer Based on Coarse-to-Fine Descriptors for Navigation Assistance

by Yicheng Fang, Kailun Yang, Ruiqi Cheng, Lei Sun and Kaiwei Wang

Sensors 2020, 20(15), 4177; https://doi.org/10.3390/s20154177 - 27 Jul 2020

Cited by 8 | Viewed by 4112

Abstract

Visual Place Recognition (VPR) addresses visual instance retrieval tasks against discrepant scenes and gives precise localization. During a traverse, the captured images (query images) would be traced back to the already existing positions in the database images, rendering vehicles or pedestrian navigation devices [...] Read more.

Visual Place Recognition (VPR) addresses visual instance retrieval tasks against discrepant scenes and gives precise localization. During a traverse, the captured images (query images) would be traced back to the already existing positions in the database images, rendering vehicles or pedestrian navigation devices distinguish ambient environments. Unfortunately, diverse appearance variations can bring about huge challenges for VPR, such as illumination changing, viewpoint varying, seasonal cycling, disparate traverses (forward and backward), and so on. In addition, the majority of current VPR algorithms are designed for forward-facing images, which can only provide with narrow Field of View (FoV) and come with severe viewpoint influences. In this paper, we propose a panoramic localizer, which is based on coarse-to-fine descriptors, leveraging panoramas for omnidirectional perception and sufficient FoV up to 360

^{\circ}

. We adopt NetVLAD descriptors in the coarse matching in a panorama-to-panorama way, for their robust performances in distinguishing different appearances, utilizing Geodesc keypoint descriptors in the fine stage in the meantime, for their capacity of detecting detailed information, formatting powerful coarse-to-fine descriptors. A comprehensive set of experiments is conducted on several datasets including both public benchmarks and our real-world campus scenes. Our system is proved to be with high recall and strong generalization capacity across various appearances. The proposed panoramic localizer can be integrated into mobile navigation devices, available for a variety of localization application scenarios. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

17 pages, 2346 KB

Open AccessArticle

Image Representation Method Based on Relative Layer Entropy for Insulator Recognition

by Zhenbing Zhao, Hongyu Qi, Xiaoqing Fan, Guozhi Xu, Yincheng Qi, Yongjie Zhai and Ke Zhang

Entropy 2020, 22(4), 419; https://doi.org/10.3390/e22040419 - 8 Apr 2020

Cited by 5 | Viewed by 3565

Abstract

Deep convolutional neural networks (DCNNs) with alternating convolutional, pooling and decimation layers are widely used in computer vision, yet current works tend to focus on deeper networks with many layers and neurons, resulting in a high computational complexity. However, the recognition task is [...] Read more.

Deep convolutional neural networks (DCNNs) with alternating convolutional, pooling and decimation layers are widely used in computer vision, yet current works tend to focus on deeper networks with many layers and neurons, resulting in a high computational complexity. However, the recognition task is still challenging for insufficient and uncomprehensive object appearance and training sample types such as infrared insulators. In view of this, more attention is focused on the application of a pretrained network for image feature representation, but the rules on how to select the feature representation layer are scarce. In this paper, we proposed a new concept, the layer entropy and relative layer entropy, which can be referred to as an image representation method based on relative layer entropy (IRM_RLE). It was designed to excavate the most suitable convolution layer for image recognition. First, the image was fed into an ImageNet pretrained DCNN model, and deep convolutional activations were extracted. Then, the appropriate feature layer was selected by calculating the layer entropy and relative layer entropy of each convolution layer. Finally, the number of the feature map was selected according to the importance degree and the feature maps of the convolution layer, which were vectorized and pooled by VLAD (vector of locally aggregated descriptors) coding and quantifying for final image representation. The experimental results show that the proposed approach performs competitively against previous methods across all datasets. Furthermore, for the indoor scenes and actions datasets, the proposed approach outperforms the state-of-the-art methods. Full article

(This article belongs to the Section Multidisciplinary Applications)

► Show Figures

Figure 1

Search Results (16)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (16)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI