MDPI - Publisher of Open Access Journals

23 pages, 4467 KiB

Open AccessArticle

Research on Indoor Object Detection and Scene Recognition Algorithm Based on Apriori Algorithm and Mobile-EFSSD Model

by Wenda Zheng, Yibo Ai and Weidong Zhang

Mathematics 2025, 13(15), 2408; https://doi.org/10.3390/math13152408 - 26 Jul 2025

Viewed by 183

With the advancement of computer vision and image processing technologies, scene recognition has gradually become a research hotspot. However, in practical applications, it is necessary to detect the categories and locations of objects in images while recognizing scenes. To address these issues, this [...] Read more.

With the advancement of computer vision and image processing technologies, scene recognition has gradually become a research hotspot. However, in practical applications, it is necessary to detect the categories and locations of objects in images while recognizing scenes. To address these issues, this paper proposes an indoor object detection and scene recognition algorithm based on the Apriori algorithm and the Mobile-EFSSD model, which can simultaneously obtain object category and location information while recognizing scenes. The specific research contents are as follows: (1) To address complex indoor scenes and occlusion, this paper proposes an improved Mobile-EFSSD object detection algorithm. An optimized MobileNetV3 with ECA attention is used as the backbone. Multi-scale feature maps are fused via FPN. The localization loss includes a hyperparameter, and focal loss replaces confidence loss. Experiments show that the method achieves stable performance, effectively detects occluded objects, and accurately extracts category and location information. (2) To improve classification stability in indoor scene recognition, this paper proposes a naive Bayes-based method. Object detection results are converted into text features, and the Apriori algorithm extracts object associations. Prior probabilities are calculated and fed into a naive Bayes classifier for scene recognition. Evaluated using the ADE20K dataset, the method outperforms existing approaches by achieving a better accuracy–speed trade-off and enhanced classification stability. The proposed algorithm is applied to indoor scene images, enabling the simultaneous acquisition of object categories and location information while recognizing scenes. Moreover, the algorithm has a simple structure, with an object detection average precision of 82.7% and a scene recognition average accuracy of 95.23%, making it suitable for practical detection requirements. Full article

► Show Figures

Figure 1

14 pages, 743 KiB

Open AccessArticle

AD-VAE: Adversarial Disentangling Variational Autoencoder

by Adson Silva and Ricardo Farias

Sensors 2025, 25(5), 1574; https://doi.org/10.3390/s25051574 - 4 Mar 2025

Viewed by 988

Abstract

Face recognition (FR) is a less intrusive biometrics technology with various applications, such as security, surveillance, and access control systems. FR remains challenging, especially when there is only a single image per person as a gallery dataset and when dealing with variations like [...] Read more.

Face recognition (FR) is a less intrusive biometrics technology with various applications, such as security, surveillance, and access control systems. FR remains challenging, especially when there is only a single image per person as a gallery dataset and when dealing with variations like pose, illumination, and occlusion. Deep learning techniques have shown promising results in recent years using VAE and GAN, with approaches such as patch-VAE, VAE-GAN for 3D Indoor Scene Synthesis, and hybrid VAE-GAN models. However, in Single Sample Per Person Face Recognition (SSPP FR), the challenge of learning robust and discriminative features that preserve the subject’s identity persists. To address these issues, we propose a novel framework called AD-VAE, specifically for SSPP FR, using a combination of variational autoencoder (VAE) and Generative Adversarial Network (GAN) techniques. The proposed AD-VAE framework is designed to learn how to build representative identity-preserving prototypes from both controlled and wild datasets, effectively handling variations like pose, illumination, and occlusion. The method uses four networks: an encoder and decoder similar to VAE, a generator that receives the encoder output plus noise to generate an identity-preserving prototype, and a discriminator that operates as a multi-task network. AD-VAE outperforms all tested state-of-the-art face recognition techniques, demonstrating its robustness. The proposed framework achieves superior results on four controlled benchmark datasets—AR, E-YaleB, CAS-PEAL, and FERET—with recognition rates of 84.9%, 94.6%, 94.5%, and 96.0%, respectively, and achieves remarkable performance on the uncontrolled LFW dataset, with a recognition rate of 99.6%. The AD-VAE framework shows promising potential for future research and real-world applications. Full article

(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)

► Show Figures

Figure 1

17 pages, 4703 KiB

Open AccessArticle

Robotics Classification of Domain Knowledge Based on a Knowledge Graph for Home Service Robot Applications

by Yiqun Wang, Rihui Yao, Keqing Zhao, Peiliang Wu and Wenbai Chen

Appl. Sci. 2024, 14(24), 11553; https://doi.org/10.3390/app142411553 - 11 Dec 2024

Cited by 2 | Viewed by 1223

Abstract

The representation and utilization of environmental information by service robots has become increasingly challenging. In order to solve the problems that the service robot platform has, such as high timeliness requirements for indoor environment recognition tasks and the small scale of indoor scene [...] Read more.

The representation and utilization of environmental information by service robots has become increasingly challenging. In order to solve the problems that the service robot platform has, such as high timeliness requirements for indoor environment recognition tasks and the small scale of indoor scene data, a method and model for rapid classification of household environment domain knowledge is proposed, which can achieve high recognition accuracy by using a small-scale indoor scene and tool dataset. This paper uses a knowledge graph to associate data for home service robots. The application requirements of knowledge graphs for home service robots are analyzed to establish a rule base for the system. A domain ontology of the home environment is constructed for use in the knowledge graph system, and the interior functional areas and functional tools are classified. This designed knowledge graph contributes to the state of the art by improving the accuracy and efficiency of service decision making. The lightweight network MobileNetV3 is used to pre-train the model, and a lightweight convolution method with good feature extraction performance is selected. This proposal adopts a combination of MobileNetV3 and transfer learning, integrating large-scale pre-training with fine-tuning for the home environment to address the challenge of limited data for home robots. The results show that the proposed model achieves higher recognition accuracy and recognition speed than other common methods, meeting the work requirements of service robots. With the Scene15 dataset, the proposed scheme has the highest recognition accuracy of 0.8815 and the fastest recognition speed of 63.11 microseconds per sheet. Full article

(This article belongs to the Special Issue Artificial Intelligence in Complex Networks (2nd Edition))

► Show Figures

Figure 1

21 pages, 7746 KiB

Open AccessArticle

Multi-Robot Collaborative Mapping with Integrated Point-Line Features for Visual SLAM

by Yu Xia, Xiao Wu, Tao Ma, Liucun Zhu, Jingdi Cheng and Junwu Zhu

Sensors 2024, 24(17), 5743; https://doi.org/10.3390/s24175743 - 4 Sep 2024

Viewed by 2269

Abstract

Simultaneous Localization and Mapping (SLAM) enables mobile robots to autonomously perform localization and mapping tasks in unknown environments. Despite significant progress achieved by visual SLAM systems in ideal conditions, relying solely on a single robot and point features for mapping in large-scale indoor [...] Read more.

Simultaneous Localization and Mapping (SLAM) enables mobile robots to autonomously perform localization and mapping tasks in unknown environments. Despite significant progress achieved by visual SLAM systems in ideal conditions, relying solely on a single robot and point features for mapping in large-scale indoor environments with weak-texture structures can affect mapping efficiency and accuracy. Therefore, this paper proposes a multi-robot collaborative mapping method based on point-line fusion to address this issue. This method is designed for indoor environments with weak-texture structures for localization and mapping. The feature-extraction algorithm, which combines point and line features, supplements the existing environment point feature-extraction method by introducing a line feature-extraction step. This integration ensures the accuracy of visual odometry estimation in scenes with pronounced weak-texture structure features. For relatively large indoor scenes, a scene-recognition-based map-fusion method is proposed in this paper to enhance mapping efficiency. This method relies on visual bag of words to determine overlapping areas in the scene, while also proposing a keyframe-extraction method based on photogrammetry to improve the algorithm’s robustness. By combining the Perspective-3-Point (P3P) algorithm and Bundle Adjustment (BA) algorithm, the relative pose-transformation relationships of multi-robots in overlapping scenes are resolved, and map fusion is performed based on these relative pose relationships. We evaluated our algorithm on public datasets and a mobile robot platform. The experimental results demonstrate that the proposed algorithm exhibits higher robustness and mapping accuracy. It shows significant effectiveness in handling mapping in scenarios with weak texture and structure, as well as in small-scale map fusion. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

20 pages, 5333 KiB

Open AccessArticle

Indoor Scene Classification through Dual-Stream Deep Learning: A Framework for Improved Scene Understanding in Robotics

by Sultan Daud Khan and Kamal M. Othman

Computers 2024, 13(5), 121; https://doi.org/10.3390/computers13050121 - 14 May 2024

Cited by 10 | Viewed by 1948

Abstract

Indoor scene classification plays a pivotal role in enabling social robots to seamlessly adapt to their environments, facilitating effective navigation and interaction within diverse indoor scenes. By accurately characterizing indoor scenes, robots can autonomously tailor their behaviors, making informed decisions to accomplish specific [...] Read more.

Indoor scene classification plays a pivotal role in enabling social robots to seamlessly adapt to their environments, facilitating effective navigation and interaction within diverse indoor scenes. By accurately characterizing indoor scenes, robots can autonomously tailor their behaviors, making informed decisions to accomplish specific tasks. Traditional methods relying on manually crafted features encounter difficulties when characterizing complex indoor scenes. On the other hand, deep learning models address the shortcomings of traditional methods by autonomously learning hierarchical features from raw images. Despite the success of deep learning models, existing models still struggle to effectively characterize complex indoor scenes. This is because there is high degree of intra-class variability and inter-class similarity within indoor environments. To address this problem, we propose a dual-stream framework that harnesses both global contextual information and local features for enhanced recognition. The global stream captures high-level features and relationships across the scene. The local stream employs a fully convolutional network to extract fine-grained local information. The proposed dual-stream architecture effectively distinguishes scenes that share similar global contexts but contain different localized objects. We evaluate the performance of the proposed framework on a publicly available benchmark indoor scene dataset. From the experimental results, we demonstrate the effectiveness of the proposed framework. Full article

(This article belongs to the Special Issue Recent Advances in Autonomous Vehicle Solutions)

► Show Figures

Figure 1

55 pages, 12486 KiB

Open AccessReview

Methods and Applications of Space Understanding in Indoor Environment—A Decade Survey

by Sebastian Pokuciński and Dariusz Mrozek

Appl. Sci. 2024, 14(10), 3974; https://doi.org/10.3390/app14103974 - 7 May 2024

Cited by 2 | Viewed by 2213

Abstract

The demand for digitizing manufacturing and controlling processes has been steadily increasing in recent years. Digitization relies on different techniques and equipment, which produces various data types and further influences the process of space understanding and area recognition. This paper provides an updated [...] Read more.

The demand for digitizing manufacturing and controlling processes has been steadily increasing in recent years. Digitization relies on different techniques and equipment, which produces various data types and further influences the process of space understanding and area recognition. This paper provides an updated view of these data structures and high-level categories of techniques and methods leading to indoor environment segmentation and the discovery of its semantic meaning. To achieve this, we followed the Systematic Literature Review (SLR) methodology and covered a wide range of solutions, from floor plan understanding through 3D model reconstruction and scene recognition to indoor navigation. Based on the obtained SLR results, we identified three different taxonomies (the taxonomy of underlying data type, of performed analysis process, and of accomplished task), which constitute different perspectives we can adopt to study the existing works in the field of space understanding. Our investigations clearly show that the progress of works in this field is accelerating, leading to more sophisticated techniques that rely on multidimensional structures and complex representations, while the processing itself has become focused on artificial intelligence-based methods. Full article

(This article belongs to the Special Issue IoT in Smart Cities and Homes, 2nd Edition)

► Show Figures

Figure 1

15 pages, 3624 KiB

Open AccessArticle

A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental Interaction

by Yu Hao, Fan Yang, Hao Huang, Shuaihang Yuan, Sundeep Rangan, John-Ross Rizzo, Yao Wang and Yi Fang

J. Imaging 2024, 10(5), 103; https://doi.org/10.3390/jimaging10050103 - 26 Apr 2024

Cited by 6 | Viewed by 4279

Abstract

People with blindness and low vision (pBLV) encounter substantial challenges when it comes to comprehensive scene recognition and precise object identification in unfamiliar environments. Additionally, due to the vision loss, pBLV have difficulty in accessing and identifying potential tripping hazards independently. Previous assistive [...] Read more.

People with blindness and low vision (pBLV) encounter substantial challenges when it comes to comprehensive scene recognition and precise object identification in unfamiliar environments. Additionally, due to the vision loss, pBLV have difficulty in accessing and identifying potential tripping hazards independently. Previous assistive technologies for the visually impaired often struggle in real-world scenarios due to the need for constant training and lack of robustness, which limits their effectiveness, especially in dynamic and unfamiliar environments, where accurate and efficient perception is crucial. Therefore, we frame our research question in this paper as: How can we assist pBLV in recognizing scenes, identifying objects, and detecting potential tripping hazards in unfamiliar environments, where existing assistive technologies often falter due to their lack of robustness? We hypothesize that by leveraging large pretrained foundation models and prompt engineering, we can create a system that effectively addresses the challenges faced by pBLV in unfamiliar environments. Motivated by the prevalence of large pretrained foundation models, particularly in assistive robotics applications, due to their accurate perception and robust contextual understanding in real-world scenarios induced by extensive pretraining, we present a pioneering approach that leverages foundation models to enhance visual perception for pBLV, offering detailed and comprehensive descriptions of the surrounding environment and providing warnings about potential risks. Specifically, our method begins by leveraging a large-image tagging model (i.e., Recognize Anything Model (RAM)) to identify all common objects present in the captured images. The recognition results and user query are then integrated into a prompt, tailored specifically for pBLV, using prompt engineering. By combining the prompt and input image, a vision-language foundation model (i.e., InstructBLIP) generates detailed and comprehensive descriptions of the environment and identifies potential risks in the environment by analyzing environmental objects and scenic landmarks, relevant to the prompt. We evaluate our approach through experiments conducted on both indoor and outdoor datasets. Our results demonstrate that our method can recognize objects accurately and provide insightful descriptions and analysis of the environment for pBLV. Full article

(This article belongs to the Special Issue Image and Video Processing for Blind and Visually Impaired)

► Show Figures

Figure 1

16 pages, 5701 KiB

Open AccessArticle

An Indoor 3D Positioning Method Using Terrain Feature Matching for PDR Error Calibration

by Xintong Chen, Yuxin Xie, Zihan Zhou, Yingying He, Qianli Wang and Zhuming Chen

Electronics 2024, 13(8), 1468; https://doi.org/10.3390/electronics13081468 - 12 Apr 2024

Cited by 6 | Viewed by 1465

Abstract

Pedestrian Dead Reckoning (PDR) is a promising algorithm for indoor positioning. However, the accuracy of PDR degrades due to the accumulated error, especially in multi-floor buildings. This paper introduces a three-dimensional (3D) positioning method based on terrain feature matching to reduce the influence [...] Read more.

Pedestrian Dead Reckoning (PDR) is a promising algorithm for indoor positioning. However, the accuracy of PDR degrades due to the accumulated error, especially in multi-floor buildings. This paper introduces a three-dimensional (3D) positioning method based on terrain feature matching to reduce the influence of accumulated errors in multi-floor scenes. The proposed calibration method involves two steps: motion pattern recognition and position matching-based calibration. The motion pattern recognition aims to detect different motion patterns, i.e., taking the stairs or horizontal walking, from the streaming data. Then, stair entrances and corridor corners are matched with transition points of motion patterns and pedestrian turning points, respectively. After matching, calibration is performed to eliminate the accumulated errors. By carrying out experiments on a two-floor closed-loop path with a walking distance about 145 m, it is shown that this method can effectively reduce the accumulated error of PDR, achieving accurate 3D positioning. The average error is reduced from 6.60 m to 1.37 m. Full article

► Show Figures

Figure 1

30 pages, 1424 KiB

Open AccessEditor’s ChoiceReview

A Review of Sensing Technologies for Indoor Autonomous Mobile Robots

by Yu Liu, Shuting Wang, Yuanlong Xie, Tifan Xiong and Mingyuan Wu

Sensors 2024, 24(4), 1222; https://doi.org/10.3390/s24041222 - 14 Feb 2024

Cited by 26 | Viewed by 12524

Abstract

As a fundamental issue in robotics academia and industry, indoor autonomous mobile robots (AMRs) have been extensively studied. For AMRs, it is crucial to obtain information about their working environment and themselves, which can be realized through sensors and the extraction of corresponding [...] Read more.

As a fundamental issue in robotics academia and industry, indoor autonomous mobile robots (AMRs) have been extensively studied. For AMRs, it is crucial to obtain information about their working environment and themselves, which can be realized through sensors and the extraction of corresponding information from the measurements of these sensors. The application of sensing technologies can enable mobile robots to perform localization, mapping, target or obstacle recognition, and motion tasks, etc. This paper reviews sensing technologies for autonomous mobile robots in indoor scenes. The benefits and potential problems of using a single sensor in application are analyzed and compared, and the basic principles and popular algorithms used in processing these sensor data are introduced. In addition, some mainstream technologies of multi-sensor fusion are introduced. Finally, this paper discusses the future development trends in the sensing technology for autonomous mobile robots in indoor scenes, as well as the challenges in the practical application environments. Full article

(This article belongs to the Special Issue Advanced Sensing and Control Technologies for Autonomous Robots)

► Show Figures

Figure 1

22 pages, 7517 KiB

Open AccessArticle

Hybrid 3D Reconstruction of Indoor Scenes Integrating Object Recognition

by Mingfan Li, Minglei Li, Li Xu and Mingqiang Wei

Remote Sens. 2024, 16(4), 638; https://doi.org/10.3390/rs16040638 - 8 Feb 2024

Cited by 1 | Viewed by 2726

Abstract

Indoor 3D reconstruction is particularly challenging due to complex scene structures involving object occlusion and overlap. This paper presents a hybrid indoor reconstruction method that segments the room point cloud into internal and external components, and then reconstructs the room shape and the [...] Read more.

Indoor 3D reconstruction is particularly challenging due to complex scene structures involving object occlusion and overlap. This paper presents a hybrid indoor reconstruction method that segments the room point cloud into internal and external components, and then reconstructs the room shape and the indoor objects in different ways. We segment the room point cloud into internal and external points based on the assumption that the room shapes are composed of some large external planar structures. For the external, we seek for an appropriate combination of intersecting faces to obtain a lightweight polygonal surface model. For the internal, we define a set of features extracted from the internal points and train a classification model based on random forests to recognize and separate indoor objects. Then, the corresponding computer aided design (CAD) models are placed in the target positions of the indoor objects, converting the reconstruction into a model fitting problem. Finally, the indoor objects and room shapes are combined to generate a complete 3D indoor model. The effectiveness of this method is evaluated on point clouds from different indoor scenes with an average fitting error of about 0.11 m, and the performance is validated by extensive comparisons with state-of-the-art methods. Full article

(This article belongs to the Special Issue Point Cloud Processing with Machine Learning)

► Show Figures

Graphical abstract

18 pages, 5430 KiB

Open AccessArticle

Three-Dimensional Indoor Positioning Scheme for Drone with Fingerprint-Based Deep-Learning Classifier

by Shuzhi Liu, Houjin Lu and Seung-Hoon Hwang

Drones 2024, 8(1), 15; https://doi.org/10.3390/drones8010015 - 9 Jan 2024

Cited by 4 | Viewed by 2900

Abstract

Unmanned aerial vehicles (UAVs) hold significant potential for various indoor applications, such as mapping, surveillance, navigation, and search and rescue operations. However, indoor positioning is a significant challenge for UAVs, owing to the lack of GPS signals and the complexity of indoor environments. [...] Read more.

Unmanned aerial vehicles (UAVs) hold significant potential for various indoor applications, such as mapping, surveillance, navigation, and search and rescue operations. However, indoor positioning is a significant challenge for UAVs, owing to the lack of GPS signals and the complexity of indoor environments. Therefore, this study was aimed at developing a Wi-Fi-based three-dimensional (3D) indoor positioning scheme tailored to time-varying environments, involving human movement and uncertainties in the states of wireless devices. Specifically, we established an innovative 3D indoor positioning system to meet the localisation demands of UAVs in indoor environments. A 3D indoor positioning database was developed using a deep-learning classifier, enabling 3D indoor positioning through Wi-Fi technology. Additionally, through a pioneering integration of fingerprint recognition into wireless positioning technology, we enhanced the precision and reliability of indoor positioning through a detailed analysis and learning process of Wi-Fi signal features. Two test cases (Cases 1 and 2) were designed with positioning height intervals of 0.5 m and 0.8 m, respectively, corresponding to the height of the test scene for positioning simulation and testing. With an error margin of 4 m, the simulation accuracies for the

(X, Y)

dimension reached 94.08% (Case 1) and 94.95% (Case 2). When the error margin was 0 m, the highest simulation accuracies for the

H

dimension were 91.84% (Case 1) and 93.61% (Case 2). Moreover, 40 real-time positioning experiments were conducted in the

(X, Y, H)

dimension. In Case 1, the average positioning success rates were 50.8% (Margin-0), 72.9% (Margin-1), and 81.4% (Margin-2), and the corresponding values for Case 2 were 52.4%, 74.5%, and 82.8%, respectively. The results demonstrated that the proposed method can facilitate 3D indoor positioning based only on Wi-Fi technologies. Full article

(This article belongs to the Special Issue Drones Navigation and Orientation)

► Show Figures

Figure 1

17 pages, 6799 KiB

Open AccessArticle

Automatic Recognition of Indoor Fire and Combustible Material with Material-Auxiliary Fire Dataset

by Feifei Hou, Wenqing Zhao and Xinyu Fan

Mathematics 2024, 12(1), 54; https://doi.org/10.3390/math12010054 - 23 Dec 2023

Cited by 1 | Viewed by 1855

Abstract

Early and timely fire detection within enclosed spaces notably diminishes the response time for emergency aid. Previous methods have mostly focused on singularly detecting either fire or combustible materials, rarely integrating both aspects, leading to a lack of a comprehensive understanding of indoor [...] Read more.

Early and timely fire detection within enclosed spaces notably diminishes the response time for emergency aid. Previous methods have mostly focused on singularly detecting either fire or combustible materials, rarely integrating both aspects, leading to a lack of a comprehensive understanding of indoor fire scenarios. Moreover, traditional fire load assessment methods such as empirical formula-based assessment are time-consuming and face challenges in diverse scenarios. In this paper, we collected a novel dataset of fire and materials, the Material-Auxiliary Fire Dataset (MAFD), and combined this dataset with deep learning to achieve both fire and material recognition and segmentation in the indoor scene. A sophisticated deep learning model, Dual Attention Network (DANet), was specifically designed for image semantic segmentation to recognize fire and combustible material. The experimental analysis of our MAFD database demonstrated that our approach achieved an accuracy of 84.26% and outperformed the prevalent methods (e.g., PSPNet, CCNet, FCN, ISANet, OCRNet), making a significant contribution to fire safety technology and enhancing the capacity to identify potential hazards indoors. Full article

(This article belongs to the Special Issue Advanced Methods and Applications with Deep Learning in Object Recognition)

► Show Figures

Figure 1

19 pages, 9951 KiB

Open AccessArticle

Dynamic and Real-Time Object Detection Based on Deep Learning for Home Service Robots

by Yangqing Ye, Xiaolon Ma, Xuanyi Zhou, Guanjun Bao, Weiwei Wan and Shibo Cai

Sensors 2023, 23(23), 9482; https://doi.org/10.3390/s23239482 - 28 Nov 2023

Cited by 14 | Viewed by 5019

Abstract

Home service robots operating indoors, such as inside houses and offices, require the real-time and accurate identification and location of target objects to perform service tasks efficiently. However, images captured by visual sensors while in motion states usually contain varying degrees of blurriness, [...] Read more.

Home service robots operating indoors, such as inside houses and offices, require the real-time and accurate identification and location of target objects to perform service tasks efficiently. However, images captured by visual sensors while in motion states usually contain varying degrees of blurriness, presenting a significant challenge for object detection. In particular, daily life scenes contain small objects like fruits and tableware, which are often occluded, further complicating object recognition and positioning. A dynamic and real-time object detection algorithm is proposed for home service robots. This is composed of an image deblurring algorithm and an object detection algorithm. To improve the clarity of motion-blurred images, the DA-Multi-DCGAN algorithm is proposed. It comprises an embedded dynamic adjustment mechanism and a multimodal multiscale fusion structure based on robot motion and surrounding environmental information, enabling the deblurring processing of images that are captured under different motion states. Compared with DeblurGAN, DA-Multi-DCGAN had a 5.07 improvement in Peak Signal-to-Noise Ratio (PSNR) and a 0.022 improvement in Structural Similarity (SSIM). An AT-LI-YOLO method is proposed for small and occluded object detection. Based on depthwise separable convolution, this method highlights key areas and integrates salient features by embedding the attention module in the AT-Resblock to improve the sensitivity and detection precision of small objects and partially occluded objects. It also employs a lightweight network unit Lightblock to reduce the network’s parameters and computational complexity, which improves its computational efficiency. Compared with YOLOv3, the mean average precision (mAP) of AT-LI-YOLO increased by 3.19%, and the detection precision of small objects, such as apples and oranges and partially occluded objects, increased by 19.12% and 29.52%, respectively. Moreover, the model inference efficiency had a 7 ms reduction in processing time. Based on the typical home activities of older people and children, the dataset Grasp-17 was established for the training and testing of the proposed method. Using the TensorRT neural network inference engine of the developed service robot prototype, the proposed dynamic and real-time object detection algorithm required 29 ms, which meets the real-time requirement of smooth vision. Full article

(This article belongs to the Special Issue Deep Learning for Computer Vision and Image Processing Sensors)

► Show Figures

Figure 1

33 pages, 82129 KiB

Open AccessArticle

Implicit Shape Model Trees: Recognition of 3-D Indoor Scenes and Prediction of Object Poses for Mobile Robots

by Pascal Meißner and Rüdiger Dillmann

Robotics 2023, 12(6), 158; https://doi.org/10.3390/robotics12060158 - 23 Nov 2023

Viewed by 2547

Abstract

This article describes an approach for mobile robots to identify scenes in configurations of objects spread across dense environments. This identification is enabled by intertwining the robotic object search and the scene recognition on already detected objects. We proposed “Implicit Shape Model (ISM) [...] Read more.

This article describes an approach for mobile robots to identify scenes in configurations of objects spread across dense environments. This identification is enabled by intertwining the robotic object search and the scene recognition on already detected objects. We proposed “Implicit Shape Model (ISM) trees” as a scene model to solve these two tasks together. This article presents novel algorithms for ISM trees to recognize scenes and predict object poses. For us, scenes are sets of objects, some of which are interrelated by 3D spatial relations. Yet, many false positives may occur when using single ISMs to recognize scenes. We developed ISM trees, which is a hierarchical model of multiple interconnected ISMs, to remedy this. In this article, we contribute a recognition algorithm that allows the use of these trees for recognizing scenes. ISM trees should be generated from human demonstrations of object configurations. Since a suitable algorithm was unavailable, we created an algorithm for generating ISM trees. In previous work, we integrated the object search and scene recognition into an active vision approach that we called “Active Scene Recognition”. An efficient algorithm was unavailable to make their integration using predicted object poses effective. Physical experiments in this article show that the new algorithm we have contributed overcomes this problem. Full article

(This article belongs to the Special Issue Active Methods in Autonomous Navigation)

► Show Figures

Figure 1

18 pages, 6202 KiB

Open AccessArticle

YG-SLAM: GPU-Accelerated RGBD-SLAM Using YOLOv5 in a Dynamic Environment

by Yating Yu, Kai Zhu and Wangshui Yu

Electronics 2023, 12(20), 4377; https://doi.org/10.3390/electronics12204377 - 23 Oct 2023

Cited by 9 | Viewed by 2302

Abstract

Traditional simultaneous localization and mapping (SLAM) performs well in a static environment; however, with the abrupt increase of dynamic points in dynamic environments, the algorithm is influenced by a lot of meaningless information, leading to low precision and poor robustness in pose estimation. [...] Read more.

Traditional simultaneous localization and mapping (SLAM) performs well in a static environment; however, with the abrupt increase of dynamic points in dynamic environments, the algorithm is influenced by a lot of meaningless information, leading to low precision and poor robustness in pose estimation. To tackle this problem, a new visual SLAM algorithm of dynamic scenes named YG-SLAM is proposed, which creates an independent dynamic-object-detection thread and adds a dynamic-feature-point elimination step in the tracking thread. The YOLOv5 algorithm is introduced in the dynamic-object-detection thread for target recognition and deployed on the GPU to speed up image frame identification. The optic-flow approach employs an optic flow to monitor feature points and helps to remove the dynamic points in different dynamic objects based on the varying speeds of pixel movement. While combined with the antecedent information of object detection, the system can eliminate dynamic feature points under various conditions. Validation is conducted in both TUM and KITTI datasets, and the results illustrate that YG-SLAM can achieve a higher accuracy in dynamic indoor environments, with the maximum accuracy augmented from 0.277 m to 0.014 m. Meanwhile, YG-SLAM requires less processing time than other dynamic-scene SLAM algorithms, indicating its positioning priority in dynamic situations. Full article

► Show Figures

Figure 1

Search Results (63)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (63)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI