Submit to Sensors Review for Sensors Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Computer Vision in AI for Robotics Development

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensors and Robotics".

Deadline for manuscript submissions: closed (31 December 2024) | Viewed by 19689

Share This Special Issue

Special Issue Editors

Dr. Chen-Chiung Hsieh

E-Mail Website
Guest Editor

Department of Computer Science and Engineering, Tatung University, Taipei City 104, Taiwan
Interests: image processing; computer vision; deep learning
Special Issues, Collections and Topics in MDPI journals

Dr. Hsiao-Ting Tseng

E-Mail Website
Guest Editor

Department of Computer Science and Information Engineering, National Central University, Chung-li, Taiwan
Interests: artificial intelligence; machine learning; human-robot interaction; big data analytics

Special Issue Information

Dear Colleagues,

The integration of computer vision and machine learning techniques in robotics has enabled significant advancements in various applications (such as perception, navigation, and control). These techniques allow robots to process and understand visual information from their environment, leading to improved decision making and performance.

We are inviting researchers, practitioners, and academics to submit their original and high-quality papers on the latest developments in the application of computer vision and machine learning in robotics. This Special Issue aims to foster discussions and collaboration within the community, and promote the advancements in this exciting field.

Topics of Interest:

Robotics perception using computer vision and machine learning;
Visual navigation for robots;
Robotic control using visual information;
Visual object recognition and tracking for robotics;
3D reconstruction and scene understanding for robotics;
Transfer learning for robotics applications;
Active perception for robots;
Other related topics in the intersection of computer vision, machine learning, and robotics.

Dr. Chen-Chiung Hsieh
Dr. Hsiao-Ting Tseng
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

24 pages, 6827 KB

Open AccessArticle

Aerial Imaging-Based Soiling Detection System for Solar Photovoltaic Panel Cleanliness Inspection

by Umair Naeem, Ken Chadda, Sara Vahaji, Jawad Ahmad, Xiaodong Li and Ehsan Asadi

Sensors 2025, 25(3), 738; https://doi.org/10.3390/s25030738 - 25 Jan 2025

Cited by 10 | Viewed by 3714

Abstract

Unmanned Aerial Vehicles (UAVs) integrated with lightweight visual cameras hold significant promise in renewable energy asset inspection and monitoring. This study presents an AI-assisted soiling detection methodology for inspecting solar photovoltaic (PV) panels, using UAV-captured RGB images. The proposed scheme introduces an autonomous end-to-end soiling detection model for common types of soiling in solar panel installations, including bird droppings and dust. Detecting soiling, particularly bird droppings, is critical due to their pronounced negative impact on power generation, primarily through hotspot formation and their resistance to natural cleaning processes such as rain. A dataset containing aerial RGB images of PV panels with dust and bird droppings is collected as a prerequisite. This study addresses the unique challenges posed by the small size and indistinct features of bird droppings in aerial imagery in contrast to relatively large-sized dust regions. To overcome these challenges, we developed a custom model, named SDS-YOLO (Soiling Detection System YOLO), which features a Convolutional Block Attention Module (CBAM) and two dedicated detection heads optimized for dust and bird droppings. The SDS-YOLO model significantly improves detection accuracy for bird droppings while maintaining robust performance for the dust class, compared with YOLOv5, YOLOv8, and YOLOv11. With the integration of CBAM, we achieved a substantial 40.2% increase in mean Average Precision (mAP50) and a 26.6% improvement in F1 score for bird droppings. Dust detection metrics also benefited from this attention-based refinement. These results underscore the CBAM’s role in improving feature extraction and reducing false positives, particularly for challenging soiling types. Additionally, the SDS-YOLO parameter count is reduced by 24%, thus enhancing its suitability for edge computing applications. Full article

(This article belongs to the Special Issue Computer Vision in AI for Robotics Development)

► Show Figures

Figure 1

22 pages, 8135 KB

Open AccessArticle

A Lightweight Visual Simultaneous Localization and Mapping Method with a High Precision in Dynamic Scenes

by Qi Zhang, Wentao Yu, Weirong Liu, Hao Xu and Yuan He

Sensors 2023, 23(22), 9274; https://doi.org/10.3390/s23229274 - 19 Nov 2023

Cited by 6 | Viewed by 2780

Abstract

Currently, in most traditional VSLAM (visual SLAM) systems, static assumptions result in a low accuracy in dynamic environments, or result in a new and higher level of accuracy but at the cost of sacrificing the real–time property. In highly dynamic scenes, balancing a high accuracy and a low computational cost has become a pivotal requirement for VSLAM systems. This paper proposes a new VSLAM system, balancing the competitive demands between positioning accuracy and computational complexity and thereby further improving the overall system properties. From the perspective of accuracy, the system applies an improved lightweight target detection network to quickly detect dynamic feature points while extracting feature points at the front end of the system, and only feature points of static targets are applied for frame matching. Meanwhile, the attention mechanism is integrated into the target detection network to continuously and accurately capture dynamic factors to cope with more complex dynamic environments. From the perspective of computational expense, the lightweight network Ghostnet module is applied as the backbone network of the target detection network YOLOv5s, significantly reducing the number of model parameters and improving the overall inference speed of the algorithm. Experimental results on the TUM dynamic dataset indicate that in contrast with the ORB–SLAM3 system, the pose estimation accuracy of the system improved by 84.04%. In contrast with dynamic SLAM systems such as DS–SLAM and DVO SLAM, the system has a significantly improved positioning accuracy. In contrast with other VSLAM algorithms based on deep learning, the system has superior real–time properties while maintaining a similar accuracy index. Full article

(This article belongs to the Special Issue Computer Vision in AI for Robotics Development)

► Show Figures

Figure 1

17 pages, 12029 KB

Open AccessArticle

A Study on Generative Models for Visual Recognition of Unknown Scenes Using a Textual Description

by Jose Martinez-Carranza, Delia Irazú Hernández-Farías, Victoria Eugenia Vazquez-Meza, Leticia Oyuki Rojas-Perez and Aldrich Alfredo Cabrera-Ponce

Sensors 2023, 23(21), 8757; https://doi.org/10.3390/s23218757 - 27 Oct 2023

Cited by 2 | Viewed by 2451

Abstract

In this study, we investigate the application of generative models to assist artificial agents, such as delivery drones or service robots, in visualising unfamiliar destinations solely based on textual descriptions. We explore the use of generative models, such as Stable Diffusion, and embedding representations, such as CLIP and VisualBERT, to compare generated images obtained from textual descriptions of target scenes with images of those scenes. Our research encompasses three key strategies: image generation, text generation, and text enhancement, the latter involving tools such as ChatGPT to create concise textual descriptions for evaluation. The findings of this study contribute to an understanding of the impact of combining generative tools with multi-modal embedding representations to enhance the artificial agent’s ability to recognise unknown scenes. Consequently, we assert that this research holds broad applications, particularly in drone parcel delivery, where an aerial robot can employ text descriptions to identify a destination. Furthermore, this concept can also be applied to other service robots tasked with delivering to unfamiliar locations, relying exclusively on user-provided textual descriptions. Full article

(This article belongs to the Special Issue Computer Vision in AI for Robotics Development)

► Show Figures

Figure 1

29 pages, 15531 KB

Open AccessArticle

Adaptation of YOLOv7 and YOLOv7_tiny for Soccer-Ball Multi-Detection with DeepSORT for Tracking by Semi-Supervised System

by Jorge Armando Vicente-Martínez, Moisés Márquez-Olivera, Abraham García-Aliaga and Viridiana Hernández-Herrera

Sensors 2023, 23(21), 8693; https://doi.org/10.3390/s23218693 - 25 Oct 2023

Cited by 17 | Viewed by 6486

Abstract

Object recognition and tracking have long been a challenge, drawing considerable attention from analysts and researchers, particularly in the realm of sports, where it plays a pivotal role in refining trajectory analysis. This study introduces a different approach, advancing the detection and tracking of soccer balls through the implementation of a semi-supervised network. Leveraging the YOLOv7 convolutional neural network, and incorporating the focal loss function, the proposed framework achieves a remarkable 95% accuracy in ball detection. This strategy outperforms previous methodologies researched in the bibliography. The integration of focal loss brings a distinctive edge to the model, improving the detection challenge for soccer balls on different fields. This pivotal modification, in tandem with the utilization of the YOLOv7 architecture, results in a marked improvement in accuracy. Following the attainment of this result, the implementation of DeepSORT enriches the study by enabling precise trajectory tracking. In the comparative analysis between versions, the efficacy of this approach is underscored, demonstrating its superiority over conventional methods with default loss function. In the Materials and Methods section, a meticulously curated dataset of soccer balls is assembled. Combining images sourced from freely available digital media with additional images from training sessions and amateur matches taken by ourselves, the dataset contains a total of 6331 images. This diverse dataset enables comprehensive testing, providing a solid foundation for evaluating the model’s performance under varying conditions, which is divided by 5731 images for supervised system and the last 600 images for semi-supervised. The results are striking, with an accuracy increase to 95% with the focal loss function. The visual representations of real-world scenarios underscore the model’s proficiency in both detection and classification tasks, further affirming its effectiveness, the impact, and the innovative approach. In the discussion, the hardware specifications employed are also touched on, any encountered errors are highlighted, and promising avenues for future research are outlined. Full article

(This article belongs to the Special Issue Computer Vision in AI for Robotics Development)

► Show Figures

Figure 1

23 pages, 5218 KB

Open AccessArticle

Automatic Speaker Positioning in Meetings Based on YOLO and TDOA

by Chen-Chiung Hsieh, Men-Ru Lu and Hsiao-Ting Tseng

Sensors 2023, 23(14), 6250; https://doi.org/10.3390/s23146250 - 8 Jul 2023

Cited by 2 | Viewed by 2354

Abstract

In recent years, many things have been held via video conferences due to the impact of the COVID-19 epidemic around the world. A webcam will be used in conjunction with a computer and the Internet. However, the network camera cannot automatically turn and cannot lock the screen to the speaker. Therefore, this study uses the objection detector YOLO to capture the upper body of all people on the screen and judge whether each person opens or closes their mouth. At the same time, the Time Difference of Arrival (TDOA) is used to detect the angle of the sound source. Finally, the person’s position obtained by YOLO is reversed to the person’s position in the spatial coordinates through the distance between the person and the camera. Then, the spatial coordinates are used to calculate the angle between the person and the camera through inverse trigonometric functions. Finally, the angle obtained by the camera, and the angle of the sound source obtained by the microphone array, are matched for positioning. The experimental results show that the recall rate of positioning through YOLOX-Tiny reached 85.2%, and the recall rate of TDOA alone reached 88%. Integrating YOLOX-Tiny and TDOA for positioning, the recall rate reached 86.7%, the precision rate reached 100%, and the accuracy reached 94.5%. Therefore, the method proposed in this study can locate the speaker, and it has a better effect than using only one source. Full article

(This article belongs to the Special Issue Computer Vision in AI for Robotics Development)

► Show Figures

Journal Menu

Journal Browser

Computer Vision in AI for Robotics Development

Share This Special Issue

Special Issue Editors

Special Issue Information

Benefits of Publishing in a Special Issue

Published Papers (5 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI