Electronics

Editorial

Jump to: Research

19 pages, 214 KB

Open AccessEditorial

Advances in Computer Vision and Deep Learning and Its Applications

by Aili Wang, Haibin Wu and Yuji Iwahori

Electronics 2025, 14(8), 1551; https://doi.org/10.3390/electronics14081551 - 11 Apr 2025

Cited by 6 | Viewed by 4443

Abstract

(1) Computer Vision: The field of computer vision is making significant strides in dynamic reasoning capability through test-time scaling (TTS) [...] Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

Research

Jump to: Editorial

13 pages, 4428 KB

Open AccessArticle

YOLO-CBF: Optimized YOLOv7 Algorithm for Helmet Detection in Road Environments

by Zhiqiang Wu, Jiaohua Qin, Xuyu Xiang and Yun Tan

Electronics 2025, 14(7), 1413; https://doi.org/10.3390/electronics14071413 - 31 Mar 2025

Cited by 2 | Viewed by 2195

Abstract

Helmet-wearing detection for electric vehicle riders is essential for traffic safety, yet existing detection models often suffer from high target occlusion and low detection accuracy in complex road environments. To address these issues, this paper proposes YOLO-CBF, an improved YOLOv7-based detection network. The [...] Read more.

Helmet-wearing detection for electric vehicle riders is essential for traffic safety, yet existing detection models often suffer from high target occlusion and low detection accuracy in complex road environments. To address these issues, this paper proposes YOLO-CBF, an improved YOLOv7-based detection network. The proposed model integrates coordinate convolution to enhance spatial information perception, optimizes the Focal EIOU loss function, and incorporates the BiFormer dynamic sparse attention mechanism to achieve more efficient computation and dynamic content perception. These enhancements enable the model to extract key features more effectively, improving detection precision. Experimental results show that YOLO-CBF achieves an average mAP of 95.6% for helmet-wearing detection in various scenarios, outperforming the original YOLOv7 by 4%. Additionally, YOLO-CBF demonstrates superior performance compared to other mainstream object detection models, achieving accurate and reliable helmet detection for electric vehicle riders. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

23 pages, 4473 KB

Open AccessArticle

A Study of Occluded Person Re-Identification for Shared Feature Fusion with Pose-Guided and Unsupervised Semantic Segmentation

by Junsuo Qu, Zhenguo Zhang, Yanghai Zhang and Chensong He

Electronics 2024, 13(22), 4523; https://doi.org/10.3390/electronics13224523 - 18 Nov 2024

Cited by 1 | Viewed by 2802

Abstract

The human body is often occluded by a variety of obstacles in the monitoring system, so occluded person re-identification is still a long-standing challenge. Recent methods based on pose guidance or external semantic clues have improved the representation and related performance of features; [...] Read more.

The human body is often occluded by a variety of obstacles in the monitoring system, so occluded person re-identification is still a long-standing challenge. Recent methods based on pose guidance or external semantic clues have improved the representation and related performance of features; there are still problems, such as weak model representation and unreliable semantic clues. To solve the above problems, we proposed a feature extraction network, named shared feature fusion with pose-guided and unsupervised semantic segmentation (SFPUS). This network will extract more discriminative features and reduce the occlusion noise on pedestrian matching. Firstly, the multibranch joint feature extraction module (MFE) is used to extract feature sets containing pose information and high-order semantic information. This module not only provides robust extraction capabilities but can also precisely segment occlusion and the body. Secondly, in order to obtain multiscale discriminant features, the multiscale correlation feature matching fusion module (MCF) is used to match the two feature sets, and the Pose–Semantic Fusion Loss is designed to calculate the similarity of the feature sets between different modes and fuse them into a feature set. Thirdly, to solve the problem of image occlusion, we use unsupervised cascade clustering to better prevent occlusion interference. Finally, performances of the proposed method and various existing methods are compared on the Occluded-Duke, Occluded-ReID, Market-1501 and Duke-MTMC datasets. The accuracy of Rank-1 reached 65.7%, 80.8%, 94.8% and 89.6%, respectively, and the mAP accuracy reached 58.8%, 72.5%, 91.8% and 80.1%. The experiment results demonstrate that our proposed SFPUS holds promising prospects and performs admirably compared with state-of-the-art methods. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

18 pages, 4997 KB

Open AccessArticle

Robotic Grasping Detection Algorithm Based on 3D Vision Dual-Stream Encoding Strategy

by Minglin Lei, Pandong Wang, Hua Lei, Jieyun Ma, Wei Wu and Yongtao Hao

Electronics 2024, 13(22), 4432; https://doi.org/10.3390/electronics13224432 - 12 Nov 2024

Cited by 1 | Viewed by 3103

Abstract

The automatic generation of stable robotic grasping postures is crucial for the application of computer vision algorithms in real-world settings. This task becomes especially challenging in complex environments, where accurately identifying the geometric shapes and spatial relationships between objects is essential. To enhance [...] Read more.

The automatic generation of stable robotic grasping postures is crucial for the application of computer vision algorithms in real-world settings. This task becomes especially challenging in complex environments, where accurately identifying the geometric shapes and spatial relationships between objects is essential. To enhance the capture of object pose information in 3D visual scenes, we propose a planar robotic grasping detection algorithm named SU-Grasp, which simultaneously focuses on local regions and long-distance relationships. Built upon a U-shaped network, SU-Grasp introduces a novel dual-stream encoding strategy using the Swin Transformer combined with spatial semantic enhancement. Compared to existing baseline methods, our algorithm achieves superior performance across public datasets, simulation tests, and real-world scenarios, highlighting its robust understanding of complex spatial environments. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

22 pages, 3646 KB

Open AccessArticle

A Novel Deep Learning Framework Enhanced by Hybrid Optimization Using Dung Beetle and Fick’s Law for Superior Pneumonia Detection

by Abdulazeez M. Sabaawi and Hakan Koyuncu

Electronics 2024, 13(20), 4042; https://doi.org/10.3390/electronics13204042 - 14 Oct 2024

Cited by 1 | Viewed by 2536

Abstract

Pneumonia is an inflammation of lung tissue caused by various infectious microorganisms and noninfectious factors. It affects people of all ages, but vulnerable age groups are more susceptible. Imaging techniques, such as chest X-rays (CXRs), are crucial in early detection and prompt action. [...] Read more.

Pneumonia is an inflammation of lung tissue caused by various infectious microorganisms and noninfectious factors. It affects people of all ages, but vulnerable age groups are more susceptible. Imaging techniques, such as chest X-rays (CXRs), are crucial in early detection and prompt action. CXRs for this condition are characterized by radiopaque appearances or sometimes a consolidation in the affected part of the lung caused by inflammatory secretions that replace the air in the infected alveoli. Accurate early detection of pneumonia is essential to avoid its potentially fatal consequences, particularly in children and the elderly. This paper proposes an enhanced framework based on convolutional neural network (CNN) architecture, specifically utilizing a transfer-learning-based architecture (MobileNet V1), which has outperformed recent models. The proposed framework is improved using a hybrid method combining the operation of two optimization algorithms: the dung beetle optimizer (DBO), which enhances exploration by mimicking dung beetles’ navigational strategies, and Fick’s law algorithm (FLA), which improves exploitation by guiding solutions toward optimal areas. This hybrid optimization effectively balances exploration and exploitation, significantly enhancing model performance. The model was trained on 7750 chest X-ray images. The framework can distinguish between healthy and pneumonia, achieving an accuracy of 98.19 ± 0.94% and a sensitivity of 98 ± 0.99%. The results are promising, indicating that this new framework could be used for the early detection of pneumonia with a low cost and high accuracy, especially in remote areas that lack expertise in radiology, thus reducing the mortality rate caused by pneumonia. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

22 pages, 9193 KB

Open AccessArticle

RS-Xception: A Lightweight Network for Facial Expression Recognition

by Liefa Liao, Shouluan Wu, Chao Song and Jianglong Fu

Electronics 2024, 13(16), 3217; https://doi.org/10.3390/electronics13163217 - 14 Aug 2024

Cited by 14 | Viewed by 4158

Abstract

Facial expression recognition (FER) utilizes artificial intelligence for the detection and analysis of human faces, with significant applications across various scenarios. Our objective is to deploy the facial emotion recognition network on mobile devices and extend its application to diverse areas, including classroom [...] Read more.

Facial expression recognition (FER) utilizes artificial intelligence for the detection and analysis of human faces, with significant applications across various scenarios. Our objective is to deploy the facial emotion recognition network on mobile devices and extend its application to diverse areas, including classroom effect monitoring, human–computer interaction, specialized training for athletes (such as in figure skating and rhythmic gymnastics), and actor emotion training. Recent studies have employed advanced deep learning models to address this task, though these models often encounter challenges like subpar performance and an excessive number of parameters that do not align with the requirements of FER for embedded devices. To tackle this issue, we have devised a lightweight network structure named RS-Xception, which is straightforward yet highly effective. Drawing on the strengths of ResNet and SENet, this network integrates elements from the Xception architecture. Our models have been trained on FER2013 datasets and demonstrate superior efficiency compared to conventional network models. Furthermore, we have assessed the model’s performance on the CK+, FER2013, and Bigfer2013 datasets, achieving accuracy rates of 97.13%, 69.02%, and 72.06%, respectively. Evaluation on the complex RAF-DB dataset yielded an accuracy rate of 82.98%. The incorporation of transfer learning notably enhanced the model’s accuracy, with a performance of 75.38% on the Bigfer2013 dataset, underscoring its significance in our research. In conclusion, our proposed model proves to be a viable solution for precise sentiment detection and estimation. In the future, our lightweight model may be deployed on embedded devices for research purposes. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

17 pages, 9111 KB

Open AccessArticle

An Efficient Multi-Branch Attention Network for Person Re-Identification

by Ke Han, Mingming Zhu, Pengzhen Li, Jie Dong, Haoyang Xie and Xiyan Zhang

Electronics 2024, 13(16), 3183; https://doi.org/10.3390/electronics13163183 - 12 Aug 2024

Cited by 2 | Viewed by 3650

Abstract

Due to the absence of tailored designs that address challenges such as variations in scale, disparities in illumination, and instances of occlusion, the implementation of current person re-identification techniques remains challenging in practical applications. An Efficient Multi-Branch Attention Network over OSNet (EMANet) is [...] Read more.

Due to the absence of tailored designs that address challenges such as variations in scale, disparities in illumination, and instances of occlusion, the implementation of current person re-identification techniques remains challenging in practical applications. An Efficient Multi-Branch Attention Network over OSNet (EMANet) is proposed. The structure is composed of three parts, the global branch, relational branch, and global contrastive pooling branch, and corresponding features are obtained from different branches. With the attention mechanism, which focuses on important features, DAS attention evaluates the significance of learned features, awarding higher ratings to those that are deemed crucial and lower ratings to those that are considered distracting. This approach leads to an enhancement in identification accuracy by emphasizing important features while discounting the influence of distracting ones. Identity loss and adaptive sparse pairwise loss are used to efficiently facilitate the information interaction. In experiments on the Market-1501 mainstream dataset, EMANet exhibited high identification accuracies of 96.1% and 89.8% for Rank-1 and mAP, respectively. The results indicate the superiority and effectiveness of the proposed model. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

16 pages, 8230 KB

Open AccessArticle

StrawSnake: A Real-Time Strawberry Instance Segmentation Network Based on the Contour Learning Approach

by Zhiyang Guo, Xing Hu, Baigan Zhao, Huaiwei Wang and Xueying Ma

Electronics 2024, 13(16), 3103; https://doi.org/10.3390/electronics13163103 - 6 Aug 2024

Cited by 9 | Viewed by 2688

Abstract

Automated harvesting systems rely heavily on precise and real-time fruit recognition, which is essential for improving efficiency and reducing labor costs. Strawberries, due to their delicate structure and complex growing environments, present unique challenges for automated recognition systems. Current methods predominantly utilize pixel-level [...] Read more.

Automated harvesting systems rely heavily on precise and real-time fruit recognition, which is essential for improving efficiency and reducing labor costs. Strawberries, due to their delicate structure and complex growing environments, present unique challenges for automated recognition systems. Current methods predominantly utilize pixel-level and box-based approaches, which are insufficient for real-time applications due to their inability to accurately pinpoint strawberry locations. To address these limitations, this study proposes StrawSnake, a contour-based detection and segmentation network tailored for strawberries. By designing a strawberry-specific octagonal contour and employing deep snake convolution (DSConv) for boundary feature extraction, StrawSnake significantly enhances recognition accuracy and speed. The Multi-scale Feature Reinforcement Block (MFRB) further strengthens the model by focusing on crucial boundary features and aggregating multi-level contour information, which improves global context comprehension. The newly developed TongStraw_DB database and the public StrawDI_Db1 database, consisting of 1080 and 3100 high-resolution strawberry images with manually segmented ground truth contours, respectively, serves as a robust foundation for training and validation. The results indicate that StrawSnake achieves real-time recognition capabilities with high accuracy, outperforming existing methods in various comparative tests. Ablation studies confirm the effectiveness of the DSConv and MFRB modules in boosting performance. StrawSnake’s integration into automated harvesting systems marks a substantial step forward in the field, promising enhanced precision and efficiency in strawberry recognition tasks. This innovation underscores the method’s potential to transform automated harvesting technologies, making them more reliable and effective for practical applications. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

19 pages, 3941 KB

Open AccessArticle

Improved 3D Object Detection Based on PointPillars

by Weiwei Kong, Yusheng Du, Leilei He and Zejiang Li

Electronics 2024, 13(15), 2915; https://doi.org/10.3390/electronics13152915 - 24 Jul 2024

Cited by 6 | Viewed by 5944

Abstract

Despite the recent advancements in 3D object detection, the conventional 3D point cloud object detection algorithms have been found to exhibit limited accuracy for the detection of small objects. To address the challenge of poor detection of small-scale objects, this paper adopts the [...] Read more.

Despite the recent advancements in 3D object detection, the conventional 3D point cloud object detection algorithms have been found to exhibit limited accuracy for the detection of small objects. To address the challenge of poor detection of small-scale objects, this paper adopts the PointPillars algorithm as the baseline model and proposes a two-stage 3D target detection approach. As a cutting-edge solution, point cloud processing is performed using Transformer models. Additionally, a redefined attention mechanism is introduced to further enhance the detection capabilities of the algorithm. In the first stage, the algorithm uses PointPillars as the baseline model. The central concept of this algorithm is to transform the point cloud space into equal-sized columns. During the feature extraction stage, when the features from all cylinders are transformed into pseudo-images, the proposed algorithm incorporates attention mechanisms adapted from the Squeeze-and-Excitation (SE) method to emphasize and suppress feature information. Furthermore, the 2D convolution of the traditional backbone network is replaced by dynamic convolution. Concurrently, the addition of the attention mechanism further improves the feature representation ability of the network. In the second phase, the candidate frames generated in the first phase are refined using a Transformer-based approach. The proposed algorithm applies channel weighting in the decoder to enhance channel information, leading to improved detection accuracy and reduced false detections. The encoder constructs the initial point features from the candidate frames for encoding. Meanwhile, the decoder applies channel weighting to enhance the channel information, thereby improving the detection accuracy and reducing false detections. In the KITTI dataset, the experimental results verify the effectiveness of this method in small objects detection. Experimental results show that the proposed method significantly improves the detection capability of small objects compared with the baseline PointPillars. In concrete terms, in the moderate difficulty detection category, cars, pedestrians, and cyclists average precision (AP) values increased by 5.30%, 8.1%, and 10.6%, respectively. Moreover, the proposed method surpasses existing mainstream approaches in the cyclist category. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

15 pages, 2842 KB

Open AccessArticle

Incremental SFM 3D Reconstruction Based on Deep Learning

by Lei Liu, Congzheng Wang, Chuncheng Feng, Wanqi Gong, Lingyi Zhang, Libin Liao and Chang Feng

Electronics 2024, 13(14), 2850; https://doi.org/10.3390/electronics13142850 - 19 Jul 2024

Cited by 9 | Viewed by 4974

Abstract

In recent years, with the rapid development of unmanned aerial vehicle (UAV) technology, multi-view 3D reconstruction has once again become a hot spot in computer vision. Incremental Structure From Motion (SFM) is currently the most prevalent reconstruction pipeline, but it still faces challenges [...] Read more.

In recent years, with the rapid development of unmanned aerial vehicle (UAV) technology, multi-view 3D reconstruction has once again become a hot spot in computer vision. Incremental Structure From Motion (SFM) is currently the most prevalent reconstruction pipeline, but it still faces challenges in reconstruction efficiency, accuracy, and feature matching. In this paper, we use deep learning algorithms for feature matching to obtain more accurate matching point pairs. Moreover, we adopted the improved Gauss–Newton (GN) method, which not only avoids numerical divergence but also accelerates the speed of bundle adjustment (BA). Then, the sparse point cloud reconstructed by SFM and the original image are used as the input of the depth estimation network to predict the depth map of each image. Finally, the depth map is fused to complete the reconstruction of dense point clouds. After experimental verification, the reconstructed dense point clouds have rich details and clear textures, and the integrity, overall accuracy, and reconstruction efficiency of the point clouds have been improved. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

17 pages, 5588 KB

Open AccessArticle

Detection of Liquid Retention on Pipette Tips in High-Throughput Liquid Handling Workstations Based on Improved YOLOv8 Algorithm with Attention Mechanism

by Yanpu Yin, Jiahui Lei and Wei Tao

Electronics 2024, 13(14), 2836; https://doi.org/10.3390/electronics13142836 - 18 Jul 2024

Cited by 2 | Viewed by 2414

Abstract

High-throughput liquid handling workstations are required to process large numbers of test samples in the fields of life sciences and medicine. Liquid retention and droplets hanging in the pipette tips can lead to cross-contamination of samples and reagents and inaccurate experimental results. Traditional [...] Read more.

High-throughput liquid handling workstations are required to process large numbers of test samples in the fields of life sciences and medicine. Liquid retention and droplets hanging in the pipette tips can lead to cross-contamination of samples and reagents and inaccurate experimental results. Traditional methods for detecting liquid retention have low precision and poor real-time performance. This paper proposes an improved YOLOv8 (You Only Look Once version 8) object detection algorithm to address the challenges posed by different liquid sizes and colors, complex situation of test tube racks and multiple samples in the background, and poor global image structure understanding in pipette tip liquid retention detection. A global context (GC) attention mechanism module is introduced into the backbone network and the cross-stage partial feature fusion (C2f) module to better focus on target features. To enhance the ability to effectively combine and process different types of data inputs and background information, a Large Kernel Selection (LKS) module is also introduced into the backbone network. Additionally, the neck network is redesigned to incorporate the Simple Attention (SimAM) mechanism module, generating attention weights and improving overall performance. We evaluated the algorithm using a self-built dataset of pipette tips. Compared to the original YOLOv8 model, the improved algorithm increased mAP@0.5 (mean average precision), F1 score, and precision by 1.7%, 2%, and 1.7%, respectively. The improved YOLOv8 algorithm can enhance the detection capability of liquid-retaining pipette tips, and prevent cross-contamination from affecting the results of sample solution experiments. It provides a detection basis for subsequent automatic processing of solution for liquid retention. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

21 pages, 10205 KB

Open AccessArticle

Improved YOLOV5 Angle Embossed Character Recognition by Multiscale Residual Attention with Selectable Clustering

by Shenshun Ying, Jianhai Fang, Shaozhang Tang and Wenzhi Bao

Electronics 2024, 13(13), 2435; https://doi.org/10.3390/electronics13132435 - 21 Jun 2024

Cited by 4 | Viewed by 2191

Abstract

In the intelligentization process of power transmission towers, automated identification of stamped characters is crucial. Currently, manual methods are predominantly used, which are time-consuming, labor-intensive, and prone to errors. For small-sized characters that are incomplete, connected, and irregular in shape, existing OCR technologies [...] Read more.

In the intelligentization process of power transmission towers, automated identification of stamped characters is crucial. Currently, manual methods are predominantly used, which are time-consuming, labor-intensive, and prone to errors. For small-sized characters that are incomplete, connected, and irregular in shape, existing OCR technologies also struggle to achieve satisfactory recognition results. Thus, an approach utilizing an improved deep neural network model to enhance the recognition performance of stamped characters is proposed. Based on the backbone network of YOLOv5, a multi-scale residual attention encoding mechanism is introduced during the upsampling process to enhance the weights of small and incomplete character targets. Additionally, a selectable clustering minimum iteration center module is introduced to optimize the selection of clustering centers and integrate multi-scale information, thereby reducing random errors. Experimental verification shows that the improved model significantly reduces the instability caused by random selection of clustering centers during the clustering process, accelerates the convergence of small target recognition, achieves a recognition accuracy of 97.6% and a detection speed of 43 milliseconds on the task of stamped character recognition, and significantly outperforms existing Fast-CNN, YOLOv5, and YOLOv6 models in terms of performance, effectively enhancing the precision and efficiency of automatic identification. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

18 pages, 16034 KB

Open AccessArticle

RVDR-YOLOv8: A Weed Target Detection Model Based on Improved YOLOv8

by Yuanming Ding, Chen Jiang, Lin Song, Fei Liu and Yunrui Tao

Electronics 2024, 13(11), 2182; https://doi.org/10.3390/electronics13112182 - 3 Jun 2024

Cited by 19 | Viewed by 3367

Abstract

Currently, weed control robots that can accurately identify weeds and carry out removal work are gradually replacing traditional chemical weed control techniques. However, the computational and storage resources of the core processing equipment of weeding robots are limited. Aiming at the current problems [...] Read more.

Currently, weed control robots that can accurately identify weeds and carry out removal work are gradually replacing traditional chemical weed control techniques. However, the computational and storage resources of the core processing equipment of weeding robots are limited. Aiming at the current problems of high computation and the high number of model parameters in weeding robots, this paper proposes a lightweight weed target detection model based on the improved YOLOv8 (You Only Look Once Version 8), called RVDR-YOLOv8 (Reversible Column Dilation-wise Residual). First, the backbone network is reconstructed based on RevCol (Reversible Column Networks). The unique reversible columnar structure of the new backbone network not only reduces the computational volume but also improves the model generalisation ability. Second, the C2fDWR module is designed using Dilation-wise Residual and integrated with the reconstructed backbone network, which improves the adaptive ability of the new backbone network RVDR and enhances the model’s recognition accuracy for occluded targets. Again, GSConv is introduced at the neck end instead of traditional convolution to reduce the complexity of computation and network structure while ensuring the model recognition accuracy. Finally, InnerMPDIoU is designed by combining MPDIoU with InnerIoU to improve the prediction accuracy of the model. The experimental results show that the computational complexity of the new model is reduced by 35.8%, the number of parameters is reduced by 35.4% and the model size is reduced by 30.2%, while the mAP₅₀ and mAP_50-95 values are improved by 1.7% and 1.1%, respectively, compared to YOLOv8. The overall performance of the new model is improved compared to models such as Faster R-CNN, SSD and RetinaNet. The new model proposed in this paper can achieve the accurate identification of weeds in farmland under the condition of limited hardware resources, which provides theoretical and technical support for the effective control of weeds in farmland. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

22 pages, 5805 KB

Open AccessArticle

Research on 3D Visualization of Drone Scenes Based on Neural Radiance Fields

by Pengfei Jin and Zhuoyuan Yu

Electronics 2024, 13(9), 1682; https://doi.org/10.3390/electronics13091682 - 26 Apr 2024

Cited by 1 | Viewed by 2688

Abstract

Neural Radiance Fields (NeRFs), as an innovative method employing neural networks for the implicit representation of 3D scenes, have been able to synthesize images from arbitrary viewpoints and successfully apply them to the visualization of objects and room-level scenes (<50 m²). [...] Read more.

Neural Radiance Fields (NeRFs), as an innovative method employing neural networks for the implicit representation of 3D scenes, have been able to synthesize images from arbitrary viewpoints and successfully apply them to the visualization of objects and room-level scenes (<50 m²). However, due to the capacity limitations of neural networks, the rendering of drone-captured scenes (>10,000 m²) often appears blurry and lacks detail. Merely increasing the model’s capacity or the number of sample points can significantly raise training costs. Existing space contraction methods, designed for forward-facing trajectory or the 360° object-centric trajectory, are not suitable for the unique trajectories of drone footage. Furthermore, anomalies and cloud fog artifacts, resulting from complex lighting conditions and sparse data acquisition, can significantly degrade the quality of rendering. To address these challenges, we propose a framework specifically designed for drone-captured scenes. Within this framework, while using a feature grid and multi-layer perceptron (MLP) to jointly represent 3D scenes, we introduce a Space Boundary Compression method and a Ground-Optimized Sampling strategy to streamline spatial structure and enhance sampling performance. Moreover, we propose an anti-aliasing neural rendering model based on Cluster Sampling and Integrated Hash Encoding to optimize distant details and incorporate an L1 norm penalty for outliers, as well as entropy regularization loss to reduce fluffy artifacts. To verify the effectiveness of the algorithm, experiments were conducted on four drone-captured scenes. The results show that, with only a single GPU and less than two hours of training time, photorealistic visualization can be achieved, significantly improving upon the performance of the existing NeRF approaches. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

13 pages, 4454 KB

Open AccessArticle

A High-Precision Fall Detection Model Based on Dynamic Convolution in Complex Scenes

by Yong Qin, Wuqing Miao and Chen Qian

Electronics 2024, 13(6), 1141; https://doi.org/10.3390/electronics13061141 - 20 Mar 2024

Cited by 17 | Viewed by 3542

Abstract

Falls can cause significant harm, and even death, to elderly individuals. Therefore, it is crucial to have a highly accurate fall detection model that can promptly detect and respond to changes in posture. The YOLOv8 model may not effectively address the challenges posed [...] Read more.

Falls can cause significant harm, and even death, to elderly individuals. Therefore, it is crucial to have a highly accurate fall detection model that can promptly detect and respond to changes in posture. The YOLOv8 model may not effectively address the challenges posed by deformation, different scale targets, and occlusion in complex scenes during human falls. This paper presented ESD-YOLO, a new high-precision fall detection model based on dynamic convolution that improves upon the YOLOv8 model. The C2f module in the backbone network was replaced with the C2Dv3 module to enhance the network’s ability to capture complex details and deformations. The Neck section used the DyHead block to unify multiple attentional operations, enhancing the detection accuracy of targets at different scales and improving performance in cases of occlusion. Additionally, the algorithm proposed in this paper utilized the loss function EASlideloss to increase the model’s focus on hard samples and solve the problem of sample imbalance. The experimental results demonstrated a 1.9% increase in precision, a 4.1% increase in recall, a 4.3% increase in mAP0.5, and a 2.8% increase in mAP0.5:0.95 compared to YOLOv8. Specifically, it has significantly improved the precision of human fall detection in complex scenes. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

25 pages, 9298 KB

Open AccessArticle

Research on the Car Searching System in the Multi-Storey Garage with the RSSI Indoor Locating Based on Neural Network

by Jihui Ma, Lijie Wang, Xianwen Zhu, Ziyi Li and Xinyu Lu

Electronics 2024, 13(5), 907; https://doi.org/10.3390/electronics13050907 - 27 Feb 2024

Cited by 1 | Viewed by 2382

Abstract

To solve the problem of reverse car searching in intelligent multi-story garages or parking lots, the reverse car searching method based on the intelligent garage of the PC client and mobile client APP was studied, and the interface design and function development of [...] Read more.

To solve the problem of reverse car searching in intelligent multi-story garages or parking lots, the reverse car searching method based on the intelligent garage of the PC client and mobile client APP was studied, and the interface design and function development of the system’s PC and mobile client APP were carried out. YOLOv5 network and LPRNet network were used for license plate location and recognition to realize parking and entry detection. The indoor pedestrian location method based on RSSI fingerprint signal fusion BPNet network and KNN algorithm was studied, and the location accuracy within 2.5 m was found to be 100%. The research on the A* algorithm based on spatial accessibility was conducted to realize the reverse car search function. The research results indicate that the guidance of the vehicle finding path can be completed while the number of invalid search nodes for the example maps was reduced by more than 55.0%, and the operating efficiency of the algorithm increased to 28.5%. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

16 pages, 62019 KB

Open AccessArticle

MM-NeRF: Large-Scale Scene Representation with Multi-Resolution Hash Grid and Multi-View Priors Features

by Bo Dong, Kaiqiang Chen, Zhirui Wang, Menglong Yan, Jiaojiao Gu and Xian Sun

Electronics 2024, 13(5), 844; https://doi.org/10.3390/electronics13050844 - 22 Feb 2024

Cited by 6 | Viewed by 4618

Abstract

Reconstructing large-scale scenes using Neural Radiance Fields (NeRFs) is a research hotspot in 3D computer vision. Existing MLP (multi-layer perception)-based methods often suffer from issues of underfitting and a lack of fine details in rendering large-scale scenes. Popular solutions are to divide the [...] Read more.

Reconstructing large-scale scenes using Neural Radiance Fields (NeRFs) is a research hotspot in 3D computer vision. Existing MLP (multi-layer perception)-based methods often suffer from issues of underfitting and a lack of fine details in rendering large-scale scenes. Popular solutions are to divide the scene into small areas for separate modeling or to increase the layer scale of the MLP network. However, the subsequent problem is that the training cost increases. Moreover, reconstructing large scenes, unlike object-scale reconstruction, involves a geometrically considerable increase in the quantity of view data if the prior information of the scene is not effectively utilized. In this paper, we propose an innovative method named MM-NeRF, which integrates efficient hybrid features into the NeRF framework to enhance the reconstruction of large-scale scenes. We propose employing a dual-branch feature capture structure, comprising a multi-resolution 3D hash grid feature branch and a multi-view 2D prior feature branch. The 3D hash grid feature models geometric details, while the 2D prior feature supplements local texture information. Our experimental results show that such integration is sufficient to render realistic novel views with fine details, forming a more accurate geometric representation. Compared with representative methods in the field, our method significantly improves the PSNR (Peak Signal-to-Noise Ratio) by approximately 5%. This remarkable progress underscores the outstanding contribution of our method in the field of large-scene radiance field reconstruction. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

21 pages, 11283 KB

Open AccessArticle

A Method for Unseen Object Six Degrees of Freedom Pose Estimation Based on Segment Anything Model and Hybrid Distance Optimization

by Li Xin, Hu Lin, Xinjun Liu and Shiyu Wang

Electronics 2024, 13(4), 774; https://doi.org/10.3390/electronics13040774 - 16 Feb 2024

Cited by 1 | Viewed by 3516

Abstract

Six degrees of freedom pose estimation technology constitutes the cornerstone for precise robotic control and similar tasks. Addressing the limitations of current 6-DoF pose estimation methods in handling object occlusions and unknown objects, we have developed a novel two-stage 6-DoF pose estimation method [...] Read more.

Six degrees of freedom pose estimation technology constitutes the cornerstone for precise robotic control and similar tasks. Addressing the limitations of current 6-DoF pose estimation methods in handling object occlusions and unknown objects, we have developed a novel two-stage 6-DoF pose estimation method that integrates RGB-D data with CAD models. Initially, targeting high-quality zero-shot object instance segmentation tasks, we innovated the CAE-SAM model based on the SAM framework. In addressing the SAM model’s boundary blur, mask voids, and over-segmentation issues, this paper introduces innovative strategies such as local spatial-feature-enhancement modules, global context markers, and a bounding box generator. Subsequently, we proposed a registration method optimized through a hybrid distance metric to diminish the dependency of point cloud registration algorithms on sensitive hyperparameters. Experimental results on the HQSeg-44K dataset substantiate the notable improvements in instance segmentation accuracy and robustness rendered by the CAE-SAM model. Moreover, the efficacy of this two-stage method is further corroborated using a 6-DoF pose dataset of workpieces constructed with CloudCompare and RealSense. For unseen targets, the ADD metric achieved 2.973 mm, and the ADD-S metric reached 1.472 mm. This paper significantly enhances pose estimation performance and streamlines the algorithm’s deployment and maintenance procedures. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Graphical abstract

17 pages, 10797 KB

Open AccessArticle

Multi-Branch Spectral Channel Attention Network for Breast Cancer Histopathology Image Classification

by Lu Cao, Ke Pan, Yuan Ren, Ruidong Lu and Jianxin Zhang

Electronics 2024, 13(2), 459; https://doi.org/10.3390/electronics13020459 - 22 Jan 2024

Cited by 10 | Viewed by 3095

Abstract

Deep-learning-based breast cancer image diagnosis is currently a prominent and growingly popular area of research. Existing convolutional-neural-network-related methods mainly capture breast cancer image features based on spatial domain characteristics for classification. However, according to digital signal processing theory, texture images usually contain repeated [...] Read more.

Deep-learning-based breast cancer image diagnosis is currently a prominent and growingly popular area of research. Existing convolutional-neural-network-related methods mainly capture breast cancer image features based on spatial domain characteristics for classification. However, according to digital signal processing theory, texture images usually contain repeated patterns and structures, which appear as intense energy at specific frequencies in the frequency domain. Motivated by this, we make an attempt to explore a breast cancer histopathology classification application in the frequency domain and further propose a novel multi-branch spectral channel attention network, i.e., the MbsCANet. It expands the interaction of frequency domain attention mechanisms from a multi-branch perspective via combining the lowest frequency features with selected high frequency information from two-dimensional discrete cosine transform, thus preventing the loss of phase information and gaining richer context information for classification. We thoroughly evaluate and analyze the MbsCANet on the publicly accessible BreakHis breast cancer histopathology dataset. It respectively achieves the optimal image-level and patient-level classification results of 99.01% and 98.87%, averagely outperforming the spatial-domain-dominated models by a large margin, and visualization results also demonstrate the effectiveness of the MbsCANet for this medical image application. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

23 pages, 8433 KB

Open AccessArticle

Application of Improved YOLOv5 Algorithm in Lightweight Transmission Line Small Target Defect Detection

by Zhilong Yu, Yanqiao Lei, Feng Shen and Shuai Zhou

Electronics 2024, 13(2), 305; https://doi.org/10.3390/electronics13020305 - 10 Jan 2024

Cited by 13 | Viewed by 4760

Abstract

With the development of UAV automatic cruising along power transmission lines, intelligent defect detection in aerial images has become increasingly important. In the process of target detection for aerial photography of transmission lines, insulator defects often pose challenges due to complex backgrounds, resulting [...] Read more.

With the development of UAV automatic cruising along power transmission lines, intelligent defect detection in aerial images has become increasingly important. In the process of target detection for aerial photography of transmission lines, insulator defects often pose challenges due to complex backgrounds, resulting in noisy images and issues such as slow detection speed, leakage, and the misidentification of small-sized targets. To address these challenges, this paper proposes an insulator defect detection algorithm called DFCG_YOLOv5, which focuses on improving both the accuracy and speed by enhancing the network structure and optimizing the loss function. Firstly, the input part is optimized, and a High-Speed Adaptive Median Filtering (HSMF) algorithm is introduced to preprocess the images captured by the UAV system, effectively reducing the noise interference in target detection. Secondly, the original Ghost backbone structure is further optimized, and the DFC attention mechanism is incorporated to strike a balance between the target detection accuracy and speed. Additionally, the original CIOU loss function is replaced with the Poly Loss, which addresses the issue of imbalanced positive and negative samples for small targets. By adjusting the parameters for different datasets, this modification effectively suppresses background positive samples and enhances the detection accuracy. To align with real-world engineering applications, the dataset utilized in this study consists of unmanned aircraft system machine patrol images from the Yunnan Power Supply Bureau Company. The experimental results demonstrate a 9.2% improvement in the algorithm accuracy and a 26.2% increase in the inference speed compared to YOLOv5s. These findings hold significant implications for the practical implementation of target detection in engineering scenarios. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

21 pages, 2728 KB

Open AccessArticle

Depth-Quality Purification Feature Processing for Red Green Blue-Depth Salient Object Detection

by Shijie Feng, Li Zhao, Jie Hu, Xiaolong Zhou and Sixian Chan

Electronics 2024, 13(1), 93; https://doi.org/10.3390/electronics13010093 - 25 Dec 2023

Cited by 1 | Viewed by 1953

Abstract

With the advances in deep learning technology, Red Green Blue-Depth (RGB-D) Salient Object Detection (SOD) based on convolutional neural networks (CNNs) is gaining more and more attention. However, the accuracy of current models is challenging. It has been found that the quality of [...] Read more.

With the advances in deep learning technology, Red Green Blue-Depth (RGB-D) Salient Object Detection (SOD) based on convolutional neural networks (CNNs) is gaining more and more attention. However, the accuracy of current models is challenging. It has been found that the quality of the depth features profoundly affects the accuracy. Several current RGB-D SOD techniques do not consider the quality of the depth features and directly fuse the original depth features and Red Green Blue (RGB) features for training, resulting in enhanced precision of the model. To address this issue, we propose a depth-quality purification feature processing network for RGB-D SOD, named DQPFPNet. First, we design a depth-quality purification feature processing (DQPFP) module to filter the depth features in a multi-scale manner and fuse them with RGB features in a multi-scale manner. This module can control and enhance the depth features explicitly in the process of cross-modal fusion, avoiding injecting noise or misleading depth features. Second, to prevent overfitting and avoid neuron inactivation, we utilize the RReLU activation function in the training process. In addition, we introduce the pixel position adaptive importance (PPAI) loss, which integrates local structure information to assign different weights to each pixel, thus better guiding the network’s learning process and producing clearer details. Finally, a dual-stage decoder is designed to utilize contextual information to improve the modeling ability of the model and enhance the efficiency of the network. Extensive experiments on six RGB-D datasets demonstrate that DQPFPNet outperforms recent efficient models and delivers cutting-edge accuracy. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

19 pages, 4395 KB

Open AccessArticle

LezioSeg: Multi-Scale Attention Affine-Based CNN for Segmenting Diabetic Retinopathy Lesions in Images

by Mohammed Yousef Salem Ali, Mohammed Jabreel, Aida Valls, Marc Baget and Mohamed Abdel-Nasser

Electronics 2023, 12(24), 4940; https://doi.org/10.3390/electronics12244940 - 8 Dec 2023

Cited by 8 | Viewed by 2737

Abstract

Diagnosing some eye pathologies, such as diabetic retinopathy (DR), depends on accurately detecting retinal eye lesions. Automatic lesion-segmentation methods based on deep learning involve heavy-weight models and have yet to produce the desired quality of results. This paper presents a new deep learning [...] Read more.

Diagnosing some eye pathologies, such as diabetic retinopathy (DR), depends on accurately detecting retinal eye lesions. Automatic lesion-segmentation methods based on deep learning involve heavy-weight models and have yet to produce the desired quality of results. This paper presents a new deep learning method for segmenting the four types of DR lesions found in eye fundus images. The method, called LezioSeg, is based on multi-scale modules and gated skip connections. It has three components: (1) Two multi-scale modules, the first is atrous spatial pyramid pooling (ASPP), which is inserted at the neck of the network, while the second is added at the end of the decoder to improve the fundus image feature extraction; (2) ImageNet MobileNet encoder; and (3) gated skip connection (GSC) mechanism for improving the ability to obtain information about retinal eye lesions. Experiments using affine-based transformation techniques showed that this architecture improved the performance in lesion segmentation on the well-known IDRiD and E-ophtha datasets. Considering the AUPR standard metric, for the IDRiD dataset, we obtained 81% for soft exudates, 86% for hard exudates, 69% for hemorrhages, and 40% for microaneurysms. For the E-ophtha dataset, we achieved an AUPR of 63% for hard exudates and 37.5% for microaneurysms. These results show that our model with affine-based augmentation achieved competitive results compared to several cutting-edge techniques, but with a model with much fewer parameters. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

14 pages, 15025 KB

Open AccessArticle

Consistent Weighted Correlation-Based Attention for Transformer Tracking

by Lei Liu, Genwen Fang, Jun Wang, Shuai Wang, Chun Wang, Longfeng Shen, Kongfen Zhu and Silas N. Melo

Electronics 2023, 12(22), 4648; https://doi.org/10.3390/electronics12224648 - 15 Nov 2023

Cited by 1 | Viewed by 2136

Abstract

Attention mechanism takes a crucial role among the key technologies in transformer-based visual tracking. However, the current methods for attention computing neglect the correlation between the query and the key, which results in erroneous correlations. To address this issue, a CWCTrack framework is [...] Read more.

Attention mechanism takes a crucial role among the key technologies in transformer-based visual tracking. However, the current methods for attention computing neglect the correlation between the query and the key, which results in erroneous correlations. To address this issue, a CWCTrack framework is proposed in this study for transformer visual tracking. To balance the weights of the attention module and enhance the feature extraction of the search region and template region, a consistent weighted correlation (CWC) module is introduced into the cross-attention block. The CWC module computes the correlation score between each query and all keys. Then, the correlation multiplies the consistent weights of the other query–key pairs to acquire the final attention weights. The weights of consistency are computed by the relevance of the query–key pairs. The correlation is enhanced for the relevant query–key pair and suppressed for the irrelevant query–key pair. Experimental results conducted on four prevalent benchmarks demonstrate that the proposed CWCTrack yields preferable performances. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

19 pages, 33702 KB

Open AccessArticle

Detection of Fittings Based on the Dynamic Graph CNN and U-Net Embedded with Bi-Level Routing Attention

by Zhihui Xie, Min Fu and Xuefeng Liu

Electronics 2023, 12(22), 4611; https://doi.org/10.3390/electronics12224611 - 11 Nov 2023

Cited by 3 | Viewed by 2887

Abstract

Accurate detection of power fittings is crucial for identifying defects or faults in these components, which is essential for assessing the safety and stability of the power system. However, the accuracy of fittings detection is affected by a complex background, small target sizes, [...] Read more.

Accurate detection of power fittings is crucial for identifying defects or faults in these components, which is essential for assessing the safety and stability of the power system. However, the accuracy of fittings detection is affected by a complex background, small target sizes, and overlapping fittings in the images. To address these challenges, a fittings detection method based on the dynamic graph convolutional neural network (DGCNN) and U-shaped network (U-Net) is proposed, which combines three-dimensional detection with two-dimensional object detection. Firstly, the bi-level routing attention mechanism is incorporated into the lightweight U-Net network to enhance feature extraction for detecting the fittings boundary. Secondly, pseudo-point cloud data are synthesized by transforming the depth map generated by the Lite-Mono algorithm and its corresponding RGB fittings image. The DGCNN algorithm is then employed to extract obscured fittings features, contributing to the final refinement of the results. This process helps alleviate the issue of occlusions among targets and further enhances the precision of fittings detection. Finally, the proposed method is evaluated using a custom dataset of fittings, and comparative studies are conducted. The experimental results illustrate the promising potential of the proposed approach in enhancing features and extracting information from fittings images. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

15 pages, 2535 KB

Open AccessArticle

Toward Unified and Quantitative Cinematic Shot Attribute Analysis

by Yuzhi Li, Feng Tian, Haojun Xu and Tianfeng Lu

Electronics 2023, 12(19), 4174; https://doi.org/10.3390/electronics12194174 - 8 Oct 2023

Cited by 2 | Viewed by 2382

Abstract

Cinematic Shot Attribute Analysis aims to analyze the intrinsic attributes of movie shots, such as movement and scale. In previous methods, specialized architectures were designed for each specific task and relied on the use of optical flow maps. In this paper, we [...] Read more.

Cinematic Shot Attribute Analysis aims to analyze the intrinsic attributes of movie shots, such as movement and scale. In previous methods, specialized architectures were designed for each specific task and relied on the use of optical flow maps. In this paper, we consider shot attribute analysis as a unified task of motion–static weight allocation, and propose a motion–static dual-path architecture for recognizing various shot attributes. In this architecture, we design a new action cue generation module for adapting the end-to-end training process instead of a pre-trained optical flow network; and, to address the issue of limited samples in movie shot datasets, we design a fixed-size adjustment strategy to enable the network to directly utilize pre-trained vision transformer models while adapting to shot data inputs at arbitrary sample rates. In addition, we quantitatively analyze the sensitivity of different shot attributes to motion and static features for the first time. Subsequent experimental results on two datasets, MovieShots and AVE, demonstrate that our proposed method outperforms all previous approaches without increasing computational cost. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

19 pages, 3408 KB

Open AccessArticle

Convolutional Neural Networks Adapted for Regression Tasks: Predicting the Orientation of Straight Arrows on Marked Road Pavement Using Deep Learning and Rectified Orthophotography

by Calimanut-Ionut Cira, Alberto Díaz-Álvarez, Francisco Serradilla and Miguel-Ángel Manso-Callejo

Electronics 2023, 12(18), 3980; https://doi.org/10.3390/electronics12183980 - 21 Sep 2023

Cited by 11 | Viewed by 6524

Abstract

Arrow signs found on roadway pavement are an important component of modern transportation systems. Given the rise in autonomous vehicles, public agencies are increasingly interested in accurately identifying and analysing detailed road pavement information to generate comprehensive road maps and decision support systems [...] Read more.

Arrow signs found on roadway pavement are an important component of modern transportation systems. Given the rise in autonomous vehicles, public agencies are increasingly interested in accurately identifying and analysing detailed road pavement information to generate comprehensive road maps and decision support systems that can optimise traffic flow, enhance road safety, and provide complete official road cartographic support (that can be used in autonomous driving tasks). As arrow signs are a fundamental component of traffic guidance, this paper aims to present a novel deep learning-based approach to identify the orientation and direction of arrow signs on marked roadway pavements using high-resolution aerial orthoimages. The approach is based on convolutional neural network architectures (VGGNet, ResNet, Xception, and DenseNet) that are modified and adapted for regression tasks with a proposed learning structure, together with an ad hoc model, specially introduced for this task. Although the best-performing artificial neural network was based on VGGNet (VGG-19 variant), it only slightly surpassed the proposed ad hoc model in the average values of the R² score, mean squared error, and angular error by 0.005, 0.001, and 0.036, respectively, using the training set (the ad hoc model delivered an average R² score, mean squared error, and angular error of 0.9874, 0.001, and 2.516, respectively). Furthermore, the ad hoc model’s predictions using the test set were the most consistent (a standard deviation of the R² score of 0.033 compared with the score of 0.042 achieved using VGG19), while being almost eight times more computationally efficient when compared with the VGG19 model (2,673,729 parameters vs VGG19′s 20,321,985 parameters). Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

17 pages, 15056 KB

Open AccessArticle

Improving Monocular Depth Estimation with Learned Perceptual Image Patch Similarity-Based Image Reconstruction and Left–Right Difference Image Constraints

by Hyeseung Park and Seungchul Park

Electronics 2023, 12(17), 3730; https://doi.org/10.3390/electronics12173730 - 4 Sep 2023

Cited by 9 | Viewed by 4435

Abstract

This paper introduces a novel approach for self-supervised monocular depth estimation. The model is trained on stereo–image (left–right pair) data and incorporates carefully designed perceptual image quality assessment-based loss functions for image reconstruction and left–right image difference. The fidelity of the reconstructed images, [...] Read more.

This paper introduces a novel approach for self-supervised monocular depth estimation. The model is trained on stereo–image (left–right pair) data and incorporates carefully designed perceptual image quality assessment-based loss functions for image reconstruction and left–right image difference. The fidelity of the reconstructed images, obtained by warping the input images using the predicted disparity maps, significantly influences the accuracy of depth estimation in self-supervised monocular depth networks. The suggested LPIPS (Learned Perceptual Image Patch Similarity)-based evaluation of image reconstruction accurately emulates human perceptual mechanisms to quantify the quality of reconstructed images, serving as an image reconstruction loss. Consequently, it facilitates the gradual convergence of the reconstructed images toward a greater similarity with the target images during the training process. Stereo–image pair often exhibits slight discrepancies in brightness, contrast, color, and camera angle due to factors like lighting conditions and camera calibration inaccuracies. These factors limit the improvement of image reconstruction quality. To address this, the left–right difference image loss is introduced, aimed at aligning the disparities between the actual left–right image pair and the reconstructed left–right image pair. Due to the tendency of distant pixel values to approach zero in the difference images derived from the left and right source images of stereo pairs, this loss progressively steers the distant pixel values of the reconstructed difference images toward a convergence with zero. Hence, the use of this loss has demonstrated its efficacy in mitigating distortions in distant regions while enhancing overall performance. The primary objective of this study is to introduce and validate the effectiveness of LPIPS-based image reconstruction and left–right difference image losses in the context of monocular depth estimation. To this end, the proposed loss functions have been seamlessly integrated into a straightforward single-task stereo–image learning framework, incorporating simple hyperparameters. Notably, our approach achieves superior results compared to other state-of-the-art methods, even those adopting more intricate hybrid data and multi-task learning strategies. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

21 pages, 7872 KB

Open AccessEditor’s ChoiceArticle

YOLO-Drone: An Optimized YOLOv8 Network for Tiny UAV Object Detection

by Xianxu Zhai, Zhihua Huang, Tao Li, Hanzheng Liu and Siyuan Wang

Electronics 2023, 12(17), 3664; https://doi.org/10.3390/electronics12173664 - 30 Aug 2023

Cited by 182 | Viewed by 35332

Abstract

With the widespread use of UAVs in commercial and industrial applications, UAV detection is receiving increasing attention in areas such as public safety. As a result, object detection techniques for UAVs are also developing rapidly. However, the small size of drones, complex airspace [...] Read more.

With the widespread use of UAVs in commercial and industrial applications, UAV detection is receiving increasing attention in areas such as public safety. As a result, object detection techniques for UAVs are also developing rapidly. However, the small size of drones, complex airspace backgrounds, and changing light conditions still pose significant challenges for research in this area. Based on the above problems, this paper proposes a tiny UAV detection method based on the optimized YOLOv8. First, in the detection head component, a high-resolution detection head is added to improve the device’s detection capability for small targets, while the large target detection head and redundant network layers are cut off to effectively reduce the number of network parameters and improve the detection speed of UAV; second, in the feature extraction stage, SPD-Conv is used to extract multi-scale features instead of Conv to reduce the loss of fine-grained information and enhance the model’s feature extraction capability for small targets. Finally, the GAM attention mechanism is introduced in the neck to enhance the model’s fusion of target features and improve the model’s overall performance in detecting UAVs. Relative to the baseline model, our method improves performance by 11.9%, 15.2%, and 9% in terms of P (precision), R (recall), and mAP (mean average precision), respectively. Meanwhile, it reduces the number of parameters and model size by 59.9% and 57.9%, respectively. In addition, our method demonstrates clear advantages in comparison experiments and self-built dataset experiments and is more suitable for engineering deployment and the practical applications of UAV object detection systems. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

20 pages, 4890 KB

Open AccessArticle

ESD-YOLOv5: A Full-Surface Defect Detection Network for Bearing Collars

by Jiale Li, Haipeng Pan and Junfeng Li

Electronics 2023, 12(16), 3446; https://doi.org/10.3390/electronics12163446 - 15 Aug 2023

Cited by 11 | Viewed by 2685

Abstract

To address the different forms and sizes of bearing collar surface defects, uneven distribution of defect positions, and complex backgrounds, we propose ESD-YOLOv5, an improved algorithm for bearing collar full-surface defect detection. First, a hybrid attention module, ECCA, was constructed by combining an [...] Read more.

To address the different forms and sizes of bearing collar surface defects, uneven distribution of defect positions, and complex backgrounds, we propose ESD-YOLOv5, an improved algorithm for bearing collar full-surface defect detection. First, a hybrid attention module, ECCA, was constructed by combining an efficient channel attention (ECA) mechanism and a coordinate attention (CA) mechanism, which was introduced into the YOLOv5 backbone network to enhance the localization ability of object features by the network. Second, the original neck was replaced by the constructed Slim-neck, which reduces the model’s parameters and computational complexity without sacrificing accuracy for object detection. Furthermore, the original head was replaced by the decoupled head from YOLOX, which separates the classification and regression tasks for object detection. Last, we constructed a dataset of defective bearing collars using images collected from industrial sites and conducted extensive experiments. The results demonstrate that our proposed ESD-YOLOv5 detection model achieved an mAP of 98.6% on our self-built dataset, which is a 2.3% improvement over the YOLOv5 base model. Moreover, it outperformed mainstream one-stage object detection algorithms. Additionally, the bearing collar surface defect detection system developed based on our proposed method has been successfully applied in the industrial domain for bearing collar inspection. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

17 pages, 8743 KB

Open AccessArticle

Digital Twin 3D System for Power Maintenance Vehicles Based on UWB and Deep Learning

by Mingju Chen, Tingting Liu, Jinsong Zhang, Xingzhong Xiong and Feng Liu

Electronics 2023, 12(14), 3151; https://doi.org/10.3390/electronics12143151 - 20 Jul 2023

Cited by 6 | Viewed by 2242

Abstract

To address the issue of the insufficient safety monitoring of power maintenance vehicles during power operations, this study proposes a vehicle monitoring scheme based on ultra wideband (UWB) and deep learning. The UWB localization algorithm employs Chaotic Particle Swarm Optimization (CSPO) to optimize [...] Read more.

To address the issue of the insufficient safety monitoring of power maintenance vehicles during power operations, this study proposes a vehicle monitoring scheme based on ultra wideband (UWB) and deep learning. The UWB localization algorithm employs Chaotic Particle Swarm Optimization (CSPO) to optimize the Time Difference of Arrival (TDOA)/Angle of Arrival (AOA) locating scheme in order to overcome the adverse effects of the non-visual distance and multipath effects in substations and significantly improve the positioning accuracy of vehicles. To solve the problem of the a large aspect ratio and the angle in the process of power maintenance vehicle operation situational awareness in the mechanical arm of the maintenance vehicle, the arm recognition network is based on the You Only Look Once version 5 (YOLOv5) and modified by Convolutional Block Attention Module (CBAM). The long-edge definition method with circular smoothing label, SIoU loss function, and HardSwish activation function enhance the precision and processing speed for the arm state. The experimental results show that the proposed CPSO-TDOA/AOA outperforms other algorithms in localization accuracy and effectively attenuates the non-visual distance and multipath effects. The recognition accuracy of the YOLOv5-CSL-CBAM network is substantially improved; the mAP value of the vehicles arm reaches 85.04%. The detection speed meets the real-time requirement, and the digital twin of the maintenance vehicle is effectively realized in the 3D substation model. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

13 pages, 8618 KB

Open AccessArticle

Lightweight Strawberry Instance Segmentation on Low-Power Devices for Picking Robots

by Leilei Cao, Yaoran Chen and Qiangguo Jin

Electronics 2023, 12(14), 3145; https://doi.org/10.3390/electronics12143145 - 20 Jul 2023

Cited by 15 | Viewed by 2889

Abstract

Machine vision plays a great role in localizing strawberries in a complex orchard or greenhouse for picking robots. Due to the variety of each strawberry (shape, size, and color) and occlusions of strawberries by leaves and stems, precisely locating each strawberry brings a [...] Read more.

Machine vision plays a great role in localizing strawberries in a complex orchard or greenhouse for picking robots. Due to the variety of each strawberry (shape, size, and color) and occlusions of strawberries by leaves and stems, precisely locating each strawberry brings a great challenge to the vision system of picking robots. Several methods have been developed for localizing strawberries, based on the well-known Mask R-CNN network, which, however, are not efficient running on the picking robots. In this paper, we propose a simple and highly efficient framework for strawberry instance segmentation running on low-power devices for picking robots, termed StrawSeg. Instead of using the common paradigm of “detection-then-segment”, we directly segment each strawberry in a single-shot manner without relying on object detection. In our model, we design a novel feature aggregation network to merge features with different scales, which employs a pixel shuffle operation to increase the resolution and reduce the channels of features. Experiments on the open-source dataset StrawDI_Db1 demonstrate that our model can achieve a good trade-off between accuracy and inference speed on a low-power device. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

10 pages, 2419 KB

Open AccessArticle

Content-Aware Image Resizing Technology Based on Composition Detection and Composition Rules

by Bo Wang, Hongyang Si, Huiting Fu, Ruao Gao, Minjuan Zhan, Huili Jiang and Aili Wang

Electronics 2023, 12(14), 3096; https://doi.org/10.3390/electronics12143096 - 17 Jul 2023

Cited by 2 | Viewed by 3127

Abstract

A novel content-aware image resizing mechanism based on composition detection and composition rules is proposed to address the lack of esthetic perception in current content-aware resizing algorithms. A composition detection module is introduced for the detection of the composition of the input image [...] Read more.

A novel content-aware image resizing mechanism based on composition detection and composition rules is proposed to address the lack of esthetic perception in current content-aware resizing algorithms. A composition detection module is introduced for the detection of the composition of the input image types in the proposed algorithm. According to the classification results, the corresponding composition rules in computational esthetics are selected. Finally, the algorithm performs the operations of seam carving using the corresponding esthetic rules. The resized image not only protects the important content of the image, but also meets the composition rules to optimize the overall visual effect of the image. The simulation results show that the proposed algorithm achieves a better visual effect. Compared with the existing algorithms, the proposed algorithm not only effectively protects important image content, but also protects important structures and improves the overall beauty of the image. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

15 pages, 2518 KB

Open AccessArticle

Automatic Fabric Defect Detection Method Using AC-YOLOv5

by Yongbin Guo, Xinjian Kang, Junfeng Li and Yuanxun Yang

Electronics 2023, 12(13), 2950; https://doi.org/10.3390/electronics12132950 - 5 Jul 2023

Cited by 43 | Viewed by 7945

Abstract

In the face of detection problems posed by complex textile texture backgrounds, different sizes, and different types of defects, commonly used object detection networks have limitations in handling target sizes. Furthermore, their stability and anti-jamming capabilities are relatively weak. Therefore, when the target [...] Read more.

In the face of detection problems posed by complex textile texture backgrounds, different sizes, and different types of defects, commonly used object detection networks have limitations in handling target sizes. Furthermore, their stability and anti-jamming capabilities are relatively weak. Therefore, when the target types are more diverse, false detections or missed detections are likely to occur. In order to meet the stringent requirements of textile defect detection, we propose a novel AC-YOLOv5-based textile defect detection method. This method fully considers the optical properties, texture distribution, imaging properties, and detection requirements specific to textiles. First, the Atrous Spatial Pyramid Pooling (ASPP) module is introduced into the YOLOv5 backbone network, and the feature map is pooled using convolution cores with different expansion rates. Multiscale feature information is obtained from feature maps of different receptive fields, which improves the detection of defects of different sizes without changing the resolution of the input image. Secondly, a convolution squeeze-and-excitation (CSE) channel attention module is proposed, and the CSE module is introduced into the YOLOv5 backbone network. The weights of each feature channel are obtained through self-learning to further improve the defect detection and anti-jamming capability. Finally, a large number of fabric images were collected using an inspection system built on a circular knitting machine at an industrial site, and a large number of experiments were conducted using a self-built fabric defect dataset. The experimental results showed that AC-YOLOv5 can achieve an overall detection accuracy of 99.1% for fabric defect datasets, satisfying the requirements for applications in industrial areas. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

18 pages, 1895 KB

Open AccessArticle

An Improved YOLOv5 Underwater Detector Based on an Attention Mechanism and Multi-Branch Reparameterization Module

by Jian Zhang, Hongda Chen, Xinyue Yan, Kexin Zhou, Jinshuai Zhang, Yonghui Zhang, Hong Jiang and Bingqian Shao

Electronics 2023, 12(12), 2597; https://doi.org/10.3390/electronics12122597 - 8 Jun 2023

Cited by 20 | Viewed by 5360

Abstract

Underwater target detection is a critical task in various applications, including environmental monitoring, underwater exploration, and marine resource management. As the demand for underwater observation and exploitation continues to grow, there is a greater need for reliable and efficient methods of detecting underwater [...] Read more.

Underwater target detection is a critical task in various applications, including environmental monitoring, underwater exploration, and marine resource management. As the demand for underwater observation and exploitation continues to grow, there is a greater need for reliable and efficient methods of detecting underwater targets. However, the unique underwater environment often leads to significant degradation of the image quality, which results in reduced detection accuracy. This paper proposes an improved YOLOv5 underwater-target-detection network to enhance accuracy and reduce missed detection. First, we added the global attention mechanism (GAM) to the backbone network, which could retain the channel and spatial information to a greater extent and strengthen cross-dimensional interaction so as to improve the ability of the backbone network to extract features. Then, we introduced the fusion block based on DAMO-YOLO for the neck, which enhanced the system’s ability to extract features at different scales. Finally, we used the SIoU loss to measure the degree of matching between the target box and the regression box, which accelerated the convergence and improved the accuracy. The results obtained from experiments on the URPC2019 dataset revealed that our model achieved an mAP@0.5 score of 80.2%, representing a 1.8% and 2.3% increase in performance compared to YOLOv7 and YOLOv8, respectively, which means our method achieved state-of-the-art (SOTA) performance. Moreover, additional evaluations on the MS COCO dataset indicated that our model’s mAP@0.5:0.95 reached 51.0%, surpassing advanced methods such as ViDT and RF-Next, demonstrating the versatility of our enhanced model architecture. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

25 pages, 1204 KB

Open AccessArticle

Research on Railway Dispatcher Fatigue Detection Method Based on Deep Learning with Multi-Feature Fusion

by Liang Chen and Wei Zheng

Electronics 2023, 12(10), 2303; https://doi.org/10.3390/electronics12102303 - 19 May 2023

Cited by 9 | Viewed by 3118

Abstract

Traffic command and scheduling are the core monitoring aspects of railway transportation. Detecting the fatigued state of dispatchers is, therefore, of great significance to ensure the safety of railway operations. In this paper, we present a multi-feature fatigue detection method based on key [...] Read more.

Traffic command and scheduling are the core monitoring aspects of railway transportation. Detecting the fatigued state of dispatchers is, therefore, of great significance to ensure the safety of railway operations. In this paper, we present a multi-feature fatigue detection method based on key points of the human face and body posture. Considering unfavorable factors such as facial occlusion and angle changes that have limited single-feature fatigue state detection methods, we developed our model based on the fusion of body postures and facial features for better accuracy. Using facial key points and eye features, we calculate the percentage of eye closure that accounts for more than 80% of the time duration, as well as blinking and yawning frequency, and we analyze fatigue behaviors, such as yawning, a bowed head (that could indicate sleep state), and lying down on a table, using a behavior recognition algorithm. We fuse five facial features and behavioral postures to comprehensively determine the fatigue state of dispatchers. The results show that on the 300 W dataset, as well as a hand-crafted dataset, the inference time of the improved facial key point detection algorithm based on the retina–face model was 100 ms and that the normalized average error (NME) was 3.58. On our own dataset, the classification accuracy based the an Bi-LSTM-SVM adaptive enhancement algorithm model reached 97%. Video data of volunteers who carried out scheduling operations in the simulation laboratory were used for our experiments, and our multi-feature fusion fatigue detection algorithm showed an accuracy rate of 96.30% and a recall rate of 96.30% in fatigue classification, both of which were higher than those of existing single-feature detection methods. Our multi-feature fatigue detection method offers a potential solution for fatigue level classification in vital areas of the industry, such as in railway transportation. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Advances in Computer Vision and Deep Learning and Its Applications

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (35 papers)

Editorial

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI