MDPI - Publisher of Open Access Journals

17 pages, 904 KiB

Open AccessArticle

Apple Detection via Near-Field MIMO-SAR Imaging: A Multi-Scale and Context-Aware Approach

by Yuanping Shi, Yanheng Ma and Liang Geng

Sensors 2025, 25(5), 1536; https://doi.org/10.3390/s25051536 - 1 Mar 2025

Viewed by 1028

Accurate fruit detection is of great importance for yield assessment, timely harvesting, and orchard management strategy optimization in precision agriculture. Traditional optical imaging methods are limited by lighting and meteorological conditions, making it difficult to obtain stable, high-quality data. Therefore, this study utilizes [...] Read more.

Accurate fruit detection is of great importance for yield assessment, timely harvesting, and orchard management strategy optimization in precision agriculture. Traditional optical imaging methods are limited by lighting and meteorological conditions, making it difficult to obtain stable, high-quality data. Therefore, this study utilizes near-field millimeter-wave MIMO-SAR (Multiple Input Multiple Output Synthetic Aperture Radar) technology, which is capable of all-day and all-weather imaging, to perform high-precision detection of apple targets in orchards. This paper first constructs a near-field millimeter-wave MIMO-SAR imaging system and performs multi-angle imaging on real fruit tree samples, obtaining about 150 sets of SAR-optical paired data, covering approximately 2000 accurately annotated apple targets. Addressing challenges such as weak scattering, low texture contrast, and complex backgrounds in SAR images, we propose an innovative detection framework integrating Dynamic Spatial Pyramid Pooling (DSPP), Recursive Feature Fusion Network (RFN), and Context-Aware Feature Enhancement (CAFE) modules. DSPP employs a learnable adaptive mechanism to dynamically adjust multi-scale feature representations, enhancing sensitivity to apple targets of varying sizes and distributions; RFN uses a multi-round iterative feature fusion strategy to gradually refine semantic consistency and stability, improving the robustness of feature representation under weak texture and high noise scenarios; and the CAFE module, based on attention mechanisms, explicitly models global and local associations, fully utilizing the scene context in texture-poor SAR conditions to enhance the discriminability of apple targets. Experimental results show that the proposed method achieves significant improvements in average precision (AP), recall rate, and F1 score on the constructed near-field millimeter-wave SAR apple dataset compared to various classic and mainstream detectors. Ablation studies confirm the synergistic effect of DSPP, RFN, and CAFE. Qualitative analysis demonstrates that the detection framework proposed in this paper can still stably locate apple targets even under conditions of leaf occlusion, complex backgrounds, and weak scattering. This research provides a beneficial reference and technical basis for using SAR data in fruit detection and yield estimation in precision agriculture. Full article

(This article belongs to the Section Smart Agriculture)

► Show Figures

Figure 1

16 pages, 1633 KiB

Open AccessArticle

Advancing Rice Grain Impurity Segmentation with an Enhanced SegFormer and Multi-Scale Feature Integration

by Xiulin Qiu, Hongzhi Yao, Qinghua Liu, Hongrui Liu, Haozhi Zhang and Mengdi Zhao

Entropy 2025, 27(1), 70; https://doi.org/10.3390/e27010070 - 15 Jan 2025

Viewed by 1093

Abstract

During the rice harvesting process, severe occlusion and adhesion exist among multiple targets, such as rice, straw, and leaves, making it difficult to accurately distinguish between rice grains and impurities. To address the current challenges, a lightweight semantic segmentation algorithm for impurities based [...] Read more.

During the rice harvesting process, severe occlusion and adhesion exist among multiple targets, such as rice, straw, and leaves, making it difficult to accurately distinguish between rice grains and impurities. To address the current challenges, a lightweight semantic segmentation algorithm for impurities based on an improved SegFormer network is proposed. To make full use of the extracted features, the decoder was redesigned. First, the Feature Pyramid Network (FPN) was introduced to optimize the structure, selectively fusing the high-level semantic features and low-level texture features generated by the encoder. Secondly, a Part Large Kernel Attention (Part-LKA) module was designed and introduced after feature fusion to help the model focus on key regions, simplifying the model and accelerating computation. Finally, to compensate for the lack of spatial interaction capabilities, Bottleneck Recursive Gated Convolution (B-

g^{n}

Conv) was introduced to achieve effective segmentation of rice grains and impurities. Compared with the original model, the improved model’s pixel accuracy (PA) and F1 score increased by 1.6% and 3.1%, respectively. This provides a valuable algorithmic reference for designing a real-time impurity rate monitoring system for rice combine harvesters. Full article

(This article belongs to the Special Issue Application of Information Theory to Computer Vision and Image Processing II)

► Show Figures

Figure 1

20 pages, 7008 KiB

Open AccessArticle

A New Deep Neural Network Based on SwinT-FRM-ShipNet for SAR Ship Detection in Complex Near-Shore and Offshore Environments

by Zhuhao Lu, Pengfei Wang, Yajun Li and Baogang Ding

Remote Sens. 2023, 15(24), 5780; https://doi.org/10.3390/rs15245780 - 18 Dec 2023

Cited by 10 | Viewed by 1971

Abstract

The advent of deep learning has significantly propelled the utilization of neural networks for Synthetic Aperture Radar (SAR) ship detection in recent years. However, there are two main obstacles in SAR detection. Challenge 1: The multiscale nature of SAR ships. Challenge 2: The [...] Read more.

The advent of deep learning has significantly propelled the utilization of neural networks for Synthetic Aperture Radar (SAR) ship detection in recent years. However, there are two main obstacles in SAR detection. Challenge 1: The multiscale nature of SAR ships. Challenge 2: The influence of intricate near-shore environments and the interference of clutter noise in offshore areas, especially affecting small-ship detection. Existing neural network-based approaches attempt to tackle these challenges, yet they often fall short in effectively addressing small-ship detection across multiple scales and complex backgrounds simultaneously. To overcome these challenges, we propose a novel network called SwinT-FRM-ShipNet. Our method introduces an integrated feature extractor, Swin-T-YOLOv5l, which combines Swin Transformer and YOLOv5l. The extractor is designed to highlight the differences between the complex background and the target by encoding both local and global information. Additionally, a feature pyramid IEFR-FPN, consisting of the Information Enhancement Module (IEM) and the Feature Refinement Module (FRM), is proposed to enrich the flow of spatial contextual information, fuse multiresolution features, and refine representations of small and multiscale ships. Furthermore, we introduce recursive gated convolutional prediction heads (GCPH) to explore the potential of high-order spatial interactions and add a larger-sized prediction head to focus on small ships. Experimental results demonstrate the superior performance of our method compared to mainstream approaches on the SSDD and SAR-Ship-Dataset. Our method achieves an F1 score, mAP_0.5, and mAP_0.5:0.95 of 96.5% (+0.9), 98.2% (+1.0%), and 75.4% (+3.3%), respectively, surpassing the most competitive algorithms. Full article

(This article belongs to the Special Issue Advanced Machine Learning and Deep Learning Approaches for Remote Sensing III)

► Show Figures

Figure 1

17 pages, 1601 KiB

Open AccessArticle

Counting Crowded Soybean Pods Based on Deformable Attention Recursive Feature Pyramid

by Can Xu, Yinhao Lu, Haiyan Jiang, Sheng Liu, Yushi Ma and Tuanjie Zhao

Agronomy 2023, 13(6), 1507; https://doi.org/10.3390/agronomy13061507 - 30 May 2023

Cited by 10 | Viewed by 2141

Abstract

Counting the soybean pods automatically has been one of the key ways to realize intelligent soybean breeding in modern smart agriculture. However, the pod counting accuracy for whole soybean plants is still limited due to the crowding and uneven distribution of pods. In [...] Read more.

Counting the soybean pods automatically has been one of the key ways to realize intelligent soybean breeding in modern smart agriculture. However, the pod counting accuracy for whole soybean plants is still limited due to the crowding and uneven distribution of pods. In this paper, based on the VFNet detector, we propose a deformable attention recursive feature pyramid network for soybean pod counting (DARFP-SD), which aims to identify the number of soybean pods accurately. Specifically, to improve the feature quality, DARFP-SD first introduces the deformable convolutional networks (DCN) and attention recursive feature pyramid (ARFP) to reduce noise interference during feature learning. DARFP-SD further combines the Repulsion Loss to correct the error of predicted bboxse coming from the mutual interference between dense pods. DARFP-SD also designs a density prediction branch in the post-processing stage, which learns an adaptive soft distance IoU to assign suitable NMS threshold for different counting scenes with uneven soybean pod distributions. The model is trained on a dense soybean dataset with more than 5300 pods from three different shapes and two classes, which consists of a training set of 138 images, a validation set of 46 images and a test set of 46 images. Extensive experiments have verified the performance of proposed DARFP-SD. The final training loss is 1.281, and an average accuracy of 90.35%, an average recall of 85.59% and a F1 score of 87.90% can be achieved, outperforming the baseline method VFNet by 8.36%, 4.55% and 7.81%, respectively. We also validate the application effect for different numbers of soybean pods and differnt shapes of soybean. All the results show the effectiveness of the DARFP-SD, which can provide a new insight into the soybean pod counting task. Full article

(This article belongs to the Special Issue Precision Operation Technology and Intelligent Equipment in Farmland)

► Show Figures

Figure 1

23 pages, 35922 KiB

Open AccessArticle

MUREN: MUltistage Recursive Enhanced Network for Coal-Fired Power Plant Detection

by Shuai Yuan, Juepeng Zheng, Lixian Zhang, Runmin Dong, Ray C. C. Cheung and Haohuan Fu

Remote Sens. 2023, 15(8), 2200; https://doi.org/10.3390/rs15082200 - 21 Apr 2023

Cited by 6 | Viewed by 1933

Abstract

The accurate detection of coal-fired power plants (CFPPs) is meaningful for environmental protection, while challenging. The CFPP is a complex combination of multiple components with varying layouts, unlike clearly defined single objects, such as vehicles. CFPPs are typically located in industrial districts with [...] Read more.

The accurate detection of coal-fired power plants (CFPPs) is meaningful for environmental protection, while challenging. The CFPP is a complex combination of multiple components with varying layouts, unlike clearly defined single objects, such as vehicles. CFPPs are typically located in industrial districts with similar backgrounds, further complicating the detection task. To address this issue, we propose a MUltistage Recursive Enhanced Detection Network (MUREN) for accurate and efficient CFPP detection. The effectiveness of MUREN lies in the following: First, we design a symmetrically enhanced module, including a spatial-enhanced subnetwork (SEN) and a channel-enhanced subnetwork (CEN). SEN learns the spatial relationships to obtain spatial context information. CEN provides adaptive channel recalibration, restraining noise disturbance and highlighting CFPP features. Second, we use a recursive construction set on top of feature pyramid networks to receive features more than once, strengthening feature learning for relatively small CFPPs. We conduct comparative and ablation experiments in two datasets and apply MUREN to the Pearl River Delta region in Guangdong province for CFPP detection. The comparative experiment results show that MUREN improves the mAP by 5.98% compared with the baseline method and outperforms by 4.57–21.38% the existing cutting-edge detection methods, which indicates the promising potential of MUREN in large-scale CFPP detection scenarios. Full article

(This article belongs to the Special Issue Artificial Intelligence-Driven Methods for Remote Sensing Target and Object Detection)

► Show Figures

Figure 1

12 pages, 3005 KiB

Open AccessArticle

Novel Recursive BiFPN Combining with Swin Transformer for Wildland Fire Smoke Detection

by Ao Li, Yaqin Zhao and Zhaoxiang Zheng

Forests 2022, 13(12), 2032; https://doi.org/10.3390/f13122032 - 30 Nov 2022

Cited by 23 | Viewed by 3736

Abstract

The technologies and models based on machine vision are widely used for early wildfire detection. Due to the broadness of wild scene and the occlusion of the vegetation, smoke is more easily detected than flame. However, the shapes of the smoke blown by [...] Read more.

The technologies and models based on machine vision are widely used for early wildfire detection. Due to the broadness of wild scene and the occlusion of the vegetation, smoke is more easily detected than flame. However, the shapes of the smoke blown by the wind change constantly and the smoke colors from different combustors vary greatly. Therefore, the existing target detection networks have limitations in detecting wildland fire smoke, such as low detection accuracy and high false alarm rate. This paper designs the attention model Recursive Bidirectional Feature Pyramid Network (RBiFPN for short) for the fusion and enhancement of smoke features. We introduce RBiFPN into the backbone network of YOLOV5 frame to better distinguish the subtle difference between clouds and smoke. In addition, we replace the classification head of YOLOV5 with Swin Transformer, which helps to change the receptive fields of the network with the size of smoke regions and enhance the capability of modeling local features and global features. We tested the proposed model on the dataset containing a large number of interference objects such as clouds and fog. The experimental results show that our model can detect wildfire smoke with a higher performance than the state-of-the-art methods. Full article

(This article belongs to the Special Issue Forest Fires Prediction and Detection)

► Show Figures

Figure 1

25 pages, 5239 KiB

Open AccessArticle

Novel Asymmetric Pyramid Aggregation Network for Infrared Dim and Small Target Detection

by Guangrui Lv, Lili Dong, Junke Liang and Wenhai Xu

Remote Sens. 2022, 14(22), 5643; https://doi.org/10.3390/rs14225643 - 8 Nov 2022

Cited by 7 | Viewed by 2449

Abstract

Robust and efficient detection of small infrared target is a critical and challenging task in infrared search and tracking applications. The size of the small infrared targets is relatively tiny compared to the ordinary targets, and the sizes and appearances of the these [...] Read more.

Robust and efficient detection of small infrared target is a critical and challenging task in infrared search and tracking applications. The size of the small infrared targets is relatively tiny compared to the ordinary targets, and the sizes and appearances of the these targets in different scenarios are quite different. Besides, these targets are easily submerged in various background noise. To tackle the aforementioned challenges, a novel asymmetric pyramid aggregation network (APANet) is proposed. Specifically, a pyramid structure integrating dual attention and dense connection is firstly constructed, which can not only generate attention-refined multi-scale features in different layers, but also preserve the primitive features of infrared small targets among multi-scale features. Then, the adjacent cross-scale features in these multi-scale information are sequentially modulated through pair-wise asymmetric combination. This mutual dynamic modulation can continuously exchange heterogeneous cross-scale information along the layer-wise aggregation path until an inverted pyramid is generated. In this way, the semantic features of lower-level network are enriched by incorporating local focus from higher-level network while the detail features of high-level network are refined by embedding point-wise focus from lower-level network, which can highlight small target features and suppress background interference. Subsequently, recursive asymmetric fusion is designed to further dynamically modulate and aggregate high resolution features of different layers in the inverted pyramid, which can also enhance the local high response of small target. Finally, a series of comparative experiments are conducted on two public datasets, and the experimental results show that the APANet can more accurately detect small targets compared to some state-of-the-art methods. Full article

(This article belongs to the Special Issue Deep Learning Based Target Detection and Recognition in Remote Sensing Images)

► Show Figures

Graphical abstract

16 pages, 8509 KiB

Open AccessArticle

Netting Damage Detection for Marine Aquaculture Facilities Based on Improved Mask R-CNN

by Ziliang Zhang, Fukun Gui, Xiaoyu Qu and Dejun Feng

J. Mar. Sci. Eng. 2022, 10(7), 996; https://doi.org/10.3390/jmse10070996 - 21 Jul 2022

Cited by 16 | Viewed by 3550

Abstract

Netting damage limits the safe development of marine aquaculture. In order to identify and locate damaged netting accurately, we propose a detection method using an improved Mask R-CNN. We create an image dataset of different kinds of damage from a mix of conditions [...] Read more.

Netting damage limits the safe development of marine aquaculture. In order to identify and locate damaged netting accurately, we propose a detection method using an improved Mask R-CNN. We create an image dataset of different kinds of damage from a mix of conditions and enhance it by data augmentation. We then introduce the Recursive Feature Pyramid (RFP) and Deformable Convolution Network (DCN) structures into the learning framework to optimize the basic backbone for a marine environment and build a feature map with both high-level semantic and low-level localization information of the network. This modification solves the problem of poor detection performance in damaged nets with small and irregular damage. Experimental results show that these changes improve the average precision of the model significantly, to 94.48%, which is 7.86% higher than the original method. The enhanced model performs rapidly, with a missing rate of about 7.12% and a detection period of 4.74 frames per second. Compared with traditional image processing methods, the proposed netting damage detection model is robust and better balances detection precision and speed. Our method provides an effective solution for detecting netting damage in marine aquaculture environments. Full article

(This article belongs to the Section Marine Aquaculture)

► Show Figures

Figure 1

18 pages, 5770 KiB

Open AccessArticle

Enhancing Precision with an Ensemble Generative Adversarial Network for Steel Surface Defect Detectors (EnsGAN-SDD)

by Fityanul Akhyar, Elvin Nur Furqon and Chih-Yang Lin

Sensors 2022, 22(11), 4257; https://doi.org/10.3390/s22114257 - 2 Jun 2022

Cited by 9 | Viewed by 3078

Abstract

Defects are the primary problem affecting steel product quality in the steel industry. The specific challenges in developing detect defectors involve the vagueness and tiny size of defects. To solve these problems, we propose incorporating super-resolution technique, sequential feature pyramid network, and boundary [...] Read more.

Defects are the primary problem affecting steel product quality in the steel industry. The specific challenges in developing detect defectors involve the vagueness and tiny size of defects. To solve these problems, we propose incorporating super-resolution technique, sequential feature pyramid network, and boundary localization. Initially, the ensemble of enhanced super-resolution generative adversarial networks (ESRGAN) was proposed for the preprocessing stage to generate a more detailed contour of the original steel image. Next, in the detector section, the latest state-of-the-art feature pyramid network, known as De-tectoRS, utilized the recursive feature pyramid network technique to extract deeper multi-scale steel features by learning the feedback from the sequential feature pyramid network. Finally, Side-Aware Boundary Localization was used to precisely generate the output prediction of the defect detectors. We named our approach EnsGAN-SDD. Extensive experimental studies showed that the proposed methods improved the defect detector’s performance, which also surpassed the accuracy of state-of-the-art methods. Moreover, the proposed EnsGAN achieved better performance and effectiveness in processing time compared with the original ESRGAN. We believe our innovation could significantly contribute to improved production quality in the steel industry. Full article

(This article belongs to the Special Issue Vision Sensors for Object Detection and Recognition)

► Show Figures

Figure 1

11 pages, 5468 KiB

Open AccessArticle

A YOLO-Based Target Detection Model for Offshore Unmanned Aerial Vehicle Data

by Zhenhua Wang, Xinyue Zhang, Jing Li and Kuifeng Luan

Sustainability 2021, 13(23), 12980; https://doi.org/10.3390/su132312980 - 24 Nov 2021

Cited by 22 | Viewed by 3387

Abstract

Target detection in offshore unmanned aerial vehicle data is still a challenge due to the complex characteristics of targets, such as multi-sizes, alterable orientation, and complex backgrounds. Herein, a YOLO-based detection model (YOLO-D) was proposed for target detection in offshore unmanned aerial vehicle [...] Read more.

Target detection in offshore unmanned aerial vehicle data is still a challenge due to the complex characteristics of targets, such as multi-sizes, alterable orientation, and complex backgrounds. Herein, a YOLO-based detection model (YOLO-D) was proposed for target detection in offshore unmanned aerial vehicle data. Based on the YOLOv3 network, the residual module was improved by establishing dense connections and adding a dual-attention mechanism (CBAM) to enhance the use of features and global information. Then, the loss function of the YOLO-D model was added to the weight coefficients to increase detection accuracy for small-size targets. Finally, the feature pyramid network (FPN) was replaced by the secondary recursive feature pyramid network to reduce the impacts of a complicated environment. Taking the car, boat, and deposit near the coastline as the targets, the proposed YOLO-D model was compared against other models, including the faster R-CNN, SSD, YOLOv3, and YOLOv5, to evaluate its detection performance. The results showed that the evaluation metrics of the YOLO-D model, including precision (Pr), recall (Re), average precision (AP), and the mean of average precision (mAP), had the highest values. The mAP of the YOLO-D model increased by 37.95%, 39.44%, 28.46%, and 5.08% compared to the faster R-CNN, SSD, YOLOv3, and YOLOv5, respectively. The AP of the car, boat, and deposit reached 96.24%, 93.70%, and 96.79% respectively. Moreover, the YOLO-D model had a higher detection accuracy than other models, especially in the detection of small-size targets. Collectively, the proposed YOLO-D model is a suitable model for target detection in offshore unmanned aerial vehicle data. Full article

► Show Figures

Figure 1

23 pages, 3571 KiB

Open AccessArticle

CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution

by Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker and Muhammad Zeshan Afzal

J. Imaging 2021, 7(10), 214; https://doi.org/10.3390/jimaging7100214 - 16 Oct 2021

Cited by 26 | Viewed by 3786

Abstract

Table detection is a preliminary step in extracting reliable information from tables in scanned document images. We present CasTabDetectoRS, a novel end-to-end trainable table detection framework that operates on Cascade Mask R-CNN, including Recursive Feature Pyramid network and Switchable Atrous Convolution in the [...] Read more.

Table detection is a preliminary step in extracting reliable information from tables in scanned document images. We present CasTabDetectoRS, a novel end-to-end trainable table detection framework that operates on Cascade Mask R-CNN, including Recursive Feature Pyramid network and Switchable Atrous Convolution in the existing backbone architecture. By utilizing a comparativelyightweight backbone of ResNet-50, this paper demonstrates that superior results are attainable without relying on pre- and post-processing methods, heavier backbone networks (ResNet-101, ResNeXt-152), and memory-intensive deformable convolutions. We evaluate the proposed approach on five different publicly available table detection datasets. Our CasTabDetectoRS outperforms the previous state-of-the-art results on four datasets (ICDAR-19, TableBank, UNLV, and Marmot) and accomplishes comparable results on ICDAR-17 POD. Upon comparing with previous state-of-the-art results, we obtain a significant relative error reduction of

56.36 %

,

20 %

,

4.5 %

, and

3.5 %

on the datasets of ICDAR-19, TableBank, UNLV, and Marmot, respectively. Furthermore, this paper sets a new benchmark by performing exhaustive cross-datasets evaluations to exhibit the generalization capabilities of the proposed method. Full article

(This article belongs to the Section Document Analysis and Processing)

► Show Figures

Figure 1

14 pages, 4311 KiB

Open AccessArticle

Reinforced Neighbour Feature Fusion Object Detection with Deep Learning

by Ningwei Wang, Yaze Li and Hongzhe Liu

Symmetry 2021, 13(9), 1623; https://doi.org/10.3390/sym13091623 - 3 Sep 2021

Cited by 6 | Viewed by 2289

Abstract

Neural networks have enabled state-of-the-art approaches to achieve incredible results on computer vision tasks such as object detection. However, previous works have tried to improve the performance in various object detection necks but have failed to extract features efficiently. To solve the insufficient [...] Read more.

Neural networks have enabled state-of-the-art approaches to achieve incredible results on computer vision tasks such as object detection. However, previous works have tried to improve the performance in various object detection necks but have failed to extract features efficiently. To solve the insufficient features of objects, this work introduces some of the most advanced and representative network models based on the Faster R-CNN architecture, such as Libra R-CNN, Grid R-CNN, guided anchoring, and GRoIE. We observed the performance of Neighbour Feature Pyramid Network (NFPN) fusion, ResNet Region of Interest Feature Extraction (ResRoIE) and the Recursive Feature Pyramid (RFP) architecture at different scales of precision when these components were used in place of the corresponding original members in various networks obtained on the MS COCO dataset. Compared to the experimental results after replacing the neck and RoIE parts of these models with our Reinforced Neighbour Feature Fusion (RNFF) model, the average precision (AP) is increased by 3.2 percentage points concerning the performance of the baseline network. Full article

(This article belongs to the Special Issue Symmetry in Computer Vision and Its Applications)

► Show Figures

Figure 1

17 pages, 3430 KiB

Open AccessArticle

RCBi-CenterNet: An Absolute Pose Policy for 3D Object Detection in Autonomous Driving

by Kang An, Yixin Chen, Suhong Wang and Zhifeng Xiao

Appl. Sci. 2021, 11(12), 5621; https://doi.org/10.3390/app11125621 - 18 Jun 2021

Cited by 4 | Viewed by 2928

Abstract

3D Object detection is a critical mission of the perception system of a self-driving vehicle. Existing bounding box-based methods are hard to train due to the need to remove duplicated detections in the post-processing stage. In this paper, we propose a center point-based [...] Read more.

3D Object detection is a critical mission of the perception system of a self-driving vehicle. Existing bounding box-based methods are hard to train due to the need to remove duplicated detections in the post-processing stage. In this paper, we propose a center point-based deep neural network (DNN) architecture named RCBi-CenterNet that predicts the absolute pose for each detected object in the 3D world space. RCBi-CenterNet is composed of a recursive composite network with a dual-backbone feature extractor and a bi-directional feature pyramid network (BiFPN) for cross-scale feature fusion. In the detection head, we predict a confidence heatmap that is used to determine the position of detected objects. The other pose information, including depth and orientation, is regressed. We conducted extensive experiments on the Peking University/Baidu-Autonomous Driving dataset, which contains more than 60,000 labeled 3D vehicle instances from 5277 real-world images, and each vehicle object is annotated with the absolute pose described by the six degrees of freedom (6DOF). We validated the design choices of various data augmentation methods and the backbone options. Through an ablation study and an overall comparison with the state-of-the-art (SOTA), namely CenterNet, we showed that the proposed RCBi-CenterNet presents performance gains of 2.16%, 2.76%, and 5.24% in Top 1, Top 3, and Top 10 mean average precision (mAP). The model and the result could serve as a credible benchmark for future research in center point-based object detection. Full article

(This article belongs to the Special Issue New Trends on Pattern Recognition and Computer Vision, Applications and Systems)

► Show Figures

Figure 1

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI