Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (21)

Search Parameters:
Keywords = bounding box re-prediction

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 5844 KB  
Article
Cloud Particle Detection in 2D-S Imaging Data via an Adaptive Anchor SSD Model
by Shuo Liu, Dingkun Yang and Luhong Fan
Atmosphere 2025, 16(8), 985; https://doi.org/10.3390/atmos16080985 - 19 Aug 2025
Viewed by 263
Abstract
The airborne 2D-S optical array probe has worked for more than ten years and has collected a large number of cloud particle images. However, existing detection methods cannot detect cloud particles with high precision due to the size differences of cloud particles and [...] Read more.
The airborne 2D-S optical array probe has worked for more than ten years and has collected a large number of cloud particle images. However, existing detection methods cannot detect cloud particles with high precision due to the size differences of cloud particles and the occurrence of particle fragmentation during imaging. So, this paper proposes a novel cloud particle detection method. The key innovation is an adaptive anchor SSD module, which overcomes existing limitations by generating anchor points that adaptively align with cloud particle size distributions. Firstly, morphological transformations generate multi-scale image information through repeated dilation and erosion operations, while removing irrelevant artifacts and fragmented particles for data cleaning. After that, the method generates geometric and mass centers across multiple scales and dynamically merges these centers to form adaptive anchor points. Finally, a detection module integrates a modified SSD with a ResNet-50 backbone for accurate bounding box predictions. Experimental results show that the proposed method achieves an mAP of 0.934 and a recall of 0.905 on the test set, demonstrating its effectiveness and reliability for cloud particle detection using the 2D-S probe. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Atmospheric Sciences)
Show Figures

Figure 1

14 pages, 2136 KB  
Article
YOLO-TARC: YOLOv10 with Token Attention and Residual Convolution for Small Void Detection in Root Canal X-Ray Images
by Yin Pan, Zhenpeng Zhang, Xueyang Zhang, Zhi Zeng and Yibin Tian
Sensors 2025, 25(10), 3036; https://doi.org/10.3390/s25103036 - 12 May 2025
Viewed by 712
Abstract
The detection of small voids or defects in X-ray images of tooth root canals still faces challenges. To address the issue, this paper proposes an improved YOLOv10 that combines Token Attention with Residual Convolution (ResConv), termed YOLO-TARC. To overcome the limitations of existing [...] Read more.
The detection of small voids or defects in X-ray images of tooth root canals still faces challenges. To address the issue, this paper proposes an improved YOLOv10 that combines Token Attention with Residual Convolution (ResConv), termed YOLO-TARC. To overcome the limitations of existing deep learning models in effectively retaining key features of small objects and their insufficient focusing capabilities, we introduce three improvements. First, ResConv is designed to ensure the transmission of discriminative features of small objects during feature propagation, leveraging the ability of residual connections to transmit information from one layer to the next. Second, to tackle the issue of weak focusing capabilities on small targets, a Token Attention module is introduced before the third small object detection head. By tokenizing feature maps and enhancing local focusing, it enables the model to pay closer attention to small targets. Additionally, to optimize the training process, a bounding box loss function is adopted to achieve faster and more accurate bounding box predictions. YOLO-TARC simultaneously enhances the ability to retain detailed information of small targets and improves their focusing capabilities, thereby increasing detection accuracy. Experimental results on a private root canal X-ray image dataset demonstrate that YOLO-TARC outperforms other state-of-the-art object detection models, achieving a 7.5% improvement to 80.8% in mAP50 and a 6.2% increase to 80.0% in Recall. YOLO-TARC can contribute to more accurate and efficient objective postoperative evaluation of root canal treatments. Full article
(This article belongs to the Special Issue Biomedical Sensing System Based on Image Analysis)
Show Figures

Figure 1

17 pages, 3869 KB  
Article
Prediction of Extensibility and Toughness of Wheat-Flour Dough Using Bubble Inflation–Structured Light Scanning 3D Imaging Technology and the Enhanced 3D Vgg11 Model
by Xiuzhi Luo, Changhe Niu, Zhaoshuai Zhu, Yuxin Hou, Hong Jiang and Xiuying Tang
Foods 2025, 14(8), 1295; https://doi.org/10.3390/foods14081295 - 8 Apr 2025
Viewed by 567
Abstract
The extensibility of dough and its resistance to extension (toughness) are important indicators, since they are directly linked to dough quality. Therefore, this paper used an independently developed device to blow sheeted dough, and then a three-dimensional (3D) camera was used to continuously [...] Read more.
The extensibility of dough and its resistance to extension (toughness) are important indicators, since they are directly linked to dough quality. Therefore, this paper used an independently developed device to blow sheeted dough, and then a three-dimensional (3D) camera was used to continuously collect point cloud images of sheeted dough forming bubbles. After data collection, the rotation algorithm, region of interest (ROI) extraction algorithm, and statistical filtering algorithm were used to process the original point cloud images. Lastly, the oriented bounding box (OBB) algorithm was proposed to calculate the deformation height of each data point. And the point cloud image with the largest deformation depth was selected as the data to input into the 3D convolutional neural network (CNN) models. The Convolutional Block Attention Module (CBAM) was introduced into the 3D Visual Geometry Group 11 (Vgg11) model to build the enhanced Vgg11. And we compared it with the other classical 3D CNN models (MobileNet, ResNet18, and Vgg11) by inputting the voxel-point-based data and the voxel-based data separately into these models. The results showed that the enhanced 3D Vgg11 model using voxel-point-based data was superior to the other models. For prediction of dough extensibility and toughness, the Rp was 0.893 and 0.878, respectively. Full article
(This article belongs to the Section Food Engineering and Technology)
Show Figures

Figure 1

22 pages, 9649 KB  
Article
YOLO-OHFD: A YOLO-Based Oriented Hair Follicle Detection Method for Robotic Hair Transplantation
by Hui Wang and Xin Liu
Appl. Sci. 2025, 15(6), 3208; https://doi.org/10.3390/app15063208 - 15 Mar 2025
Cited by 1 | Viewed by 1412
Abstract
Hair loss affects over 30% of the global population, impacting psychological well-being and social interactions. Robotic hair transplantation has emerged as a pivotal solution, requiring precise hair follicle detection for effective treatment. Traditional methods utilizing horizontal bounding boxes (HBBs) often misclassify due to [...] Read more.
Hair loss affects over 30% of the global population, impacting psychological well-being and social interactions. Robotic hair transplantation has emerged as a pivotal solution, requiring precise hair follicle detection for effective treatment. Traditional methods utilizing horizontal bounding boxes (HBBs) often misclassify due to the follicles’ elongated shapes and varied orientations. This study introduces YOLO-OHFD, a novel YOLO-based method using oriented bounding boxes (OBBs) for improved hair follicle detection in dermoscopic images, addressing the limitations of traditional HBB approaches by enhancing detection accuracy and computational efficiency. YOLO-OHFD incorporates the ECA-Res2Block in its feature extraction network to manage occlusions and hair follicle orientation variations effectively. A Feature Alignment Module (FAM) is embedded within the feature fusion network to ensure precise multi-scale feature integration. We utilize angle classification over regression for robust angle prediction. The method was validated using a custom dataset comprising 500 dermoscopic images with detailed annotations of hair follicle orientations and classifications. The proposed YOLO-OHFD method outperformed existing techniques, achieving a mean average precision (mAP) of 87.01% and operating at 43.67 frames per second (FPS). These metrics attest to its efficacy and real-time application potential. The angle classification component particularly enhanced the stability and precision of orientation predictions, critical for the accurate positioning required in robotic procedures. YOLO-OHFD represents a significant advancement in robotic hair transplantation, providing a robust framework for precise, efficient, and real-time hair follicle detection. Future work will focus on refining computational efficiency and testing in dynamic surgical environments to broaden the clinical applicability of this technology. Full article
Show Figures

Figure 1

17 pages, 8138 KB  
Article
Deep Learning Models Based on Pretreatment MRI and Clinicopathological Data to Predict Responses to Neoadjuvant Systemic Therapy in Triple-Negative Breast Cancer
by Zhan Xu, Zijian Zhou, Jong Bum Son, Haonan Feng, Beatriz E. Adrada, Tanya W. Moseley, Rosalind P. Candelaria, Mary S. Guirguis, Miral M. Patel, Gary J. Whitman, Jessica W. T. Leung, Huong T. C. Le-Petross, Rania M. Mohamed, Bikash Panthi, Deanna L. Lane, Huiqin Chen, Peng Wei, Debu Tripathy, Jennifer K. Litton, Vicente Valero, Lei Huo, Kelly K. Hunt, Anil Korkut, Alastair Thompson, Wei Yang, Clinton Yam, Gaiane M. Rauch and Jingfei Maadd Show full author list remove Hide full author list
Cancers 2025, 17(6), 966; https://doi.org/10.3390/cancers17060966 - 13 Mar 2025
Cited by 5 | Viewed by 1735
Abstract
Purpose: To develop deep learning models for predicting the pathologic complete response (pCR) to neoadjuvant systemic therapy (NAST) in patients with triple-negative breast cancer (TNBC) based on pretreatment multiparametric breast MRI and clinicopathological data. Methods: The prospective institutional review board-approved study [NCT02276443] included [...] Read more.
Purpose: To develop deep learning models for predicting the pathologic complete response (pCR) to neoadjuvant systemic therapy (NAST) in patients with triple-negative breast cancer (TNBC) based on pretreatment multiparametric breast MRI and clinicopathological data. Methods: The prospective institutional review board-approved study [NCT02276443] included 282 patients with stage I–III TNBC who had multiparametric breast MRI at baseline and underwent NAST and surgery during 2016–2021. Dynamic contrast-enhanced MRI (DCE), diffusion-weighted imaging (DWI), and clinicopathological data were used for the model development and internal testing. Data from the I-SPY 2 trial (2010–2016) were used for external testing. Four variables with a potential impact on model performance were systematically investigated: 3D model frameworks, tumor volume preprocessing, tumor ROI selection, and data inputs. Results: Forty-eight models with different variable combinations were investigated. The best-performing model in the internal testing dataset used DCE, DWI, and clinicopathological data with the originally contoured tumor volume, the tight bounding box of the tumor mask, and ResNeXt50, and achieved an area under the receiver operating characteristic curve (AUC) of 0.76 (95% CI: 0.60–0.88). The best-performing models in the external testing dataset achieved an AUC of 0.72 (95% CI: 0.57–0.84) using only DCE images (originally contoured tumor volume, enlarged bounding box of tumor mask, and ResNeXt50) and an AUC of 0.72 (95% CI: 0.56–0.86) using only DWI images (originally contoured tumor volume, enlarged bounding box of tumor mask, and ResNet18). Conclusions: We developed 3D deep learning models based on pretreatment data that could predict pCR to NAST in TNBC patients. Full article
(This article belongs to the Special Issue Advances in Triple-Negative Breast Cancer)
Show Figures

Figure 1

22 pages, 6129 KB  
Article
A Novel Machine Vision-Based Collision Risk Warning Method for Unsignalized Intersections on Arterial Roads
by Zhongbin Luo, Yanqiu Bi, Qing Ye, Yong Li and Shaofei Wang
Electronics 2025, 14(6), 1098; https://doi.org/10.3390/electronics14061098 - 11 Mar 2025
Cited by 1 | Viewed by 963
Abstract
To address the critical need for collision risk warning at unsignalized intersections, this study proposes an advanced predictive system combining YOLOv8 for object detection, Deep SORT for tracking, and Bi-LSTM networks for trajectory prediction. To adapt YOLOv8 for complex intersection scenarios, several architectural [...] Read more.
To address the critical need for collision risk warning at unsignalized intersections, this study proposes an advanced predictive system combining YOLOv8 for object detection, Deep SORT for tracking, and Bi-LSTM networks for trajectory prediction. To adapt YOLOv8 for complex intersection scenarios, several architectural enhancements were incorporated. The RepLayer module replaced the original C2f module in the backbone, integrating large-kernel depthwise separable convolution to better capture contextual information in cluttered environments. The GIoU loss function was introduced to improve bounding box regression accuracy, mitigating the issues related to missed or incorrect detections due to occlusion and overlapping objects. Furthermore, a Global Attention Mechanism (GAM) was implemented in the neck network to better learn both location and semantic information, while the ReContext gradient composition feature pyramid replaced the traditional FPN, enabling more effective multi-scale object detection. Additionally, the CSPNet structure in the neck was substituted with Res-CSP, enhancing feature fusion flexibility and improving detection performance in complex traffic conditions. For tracking, the Deep SORT algorithm was optimized with enhanced appearance feature extraction, reducing the identity switches caused by occlusions and ensuring the stable tracking of vehicles, pedestrians, and non-motorized vehicles. The Bi-LSTM model was employed for trajectory prediction, capturing long-range dependencies to provide accurate forecasting of future positions. The collision risk was quantified using the predictive collision risk area (PCRA) method, categorizing risks into three levels (danger, warning, and caution) based on the predicted overlaps in trajectories. In the experimental setup, the dataset used for training the model consisted of 30,000 images annotated with bounding boxes around vehicles, pedestrians, and non-motorized vehicles. Data augmentation techniques such as Mosaic, Random_perspective, Mixup, HSV adjustments, Flipud, and Fliplr were applied to enrich the dataset and improve model robustness. In real-world testing, the system was deployed as part of the G310 highway safety project, where it achieved a mean Average Precision (mAP) of over 90% for object detection. Over a one-month period, 120 warning events involving vehicles, pedestrians, and non-motorized vehicles were recorded. Manual verification of the warnings indicated a prediction accuracy of 97%, demonstrating the system’s reliability in identifying potential collisions and issuing timely warnings. This approach represents a significant advancement in enhancing safety at unsignalized intersections in urban traffic environments. Full article
(This article belongs to the Special Issue Computer Vision and Image Processing in Machine Learning)
Show Figures

Figure 1

26 pages, 1503 KB  
Article
Elevating Detection Performance in Optical Remote Sensing Image Object Detection: A Dual Strategy with Spatially Adaptive Angle-Aware Networks and Edge-Aware Skewed Bounding Box Loss Function
by Zexin Yan, Jie Fan, Zhongbo Li and Yongqiang Xie
Sensors 2024, 24(16), 5342; https://doi.org/10.3390/s24165342 - 18 Aug 2024
Cited by 1 | Viewed by 1710
Abstract
In optical remote sensing image object detection, discontinuous boundaries often limit detection accuracy, particularly at high Intersection over Union (IoU) thresholds. This paper addresses this issue by proposing the Spatial Adaptive Angle-Aware (SA3) Network. The SA3 Network employs a [...] Read more.
In optical remote sensing image object detection, discontinuous boundaries often limit detection accuracy, particularly at high Intersection over Union (IoU) thresholds. This paper addresses this issue by proposing the Spatial Adaptive Angle-Aware (SA3) Network. The SA3 Network employs a hierarchical refinement approach, consisting of coarse regression, fine regression, and precise tuning, to optimize the angle parameters of rotated bounding boxes. It adapts to specific task scenarios using either class-aware or class-agnostic strategies. Experimental results demonstrate its effectiveness in significantly improving detection accuracy at high IoU thresholds. Additionally, we introduce a Gaussian transform-based IoU factor during angle regression loss calculation, leading to the development of Edge-aware Skewed Bounding Box Loss (EAS Loss). The EAS loss enhances the loss gradient at the final stage of angle regression for bounding boxes, addressing the challenge of further learning when the predicted box angle closely aligns with the real target box angle. This results in increased training efficiency and better alignment between training and evaluation metrics. Experimental results show that the proposed method substantially enhances the detection accuracy of ReDet and ReBiDet models. The SA3 Network and EAS loss not only elevate the mAP of the ReBiDet model on DOTA-v1.5 to 78.85% but also effectively improve the model’s mAP under high IoU threshold conditions. Full article
(This article belongs to the Special Issue Object Detection Based on Vision Sensors and Neural Network)
Show Figures

Figure 1

19 pages, 3524 KB  
Article
Ensemble of Convolutional Neural Networks for COVID-19 Localization on Chest X-ray Images
by Karem D. Marcomini
Big Data Cogn. Comput. 2024, 8(8), 84; https://doi.org/10.3390/bdcc8080084 - 1 Aug 2024
Viewed by 1649
Abstract
Coronavirus disease (COVID-19) is caused by the SARS-CoV-2 virus and has been declared as a pandemic. The early detection of COVID-19 is necessary to interrupt the spread of the virus and prevent its transmission. X-rays and CT scans can assist radiologists in disease [...] Read more.
Coronavirus disease (COVID-19) is caused by the SARS-CoV-2 virus and has been declared as a pandemic. The early detection of COVID-19 is necessary to interrupt the spread of the virus and prevent its transmission. X-rays and CT scans can assist radiologists in disease detection. However, detecting COVID-19 on chest radiographs is challenging due to similarities with other bacterial and viral pneumonias. Therefore, it is essential to develop a fast and accurate algorithm for detecting COVID-19. In this work, we applied pre-processing in order to increase the contrast in X-rays. We then use the ResNet-50 model to differentiate between normal and COVID-19 images. Images classified as COVID-19 were investigated with an ensemble detection model (deep learning models—You Only Look Once version 5 and X). The classification model achieved an accuracy of 0.864 and an AUC of 0.904 in 5-fold cross-validation. The overlap between the predicted bounding boxes and the ground truth reached, in the ensemble model, a mAP of 59.63% in 5-fold cross-validation. Thus, we consider that the result was significant in terms of the global classification of the images, as well as in the location of suspicious regions that require greater attention from the specialist, which makes the developed model a fast and promising way to aid the specialist in decision making. Full article
Show Figures

Figure 1

22 pages, 7578 KB  
Article
EC-YOLO: Improved YOLOv7 Model for PCB Electronic Component Detection
by Shiyi Luo, Fang Wan, Guangbo Lei, Li Xu, Zhiwei Ye, Wei Liu, Wen Zhou and Chengzhi Xu
Sensors 2024, 24(13), 4363; https://doi.org/10.3390/s24134363 - 5 Jul 2024
Cited by 8 | Viewed by 3656
Abstract
Electronic components are the main components of PCBs (printed circuit boards), so the detection and classification of ECs (electronic components) is an important aspect of recycling used PCBs. However, due to the variety and quantity of ECs, traditional target detection methods for EC [...] Read more.
Electronic components are the main components of PCBs (printed circuit boards), so the detection and classification of ECs (electronic components) is an important aspect of recycling used PCBs. However, due to the variety and quantity of ECs, traditional target detection methods for EC classification still have problems such as slow detection speed and low performance, and the accuracy of the detection needs to be improved. To overcome these limitations, this study proposes an enhanced YOLO (you only look once) network (EC-YOLOv7) for detecting EC targets. The network uses ACmix (a mixed model that enjoys the benefits of both self-attention and convolution) as a substitute for the 3 × 3 convolutional modules in the E-ELAN (Extended ELAN) architecture and implements branch links and 1 × 1 convolutional arrays between the ACmix modules to improve the speed of feature retrieval and network inference. Furthermore, the ResNet-ACmix module is engineered to prevent the leakage of function data and to minimise calculation time. Subsequently, the SPPCSPS (spatial pyramid pooling connected spatial pyramid convolution) block has been improved by replacing the serial channels with concurrent channels, which improves the fusion speed of the image features. To effectively capture spatial information and improve detection accuracy, the DyHead (the dynamic head) is utilised to enhance the model’s size, mission, and sense of space, which effectively captures spatial information and improves the detection accuracy. A new bounding-box loss regression method, the WIoU-Soft-NMS method, is finally suggested to facilitate prediction regression and improve the localisation accuracy. The experimental results demonstrate that the enhanced YOLOv7 net surpasses the initial YOLOv7 model and other common EC detection methods. The proposed EC-YOLOv7 network reaches a mean accuracy (mAP@0.5) of 94.4% on the PCB dataset and exhibits higher FPS compared to the original YOLOv7 model. In conclusion, it can significantly enhance high-density EC target recognition. Full article
(This article belongs to the Section Electronic Sensors)
Show Figures

Figure 1

21 pages, 6284 KB  
Article
Weakly Supervised Object Detection for Remote Sensing Images via Progressive Image-Level and Instance-Level Feature Refinement
by Shangdong Zheng, Zebin Wu, Yang Xu and Zhihui Wei
Remote Sens. 2024, 16(7), 1203; https://doi.org/10.3390/rs16071203 - 29 Mar 2024
Cited by 6 | Viewed by 2200
Abstract
Weakly supervised object detection (WSOD) aims to predict a set of bounding boxes and corresponding category labels for instances with only image-level supervisions. Compared with fully supervised object detection, WSOD in remote sensing images (RSIs) is much more challenging due to the vast [...] Read more.
Weakly supervised object detection (WSOD) aims to predict a set of bounding boxes and corresponding category labels for instances with only image-level supervisions. Compared with fully supervised object detection, WSOD in remote sensing images (RSIs) is much more challenging due to the vast foreground-related context regions. In this paper, we propose a progressive image-level and instance-level feature refinement network to address the problems of missing detection and part domination for WSOD in RSIs. Firstly, we propose a multi-label attention mining loss (MAML)-guided image-level feature refinement branch to effectively allocate the computational resources towards the most informative part of images. With the supervision of MAML, all latent instances in images are emphasized. However, image-level feature refinement further expands responsive gaps between the informative part and other sub-optimal informative ones, which results in exacerbating the problem of part domination. In order to alleviate the above-mentioned limitation, we further construct an instance-level feature refinement branch to re-balance the contributions of different adjacent candidate bounding boxes according to the detection task. An instance selection loss (ISL) is proposed to progressively boost the representation of salient regions by exploring supervision from the network itself. Finally, we integrate the image-level and instance-level feature refinement branches into a complete network and the proposed MAML and ISL functions are merged with class classification and box regression to optimize the whole WSOD network in an end-to-end training fashion. We conduct experiments on two popular WSOD datasets, NWPU VHR-10.v2 and DIOR. All the experimental results demonstrate that our method achieves a competitive performance compared with other state-of-the-art approaches. Full article
Show Figures

Figure 1

17 pages, 2272 KB  
Article
Semi-Supervised Object Detection with Multi-Scale Regularization and Bounding Box Re-Prediction
by Yeqin Shao, Chang Lv, Ruowei Zhang, He Yin, Meiqin Che, Guoqing Yang and Quan Jiang
Electronics 2024, 13(1), 221; https://doi.org/10.3390/electronics13010221 - 3 Jan 2024
Cited by 2 | Viewed by 2904
Abstract
Semi-supervised object detection has become a hot topic in recent years, but there are still some challenges regarding false detection, duplicate detection, and inaccurate localization. This paper presents a semi-supervised object detection method with multi-scale regularization and bounding box re-prediction. Specifically, to improve [...] Read more.
Semi-supervised object detection has become a hot topic in recent years, but there are still some challenges regarding false detection, duplicate detection, and inaccurate localization. This paper presents a semi-supervised object detection method with multi-scale regularization and bounding box re-prediction. Specifically, to improve the generalization of the two-stage object detector and to make consistent predictions related to the image and its down-sampled counterpart, a novel multi-scale regularization loss is proposed for the region proposal network and the region-of-interest head. Then, in addition to using the classification probabilities of the pseudo-labels to exploit the unlabeled data, this paper proposes a novel bounding box re-prediction strategy to re-predict the bounding boxes of the pseudo-labels in the unlabeled images and select the pseudo-labels with reliable bounding boxes (location coordinates) to improve the model’s localization accuracy based on its unsupervised localization loss. Experiments on the public MS COCO and Pascal VOC show that our proposed method achieves a competitive detection performance compared to other state-of-the-art methods. Furthermore, our method offers a multi-scale regularization strategy and a reliably located pseudo-label screening strategy, both of which facilitate the development of semi-supervised object detection techniques and boost the object detection performance in autonomous driving, industrial inspection, and agriculture automation. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

20 pages, 6428 KB  
Article
Automatic Detection of Feral Pigeons in Urban Environments Using Deep Learning
by Zhaojin Guo, Zheng He, Li Lyu, Axiu Mao, Endai Huang and Kai Liu
Animals 2024, 14(1), 159; https://doi.org/10.3390/ani14010159 - 3 Jan 2024
Cited by 2 | Viewed by 2859
Abstract
The overpopulation of feral pigeons in Hong Kong has significantly disrupted the urban ecosystem, highlighting the urgent need for effective strategies to control their population. In general, control measures should be implemented and re-evaluated periodically following accurate estimations of the feral pigeon population [...] Read more.
The overpopulation of feral pigeons in Hong Kong has significantly disrupted the urban ecosystem, highlighting the urgent need for effective strategies to control their population. In general, control measures should be implemented and re-evaluated periodically following accurate estimations of the feral pigeon population in the concerned regions, which, however, is very difficult in urban environments due to the concealment and mobility of pigeons within complex building structures. With the advances in deep learning, computer vision can be a promising tool for pigeon monitoring and population estimation but has not been well investigated so far. Therefore, we propose an improved deep learning model (Swin-Mask R-CNN with SAHI) for feral pigeon detection. Our model consists of three parts. Firstly, the Swin Transformer network (STN) extracts deep feature information. Secondly, the Feature Pyramid Network (FPN) fuses multi-scale features to learn at different scales. Lastly, the model’s three head branches are responsible for classification, best bounding box prediction, and segmentation. During the prediction phase, we utilize a Slicing-Aided Hyper Inference (SAHI) tool to focus on the feature information of small feral pigeon targets. Experiments were conducted on a feral pigeon dataset to evaluate model performance. The results reveal that our model achieves excellent recognition performance for feral pigeons. Full article
(This article belongs to the Special Issue Sensors-Assisted Observation of Wildlife)
Show Figures

Figure 1

21 pages, 13266 KB  
Article
A Study on Tomato Disease and Pest Detection Method
by Wenyi Hu, Wei Hong, Hongkun Wang, Mingzhe Liu and Shan Liu
Appl. Sci. 2023, 13(18), 10063; https://doi.org/10.3390/app131810063 - 6 Sep 2023
Cited by 16 | Viewed by 4343
Abstract
In recent years, with the rapid development of artificial intelligence technology, computer vision-based pest detection technology has been widely used in agricultural production. Tomato diseases and pests are serious problems affecting tomato yield and quality, so it is important to detect them quickly [...] Read more.
In recent years, with the rapid development of artificial intelligence technology, computer vision-based pest detection technology has been widely used in agricultural production. Tomato diseases and pests are serious problems affecting tomato yield and quality, so it is important to detect them quickly and accurately. In this paper, we propose a tomato disease and pest detection model based on an improved YOLOv5n to overcome the problems of low accuracy and large model size in traditional pest detection methods. Firstly, we use the Efficient Vision Transformer as the feature extraction backbone network to reduce model parameters and computational complexity while improving detection accuracy, thus solving the problems of poor real-time performance and model deployment. Second, we replace the original nearest neighbor interpolation upsampling module with the lightweight general-purpose upsampling operator Content-Aware ReAssembly of FEatures to reduce feature information loss during upsampling. Finally, we use Wise-IoU instead of the original CIoU as the regression loss function of the target bounding box to improve the regression prediction accuracy of the predicted bounding box while accelerating the convergence speed of the regression loss function. We perform statistical analysis on the experimental results of tomato diseases and pests under data augmentation conditions. The results show that the improved algorithm improves mAP50 and mAP50:95 by 2.3% and 1.7%, respectively, while reducing the number of model parameters by 0.4 M and the computational complexity by 0.9 GFLOPs. The improved model has a parameter count of only 1.6 M and a computational complexity of only 3.3 GFLOPs, demonstrating a certain advantage over other mainstream object detection algorithms in terms of detection accuracy, model parameter count, and computational complexity. The experimental results show that this method is suitable for the early detection of tomato diseases and pests. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

23 pages, 4946 KB  
Article
Online Multiple Object Tracking Using Min-Cost Flow on Temporal Window for Autonomous Driving
by Hongjian Wei, Yingping Huang, Qian Zhang and Zhiyang Guo
World Electr. Veh. J. 2023, 14(9), 243; https://doi.org/10.3390/wevj14090243 - 2 Sep 2023
Viewed by 2056
Abstract
Multiple object tracking (MOT), as a core technology for environment perception in autonomous driving, has attracted attention from researchers. Combing the advantages of batch global optimization, we present a novel online MOT framework for autonomous driving, consisting of feature extraction and data association [...] Read more.
Multiple object tracking (MOT), as a core technology for environment perception in autonomous driving, has attracted attention from researchers. Combing the advantages of batch global optimization, we present a novel online MOT framework for autonomous driving, consisting of feature extraction and data association on a temporal window. In the feature extraction stage, we design a three-channel appearance feature extraction network based on metric learning by using ResNet50 as the backbone network and the triplet loss function and employ a Kalman Filter with a constant acceleration motion model to optimize and predict the object bounding box information, so as to obtain reliable and discriminative object representation features. For data association, to reduce the ID switches, the min-cost flow of global association is introduced within the temporal window composed of consecutive multi-frame images. The trajectories within the temporal window are divided into two categories, active trajectories and inactive trajectories, and the appearance, motion affinities between each category of trajectories, and detections are calculated, respectively. Based on this, a sparse affinity network is constructed, and the data association is achieved using the min-cost flow problem of the network. Qualitative experimental results on KITTI MOT public benchmark dataset and real-world campus scenario sequences validate the effectiveness and robustness of our method. Compared with the homogeneous, vision-based MOT methods, quantitative experimental results demonstrate that our method has competitive advantages in terms of higher order tracking accuracy, association accuracy, and ID switches. Full article
(This article belongs to the Special Issue Recent Advance in Intelligent Vehicle)
Show Figures

Figure 1

13 pages, 5327 KB  
Article
ADFireNet: An Anchor-Free Smoke and Fire Detection Network Based on Deformable Convolution
by Bin Li and Peng Liu
Sensors 2023, 23(16), 7086; https://doi.org/10.3390/s23167086 - 10 Aug 2023
Cited by 1 | Viewed by 1396
Abstract
In this paper, we propose an anchor-free smoke and fire detection network, ADFireNet, based on deformable convolution. The proposed ADFireNet network is composed of three parts: The backbone network is responsible for feature extraction of input images, which is composed of ResNet added [...] Read more.
In this paper, we propose an anchor-free smoke and fire detection network, ADFireNet, based on deformable convolution. The proposed ADFireNet network is composed of three parts: The backbone network is responsible for feature extraction of input images, which is composed of ResNet added to deformable convolution. The neck network, which is responsible for multi-scale detection, is composed of the feature pyramid network. The head network outputs results and adopts pseudo intersection over union combined with anchor-free network structure. The head network consists of two full convolutional subnetworks: the first is the classification sub-network, which outputs a classification confidence score, and the second is the regression sub-network, which predicts the parameters of bounding boxes. The deformable convolution (DCN) added to the backbone network enhances the shape feature extraction capability for fire and smoke, and the pseudo intersection over union (pseudo-IoU) added to the head network solves the label assignment problem that exists in anchor-free object detection networks. The proposed ADFireNet is evaluated using the fire smoke dataset. The experimental results show that ADFireNet has higher accuracy and faster detection speeds compared with other methods. Ablation studies have demonstrated the effectiveness of DCN and pseudo IoU. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Back to TopTop