Deep Learning in Object Detection and Tracking

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (30 June 2023) | Viewed by 79603

Special Issue Editors


E-Mail Website
Guest Editor
Ministry of Education Key Lab of Artificial Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Interests: graph matching; point process; object detection

E-Mail Website
Co-Guest Editor
Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Interests: medical imaging; motion correction; computer vision

Special Issue Information

Dear Colleagues,

We are pleased to invite you to submit your work to our journal. Object detection, as one of the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. As one of the fundamental problems of computer vision, object detection forms the basis of many other computer vision tasks, such as instance segmentation, image captioning, object tracking, etc. In recent years, the rapid development of deep learning techniques has brought new blood into object detection, leading to remarkable breakthroughs and pushing it forward to a research hotspot with unprecedented attention.

This Special Issue aims to discuss and solve the current key issues and problems related to object detection and tracking. We only accept submissions for articles related to object detection or tracking.

In this Special Issue, original research articles and reviews are welcome. Research areas may include (but not be limited to) the following:

  • Object detection and tracking;
  • Few-shot/zero-shot object detection and tracking;
  • Weak/semi/unsupervised object detection and tracking;
  • Long-tailed object detection and tracking;
  • Small object detection;
  • Rotated object detection.

We look forward to receiving your contributions.

Dr. Junchi Yan
Dr. Minghao Guo
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • object detection
  • object tracking
  • computer vision
  • deep learning

Related Special Issue

Published Papers (29 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 8322 KiB  
Article
Target-Aware Feature Bottleneck for Real-Time Visual Tracking
by Janghoon Choi
Appl. Sci. 2023, 13(18), 10198; https://doi.org/10.3390/app131810198 - 11 Sep 2023
Viewed by 701
Abstract
Recent Siamese network-based visual tracking approaches have achieved high performance metrics on numerous recent visual tracking benchmarks, where most of these trackers employ a backbone feature extractor network with a prediction head network for classification and regression tasks. However, there has been a [...] Read more.
Recent Siamese network-based visual tracking approaches have achieved high performance metrics on numerous recent visual tracking benchmarks, where most of these trackers employ a backbone feature extractor network with a prediction head network for classification and regression tasks. However, there has been a constant trend of employing a larger and complex backbone network and prediction head networks for improved performance, where increased computational load can slow down the overall speed of the tracking algorithm. To address the aforementioned issues, we propose a novel target-aware feature bottleneck module for trackers, where the proposed bottleneck can elicit a target-aware feature in order to obtain a compact feature representation from the backbone network for improved speed and robustness. Our lightweight target-aware bottleneck module attends to the feature representation of the target region to elicit scene-specific information and generate feature-wise modulation weights that can adaptively change the importance of each feature. The proposed tracker is evaluated on large-scale visual tracking datasets, GOT-10k and LaSOT, and we achieve real-time speed in terms of computation and obtain improved accuracy over the baseline tracker algorithm with high performance metrics. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

13 pages, 2795 KiB  
Article
Resizer Swin Transformer-Based Classification Using sMRI for Alzheimer’s Disease
by Yihang Huang and Wan Li
Appl. Sci. 2023, 13(16), 9310; https://doi.org/10.3390/app13169310 - 16 Aug 2023
Cited by 2 | Viewed by 1225
Abstract
Structural magnetic resonance imaging (sMRI) is widely used in the clinical diagnosis of diseases due to its advantages: high-definition and noninvasive visualization. Therefore, computer-aided diagnosis based on sMRI images is broadly applied in classifying Alzheimer’s disease (AD). Due to the excellent performance of [...] Read more.
Structural magnetic resonance imaging (sMRI) is widely used in the clinical diagnosis of diseases due to its advantages: high-definition and noninvasive visualization. Therefore, computer-aided diagnosis based on sMRI images is broadly applied in classifying Alzheimer’s disease (AD). Due to the excellent performance of the Transformer in computer vision, the Vision Transformer (ViT) has been employed for AD classification in recent years. The ViT relies on access to large datasets, while the sample size of brain imaging datasets is relatively insufficient. Moreover, the preprocessing procedures of brain sMRI images are complex and labor-intensive. To overcome the limitations mentioned above, we propose the Resizer Swin Transformer (RST), a deep-learning model that can extract information from brain sMRI images that are only briefly processed to achieve multi-scale and cross-channel features. In addition, we pre-trained our RST on a natural image dataset and obtained better performance. We achieved 99.59% and 94.01% average accuracy on the ADNI and AIBL datasets, respectively. Importantly, the RST has a sensitivity of 99.59%, a specificity of 99.58%, and a precision of 99.83% on the ADNI dataset, which are better than or comparable to state-of-the-art approaches. The experimental results prove that RST can achieve better classification performance in AD prediction compared with CNN-based and Transformer models. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

24 pages, 5937 KiB  
Article
Recognition of Student Engagement State in a Classroom Environment Using Deep and Efficient Transfer Learning Algorithm
by Sana Ikram, Haseeb Ahmad, Nasir Mahmood, C. M. Nadeem Faisal, Qaisar Abbas, Imran Qureshi and Ayyaz Hussain
Appl. Sci. 2023, 13(15), 8637; https://doi.org/10.3390/app13158637 - 26 Jul 2023
Cited by 1 | Viewed by 1251
Abstract
A student’s engagement in a real classroom environment usually varies with respect to time. Moreover, both genders may also engage differently during lecture procession. Previous research measures students’ engagement either from the assessment outcome or by observing their gestures in online or real [...] Read more.
A student’s engagement in a real classroom environment usually varies with respect to time. Moreover, both genders may also engage differently during lecture procession. Previous research measures students’ engagement either from the assessment outcome or by observing their gestures in online or real but controlled classroom environments with limited students. However, most works either manually assess the engagement level in online class environments or use limited features for automatic computation. Moreover, the demographic impact on students’ engagement in the real classroom environment is limited and needs further exploration. This work is intended to compute student engagement in a real but least controlled classroom environment with 45 students. More precisely, the main contributions of this work are twofold. First, we proposed an efficient transfer-learning-based VGG16 model with extended layer, and fine-tuned hyperparameters to compute the students’ engagement level in a real classroom environment. Overall, 90% accuracy and 0.5 N seconds computational time were achieved in terms of computation for engaged and non-engaged students. Subsequently, we incorporated inferential statistics to measure the impact of time while performing 14 experiments. We performed six experiments for gender impact on students’ engagement. Overall, inferential analysis reveals the positive impact of time and gender on students’ engagement levels in a real classroom environment. The comparisons were also performed by various transfer learning algorithms. The proposed work may help to improve the quality of educational content delivery and decision making for educational institutions. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

17 pages, 38646 KiB  
Article
Lightweight Human Ear Recognition Based on Attention Mechanism and Feature Fusion
by Yanmin Lei, Dong Pan, Zhibin Feng and Junru Qian
Appl. Sci. 2023, 13(14), 8441; https://doi.org/10.3390/app13148441 - 21 Jul 2023
Cited by 1 | Viewed by 730
Abstract
With the development of deep learning technology, more and more researchers are interested in ear recognition. Human ear recognition is a biometric identification technology based on human ear feature information and it is often used for authentication and intelligent monitoring field, etc. In [...] Read more.
With the development of deep learning technology, more and more researchers are interested in ear recognition. Human ear recognition is a biometric identification technology based on human ear feature information and it is often used for authentication and intelligent monitoring field, etc. In order to make ear recognition better applied to practical application, real time and accuracy have always been very important and challenging topics. Therefore, focusing on the problem that the [email protected] value of the YOLOv5s-MG method is lower than that of the YOLOv5s method on the EarVN1.0 human ear dataset with low resolution, small target, rotation, brightness change, earrings, glasses and other occlusion, a lightweight ear recognition method is proposed based on an attention mechanism and feature fusion. This method mainly includes the following several steps: First, the CBAM attention mechanism is added to the connection between the backbone network and the neck network of the lightweight human ear recognition method YOLOv5s-MG, and the YOLOv5s-MG-CBAM human ear recognition network is constructed, which can improve the accuracy of the method. Second, the SPPF layer and cross-regional feature fusion are added to construct the YOLOv5s-MG-CBAM-F human ear recognition method, which further improves the accuracy. Three distinctive human ear datasets, namely, CCU-DE, USTB and EarVN1.0, are used to evaluate the proposed method. Through the experimental comparison of seven methods including YOLOv5s-MG-CBAM-F, YOLOv5s-MG-SE-F, YOLOv5s-MG-CA-F, YOLOv5s-MG-ECA-F, YOLOv5s, YOLOv7 and YOLOv5s-MG on the EarVN1.0 human ear dataset, it is found that the human ear recognition rate of YOLOv5s-MG-CBAM-F method is the highest. The [email protected] value of the proposed YOLOv5s-MG-CBAM-F method on the EarVN1.0 ear dataset is 91.9%, which is 6.4% higher than that of the YOLOv5s-MG method and 3.7% higher than that of the YOLOv5s method. The params, GFLOPS, model size and the inference time per image of YOLOv5s-MG-CBAM-F method on the EarVN1.0 human ear dataset are 5.2 M, 8.3 G, 10.9 MB and 16.4 ms, respectively, which are higher than the same parameters of the YOLOv5s-MG method, but less than the same parameters of YOLOv5s method. The quantitative results show that the proposed method can improve the ear recognition rate while satisfying the real-time performance and it is especially suitable for applications where high ear recognition rates are required. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

20 pages, 9027 KiB  
Article
Multi-Attribute NMS: An Enhanced Non-Maximum Suppression Algorithm for Pedestrian Detection in Crowded Scenes
by Wei Wang, Xin Li, Xin Lyu, Tao Zeng, Jiale Chen and Shangjing Chen
Appl. Sci. 2023, 13(14), 8073; https://doi.org/10.3390/app13148073 - 11 Jul 2023
Cited by 1 | Viewed by 1002
Abstract
Removing duplicate proposals is a critical process in pedestrian detection, and is usually performed via Non-Maximum Suppression (NMS); however, in crowded scenes, the detection proposals of occluded pedestrians are hard to distinguish from duplicate proposals, making the detection results inaccurate. In order to [...] Read more.
Removing duplicate proposals is a critical process in pedestrian detection, and is usually performed via Non-Maximum Suppression (NMS); however, in crowded scenes, the detection proposals of occluded pedestrians are hard to distinguish from duplicate proposals, making the detection results inaccurate. In order to address the above-mentioned problem, the authors of this paper propose a Multi-Attribute NMS (MA-NMS) algorithm, which combines density and count attributes in order to adaptively adjust suppression, effectively preserving the proposals of occluded pedestrians while removing duplicate proposals. In order to obtain the density and count attributes, an attribute branch (ATTB), which uses a context extraction module (CEM) to extract the context of pedestrians, and then, concatenates the context with the features of pedestrians in order to predict both the density and count attributes simultaneously, is also proposed. With the proposed ATTB, a pedestrian detector, based on MA-NMS, is constructed for pedestrian detection in crowded scenes. Extensive experiments are conducted using the CrowdHuman and CityPersons datasets, and the results show that the proposed method outperforms mainstream methods on AP (average precision), Recall, and MR2 (log-average miss rate), sufficiently validating the effectiveness of the proposed MA-NMS algorithm. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

18 pages, 13679 KiB  
Article
Automated Wildlife Bird Detection from Drone Footage Using Computer Vision Techniques
by Dimitrios Mpouziotas, Petros Karvelis, Ioannis Tsoulos and Chrysostomos Stylios
Appl. Sci. 2023, 13(13), 7787; https://doi.org/10.3390/app13137787 - 30 Jun 2023
Cited by 1 | Viewed by 2284
Abstract
Wildlife conservationists have traditionally relied on manual identification and tracking of bird species to monitor populations and identify potential threats. However, many of these techniques may prove to be time-consuming. With the advancement of computer vision techniques, automated bird detection and recognition have [...] Read more.
Wildlife conservationists have traditionally relied on manual identification and tracking of bird species to monitor populations and identify potential threats. However, many of these techniques may prove to be time-consuming. With the advancement of computer vision techniques, automated bird detection and recognition have become possible. In this manuscript, we present an application of an object-detection model for identifying and tracking wild bird species in natural environments. We used a dataset of bird images captured in the wild and trained the YOLOv4 model to detect bird species with high accuracy. We evaluated the model’s performance on a separate set of test images and achieved an average precision of 91.28%. Our method surpassed the time-consuming nature of manual identification and tracking, allowing for efficient and precise monitoring of bird populations. Through extensive evaluation on a separate set of test images, we demonstrated the performance of our model. Furthermore, our results demonstrated the potential of using YOLOv4 for automated bird detection and monitoring in the wild, which could help conservationists better understand bird populations and identify potential threats. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

16 pages, 6648 KiB  
Article
Efficient Roundabout Supervision: Real-Time Vehicle Detection and Tracking on Nvidia Jetson Nano
by Imane Elmanaa, My Abdelouahed Sabri, Yassine Abouch and Abdellah Aarab
Appl. Sci. 2023, 13(13), 7416; https://doi.org/10.3390/app13137416 - 22 Jun 2023
Cited by 3 | Viewed by 2596
Abstract
In recent years, a significant number of people in Morocco have been commuting daily to Casablanca, the country’s economic capital. This heavy traffic flow has led to congestion and accidents during certain times of the day as the city’s roads cannot handle the [...] Read more.
In recent years, a significant number of people in Morocco have been commuting daily to Casablanca, the country’s economic capital. This heavy traffic flow has led to congestion and accidents during certain times of the day as the city’s roads cannot handle the high volume of vehicles passing through. To address this issue, it is essential to expand the infrastructure based on accurate traffic-flow data. In collaboration with the municipality of Bouskoura, a neighboring city of Casablanca, we proposed installing a smart camera on the primary route connecting the two cities. This camera would enable us to gather accurate statistics on the number and types of vehicles crossing the road, which can be used to adapt and redesign the existing infrastructure. We implemented our system using the YOLOv7-tiny object detection model to detect and classify the various types of vehicles (such as trucks, cars, motorcycles, and buses) crossing the main road. Additionally, we used the Deep SORT tracking method to track each vehicle appearing on the camera and to provide the total number of each class for each lane, as well as the number of vehicles passing from one lane to another. Furthermore, we deployed our solution on an embedded system, specifically the Nvidia Jetson Nano. This allowed us to create a compact and efficient system that is capable of a real-time processing of camera images, making it suitable for deployment in various scenarios where limited resources are required. Deploying our solution on the Nvidia Jetson Nano showed promising results, and we believe that this approach could be applied in similar traffic-surveillance projects to provide accurate and reliable data for better decision-making. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

14 pages, 11528 KiB  
Article
Food Classification and Meal Intake Amount Estimation through Deep Learning
by Ji-hwan Kim, Dong-seok Lee and Soon-kak Kwon
Appl. Sci. 2023, 13(9), 5742; https://doi.org/10.3390/app13095742 - 06 May 2023
Cited by 2 | Viewed by 2369
Abstract
This paper proposes a method to classify food types and to estimate meal intake amounts in pre- and post-meal images through a deep learning object detection network. The food types and the food regions are detected through Mask R-CNN. In order to make [...] Read more.
This paper proposes a method to classify food types and to estimate meal intake amounts in pre- and post-meal images through a deep learning object detection network. The food types and the food regions are detected through Mask R-CNN. In order to make both pre- and post-meal images to a same capturing environment, the post-meal image is corrected through a homography transformation based on the meal plate regions in both images. The 3D shape of the food is determined as one of a spherical cap, a cone, and a cuboid depending on the food type. The meal intake amount is estimated as food volume differences between the pre-meal and post-meal images. As results of the simulation, the food classification accuracy and the food region detection accuracy are up to 97.57% and 93.6%, respectively. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

15 pages, 2546 KiB  
Article
Influence of Training Parameters on Real-Time Similar Object Detection Using YOLOv5s
by Tautvydas Kvietkauskas and Pavel Stefanovič
Appl. Sci. 2023, 13(6), 3761; https://doi.org/10.3390/app13063761 - 15 Mar 2023
Cited by 1 | Viewed by 2006
Abstract
Object detection is one of the most popular areas today. The new models of object detection are created continuously and applied in various fields that help to modernize the old solutions in practice. In this manuscript, the focus has been on investigating the [...] Read more.
Object detection is one of the most popular areas today. The new models of object detection are created continuously and applied in various fields that help to modernize the old solutions in practice. In this manuscript, the focus has been on investigating the influence of training parameters on similar object detection: image resolution, batch size, iteration number, and color of images. The results of the model have been applied in real-time object detection using mobile devices. The new construction detail dataset has been collected and used in experimental investigation. The models have been evaluated by two measures: the accuracy of each prepared model has been measured; results of real-time object detection on testing data, where the recognition ratio has been calculated. The highest influence on the accuracy of the created models has the iteration number chosen in the training process and the resolution of the images. The higher the resolution of the images that have been selected, the lower the accuracy that has been obtained. The small iteration number leads to the model not being well trained and the accuracy of the models being very low. Slightly better results were obtained when the color images were used. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

13 pages, 576 KiB  
Article
Open-Set Signal Recognition Based on Transformer and Wasserstein Distance
by Wei Zhang, Da Huang, Minghui Zhou, Jingran Lin and Xiangfeng Wang
Appl. Sci. 2023, 13(4), 2151; https://doi.org/10.3390/app13042151 - 07 Feb 2023
Cited by 3 | Viewed by 1476
Abstract
Open-set signal recognition provides a new approach for verifying the robustness of models by introducing novel unknown signal classes into the model testing and breaking the conventional closed-set assumption, which has become very popular in real-world scenarios. In the present work, we propose [...] Read more.
Open-set signal recognition provides a new approach for verifying the robustness of models by introducing novel unknown signal classes into the model testing and breaking the conventional closed-set assumption, which has become very popular in real-world scenarios. In the present work, we propose an efficient open-set signal recognition algorithm, which contains three key sub-modules: the signal representation sub-module based on a vision transformer (ViT) structure, a set distance metric sub-module based on Wasserstein distance, and a class space compression sub-module based on reciprocal point separation and central loss. In this algorithm, the representing features of signals are established based on transformer-based neural networks, i.e., ViT, in order to extract global information about time series-related data. The employed reciprocal point is used in modeling the potential unknown space without using the corresponding samples, while the distance metric between different class spaces is mathematically modeled in terms of the Wasserstein distance instead of the classical Euclidean distance. Numerical experiments on different open-set signal recognition tasks show that the proposed algorithm can significantly improve the recognition efficiency in both known and unknown categories. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

10 pages, 706 KiB  
Article
A Lightweight Neural Network-Based Method for Identifying Early-Blight and Late-Blight Leaves of Potato
by Feilong Kang, Jia Li, Chunguang Wang and Fuxiang Wang
Appl. Sci. 2023, 13(3), 1487; https://doi.org/10.3390/app13031487 - 23 Jan 2023
Cited by 9 | Viewed by 3572
Abstract
Crop pests and diseases are one of the most critical disasters that limit agricultural production. In this paper, we trained a lightweight convolutional neural network model and built a Django framework-based potato disease leaf recognition system, which can recognize three types of potato [...] Read more.
Crop pests and diseases are one of the most critical disasters that limit agricultural production. In this paper, we trained a lightweight convolutional neural network model and built a Django framework-based potato disease leaf recognition system, which can recognize three types of potato leaf images including early blight, late blight, and healthy. A lightweight, neural network-based model for the identification of early potato leaf diseases significantly reduces the number of model parameters, whereas the accuracy of Top-1 identification is over 93%. We imported the trained model into the Django framework to build a website for a potato early leaf disease identification system, thus providing technical support for the implementation of a mobile-based potato leaf disease identification and early warning system. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

10 pages, 1728 KiB  
Article
Research on Tracking and Identification of Typical Protective Behavior of Cows Based on DeepLabCut
by Jia Li, Feilong Kang, Yongan Zhang, Yanqiu Liu and Xia Yu
Appl. Sci. 2023, 13(2), 1141; https://doi.org/10.3390/app13021141 - 14 Jan 2023
Cited by 2 | Viewed by 1688
Abstract
In recent years, traditional farming methods have been increasingly replaced by more modern, intelligent farming techniques. This shift towards information and intelligence in farming is becoming a trend. When they are bitten by dinoflagellates, cows display stress behaviors, including tail wagging, head tossing, [...] Read more.
In recent years, traditional farming methods have been increasingly replaced by more modern, intelligent farming techniques. This shift towards information and intelligence in farming is becoming a trend. When they are bitten by dinoflagellates, cows display stress behaviors, including tail wagging, head tossing, leg kicking, ear flapping, and skin fluttering. The study of cow protective behavior can indirectly reveal the health status of cows and their living patterns under different environmental conditions, allowing for the evaluation of the breeding environment and animal welfare status. In this study, we generated key point feature marker information using the DeepLabCut target detection algorithm and constructed the spatial relationship of cow feature marker points to detect the cow’s protective behavior based on the change in key elements of the cow’s head swinging and walking performance. The algorithm can detect the protective behavior of cows, with the detection accuracy reaching the level of manual detection. The next step in the research focuses on analyzing the differences in protective behaviors of cows in different environments, which can help in cow breed selection. It is an important guide for diagnosing the health status of cows and improving milk production in a practical setting. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

14 pages, 4003 KiB  
Article
Deep CNN-Based Materials Location and Recognition for Industrial Multi-Crane Visual Sorting System in 5G Network
by Meixia Fu, Qu Wang, Jianquan Wang, Lei Sun, Zhangchao Ma, Chaoyi Zhang, Wanqing Guan, Qiang Liu, Danshi Wang and Wei Li
Appl. Sci. 2023, 13(2), 1066; https://doi.org/10.3390/app13021066 - 12 Jan 2023
Cited by 4 | Viewed by 2855
Abstract
Intelligent manufacturing is a challenging and compelling topic in Industry 4.0. Many computer vision (CV)-based applications have attracted widespread interest from researchers and industries around the world. However, it is difficult to integrate visual recognition algorithms with industrial control systems. The low-level devices [...] Read more.
Intelligent manufacturing is a challenging and compelling topic in Industry 4.0. Many computer vision (CV)-based applications have attracted widespread interest from researchers and industries around the world. However, it is difficult to integrate visual recognition algorithms with industrial control systems. The low-level devices are controlled by traditional programmable logic controllers (PLCs) that cannot realize data communication due to different industrial control protocols. In this article, we develop a multi-crane visual sorting system with cloud PLCs in a 5G environment, in which deep convolutional neural network (CNN)-based character recognition and dynamic scheduling are designed for materials in intelligent manufacturing. First, an YOLOv5-based algorithm is applied to locate the position of objects on the conveyor belt. Then, we propose a Chinese character recognition network (CCRNet) to significantly recognize each object from the original image. The position, type, and timestamp of each object are sent to cloud PLCs that are virtualized in the cloud to replace the function of traditional PLCs in the terminal. After that, we propose a dynamic scheduling method to sort the materials in minimum time. Finally, we establish a real experimental platform of a multi-crane visual sorting system to verify the performance of the proposed methods. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

13 pages, 4582 KiB  
Article
Tree Seedlings Detection and Counting Using a Deep Learning Algorithm
by Deema Moharram, Xuguang Yuan and Dan Li
Appl. Sci. 2023, 13(2), 895; https://doi.org/10.3390/app13020895 - 09 Jan 2023
Cited by 4 | Viewed by 3410
Abstract
Tree-counting methods based on computer vision technologies are low-cost and efficient in contrast to the traditional tree counting methods, which are time-consuming, laborious, and humanly infeasible. This study presents a method for detecting and counting tree seedlings in images using a deep learning [...] Read more.
Tree-counting methods based on computer vision technologies are low-cost and efficient in contrast to the traditional tree counting methods, which are time-consuming, laborious, and humanly infeasible. This study presents a method for detecting and counting tree seedlings in images using a deep learning algorithm with a high economic value and broad application prospects in detecting the type and quantity of tree seedlings. The dataset was built with three types of tree seedlings: dragon spruce, black chokeberries, and Scots pine. The data were augmented via several data augmentation methods to improve the accuracy of the detection model and prevent overfitting. Then a YOLOv5 object detection network was built and trained with three types of tree seedlings to obtain the training weights. The results of the experiments showed that our proposed method could effectively identify and count the tree seedlings in an image. Specifically, the MAP of the dragon spruce, black chokeberries, and Scots pine tree seedlings were 89.8%, 89.1%, and 95.6%, respectively. The accuracy of the detection model reached 95.10% on average (98.58% for dragon spruce, 91.62% for black chokeberries, and 95.11% for Scots pine). The proposed method can provide technical support for the statistical tasks of counting trees. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

27 pages, 5421 KiB  
Article
KCFS-YOLOv5: A High-Precision Detection Method for Object Detection in Aerial Remote Sensing Images
by Ziwei Tian, Jie Huang, Yang Yang and Weiying Nie
Appl. Sci. 2023, 13(1), 649; https://doi.org/10.3390/app13010649 - 03 Jan 2023
Cited by 12 | Viewed by 4228
Abstract
Aerial remote sensing image object detection, based on deep learning, is of great significance in geological resource exploration, urban traffic management, and military strategic information. To improve intractable problems in aerial remote sensing image, we propose a high-precision object detection method based on [...] Read more.
Aerial remote sensing image object detection, based on deep learning, is of great significance in geological resource exploration, urban traffic management, and military strategic information. To improve intractable problems in aerial remote sensing image, we propose a high-precision object detection method based on YOLOv5 for aerial remote sensing image. The object detection method is called KCFS-YOLOv5. To obtain the appropriate anchor box, we used the K-means++ algorithm to optimize the initial clustering points. To further enhance the feature extraction and fusion ability of the backbone network, we embedded the Coordinate Attention (CA) in the backbone network of YOLOv5 and introduced the Bidirectional Feature Pyramid Network (BiFPN) in the neck network of conventional YOLOv5. To improve the detection precision of tiny objects, we added a new tiny object detection head based on the conventional YOLOv5. To reduce the deviation between the predicted box and the ground truth box, we used the SIoU Loss function. Finally, we fused and adjusted the above improvement points and obtained high-precision detection method: KCFS-YOLOv5. This detection method was evaluated on three datasets (NWPU VHR-10, RSOD, and UCAS-AOD-CAR). The comparative experiment results demonstrate that our KCFS-YOLOv5 has the highest accuracy for the object detection in aerial remote sensing image. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

16 pages, 45361 KiB  
Article
A Lidar-Inertial Navigation System for UAVs in GNSS-Denied Environment with Spatial Grid Structures
by Ziyi Qiu, Junning Lv, Defu Lin, Yinan Yu, Zhiwen Sun and Zhangxiong Zheng
Appl. Sci. 2023, 13(1), 414; https://doi.org/10.3390/app13010414 - 28 Dec 2022
Cited by 2 | Viewed by 1594
Abstract
With its fast and accurate position and attitude estimation, the feature-based lidar-inertial odometer is widely used for UAV navigation in GNSS-denied environments. However, the existing algorithms cannot accurately extract the required feature points in the spatial grid structure, resulting in reduced positioning accuracy. [...] Read more.
With its fast and accurate position and attitude estimation, the feature-based lidar-inertial odometer is widely used for UAV navigation in GNSS-denied environments. However, the existing algorithms cannot accurately extract the required feature points in the spatial grid structure, resulting in reduced positioning accuracy. To solve this problem, we propose a lidar-inertial navigation system based on grid and shell features in the environment. In this paper, an algorithm for extracting features of the grid and shell is proposed. The extracted features are used to complete the pose (position and orientation) calculation based on the assumption of local collinearity and coplanarity. Compared with the existing lidar navigation system in practical application scenarios, the proposed navigation system can achieve fast and accurate pose estimation of UAV in a GNSS-denied environment full of spatial grid structures. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

13 pages, 2437 KiB  
Article
Weld Defect Segmentation in X-ray Image with Boundary Label Smoothing
by Junhua Zhang, Minghao Guo, Pengzhi Chu, Yang Liu, Jun Chen and Huanxi Liu
Appl. Sci. 2022, 12(24), 12818; https://doi.org/10.3390/app122412818 - 14 Dec 2022
Viewed by 2506
Abstract
Weld defect segmentation (WDS) is widely used to detect defects from X-ray images for welds, which is of practical importance for manufacturing in all industries. The key challenge of WDS is that the labeled ground truth of defects is usually not accurate because [...] Read more.
Weld defect segmentation (WDS) is widely used to detect defects from X-ray images for welds, which is of practical importance for manufacturing in all industries. The key challenge of WDS is that the labeled ground truth of defects is usually not accurate because of the similarities between the candidate defect and noisy background, making it difficult to distinguish some critical defects, such as cracks, from the weld line during the inference stage. In this paper, we propose boundary label smoothing (BLS), which uses Gaussian Blur to soften the labels near object boundaries to provide an appropriate representation of inaccuracy and uncertainty in ground truth labels. We incorporate BLS into dice loss, in combination with focal loss and weighted cross-entropy loss as a hybrid loss, to achieve improved performance on different types of segmentation datasets. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

11 pages, 3395 KiB  
Article
Temporal-Guided Label Assignment for Video Object Detection
by Shu Tian, Meng Xia and Chun Yang
Appl. Sci. 2022, 12(23), 12314; https://doi.org/10.3390/app122312314 - 01 Dec 2022
Viewed by 888
Abstract
In video object detection, the deterioration of an object’s appearance in a single frame brings challenges for recognition; therefore, it is natural to exploit temporal information to boost the robustness of video object detection. Existing methods usually utilize temporal information to enhance features, [...] Read more.
In video object detection, the deterioration of an object’s appearance in a single frame brings challenges for recognition; therefore, it is natural to exploit temporal information to boost the robustness of video object detection. Existing methods usually utilize temporal information to enhance features, often ignoring the information in label assignments. Label assignment, which assigns labels to anchors for training, is an essential part of object detection. It is also challenged in video object detection and can be improved by temporal information. In this work, a temporal-guided label assignment framework is proposed for the learning task of a region proposal network (RPN). Specifically, we propose a feature instructing module (FIM) to establish the relation model among labels through feature similarity in the temporal dimension. The proposed video object detection framework was evaluated on the ImageNet VID benchmark. Without any additional inference cost, our work obtained a 0.8 mean average precision (mAP(%)) improvement over the baseline and achieved a mAP(%) of 82.0. The result was on par with the state-of-the-art accuracy without using any post-processing methods. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

14 pages, 14396 KiB  
Article
Small Object Detection in Infrared Images: Learning from Imbalanced Cross-Domain Data via Domain Adaptation
by Jaekyung Kim, Jungwoo Huh, Ingu Park, Junhyeong Bak, Donggeon Kim and Sanghoon Lee
Appl. Sci. 2022, 12(21), 11201; https://doi.org/10.3390/app122111201 - 04 Nov 2022
Cited by 5 | Viewed by 3484
Abstract
Deep learning-based object detection is one of the most popular research topics. However, in cases where large-scale datasets are unavailable, the training of detection models remains challenging due to the data-driven characteristics of deep learning. Small object detection in infrared images is such [...] Read more.
Deep learning-based object detection is one of the most popular research topics. However, in cases where large-scale datasets are unavailable, the training of detection models remains challenging due to the data-driven characteristics of deep learning. Small object detection in infrared images is such a case. To solve this problem, we propose a YOLOv5-based framework with a novel training strategy based on the domain adaptation method. First, an auxiliary domain classifier is combined with the YOLOv5 architecture to compose a detection framework that is trainable using datasets from multiple domains while maintaining calculation costs in the inference stage. Secondly, a new loss function based on Wasserstein distance is proposed to deal with small-sized objects by overcoming the problem of the intersection over union sensitivity problem in small-scale cases. Then, a model training strategy inspired from domain adaptation and knowledge distillation is presented. Using the domain confidence output of the domain classifier as a soft label, domain confusion loss is backpropagated to force the model to extract domain-invariant features while training the model with datasets with imbalanced distributions. Additionally, we generate a synthetic dataset in both the visible light and infrared spectrum to overcome the data shortage. The proposed framework is trained on the MS COCO, VEDAI, DOTA, ADAS Thermal datasets along with a constructed synthetic dataset for human detection and vehicle detection tasks. The experimental results show that the proposed framework achieved the best mean average precision (mAP) of 64.7 and 57.5 in human and vehicle detection tasks. Additionally, the ablation experiment shows that the proposed training strategy can improve the performance by training the model to extract domain-invariant features. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

17 pages, 5992 KiB  
Article
Aquila Optimization with Transfer Learning Based Crowd Density Analysis for Sustainable Smart Cities
by Mesfer Al Duhayyim, Eatedal Alabdulkreem, Khaled Tarmissi, Mohammed Aljebreen, Bothaina Samih Ismail Abou El Khier, Abu Sarwar Zamani, Ishfaq Yaseen and Mohamed I. Eldesouki
Appl. Sci. 2022, 12(21), 11187; https://doi.org/10.3390/app122111187 - 04 Nov 2022
Cited by 1 | Viewed by 1469
Abstract
Video surveillance in smart cities provides efficient city operations, safer communities, and improved municipal services. Object detection is a computer vision-based technology, which is utilized for detecting instances of semantic objects of a specific class in digital videos and images. Crowd density analysis [...] Read more.
Video surveillance in smart cities provides efficient city operations, safer communities, and improved municipal services. Object detection is a computer vision-based technology, which is utilized for detecting instances of semantic objects of a specific class in digital videos and images. Crowd density analysis is a widely used application of object detection, while crowd density classification techniques face complications such as inter-scene deviations, non-uniform density, intra-scene deviations and occlusion. The convolution neural network (CNN) model is advantageous. This study presents Aquila Optimization with Transfer Learning based Crowd Density Analysis for Sustainable Smart Cities (AOTL-CDA3S). The presented AOTL-CDA3S technique aims to identify different kinds of crowd densities in the smart cities. For accomplishing this, the proposed AOTL-CDA3S model initially applies a weighted average filter (WAF) technique for improving the quality of the input frames. Next, the AOTL-CDA3S technique employs an AO algorithm with the SqueezeNet model for feature extraction. Finally, to classify crowd densities, an extreme gradient boosting (XGBoost) classification model is used. The experimental validation of the AOTL-CDA3S approach is tested by means of benchmark crowd datasets and the results are examined under distinct metrics. This study reports the improvements of the AOTL-CDA3S model over recent state of the art methods. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

15 pages, 3787 KiB  
Article
FastDARTSDet: Fast Differentiable Architecture Joint Search on Backbone and FPN for Object Detection
by Chunxian Wang, Xiaoxing Wang, Yiwen Wang, Shengchao Hu, Hongyang Chen, Xuehai Gu, Junchi Yan and Tao He
Appl. Sci. 2022, 12(20), 10530; https://doi.org/10.3390/app122010530 - 19 Oct 2022
Cited by 3 | Viewed by 1372
Abstract
Neural architecture search (NAS) is a popular branch of automatic machine learning (AutoML), which aims to search for efficient network structures. Many prior works have explored a wide range of search algorithms for classification tasks, and have achieved better performance than manually designed [...] Read more.
Neural architecture search (NAS) is a popular branch of automatic machine learning (AutoML), which aims to search for efficient network structures. Many prior works have explored a wide range of search algorithms for classification tasks, and have achieved better performance than manually designed network architectures. However, few works have explored NAS for object detection tasks due to the difficulty to train convolution neural networks from scratch. In this paper, we propose a framework, named as FastDARTSDet, to directly search on a larger-scale object detection dataset (MS-COCO). Specifically, we propose to apply differentiable architecture search method (DARTS) to jointly search backbone and feature pyramid network (FPN) architectures for object detection task. Extensive experimental results on MS-COCO show the efficient and efficacy of our method. Specifically, our method achieves 40.0% mean average precision (mAP) on the test set, outperforming many recent NAS methods. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

22 pages, 8011 KiB  
Article
RS-YOLOX: A High-Precision Detector for Object Detection in Satellite Remote Sensing Images
by Lei Yang, Guowu Yuan, Hao Zhou, Hongyu Liu, Jian Chen and Hao Wu
Appl. Sci. 2022, 12(17), 8707; https://doi.org/10.3390/app12178707 - 30 Aug 2022
Cited by 19 | Viewed by 3299
Abstract
Automatic object detection by satellite remote sensing images is of great significance for resource exploration and natural disaster assessment. To solve existing problems in remote sensing image detection, this article proposes an improved YOLOX model for satellite remote sensing image automatic detection. This [...] Read more.
Automatic object detection by satellite remote sensing images is of great significance for resource exploration and natural disaster assessment. To solve existing problems in remote sensing image detection, this article proposes an improved YOLOX model for satellite remote sensing image automatic detection. This model is named RS-YOLOX. To strengthen the feature learning ability of the network, we used Efficient Channel Attention (ECA) in the backbone network of YOLOX and combined the Adaptively Spatial Feature Fusion (ASFF) with the neck network of YOLOX. To balance the numbers of positive and negative samples in training, we used the Varifocal Loss function. Finally, to obtain a high-performance remote sensing object detector, we combined the trained model with an open-source framework called Slicing Aided Hyper Inference (SAHI). This work evaluated models on three aerial remote sensing datasets (DOTA-v1.5, TGRS-HRRSD, and RSOD). Our comparative experiments demonstrate that our model has the highest accuracy in detecting objects in remote sensing image datasets. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

16 pages, 8491 KiB  
Article
Improved YOLOv5: Efficient Object Detection Using Drone Images under Various Conditions
by Hyun-Ki Jung and Gi-Sang Choi
Appl. Sci. 2022, 12(14), 7255; https://doi.org/10.3390/app12147255 - 19 Jul 2022
Cited by 67 | Viewed by 11806
Abstract
With the recent development of drone technology, object detection technology is emerging, and these technologies can also be applied to illegal immigrants, industrial and natural disasters, and missing people and objects. In this paper, we would like to explore ways to increase object [...] Read more.
With the recent development of drone technology, object detection technology is emerging, and these technologies can also be applied to illegal immigrants, industrial and natural disasters, and missing people and objects. In this paper, we would like to explore ways to increase object detection performance in these situations. Photography was conducted in an environment where it was confusing to detect an object. The experimental data were based on photographs that created various environmental conditions, such as changes in the altitude of the drone, when there was no light, and taking pictures in various conditions. All the data used in the experiment were taken with F11 4K PRO drone and VisDrone dataset. In this study, we propose an improved performance of the original YOLOv5 model. We applied the obtained data to each model: the original YOLOv5 model and the improved YOLOv5_Ours model, to calculate the key indicators. The main indicators are precision, recall, F-1 score, and mAP (0.5), and the YOLOv5_Ours values of mAP (0.5) and function loss were improved by comparing it with the original YOLOv5 model. Finally, the conclusion was drawn based on the data comparing the original YOLOv5 model and the improved YOLOv5_Ours model. As a result of the analysis, we were able to arrive at a conclusion on the best model of object detection under various conditions. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

13 pages, 2728 KiB  
Article
Surface Defect Detection Model for Aero-Engine Components Based on Improved YOLOv5
by Xin Li, Cheng Wang, Haijuan Ju and Zhuoyue Li
Appl. Sci. 2022, 12(14), 7235; https://doi.org/10.3390/app12147235 - 18 Jul 2022
Cited by 22 | Viewed by 2699
Abstract
Aiming at the problems of low efficiency and poor accuracy in conventional surface defect detection methods for aero-engine components, a surface defect detection model based on an improved YOLOv5 object detection algorithm is proposed in this paper. First, a k-means clustering algorithm was [...] Read more.
Aiming at the problems of low efficiency and poor accuracy in conventional surface defect detection methods for aero-engine components, a surface defect detection model based on an improved YOLOv5 object detection algorithm is proposed in this paper. First, a k-means clustering algorithm was used to recalculate the parameters of the preset anchors to make them match the samples better. Then, an ECA-Net attention mechanism was added at the end of the backbone network to make the model pay more attention to feature extraction from defect areas. Finally, the PANet structure of the neck network was improved through its replacement with BiFPN modules to fully integrate the features of all scales. The results showed that the mAP of the YOLOv5s-KEB model was 98.3%, which was 1.0% higher than the original YOLOv5s model, and the average inference time for a single image was 2.6 ms, which was 10.3% lower than the original model. Moreover, compared with the Faster R-CNN, YOLOv3, YOLOv4 and YOLOv4-tiny object detection algorithms, the YOLOv5s-KEB model has the highest accuracy and the smallest size, which make it very efficient and convenient for practical applications. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

20 pages, 5406 KiB  
Article
Deep Transfer Learning Enabled Intelligent Object Detection for Crowd Density Analysis on Video Surveillance Systems
by Fadwa Alrowais, Saud S. Alotaibi, Fahd N. Al-Wesabi, Noha Negm, Rana Alabdan, Radwa Marzouk, Amal S. Mehanna and Mesfer Al Duhayyim
Appl. Sci. 2022, 12(13), 6665; https://doi.org/10.3390/app12136665 - 30 Jun 2022
Cited by 12 | Viewed by 2179
Abstract
Object detection is a computer vision based technique which is used to detect instances of semantic objects of a particular class in digital images and videos. Crowd density analysis is one of the commonly utilized applications of object detection. Since crowd density classification [...] Read more.
Object detection is a computer vision based technique which is used to detect instances of semantic objects of a particular class in digital images and videos. Crowd density analysis is one of the commonly utilized applications of object detection. Since crowd density classification techniques face challenges like non-uniform density, occlusion, inter-scene, and intra-scene deviations, convolutional neural network (CNN) models are useful. This paper presents a Metaheuristics with Deep Transfer Learning Enabled Intelligent Crowd Density Detection and Classification (MDTL-ICDDC) model for video surveillance systems. The proposed MDTL-ICDDC technique mostly concentrates on the effective identification and classification of crowd density on video surveillance systems. In order to achieve this, the MDTL-ICDDC model primarily leverages a Salp Swarm Algorithm (SSA) with NASNetLarge model as a feature extraction in which the hyperparameter tuning process is performed by the SSA. Furthermore, a weighted extreme learning machine (WELM) method was utilized for crowd density and classification process. Finally, the krill swarm algorithm (KSA) is applied for an effective parameter optimization process and thereby improves the classification results. The experimental validation of the MDTL-ICDDC approach was carried out with a benchmark dataset, and the outcomes are examined under several aspects. The experimental values indicated that the MDTL-ICDDC system has accomplished enhanced performance over other models such as Gabor, BoW-SRP, Bow-LBP, GLCM-SVM, GoogleNet, and VGGNet. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

21 pages, 6188 KiB  
Article
Development and Optimization of Deep Learning Models for Weapon Detection in Surveillance Videos
by Soban Ahmed, Muhammad Tahir Bhatti, Muhammad Gufran Khan, Benny Lövström and Muhammad Shahid
Appl. Sci. 2022, 12(12), 5772; https://doi.org/10.3390/app12125772 - 07 Jun 2022
Cited by 14 | Viewed by 3779
Abstract
Weapon detection in CCTV camera surveillance videos is a challenging task and its importance is increasing because of the availability and easy access of weapons in the market. This becomes a big problem when weapons go into the wrong hands and are often [...] Read more.
Weapon detection in CCTV camera surveillance videos is a challenging task and its importance is increasing because of the availability and easy access of weapons in the market. This becomes a big problem when weapons go into the wrong hands and are often misused. Advances in computer vision and object detection are enabling us to detect weapons in live videos without human intervention and, in turn, intelligent decisions can be made to protect people from dangerous situations. In this article, we have developed and presented an improved real-time weapon detection system that shows a higher mean average precision (mAP) score and better inference time performance compared to the previously proposed approaches in the literature. Using a custom weapons dataset, we implemented a state-of-the-art Scaled-YOLOv4 model that resulted in a 92.1 mAP score and frames per second (FPS) of 85.7 on a high-performance GPU (RTX 2080TI). Furthermore, to achieve the benefits of lower latency, higher throughput, and improved privacy, we optimized our model for implementation on a popular edge-computing device (Jetson Nano GPU) with the TensorRT network optimizer. We have also performed a comparative analysis of the previous weapon detector with our presented model using different CPU and GPU machines that fulfill the purpose of this work, making the selection of model and computing device easier for the users for deployment in a real-time scenario. The analysis shows that our presented models result in improved mAP scores on high-performance GPUs (such as RTX 2080TI), as well as on low-cost edge computing GPUs (such as Jetson Nano) for weapon detection in live CCTV camera surveillance videos. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

16 pages, 1844 KiB  
Article
SiamCAM: A Real-Time Siamese Network for Object Tracking with Compensating Attention Mechanism
by Kai Huang, Peixuan Qin, Xuji Tu, Lu Leng and Jun Chu
Appl. Sci. 2022, 12(8), 3931; https://doi.org/10.3390/app12083931 - 13 Apr 2022
Cited by 4 | Viewed by 2009
Abstract
The Siamese-based object tracking algorithm regards tracking as a similarity matching problem. It determines the object location according to the response value of the object template to the search template. When there is similar object interference in complex scenes, it is easy to [...] Read more.
The Siamese-based object tracking algorithm regards tracking as a similarity matching problem. It determines the object location according to the response value of the object template to the search template. When there is similar object interference in complex scenes, it is easy to cause tracking drift. We propose a real-time Siamese network object tracking algorithm combined with a compensating attention mechanism to solve this problem. Firstly, the attention mechanism is introduced in the feature extraction module of the template branch and search branch of the Siamese network to improve the feature representation of the network to the object. The attention mechanism of the search branch enhances the feature representation of both the target and the similar backgrounds simultaneously. Therefore, based on the above two-branch attention, we propose a compensated attention model, which introduces the attention selected by the template branch into the search branch, and improves the discriminative ability of the search branch to the object by using the feature attention weighting of the template branch to the object. Experimental results on three popular benchmarks, including OTB2015, VOT2018, and LaSOT, show that the accuracy and robustness of the algorithm in this paper are adequate. It improved occlusion cases, similar object interference, and high-speed motion. The processing speed on GPU reaches 47 fps, which can achieve real-time object tracking. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

18 pages, 6552 KiB  
Article
Deep Learning-Based Small Object Detection and Classification Model for Garbage Waste Management in Smart Cities and IoT Environment
by Faisal S. Alsubaei, Fahd N. Al-Wesabi and Anwer Mustafa Hilal
Appl. Sci. 2022, 12(5), 2281; https://doi.org/10.3390/app12052281 - 22 Feb 2022
Cited by 27 | Viewed by 5224
Abstract
In recent years, object detection has gained significant interest and is considered a challenging problem in computer vision. Object detection is mainly employed for several applications, such as instance segmentation, object tracking, image captioning, healthcare, etc. Recent studies have reported that deep learning [...] Read more.
In recent years, object detection has gained significant interest and is considered a challenging problem in computer vision. Object detection is mainly employed for several applications, such as instance segmentation, object tracking, image captioning, healthcare, etc. Recent studies have reported that deep learning (DL) models can be employed for effective object detection compared to traditional methods. The rapid urbanization of smart cities necessitates the design of intelligent and automated waste management techniques for effective recycling of waste. In this view, this study develops a novel deep learning-based small object detection and classification model for garbage waste management (DLSODC-GWM) technique. The proposed DLSODC-GWM technique mainly focuses on detecting and classifying small garbage waste objects to assist intelligent waste management systems. The DLSODC-GWM technique follows two major processes, namely, object detection and classification. For object detection, an arithmetic optimization algorithm (AOA) with an improved RefineDet (IRD) model is applied, where the hyperparameters of the IRD model are optimally chosen by the AOA. Secondly, the functional link neural network (FLNN) technique was applied for the classification of waste objects into multiple classes. The design of IRD for waste classification and AOA-based hyperparameter tuning demonstrates the novelty of the work. The performance validation of the DLSODC-GWM technique is performed using benchmark datasets, and the experimental results show the promising performance of the DLSODC-GWM method on existing approaches with a maximum accuy of 98.61%. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

15 pages, 18951 KiB  
Article
Modeling Pedestrian Motion in Crowded Scenes Based on the Shortest Path Principle
by Yi Zou and Yuncai Liu
Appl. Sci. 2022, 12(1), 381; https://doi.org/10.3390/app12010381 - 31 Dec 2021
Viewed by 1382
Abstract
In the computer vision field, understanding human dynamics is not only a great challenge but also very meaningful work, which plays an indispensable role in public safety. Despite the complexity of human dynamics, physicists have found that pedestrian motion in a crowd is [...] Read more.
In the computer vision field, understanding human dynamics is not only a great challenge but also very meaningful work, which plays an indispensable role in public safety. Despite the complexity of human dynamics, physicists have found that pedestrian motion in a crowd is governed by some internal rules, which can be formulated as a motion model, and an effective model is of great importance for understanding and reconstructing human dynamics in various scenes. In this paper, we revisit the related research in social psychology and propose a two-part motion model based on the shortest path principle. One part of the model seeks the origin and destination of a pedestrian, and the other part generates the movement path of the pedestrian. With the proposed motion model, we simulated the movement behavior of pedestrians and classified them into various patterns. We next reconstructed the crowd motions in a real-world scene. In addition, to evaluate the effectiveness of the model in crowd motion simulations, we created a new indicator to quantitatively measure the correlation between two groups of crowd motion trajectories. The experimental results show that our motion model outperformed the state-of-the-art model in the above applications. Full article
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)
Show Figures

Figure 1

Back to TopTop