sensors-logo

Journal Browser

Journal Browser

Image Processing and Analysis for Object Detection

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: closed (30 March 2023) | Viewed by 55560

Special Issue Editors


E-Mail Website
Guest Editor
School of Information and Control, Nanjing University of Information Science and Technology, Nanjing, China
Interests: computer vision; pattern recognition
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China
Interests: visual tracking; sign language recognition

E-Mail Website
Guest Editor
JD Finance America Corporation, Mountain View, CA 94089, USA
Interests: multimedia analysis; sign language recognition

E-Mail Website
Guest Editor
Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, Nanjing, China
Interests: face recognition; image super-resolution
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We have witnessed an explosion in interest related to the research and development of deep learning techniques for computer vision in recent years. As deep learning covers almost all fields of science and engineering, computer vision remains one of its primary application areas. Specifically, the use of deep learning to handle computer vision tasks has led to numerous unprecedented performances, such as high-accuracy object detection, visual tracking, image segmentation, image/video super-resolution, satellite image processing, and saliency object detection, which cannot achieve promising performance through the use of conventional methods.

This Special Issue aims to cover recent advancements in computer vision that involve the use of deep learning methods, with a particular interest in low-level and high-level computer vision tasks. Both original research and review articles are welcome. Topics include, but are not limited to, the following:

  • Image/video super-resolution with deep learning approaches;
  • Object detection, visual tracking, and image/video segmentation with deep learning approaches;
  • Supervised and unsupervised learning for image/video processing;
  • Satellite image processing with deep learning techniques;
  • Low-light image enhancement using deep learning approaches.

Dr. Kaihua Zhang
Prof. Dr. Wanli Xue
Dr. Bo Liu
Dr. Guangwei Gao
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • computer vision
  • object detection
  • visual tracking
  • image super-resolution
  • saliency object detection

Related Special Issue

Published Papers (21 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

31 pages, 9481 KiB  
Article
Extreme Early Image Recognition Using Event-Based Vision
by Abubakar Abubakar, AlKhzami AlHarami, Yin Yang and Amine Bermak
Sensors 2023, 23(13), 6195; https://doi.org/10.3390/s23136195 - 6 Jul 2023
Cited by 1 | Viewed by 1432
Abstract
While deep learning algorithms have advanced to a great extent, they are all designed for frame-based imagers that capture images at a high frame rate, which leads to a high storage requirement, heavy computations, and very high power consumption. Unlike frame-based imagers, event-based [...] Read more.
While deep learning algorithms have advanced to a great extent, they are all designed for frame-based imagers that capture images at a high frame rate, which leads to a high storage requirement, heavy computations, and very high power consumption. Unlike frame-based imagers, event-based imagers output asynchronous pixel events without the need for global exposure time, therefore lowering both power consumption and latency. In this paper, we propose an innovative image recognition technique that operates on image events rather than frame-based data, paving the way for a new paradigm of recognizing objects prior to image acquisition. To the best of our knowledge, this is the first time such a concept is introduced featuring not only extreme early image recognition but also reduced computational overhead, storage requirement, and power consumption. Our collected event-based dataset using CeleX imager and five public event-based datasets are used to prove this concept, and the testing metrics reflect how early the neural network (NN) detects an image before the full-frame image is captured. It is demonstrated that, on average for all the datasets, the proposed technique recognizes an image 38.7 ms before the first perfect event and 603.4 ms before the last event is received, which is a reduction of 34% and 69% of the time needed, respectively. Further, less processing is required as the image is recognized 9460 events earlier, which is 37% less than waiting for the first perfectly recognized image. An enhanced NN method is also introduced to reduce this time. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

22 pages, 24238 KiB  
Article
Object Detection of Flexible Objects with Arbitrary Orientation Based on Rotation-Adaptive YOLOv5
by Jiajun Wu, Lumei Su, Zhiwei Lin, Yuhan Chen, Jiaming Ji and Tianyou Li
Sensors 2023, 23(10), 4925; https://doi.org/10.3390/s23104925 - 20 May 2023
Cited by 1 | Viewed by 1991
Abstract
It is challenging to accurately detect flexible objects with arbitrary orientation from monitoring images in power grid maintenance and inspection sites. This is because these images exhibit a significant imbalance between the foreground and background, which can lead to low detection accuracy when [...] Read more.
It is challenging to accurately detect flexible objects with arbitrary orientation from monitoring images in power grid maintenance and inspection sites. This is because these images exhibit a significant imbalance between the foreground and background, which can lead to low detection accuracy when using a horizontal bounding box (HBB) as the detector in general object detection algorithms. Existing multi-oriented detection algorithms that use irregular polygons as the detector can improve accuracy to some extent, but their accuracy is limited due to boundary problems during the training process. This paper proposes a rotation-adaptive YOLOv5 (R_YOLOv5) with a rotated bounding box (RBB) to detect flexible objects with arbitrary orientation, effectively addressing the above issues and achieving high accuracy. Firstly, a long-side representation method is used to add the degree of freedom (DOF) for bounding boxes, enabling accurate detection of flexible objects with large spans, deformable shapes, and small foreground-to-background ratios. Furthermore, the further boundary problem induced by the proposed bounding box strategy is overcome by using classification discretization and symmetric function mapping methods. Finally, the loss function is optimized to ensure training convergence for the new bounding box. To meet various practical requirements, we propose four models with different scales based on YOLOv5, namely R_YOLOv5s, R_YOLOv5m, R_YOLOv5l, and R_YOLOv5x. Experimental results demonstrate that these four models achieve mean average precision (mAP) values of 0.712, 0.731, 0.736, and 0.745 on the DOTA-v1.5 dataset and 0.579, 0.629, 0.689, and 0.713 on our self-built FO dataset, exhibiting higher recognition accuracy and a stronger generalization ability. Among them, R_YOLOv5x achieves a mAP that is about 6.84% higher than ReDet on the DOTAv-1.5 dataset and at least 2% higher than the original YOLOv5 model on the FO dataset. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

26 pages, 12885 KiB  
Article
A Systematic Solution for Moving-Target Detection and Tracking While Only Using a Monocular Camera
by Shun Wang, Sheng Xu, Zhihao Ma, Dashuai Wang and Weimin Li
Sensors 2023, 23(10), 4862; https://doi.org/10.3390/s23104862 - 18 May 2023
Cited by 1 | Viewed by 2050
Abstract
This paper focuses on moving-target detection and tracking in a three-dimensional (3D) space, and proposes a visual target tracking system only using a two-dimensional (2D) camera. To quickly detect moving targets, an improved optical flow method with detailed modifications in the pyramid, warping, [...] Read more.
This paper focuses on moving-target detection and tracking in a three-dimensional (3D) space, and proposes a visual target tracking system only using a two-dimensional (2D) camera. To quickly detect moving targets, an improved optical flow method with detailed modifications in the pyramid, warping, and cost volume network (PWC-Net) is applied. Meanwhile, a clustering algorithm is used to accurately extract the moving target from a noisy background. Then, the target position is estimated using a proposed geometrical pinhole imaging algorithm and cubature Kalman filter (CKF). Specifically, the camera’s installation position and inner parameters are applied to calculate the azimuth, elevation angles, and depth of the target while only using 2D measurements. The proposed geometrical solution has a simple structure and fast computational speed. Different simulations and experiments verify the effectiveness of the proposed method. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

18 pages, 16184 KiB  
Article
Waste Detection System Based on Data Augmentation and YOLO_EC
by Jinhao Fan, Lizhi Cui and Shumin Fei
Sensors 2023, 23(7), 3646; https://doi.org/10.3390/s23073646 - 31 Mar 2023
Cited by 6 | Viewed by 2093
Abstract
The problem of waste classification has been a major concern for both the government and society, and whether waste can be effectively classified will affect the sustainable development of human society. To perform fast and efficient detection of waste targets in the sorting [...] Read more.
The problem of waste classification has been a major concern for both the government and society, and whether waste can be effectively classified will affect the sustainable development of human society. To perform fast and efficient detection of waste targets in the sorting process, this paper proposes a data augmentation + YOLO_EC waste detection system. First of all, because of the current shortage of multi-objective waste classification datasets, the heavy workload of human data collection, and the limited improvement of data features by traditional data augmentation methods, DCGAN (deep convolution generative adversarial networks) was optimized by improving the loss function, and an image-generation model was established to realize the generation of multi-objective waste images; secondly, with YOLOv4 (You Only Look Once version 4) as the basic model, EfficientNet is used as the backbone feature extraction network to realize the light weight of the algorithm, and at the same time, the CA (coordinate attention) attention mechanism is introduced to reconstruct the MBConv module to filter out high-quality information and enhance the feature extraction ability of the model. Experimental results show that on the HPU_WASTE dataset, the proposed model outperforms other models in both data augmentation and waste detection. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

17 pages, 21333 KiB  
Article
ARTD-Net: Anchor-Free Based Recyclable Trash Detection Net Using Edgeless Module
by BoSeon Kang and Chang-Sung Jeong
Sensors 2023, 23(6), 2907; https://doi.org/10.3390/s23062907 - 7 Mar 2023
Cited by 2 | Viewed by 1391
Abstract
Due to the sharp increase in household waste, its separate collection is essential in order to reduce the huge amount of household waste, since it is difficult to recycle trash without separate collection. However, since it is costly and time-consuming to separate trash [...] Read more.
Due to the sharp increase in household waste, its separate collection is essential in order to reduce the huge amount of household waste, since it is difficult to recycle trash without separate collection. However, since it is costly and time-consuming to separate trash manually, it is crucial to develop an automatic system for separate collection using deep learning and computer vision. In this paper, we propose two Anchor-free-based Recyclable Trash Detection Networks (ARTD-Net) which can recognize overlapped multiple wastes of different types efficiently by using edgeless modules: ARTD-Net1 and ARTD-Net2. The former is an anchor-free based one-stage deep learning model which consists of three modules: centralized feature extraction, multiscale feature extraction and prediction. The centralized feature extraction module in backbone architecture focuses on extracting features around the center of the input image to improve detection accuracy. The multiscale feature extraction module provides feature maps of different scales through bottom-up and top-down pathways. The prediction module improves classification accuracy of multiple objects based on edge weights adjustments for each instance. The latter is an anchor-free based multi-stage deep learning model which can efficiently finds each of waste regions by additionally exploiting region proposal network and RoIAlign. It sequentially performs classification and regression to improve accuracy. Therefore, ARTD-Net2 is more accurate than ARTD-Net1, while ARTD-Net1 is faster than ARTD-Net2. We shall show that our proposed ARTD-Net1 and ARTD-Net2 methods achieve competitive performance in mean average precision and F1 score compared to other deep learning models. The existing datasets have several problems that do not deal with the important class of wastes produced commonly in the real world, and they also do not consider the complex arrangement of multiple wastes with different types. Moreover, most of the existing datasets have an insufficient number of images with low resolution. We shall present a new recyclables dataset which is composed of a large number of high-resolution waste images with additional essential classes. We shall show that waste detection performance is improved by providing various images with the complex arrangement of overlapped multiple wastes with different types. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

15 pages, 15849 KiB  
Article
Underwater Object Detection Using TC-YOLO with Attention Mechanisms
by Kun Liu, Lei Peng and Shanran Tang
Sensors 2023, 23(5), 2567; https://doi.org/10.3390/s23052567 - 25 Feb 2023
Cited by 18 | Viewed by 5138
Abstract
Underwater object detection is a key technology in the development of intelligent underwater vehicles. Object detection faces unique challenges in underwater applications: blurry underwater images; small and dense targets; and limited computational capacity available on the deployed platforms. To improve the performance of [...] Read more.
Underwater object detection is a key technology in the development of intelligent underwater vehicles. Object detection faces unique challenges in underwater applications: blurry underwater images; small and dense targets; and limited computational capacity available on the deployed platforms. To improve the performance of underwater object detection, we proposed a new object detection approach that combines a new detection neural network called TC-YOLO, an image enhancement technique using an adaptive histogram equalization algorithm, and the optimal transport scheme for label assignment. The proposed TC-YOLO network was developed based on YOLOv5s. Transformer self-attention and coordinate attention were adopted in the backbone and neck of the new network, respectively, to enhance feature extraction for underwater objects. The application of optimal transport label assignment enables a significant reduction in the number of fuzzy boxes and improves the utilization of training data. Our tests using the RUIE2020 dataset and ablation experiments demonstrate that the proposed approach performs better than the original YOLOv5s and other similar networks for underwater object detection tasks; moreover, the size and computational cost of the proposed model remain small for underwater mobile applications. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

24 pages, 6833 KiB  
Article
Detection of Occluded Small Commodities Based on Feature Enhancement under Super-Resolution
by Haonan Dong, Kai Xie, An Xie, Chang Wen, Jianbiao He, Wei Zhang, Dajiang Yi and Sheng Yang
Sensors 2023, 23(5), 2439; https://doi.org/10.3390/s23052439 - 22 Feb 2023
Cited by 2 | Viewed by 1254
Abstract
As small commodity features are often few in number and easily occluded by hands, the overall detection accuracy is low, and small commodity detection is still a great challenge. Therefore, in this study, a new algorithm for occlusion detection is proposed. Firstly, a [...] Read more.
As small commodity features are often few in number and easily occluded by hands, the overall detection accuracy is low, and small commodity detection is still a great challenge. Therefore, in this study, a new algorithm for occlusion detection is proposed. Firstly, a super-resolution algorithm with an outline feature extraction module is used to process the input video frames to restore high-frequency details, such as the contours and textures of the commodities. Next, residual dense networks are used for feature extraction, and the network is guided to extract commodity feature information under the effects of an attention mechanism. As small commodity features are easily ignored by the network, a new local adaptive feature enhancement module is designed to enhance the regional commodity features in the shallow feature map to enhance the expression of the small commodity feature information. Finally, a small commodity detection box is generated through the regional regression network to complete the small commodity detection task. Compared to RetinaNet, the F1-score improved by 2.6%, and the mean average precision improved by 2.45%. The experimental results reveal that the proposed method can effectively enhance the expressions of the salient features of small commodities and further improve the detection accuracy for small commodities. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

18 pages, 37330 KiB  
Article
Lightweight Helmet Detection Algorithm Using an Improved YOLOv4
by Junhua Chen, Sihao Deng, Ping Wang, Xueda Huang and Yanfei Liu
Sensors 2023, 23(3), 1256; https://doi.org/10.3390/s23031256 - 21 Jan 2023
Cited by 18 | Viewed by 3051
Abstract
Safety helmet wearing plays a major role in protecting the safety of workers in industry and construction, so a real-time helmet wearing detection technology is very necessary. This paper proposes an improved YOLOv4 algorithm to achieve real-time and efficient safety helmet wearing detection. [...] Read more.
Safety helmet wearing plays a major role in protecting the safety of workers in industry and construction, so a real-time helmet wearing detection technology is very necessary. This paper proposes an improved YOLOv4 algorithm to achieve real-time and efficient safety helmet wearing detection. The improved YOLOv4 algorithm adopts a lightweight network PP-LCNet as the backbone network and uses deepwise separable convolution to decrease the model parameters. Besides, the coordinate attention mechanism module is embedded in the three output feature layers of the backbone network to enhance the feature information, and an improved feature fusion structure is designed to fuse the target information. In terms of the loss function, we use a new SIoU loss function that fuses directional information to increase detection precision. The experimental findings demonstrate that the improved YOLOv4 algorithm achieves an accuracy of 92.98%, a model size of 41.88 M, and a detection speed of 43.23 pictures/s. Compared with the original YOLOv4, the accuracy increases by 0.52%, the model size decreases by about 83%, and the detection speed increases by 88%. Compared with other existing methods, it performs better in terms of precision and speed. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

33 pages, 33738 KiB  
Article
Real-Time Human Motion Tracking by Tello EDU Drone
by Anuparp Boonsongsrikul and Jirapon Eamsaard
Sensors 2023, 23(2), 897; https://doi.org/10.3390/s23020897 - 12 Jan 2023
Cited by 6 | Viewed by 6312
Abstract
Human movement tracking is useful in a variety of areas, such as search-and-rescue activities. CCTV and IP cameras are popular as front-end sensors for tracking human motion; however, they are stationary and have limited applicability in hard-to-reach places, such as those where disasters [...] Read more.
Human movement tracking is useful in a variety of areas, such as search-and-rescue activities. CCTV and IP cameras are popular as front-end sensors for tracking human motion; however, they are stationary and have limited applicability in hard-to-reach places, such as those where disasters have occurred. Using a drone to discover a person is challenging and requires an innovative approach. In this paper, we aim to present the design and implementation of a human motion tracking method using a Tello EDU drone. The design methodology is carried out in four steps: (1) control panel design; (2) human motion tracking algorithm; (3) notification systems; and (4) communication and distance extension. Intensive experimental results show that the drone implemented by the proposed algorithm performs well in tracking a human at a distance of 2–10 m moving at a speed of 2 m/s. In an experimental field of the size 95×35m2, the drone tracked human motion throughout a whole day, with the best tracking results observed in the morning. The drone was controlled from a laptop using a Wi-Fi router with a maximum horizontal tracking distance of 84.30 m and maximum vertical distance of 13.40 m. The experiment showed an accuracy rate for human movement detection between 96.67 and 100%. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

21 pages, 12314 KiB  
Article
High Quality Coal Foreign Object Image Generation Method Based on StyleGAN-DSAD
by Xiangang Cao, Hengyang Wei, Peng Wang, Chiyu Zhang, Shikai Huang and Hu Li
Sensors 2023, 23(1), 374; https://doi.org/10.3390/s23010374 - 29 Dec 2022
Cited by 3 | Viewed by 2485
Abstract
Research on coal foreign object detection based on deep learning is of great significance to safe, efficient, and green production of coal mines. However, the foreign object image dataset is scarce due to collection conditions, which brings an enormous challenge to coal foreign [...] Read more.
Research on coal foreign object detection based on deep learning is of great significance to safe, efficient, and green production of coal mines. However, the foreign object image dataset is scarce due to collection conditions, which brings an enormous challenge to coal foreign object detection. To achieve augmentation of foreign object datasets, a high-quality coal foreign object image generation method based on improved StyleGAN is proposed. Firstly, the dual self-attention module is introduced into the generator to strengthen the long-distance dependence of features between spatial and channel, refine the details of the generated images, accurately distinguish the front background information, and improve the quality of the generated images. Secondly, the depthwise separable convolution is introduced into the discriminator to solve the problem of low efficiency caused by the large number of parameters of multi-stage convolutional networks, to realize the lightweight model, and to accelerate the training speed. Experimental results show that the improved model has significant advantages over several classical GANS and original StyleGAN in terms of quality and diversity of the generated images, with an average improvement of 2.52 in IS and a decrease of 5.80 in FID for each category. As for the model complexity, the parameters and training time of the improved model are reduced to 44.6% and 58.8% of the original model without affecting the generated images quality. Finally, the results of applying different data augmentation methods to the foreign object detection task show that our image generation method is more effective than the traditional methods, and that, under the optimal conditions, it improves APbox by 5.8% and APmask by 4.5%. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

15 pages, 5320 KiB  
Article
A Robust Method: Arbitrary Shape Text Detection Combining Semantic and Position Information
by Zhenchao Wang, Wushour Silamu, Yuze Li and Miaomiao Xu
Sensors 2022, 22(24), 9982; https://doi.org/10.3390/s22249982 - 18 Dec 2022
Cited by 2 | Viewed by 1601
Abstract
There is a growing interest in scene text detection for arbitrary shapes. The effectiveness of text detection has also evolved from horizontal text detection to the ability to perform text detection in multiple directions and arbitrary shapes. However, scene text detection is still [...] Read more.
There is a growing interest in scene text detection for arbitrary shapes. The effectiveness of text detection has also evolved from horizontal text detection to the ability to perform text detection in multiple directions and arbitrary shapes. However, scene text detection is still a challenging task due to significant differences in size and aspect ratio and diversity in shape, as well as orientation, coarse annotations, and other factors. Regression-based methods are inspired by object detection and have limitations in fitting the edges of arbitrarily shaped text due to the characteristics of their methods. Segmentation-based methods, on the other hand, perform prediction at the pixel level and thus can fit arbitrarily shaped text better. However, the inaccuracy of image text annotations and the distribution characteristics of text pixels, which contain a large number of background pixels and misclassified pixels, degrades the performance of segmentation-based text detection methods to some extent. Usually, considering whether a pixel belongs to a text region is highly dependent on the strength of the semantic information it has and the position of the pixel in the text area. Based on the above two points, we propose an innovative and robust method for scene text detection combining position and semantic information. First, we add position information to the images using a position encoding module (PosEM) to help the model learn the implicit feature relationships associated with the position. Second, we use the semantic enhancement module (SEM) to enhance the model’s focus on the semantic information in the image during feature extraction. Then, to minimize the effect of noise due to inaccurate image text annotations and the distribution characteristics of text pixels, we convert the detection results into a probability map that can more reasonably represent the text distribution. Finally, we reconstruct and filter the text instances using a post-processing algorithm to reduce false positives. The experimental results show that our model improves significantly on the Total-Text, MSRA-TD500, and CTW1500 datasets, outperforming most previous advanced algorithms. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

14 pages, 1706 KiB  
Article
Attention-Based Scene Text Detection on Dual Feature Fusion
by Yuze Li, Wushour Silamu, Zhenchao Wang and Miaomiao Xu
Sensors 2022, 22(23), 9072; https://doi.org/10.3390/s22239072 - 23 Nov 2022
Cited by 2 | Viewed by 1582
Abstract
The segmentation-based scene text detection algorithm has advantages in scene text detection scenarios with arbitrary shape and extreme aspect ratio, depending on its pixel-level description and fine post-processing. However, the insufficient use of semantic and spatial information in the network limits the classification [...] Read more.
The segmentation-based scene text detection algorithm has advantages in scene text detection scenarios with arbitrary shape and extreme aspect ratio, depending on its pixel-level description and fine post-processing. However, the insufficient use of semantic and spatial information in the network limits the classification and positioning capabilities of the network. Existing scene text detection methods have the problem of losing important feature information in the process of extracting features from each network layer. To solve this problem, the Attention-based Dual Feature Fusion Model (ADFM) is proposed. The Bi-directional Feature Fusion Pyramid Module (BFM) first adds stronger semantic information to the higher-resolution feature maps through a top-down process and then reduces the aliasing effects generated by the previous process through a bottom-up process to enhance the representation of multi-scale text semantic information. Meanwhile, a position-sensitive Spatial Attention Module (SAM) is introduced in the intermediate process of two-stage feature fusion. It focuses on the one feature map with the highest resolution and strongest semantic features generated in the top-down process and weighs the spatial position weight by the relevance of text features, thus improving the sensitivity of the text detection network to text regions. The effectiveness of each module of ADFM was verified by ablation experiments and the model was compared with recent scene text detection methods on several publicly available datasets. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

12 pages, 1906 KiB  
Article
A Study on the Effectiveness of Spatial Filters on Thermal Image Pre-Processing and Correlation Technique for Quantifying Defect Size
by Ho Jong Kim, Anuja Shrestha, Eliza Sapkota, Anwit Pokharel, Sarvesh Pandey, Cheol Sang Kim and Ranjit Shrestha
Sensors 2022, 22(22), 8965; https://doi.org/10.3390/s22228965 - 19 Nov 2022
Cited by 5 | Viewed by 2320
Abstract
Thermal imaging plays a vital role in structural health monitoring of various materials and provides insight into the defect present due to aging, deterioration, and fault during construction. This study investigated the effectiveness of spatial filters during pre-processing of thermal images and a [...] Read more.
Thermal imaging plays a vital role in structural health monitoring of various materials and provides insight into the defect present due to aging, deterioration, and fault during construction. This study investigated the effectiveness of spatial filters during pre-processing of thermal images and a correlation technique in post-processing, as well as exploited its application in non-destructive testing and evaluation of defects in steel structures. Two linear filters (i.e., Gaussian and Window Averaging) and a non-linear filter (i.e., Median) were implemented during pre-processing of a pulsed thermography image sequence. The effectiveness of implemented filters was then assessed using signal to noise ratio as a quality metric. The result of pre-processing revealed that each implemented filter is capable of reducing impulse noise and producing high-quality images; additionally, when comparing the signal to noise ratio, the Gaussian filter dominated both Window Averaging and Median filters. Defect size was determined using a correlation technique on a sequence of pulsed thermography images that had been pre-processed with a Gaussian filter. Finally, it is concluded that the correlation technique could be applied to the fast measurement of defect size, even though the accuracy may depend on the detection limit of thermography and defect size to depth ratio. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

15 pages, 6050 KiB  
Article
Adaptive CFAR Method for SAR Ship Detection Using Intensity and Texture Feature Fusion Attention Contrast Mechanism
by Nana Li, Xueli Pan, Lixia Yang, Zhixiang Huang, Zhenhua Wu and Guoqing Zheng
Sensors 2022, 22(21), 8116; https://doi.org/10.3390/s22218116 - 23 Oct 2022
Cited by 7 | Viewed by 2109
Abstract
Due to the complexity of sea surface environments, such as speckles and side lobes of ships, ship wake, etc., the detection of ship targets in synthetic aperture radar (SAR) images is still confronted with enormous challenges, especially for small ship targets. Aiming at [...] Read more.
Due to the complexity of sea surface environments, such as speckles and side lobes of ships, ship wake, etc., the detection of ship targets in synthetic aperture radar (SAR) images is still confronted with enormous challenges, especially for small ship targets. Aiming at the key problem of ship target detection in the complex environments, the article proposes a constant false alarm rate (CFAR) algorithm for SAR ship target detection based on the attention contrast mechanism of intensity and texture feature fusion. First of all, the local feature attention contrast enhancement is performed based on the intensity dissimilarity and the texture feature difference described by local binary pattern (LBP) between ship targets and sea clutter, so as to realize the target enhancement and background suppression. Furthermore, the adaptive CFAR ship target detection method based on generalized Gamma distribution (GΓD) which can fit the clutter well by the goodness-of-fit analyses is carried out. Finally, the public datasets HRSID and LS-SSDD-v1.0 are used to verify the effectiveness of the proposed detection method. A large number of experimental results show that the proposed method can suppress clutter background and speckle noise and improve the target-to-clutter rate (TCR) significantly, and has the relative high detection rate and low false alarm rate in the complex background and multi-target marine environments. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

20 pages, 3347 KiB  
Article
A Residual-Inception U-Net (RIU-Net) Approach and Comparisons with U-Shaped CNN and Transformer Models for Building Segmentation from High-Resolution Satellite Images
by Batuhan Sariturk and Dursun Zafer Seker
Sensors 2022, 22(19), 7624; https://doi.org/10.3390/s22197624 - 8 Oct 2022
Cited by 12 | Viewed by 2404
Abstract
Building segmentation is crucial for applications extending from map production to urban planning. Nowadays, it is still a challenge due to CNNs’ inability to model global context and Transformers’ high memory need. In this study, 10 CNN and Transformer models were generated, and [...] Read more.
Building segmentation is crucial for applications extending from map production to urban planning. Nowadays, it is still a challenge due to CNNs’ inability to model global context and Transformers’ high memory need. In this study, 10 CNN and Transformer models were generated, and comparisons were realized. Alongside our proposed Residual-Inception U-Net (RIU-Net), U-Net, Residual U-Net, and Attention Residual U-Net, four CNN architectures (Inception, Inception-ResNet, Xception, and MobileNet) were implemented as encoders to U-Net-based models. Lastly, two Transformer-based approaches (Trans U-Net and Swin U-Net) were also used. Massachusetts Buildings Dataset and Inria Aerial Image Labeling Dataset were used for training and evaluation. On Inria dataset, RIU-Net achieved the highest IoU score, F1 score, and test accuracy, with 0.6736, 0.7868, and 92.23%, respectively. On Massachusetts Small dataset, Attention Residual U-Net achieved the highest IoU and F1 scores, with 0.6218 and 0.7606, and Trans U-Net reached the highest test accuracy, with 94.26%. On Massachusetts Large dataset, Residual U-Net accomplished the highest IoU and F1 scores, with 0.6165 and 0.7565, and Attention Residual U-Net attained the highest test accuracy, with 93.81%. The results showed that RIU-Net was significantly successful on Inria dataset. On Massachusetts datasets, Residual U-Net, Attention Residual U-Net, and Trans U-Net provided successful results. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

17 pages, 388 KiB  
Article
Deep Metric Learning Using Negative Sampling Probability Annealing
by Gábor Kertész
Sensors 2022, 22(19), 7579; https://doi.org/10.3390/s22197579 - 6 Oct 2022
Cited by 1 | Viewed by 1340
Abstract
Multiple studies have concluded that the selection of input samples is key for deep metric learning. For triplet networks, the selection of the anchor, positive, and negative pairs is referred to as triplet mining. The selection of the negatives is considered the be [...] Read more.
Multiple studies have concluded that the selection of input samples is key for deep metric learning. For triplet networks, the selection of the anchor, positive, and negative pairs is referred to as triplet mining. The selection of the negatives is considered the be the most complicated task, due to a large number of possibilities. The goal is to select a negative that results in a positive triplet loss; however, there are multiple approaches for this—semi-hard negative mining or hardest mining are well-known in addition to random selection. Since its introduction, semi-hard mining was proven to outperform other negative mining techniques; however, in recent years, the selection of the so-called hardest negative has shown promising results in different experiments. This paper introduces a novel negative sampling solution based on dynamic policy switching, referred to as negative sampling probability annealing, which aims to exploit the positives of all approaches. Results are validated on an experimental synthetic dataset using cluster-analysis methods; finally, the discriminative abilities of trained models are measured on real-life data. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

20 pages, 8436 KiB  
Article
A Domestic Trash Detection Model Based on Improved YOLOX
by Changhong Liu, Ning Xie, Xingxin Yang, Rongdong Chen, Xiangyang Chang, Ray Y. Zhong, Shaohu Peng and Xiaochu Liu
Sensors 2022, 22(18), 6974; https://doi.org/10.3390/s22186974 - 15 Sep 2022
Cited by 15 | Viewed by 3305
Abstract
Domestic trash detection is an essential technology toward achieving a smart city. Due to the complexity and variability of urban trash scenarios, the existing trash detection algorithms suffer from low detection rates and high false positives, as well as the general problem of [...] Read more.
Domestic trash detection is an essential technology toward achieving a smart city. Due to the complexity and variability of urban trash scenarios, the existing trash detection algorithms suffer from low detection rates and high false positives, as well as the general problem of slow speed in industrial applications. This paper proposes an i-YOLOX model for domestic trash detection based on deep learning algorithms. First, a large number of real-life trash images are collected into a new trash image dataset. Second, the lightweight operator involution is incorporated into the feature extraction structure of the algorithm, which allows the feature extraction layer to establish long-distance feature relationships and adaptively extract channel features. In addition, the ability of the model to distinguish similar trash features is strengthened by adding the convolutional block attention module (CBAM) to the enhanced feature extraction network. Finally, the design of the involution residual head structure in the detection head reduces the gradient disappearance and accelerates the convergence of the model loss values allowing the model to perform better classification and regression of the acquired feature layers. In this study, YOLOX-S is chosen as the baseline for each enhancement experiment. The experimental results show that compared with the baseline algorithm, the mean average precision (mAP) of i-YOLOX is improved by 1.47%, the number of parameters is reduced by 23.3%, and the FPS is improved by 40.4%. In practical applications, this improved model achieves accurate recognition of trash in natural scenes, which further validates the generalization performance of i-YOLOX and provides a reference for future domestic trash detection research. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

16 pages, 5078 KiB  
Article
Multi-Scale Safety Helmet Detection Based on RSSE-YOLOv3
by Hongru Song
Sensors 2022, 22(16), 6061; https://doi.org/10.3390/s22166061 - 13 Aug 2022
Cited by 6 | Viewed by 1543
Abstract
Due to the shielding of dense and small targets, real-time detection of whether construction workers are wearing safety helmets suffers from low detection accuracy and missed detection. In this paper, a new detection algorithm based on YOLOv3 is proposed. Firstly, the parallel network [...] Read more.
Due to the shielding of dense and small targets, real-time detection of whether construction workers are wearing safety helmets suffers from low detection accuracy and missed detection. In this paper, a new detection algorithm based on YOLOv3 is proposed. Firstly, the parallel network RepVGG Skip Squeeze Excitation (RSSE) module is used to replace the Res8 module in the original YOLOv3 network. The RSSE module consists of 3 × 3 convolutional fusion channels and SSE branches fused. The introduction of the R-SSE module increases the network width, reduces the network depth, and improves the network detection speed and accuracy. Secondly, to avoid gradient disappearance and improve feature reuse, the residual module Res2 is used to replace the CBL×5 modules. Finally, the resolution of the input image is improved, and the four-scale feature prediction is used instead of the three-scale feature prediction to further improve the efficiency of detecting small targets. This paper also introduces the complete joint crossover (CIOU) to improve the loss function and positioning accuracy. The experimental results show that, compared with the original YOLOv3 algorithm, the improved algorithm improves the precision (P) by 3.9%, the recall (R) by 5.2%, and the average precision (mAP) by 4.7%, which significantly improves the performance of the detection. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

17 pages, 1123 KiB  
Article
Design of Convolutional Neural Network Processor Based on FPGA Resource Multiplexing Architecture
by Fei Yan, Zhuangzhuang Zhang, Yinping Liu and Jia Liu
Sensors 2022, 22(16), 5967; https://doi.org/10.3390/s22165967 - 10 Aug 2022
Cited by 6 | Viewed by 2639
Abstract
As CNNs are widely used in fields such as image classification and target detection, the total number of parameters and computation of the models is gradually increasing. In addition, the requirements on hardware resources and power consumption for deploying CNNs are becoming higher [...] Read more.
As CNNs are widely used in fields such as image classification and target detection, the total number of parameters and computation of the models is gradually increasing. In addition, the requirements on hardware resources and power consumption for deploying CNNs are becoming higher and higher, leading to CNN models being restricted to certain specific platforms for miniaturization and practicality. Therefore, this paper proposes a convolutional-neural-network-processor design with an FPGA-based resource-multiplexing architecture, aiming to reduce the consumption of hardware resources and power consumption of CNNs. First, this paper takes a handwritten-digit-recognition CNN as an example of a CNN design based on a resource-multiplexing architecture, and the prediction accuracy of the CNN can reach 97.3 percent by training and testing with Mnist dataset. Then, the CNN is deployed on FPGA using the hardware description language Verilog, and the design is optimized by resource multiplexing and parallel processing. Finally, the total power consumption of the system is 1.03 W and the power consumption of the CNN module is 0.03 W under the premise of guaranteeing the prediction accuracy, and the prediction of a picture is about 68,139 clock cycles, which is 340.7 us under a 200 MHz clock. The experimental results have obvious advantages in terms of resources and power consumption compared with those reported in related articles in recent years, and the design proposed in this paper. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

19 pages, 26848 KiB  
Article
LEOD-Net: Learning Line-Encoded Bounding Boxes for Real-Time Object Detection
by Hatem Ibrahem, Ahmed Salem and Hyun-Soo Kang
Sensors 2022, 22(10), 3699; https://doi.org/10.3390/s22103699 - 12 May 2022
Cited by 3 | Viewed by 1938
Abstract
This paper proposes a learnable line encoding technique for bounding boxes commonly used in the object detection task. A bounding box is simply encoded using two main points: the top-left corner and the bottom-right corner of the bounding box; then, a lightweight convolutional [...] Read more.
This paper proposes a learnable line encoding technique for bounding boxes commonly used in the object detection task. A bounding box is simply encoded using two main points: the top-left corner and the bottom-right corner of the bounding box; then, a lightweight convolutional neural network (CNN) is employed to learn the lines and propose high-resolution line masks for each category of classes using a pixel-shuffle operation. Post-processing is applied to the predicted line masks to filtrate them and estimate clear lines based on a progressive probabilistic Hough transform. The proposed method was trained and evaluated on two common object detection benchmarks: Pascal VOC2007 and MS-COCO2017. The proposed model attains high mean average precision (mAP) values (78.8% for VOC2007 and 48.1% for COCO2017) while processing each frame in a few milliseconds (37 ms for PASCAL VOC and 47 ms for COCO). The strength of the proposed method lies in its simplicity and ease of implementation unlike the recent state-of-the-art methods in object detection, which include complex processing pipelines. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

23 pages, 24651 KiB  
Article
SCD: A Stacked Carton Dataset for Detection and Segmentation
by Jinrong Yang, Shengkai Wu, Lijun Gou, Hangcheng Yu, Chenxi Lin, Jiazhuo Wang, Pan Wang, Minxuan Li and Xiaoping Li
Sensors 2022, 22(10), 3617; https://doi.org/10.3390/s22103617 - 10 May 2022
Cited by 8 | Viewed by 4778
Abstract
Carton detection is an important technique in the automatic logistics system and can be applied to many applications such as the stacking and unstacking of cartons and the unloading of cartons in the containers. However, there is no public large-scale carton dataset for [...] Read more.
Carton detection is an important technique in the automatic logistics system and can be applied to many applications such as the stacking and unstacking of cartons and the unloading of cartons in the containers. However, there is no public large-scale carton dataset for the research community to train and evaluate the carton detection models up to now, which hinders the development of carton detection. In this article, we present a large-scale carton dataset named Stacked Carton Dataset (SCD) with the goal of advancing the state-of-the-art in carton detection. Images were collected from the Internet and several warehouses, and objects were labeled for precise localization using instance mask annotation. There were a total of 250,000 instance masks from 16,136 images. Naturally, a suite of benchmarks was established with several popular detectors and instance segmentation models. In addition, we designed a carton detector based on RetinaNet by embedding our proposed Offset Prediction between the Classification and Localization module (OPCL) and the Boundary Guided Supervision module (BGS). OPCL alleviates the imbalance problem between classification and localization quality, which boosts AP by 3.14.7% on SCD at the model level, while BGS guides the detector to pay more attention to the boundary information of cartons and decouple repeated carton textures at the task level. To demonstrate the generalization of OPCL for other datasets, we conducted extensive experiments on MS COCO and PASCAL VOC. The improvements in AP on MS COCO and PASCAL VOC were 1.82.2% and 3.44.3%, respectively. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection)
Show Figures

Figure 1

Back to TopTop