Object Detection, Segmentation and Categorization in Artificial Intelligence

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (15 April 2024) | Viewed by 10647

Special Issue Editors

School of Electronic Engineering, Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xidian University, Xi’an 710071, China
Interests: artificial intelligence (in particular, machine learning, multiagent systems and their applications) and formal methods (in particular, machine learning-based model checking)
Special Issues, Collections and Topics in MDPI journals
Academy of Advanced Interdisciplinary Research, Xidian University, Xi’an 710071, China
Interests: image processing; pattern recognition; machine learning; change detection; few-shot knowledge graph

E-Mail Website
Guest Editor
Shaanxi Key Laboratory of Underwater Information Technology, School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an 710072, China
Interests: underwater object detection; machine learning; neural networks; few-shot knowledge graph
School of Electronic Engineering, Xidian University, Xi’an 710071, China
Interests: visualization system simulation and modeling; intelligent algorithm research on body posture and expression; image processing

Special Issue Information

Dear Colleagues,

Object detection, segmentation and categorization are the core tasks of artificial intelligence in applications such as image understanding, remote sensing image intelligent interpretation, medical image analysis, augmented reality, object recognition and tracking, object retrieval, video surveillance and autonomous vehicles. Due to their wide practical applicability, these tasks have attracted considerable attention from researchers around the world. Object detection involves exacting both the location and class of specific objects or all instances in an image. Segmentation supports the determination of the boundaries of same-class objects in an entire scene. Categorization aims to assign class labels to specific pixels or images. As important steps of image processing and further analysis, improvement in object detection, segmentation and categorization techniques is urgently needed to achieve higher performance in these tasks. Although deep learning has achieved unprecedented success in the field, there are still open application issues that must be comprehensively addressed.

This Special Issue aims to gather papers presenting recent advances in object detection, segmentation and categorization with novel and impactful applications. Topics of interest include, but are not limited to:

  • Machine learning for object detection, segmentation and categorization;
  • Multiobjective or multitask optimization for object detection, segmentation and categorization;
  • Object detection, segmentation and categorization based on evolutionary computation;
  • Remote sensing/ teaching image object detection, segmentation and categorization;
  • Medical image segmentation and categorization;
  • Underwater target detection, identification and tracking;
  • Ocean acoustic remote sensing;
  • Sensor signal detection, identification and categorization.

Dr. Hao Li
Dr. Fei Xie
Dr. Jianbo Zhou
Dr. Jieyi Liu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing
  • object detection
  • image segmentation
  • image classification
  • deep learning
  • neural networks
  • computational intelligence

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 3073 KiB  
Article
A Causality-Aware Perspective on Domain Generalization via Domain Intervention
by Youjia Shao, Shaohui Wang and Wencang Zhao
Electronics 2024, 13(10), 1891; https://doi.org/10.3390/electronics13101891 (registering DOI) - 11 May 2024
Viewed by 129
Abstract
Most mainstream statistical models will achieve poor performance in Out-Of-Distribution (OOD) generalization. This is because these models tend to learn the spurious correlation between data and will collapse when the domain shift exists. If we want artificial intelligence (AI) to make great strides [...] Read more.
Most mainstream statistical models will achieve poor performance in Out-Of-Distribution (OOD) generalization. This is because these models tend to learn the spurious correlation between data and will collapse when the domain shift exists. If we want artificial intelligence (AI) to make great strides in real life, the current focus needs to be shifted to the OOD problem of deep learning models to explore the generalization ability under unknown environments. Domain generalization (DG) focusing on OOD generalization is proposed, which is able to transfer the knowledge extracted from multiple source domains to the unseen target domain. We are inspired by intuitive thinking about human intelligence relying on causality. Unlike relying on plain probability correlations, we apply a novel causal perspective to DG, which can improve the OOD generalization ability of the trained model by mining the invariant causal mechanism. Firstly, we construct the inclusive causal graph for most DG tasks through stepwise causal analysis based on the data generation process in the natural environment and introduce the reasonable Structural Causal Model (SCM). Secondly, based on counterfactual inference, causal semantic representation learning with domain intervention (CSRDN) is proposed to train a robust model. In this regard, we generate counterfactual representations for different domain interventions, which can help the model learn causal semantics and develop generalization capacity. At the same time, we seek the Pareto optimal solution in the optimization process based on the loss function to obtain a more advanced training model. Extensive experimental results of Rotated MNIST and PACS as well as VLCS datasets verify the effectiveness of the proposed CSRDN. The proposed method can integrate causal inference into domain generalization by enhancing interpretability and applicability and brings a boost to challenging OOD generalization problems. Full article
Show Figures

Figure 1

17 pages, 998 KiB  
Article
Evolutionary Competition Multitasking Optimization with Online Resource Allocation for Endmemeber Extraction of Hyperspectral Images
by Yiming Shang, Qian Wang, Wenbo Zhu, Fei Xie, Hexu Wang and Lei Li
Electronics 2024, 13(8), 1424; https://doi.org/10.3390/electronics13081424 - 10 Apr 2024
Viewed by 380
Abstract
Hyperspectral remote sensing images typically have mixed rather than pure pixels. Endmember extraction aims to find a group of endmembers to represent the original image. In fact, the amount of endmembers is not easily determined in the existing endmember extraction studies.It requires several [...] Read more.
Hyperspectral remote sensing images typically have mixed rather than pure pixels. Endmember extraction aims to find a group of endmembers to represent the original image. In fact, the amount of endmembers is not easily determined in the existing endmember extraction studies.It requires several separate and laborious runs in order to produce results for endmember extraction with varying numbers of endmembers. There is also a correlation between the individual runs, which should be taken into account to accelerate algorithm convergence and improve accuracy. In this paper, an evolutionary competition multitasking optimization method (CMTEE) is proposed to achieve endmember extraction. In the proposed method, endmember extraction problems with different numbers of endmembers are considered as a group of optimization tasks. Specially, these tasks are assumed to be competitive. Then, online resource allocation is employed to assign suitable computational resources to the considered tasks. Experiments on simulated and real hyperspectral datasets demonstrated the effectiveness of the proposed evolutionary competition multitasking optimization method for endmember extraction. Full article
Show Figures

Figure 1

23 pages, 4882 KiB  
Article
USES-Net: An Infrared Dim and Small Target Detection Network with Embedded Knowledge Priors
by Lingxiao Li, Linlin Liu, Yunan He and Zhuqiang Zhong
Electronics 2024, 13(7), 1400; https://doi.org/10.3390/electronics13071400 - 8 Apr 2024
Viewed by 477
Abstract
Detecting and identifying small infrared targets has always been a crucial technology for many applications. To address the low accuracy, high false-alarm rate, and poor environmental adaptability that commonly exist in infrared target detection methods, this paper proposes a composite infrared dim and [...] Read more.
Detecting and identifying small infrared targets has always been a crucial technology for many applications. To address the low accuracy, high false-alarm rate, and poor environmental adaptability that commonly exist in infrared target detection methods, this paper proposes a composite infrared dim and small target detection model called USES-Net, which combines the target prior knowledge and conventional data-driven deep learning networks to make use of both labeled data and the domain knowledge. Based on the typical encoder–decoder structure, USES-Net firstly introduces the self-attention mechanism of Swin Transformer to replace the universal convolution kernel at the encoder end. This helps to extract potential features related to dim, small targets in a larger receptive field. In addition, USES-Net includes an embedded patch-based contrast learning module (EPCLM) to integrate the spatial distribution of the target as a knowledge prior in the training network model. This guides the training process of the constrained network model with clear physical interpretability. Finally, USES-Net also designs a bottom-up cross-layer feature fusion module (AFM) as the decoder of the network, and a data-slicing-aided enhancement and inference method based on Slicing Aided Hyper Inference (SAHI) is utilized to further improve the model’s detection accuracy. An experimental comparative analysis shows that USES-Net achieves the best results on three typical infrared weak-target datasets: NUAA-SIRST, NUDT-SIRST, and IRSTD-1K. The results of the target segmentation are complete and sufficient, which demonstrates the validity and practicality of the proposed method in comparison to others. Full article
Show Figures

Figure 1

12 pages, 7760 KiB  
Article
Rotated Object Detection with Circular Gaussian Distribution
by Hang Xu, Xinyuan Liu, Yike Ma, Zunjie Zhu, Shuai Wang, Chenggang Yan and Feng Dai
Electronics 2023, 12(15), 3265; https://doi.org/10.3390/electronics12153265 - 29 Jul 2023
Viewed by 1134
Abstract
Rotated object detection is a challenging task due to the difficulties of locating the rotated objects and separating them effectively from the background. For rotated object prediction, researchers have explored numerous regression-based and classification-based approaches to predict a rotation angle. However, both paradigms [...] Read more.
Rotated object detection is a challenging task due to the difficulties of locating the rotated objects and separating them effectively from the background. For rotated object prediction, researchers have explored numerous regression-based and classification-based approaches to predict a rotation angle. However, both paradigms are constrained by some flaws that make it difficult to accurately predict angles, such as multi-solution and boundary issues, which limits the performance upper bound of detectors. To address these issues, we propose a circular Gaussian distribution (CGD)-based method for angular prediction. We convert the labeled angle into a discrete circular Gaussian distribution spanning a single minimal positive period, and let the model predict the distribution parameters instead of directly regressing or classifying the angle. To improve the overall efficiency of the detection model, we also design a rotated object detector based on CenterNet. Experimental results on various public datasets demonstrated the effectiveness and superior performances of our method. In particular, our approach achieves better results than state-of-the-art competitors, with improvements of 1.92% and 1.04% in terms of AP points on the HRSC2016 and DOTA datasets, respectively. Full article
Show Figures

Figure 1

16 pages, 5810 KiB  
Article
An Underwater Dense Small Object Detection Model Based on YOLOv5-CFDSDSE
by Jingyang Wang, Yujia Li, Junkai Wang and Ying Li
Electronics 2023, 12(15), 3231; https://doi.org/10.3390/electronics12153231 - 26 Jul 2023
Viewed by 1488
Abstract
Underwater target detection is a key technology in the process of exploring and developing the ocean. Because underwater targets are often very dense, mutually occluded, and affected by light, the detection objects are often unclear, and so, underwater target detection technology faces unique [...] Read more.
Underwater target detection is a key technology in the process of exploring and developing the ocean. Because underwater targets are often very dense, mutually occluded, and affected by light, the detection objects are often unclear, and so, underwater target detection technology faces unique challenges. In order to improve the performance of underwater target detection, this paper proposed a new target detection model YOLOv5-FCDSDSE based on YOLOv5s. In this model, the CFnet (efficient fusion of C3 and FasterNet structure) structure was used to optimize the network structure of the YOLOv5, which improved the model’s accuracy while reducing the number of parameters. Then, Dyhead technology was adopted to achieve better scale perception, space perception, and task perception. In addition, the small object detection (SD) layer was added to combine feature information from different scales effectively, retain more detailed information, and improve the detection ability of small objects. Finally, the attention mechanism squeeze and excitation (SE) was introduced to enhance the feature extraction ability of the model. This paper used the self-made underwater small object dataset URPC_UODD for comparison and ablation experiments. The experimental results showed that the accuracy of the model proposed in this paper was better than the original YOLOv5s and other baseline models in the underwater dense small object detection task, and the number of parameters was also reduced compared to YOLOv5s. Therefore, YOLOv5-FCDSDSE was an innovative solution for underwater target detection tasks. Full article
Show Figures

Figure 1

22 pages, 9681 KiB  
Article
A Specialized Database for Autonomous Vehicles Based on the KITTI Vision Benchmark
by Juan I. Ortega-Gomez, Luis A. Morales-Hernandez and Irving A. Cruz-Albarran
Electronics 2023, 12(14), 3165; https://doi.org/10.3390/electronics12143165 - 21 Jul 2023
Cited by 3 | Viewed by 1286
Abstract
Autonomous driving systems have emerged with the promise of preventing accidents. The first critical aspect of these systems is perception, where the regular practice is the use of top-view point clouds as the input; however, the existing databases in this area only present [...] Read more.
Autonomous driving systems have emerged with the promise of preventing accidents. The first critical aspect of these systems is perception, where the regular practice is the use of top-view point clouds as the input; however, the existing databases in this area only present scenes with 3D point clouds and their respective labels. This generates an opportunity, and the objective of this work is to present a database with scenes directly in the top-view and their labels in the respective plane, as well as adding a segmentation map for each scene as a label for segmentation work. The method used during the creation of the proposed database is presented; this covers how to transform 3D to 2D top-view image point clouds, how the detection labels in the plane are generated, and how to implement a neural network for the generated segmentation maps of each scene. Using this method, a database was developed with 7481 scenes, each with its corresponding top-view image, label file, and segmentation map, where the road segmentation metrics are as follows: F1, 95.77; AP, 92.54; ACC, 97.53; PRE, 94.34; and REC, 97.25. This article presents the development of a database for segmentation and detection assignments, highlighting its particular use for environmental perception works. Full article
Show Figures

Figure 1

18 pages, 7502 KiB  
Article
Aircraft Detection and Fine-Grained Recognition Based on High-Resolution Remote Sensing Images
by Qinghe Guan, Ying Liu, Lei Chen, Shuang Zhao and Guandian Li
Electronics 2023, 12(14), 3146; https://doi.org/10.3390/electronics12143146 - 20 Jul 2023
Viewed by 945
Abstract
In order to realize the detection and recognition of specific types of an aircraft in remote sensing images, this paper proposes an algorithm called Fine-grained S2ANet (FS2ANet) based on the improved Single-shot Alignment Network (S2ANet) for remote [...] Read more.
In order to realize the detection and recognition of specific types of an aircraft in remote sensing images, this paper proposes an algorithm called Fine-grained S2ANet (FS2ANet) based on the improved Single-shot Alignment Network (S2ANet) for remote sensing aircraft object detection and fine-grained recognition. Firstly, to address the imbalanced number of instances of various aircrafts in the dataset, we perform data augmentation on some remote sensing images using flip and color space transformation methods. Secondly, this paper selects ResNet101 as the backbone, combines space-to-depth (SPD) to improve the FPN structure, constructs the FPN-SPD module, and builds the aircraft fine feature focusing module (AF3M) in the detection head of the network, which reduces the loss of fine-grained information in the process of feature extraction, enhances the extraction capability of the network for fine aircraft features, and improves the detection accuracy of remote sensing micro aircraft objects. Finally, we use the SkewIoU based on Kalman filtering (KFIoU) as the algorithm’s regression loss function, improving the algorithm’s convergence speed and the object boxes’ regression accuracy. The experimental results of the detection and fine-grained recognition of 11 types of remote sensing aircraft objects such as Boeing 737, A321, and C919 using the FS2ANet algorithm show that the mAP0.5 of FS2ANet is 46.82%, which is 3.87% higher than S2ANet, and it can apply to the field of remote sensing aircraft object detection and fine-grained recognition. Full article
Show Figures

Figure 1

17 pages, 8109 KiB  
Article
Eye-Blink Event Detection Using a Neural-Network-Trained Frame Segment for Woman Drivers in Saudi Arabia
by Muna S. Al-Razgan, Issema Alruwaly and Yasser A. Ali
Electronics 2023, 12(12), 2699; https://doi.org/10.3390/electronics12122699 - 16 Jun 2023
Viewed by 1157
Abstract
Women have been allowed to drive in Saudi Arabia since 2018, revoking a 30-year ban that also adhered to the traffic rules provided in the country. Conventional drivers are often monitored for safe driving by monitoring their facial reactions, eye blinks, and expressions. [...] Read more.
Women have been allowed to drive in Saudi Arabia since 2018, revoking a 30-year ban that also adhered to the traffic rules provided in the country. Conventional drivers are often monitored for safe driving by monitoring their facial reactions, eye blinks, and expressions. As driving experience and vehicle handling features have been less exposed to novice women drivers in Saudi Arabia, technical assistance and physical observations are mandatory. Such observations are sensed as images/video frames for computer-based analyses. Precise computer vision processes are employed for detecting and classifying events using image processing. The identified events are unique to novice women drivers in Saudi Arabia, assisting with their vehicle usage. This article introduces the Event Detection using Segmented Frame (ED-SF) method to improve the abnormal Eye-Blink Detection (EBD) of women drivers. The eye region is segmented using variation pixel extraction in this process. The pixel extraction process requires textural variation identified from different frames. The condition is that the frames are to be continuous in the event detection. This method employs a convolution neural network with two hidden layer processes. In the first layer, continuous and discrete frame differentiations are identified. The second layer is responsible for segmenting the eye region, devouring the textural variation. The variations and discrete frames are used for training the neural network to prevent segment errors in the extraction process. Therefore, the frame segment changes are used for Identifying the expressions through different inputs across different texture luminosities. This method applies to less-experienced and road-safety-knowledge-lacking woman drivers who have initiated their driving journey in Saudi-Arabia-like countries. Thus the proposed method improves the EBD accuracy by 9.5% compared to Hybrid Convolutional Neural Networks (HCNN), Long Short-Term Neural Networks (HCNN + LSTM), Two-Stream Spatial-Temporal Graph Convolutional Networks (2S-STGCN), and the Customized Driving Fatigue Detection Method CDFDM. Full article
Show Figures

Figure 1

17 pages, 4345 KiB  
Article
Underwater Noise Modeling and Its Application in Noise Classification with Small-Sized Samples
by Guoli Song, Xinyi Guo, Qianchu Zhang, Jun Li and Li Ma
Electronics 2023, 12(12), 2669; https://doi.org/10.3390/electronics12122669 - 14 Jun 2023
Viewed by 1236
Abstract
Underwater noise classification is of great significance for identifying ships as well as other vehicles. Moreover, it is helpful in ensuring a marine habitat-friendly, noise-free ocean environment. But a challenge we are facing is the small-sized underwater noise samples. Because noise is influenced [...] Read more.
Underwater noise classification is of great significance for identifying ships as well as other vehicles. Moreover, it is helpful in ensuring a marine habitat-friendly, noise-free ocean environment. But a challenge we are facing is the small-sized underwater noise samples. Because noise is influenced by multiple sources, it is often difficult to determine and label which source or which two sources are dominant. At present, research to solve the problem is focused on noise image processing or advanced computer technology without starting with the noise generation mechanism and modeling. Here, a typical underwater noise generation model (UNGM) is established to augment noise samples. It is established by generating noise with certain kurtosis according to the spectral and statistical characteristics of the actual noise and filter design. In addition, an underwater noise classification model is developed based on UNGM and convolutional neural networks (CNN). Then the UNGM-CNN-based model is used to classify nine types of typical underwater noise, with either the 1/3 octave noise spectrum level (NSL) or power spectral density (PSD) as the input features. The results show that it is effective in improving classification accuracy. Specifically, it increases the classification accuracy by 1.59%, from 98.27% to 99.86%, and by 2.44%, from 97.45% to 99.89%, when the NSL and PSD are used as the input features, respectively. Additionally, the UNGM-CNN-based method appreciably improves macro-precision and macro-recall by approximately 0.87% and 0.83%, respectively, compared to the CNN-based method. These results demonstrate the effectiveness of the UNGM established in noise classification with small-sized samples. Full article
Show Figures

Figure 1

18 pages, 3959 KiB  
Article
Multi-Mode Channel Position Attention Fusion Side-Scan Sonar Transfer Recognition
by Jian Wang, Haisen Li, Guanying Huo, Chao Li and Yuhang Wei
Electronics 2023, 12(4), 791; https://doi.org/10.3390/electronics12040791 - 4 Feb 2023
Cited by 1 | Viewed by 1264
Abstract
Side-scan sonar (SSS) target recognition is an important part of building an underwater detection system and ensuring a high-precision perception of underwater information. In this paper, a novel multi-channel multi-location attention mechanism is proposed for a multi-modal phased transfer side-scan sonar target recognition [...] Read more.
Side-scan sonar (SSS) target recognition is an important part of building an underwater detection system and ensuring a high-precision perception of underwater information. In this paper, a novel multi-channel multi-location attention mechanism is proposed for a multi-modal phased transfer side-scan sonar target recognition model. Optical images from the ImageNet database, synthetic aperture radar (SAR) images and SSS images are used as the training datasets. The backbone network for feature extraction is transferred and learned by a staged transfer learning method. The head network used to predict the type of target extracts the attention features of SSS through a multi-channel and multi-position attention mechanism, and subsequently performs target recognition. The proposed model is tested on the SSS test dataset and evaluated using several metrics, and compared with different recognition algorithms as well. The results show that the model has better recognition accuracy and robustness for SSS targets. Full article
Show Figures

Figure 1

Back to TopTop