sensors-logo

Journal Browser

Journal Browser

Computer Vision and Machine Learning for Intelligent Sensing Systems—2nd Edition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (25 March 2024) | Viewed by 17455

Special Issue Editor

Institute of Systems Science, National University of Singapore, Singapore 119620, Singapore
Interests: computer vision; machine learning; video analytics; multimedia application
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

With the rapid development of computer vision and machine learning technology, intelligent sensing systems have been fuelled to make sense of vision sensory data to address complex and challenging real-world sense-making problems. This has raised tremendous opportunities and challenges in managing and understanding vision sensory data for intelligent sensing systems. With the recent advances in machine learning techniques, we are now able to boost the intelligence of analyzing vision sensory data significantly. This has attracted massive research efforts devoted to addressing challenges in this area, including visual surveillance, smart cities, and healthcare, etc. The Special Issue aims to provide a collection of high-quality research articles that address the broad challenges in both theoretical and application aspects of computer vision and machine learning for intelligent sensing systems.

The topics of interest include, but are not limited to:

  • Computer vision for intelligent sensing systems
    • Sensing, representation, modeling
    • Restoration, enhancement, and super-resolution
    • Color, multispectral, and hyperspectral imaging
    • Stereoscopic, multiview, and 3D processing
  • Machine learning for intelligent sensing systems
    • Classification, detection, segmentation
    • Action and event recognition, behavior understanding
    • Multimodal machine learning
  • Computer vision applications for healthcare, manufacture, security and safety, biomedical sciences, and other emerging applications

Dr. Jing Tian
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • deep learning
  • computer vision
  • image classification
  • image analysis
  • object detection
  • image segmentation
  • action recognition

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

24 pages, 2079 KiB  
Article
GM-DETR: Research on a Defect Detection Method Based on Improved DETR
by Xin Liu, Xudong Yang, Lianhe Shao, Xihan Wang, Quanli Gao and Hongbo Shi
Sensors 2024, 24(11), 3610; https://doi.org/10.3390/s24113610 - 3 Jun 2024
Abstract
Defect detection is an indispensable part of the industrial intelligence process. The introduction of the DETR model marked the successful application of a transformer for defect detection, achieving true end-to-end detection. However, due to the complexity of defective backgrounds, low resolutions can lead [...] Read more.
Defect detection is an indispensable part of the industrial intelligence process. The introduction of the DETR model marked the successful application of a transformer for defect detection, achieving true end-to-end detection. However, due to the complexity of defective backgrounds, low resolutions can lead to a lack of image detail control and slow convergence of the DETR model. To address these issues, we proposed a defect detection method based on an improved DETR model, called the GM-DETR. We optimized the DETR model by integrating GAM global attention with CNN feature extraction and matching features. This optimization process reduces the defect information diffusion and enhances the global feature interaction, improving the neural network’s performance and ability to recognize target defects in complex backgrounds. Next, to filter out unnecessary model parameters, we proposed a layer pruning strategy to optimize the decoding layer, thereby reducing the model’s parameter count. In addition, to address the issue of poor sensitivity of the original loss function to small differences in defect targets, we replaced the L1 loss in the original loss function with MSE loss to accelerate the network’s convergence speed and improve the model’s recognition accuracy. We conducted experiments on a dataset of road pothole defects to further validate the effectiveness of the GM-DETR model. The results demonstrate that the improved model exhibits better performance, with an increase in average precision of 4.9% ([email protected]), while reducing the parameter count by 12.9%. Full article
18 pages, 1567 KiB  
Article
Image Classifier for an Online Footwear Marketplace to Distinguish between Counterfeit and Real Sneakers for Resale
by Joshua Onalaja, Essa Q. Shahra, Shadi Basurra and Waheb A. Jabbar
Sensors 2024, 24(10), 3030; https://doi.org/10.3390/s24103030 - 10 May 2024
Viewed by 500
Abstract
The sneaker industry is continuing to expand at a fast rate and will be worth over USD 120 billion in the next few years. This is, in part due to social media and online retailers building hype around releases of limited-edition sneakers, which [...] Read more.
The sneaker industry is continuing to expand at a fast rate and will be worth over USD 120 billion in the next few years. This is, in part due to social media and online retailers building hype around releases of limited-edition sneakers, which are usually collaborations between well-known global icons and footwear companies. These limited-edition sneakers are typically released in low quantities using an online raffle system, meaning only a few people can get their hands on them. As expected, this causes their value to skyrocket and has created an extremely lucrative resale market for sneakers. This has given rise to numerous counterfeit sneakers flooding the resale market, resulting in online platforms having to hand-verify a sneaker’s authenticity, which is an important but time-consuming procedure that slows the selling and buying process. To speed up the authentication process, Support Vector Machines and a convolutional neural network were used to classify images of fake and real sneakers and then their accuracies were compared to see which performed better. The results showed that the CNNs performed much better at this task than the SVMs with some accuracies over 95%. Therefore, a CNN is well equipped to be a sneaker authenticator and will be of great benefit to the reselling industry. Full article
Show Figures

Figure 1

20 pages, 32970 KiB  
Article
Faces in Event Streams (FES): An Annotated Face Dataset for Event Cameras
by Ulzhan Bissarinova, Tomiris Rakhimzhanova, Daulet Kenzhebalin and Huseyin Atakan Varol
Sensors 2024, 24(5), 1409; https://doi.org/10.3390/s24051409 - 22 Feb 2024
Viewed by 898
Abstract
The use of event-based cameras in computer vision is a growing research direction. However, despite the existing research on face detection using the event camera, a substantial gap persists in the availability of a large dataset featuring annotations for faces and facial landmarks [...] Read more.
The use of event-based cameras in computer vision is a growing research direction. However, despite the existing research on face detection using the event camera, a substantial gap persists in the availability of a large dataset featuring annotations for faces and facial landmarks on event streams, thus hampering the development of applications in this direction. In this work, we address this issue by publishing the first large and varied dataset (Faces in Event Streams) with a duration of 689 min for face and facial landmark detection in direct event-based camera outputs. In addition, this article presents 12 models trained on our dataset to predict bounding box and facial landmark coordinates with an mAP50 score of more than 90%. We also performed a demonstration of real-time detection with an event-based camera using our models. Full article
Show Figures

Figure 1

15 pages, 1792 KiB  
Article
Rethinking Attention Mechanisms in Vision Transformers with Graph Structures
by Hyeongjin Kim and Byoung Chul Ko
Sensors 2024, 24(4), 1111; https://doi.org/10.3390/s24041111 - 8 Feb 2024
Viewed by 946
Abstract
In this paper, we propose a new type of vision transformer (ViT) based on graph head attention (GHA). Because the multi-head attention (MHA) of a pure ViT requires multiple parameters and tends to lose the locality of an image, we replaced MHA with [...] Read more.
In this paper, we propose a new type of vision transformer (ViT) based on graph head attention (GHA). Because the multi-head attention (MHA) of a pure ViT requires multiple parameters and tends to lose the locality of an image, we replaced MHA with GHA by applying a graph to the attention head of the transformer. Consequently, the proposed GHA maintains both the locality and globality of the input patches and guarantees the diversity of the attention. The proposed GHA-ViT commonly outperforms pure ViT-based models using small-sized CIFAR-10/100, MNIST, and MNIST-F datasets and a medium-sized ImageNet-1K dataset in scratch training. A Top-1 accuracy of 81.7% was achieved for ImageNet-1K using GHA-B, which is a base model with approximately 29 M parameters. In addition, with CIFAR-10/100, the existing ViT and parameters are reduced 17-fold and the performance increased by 0.4/4.3%, respectively. The proposed GHA-ViT shows promising results in terms of the number of parameters and operations and the level of accuracy in comparison with other state-of-the-art ViT-lightweight models. Full article
Show Figures

Figure 1

21 pages, 2798 KiB  
Article
An Improved YOLOv5-Based Underwater Object-Detection Framework
by Jian Zhang, Jinshuai Zhang, Kexin Zhou, Yonghui Zhang, Hongda Chen and Xinyue Yan
Sensors 2023, 23(7), 3693; https://doi.org/10.3390/s23073693 - 3 Apr 2023
Cited by 16 | Viewed by 6878
Abstract
To date, general-purpose object-detection methods have achieved a great deal. However, challenges such as degraded image quality, complex backgrounds, and the detection of marine organisms at different scales arise when identifying underwater organisms. To solve such problems and further improve the accuracy of [...] Read more.
To date, general-purpose object-detection methods have achieved a great deal. However, challenges such as degraded image quality, complex backgrounds, and the detection of marine organisms at different scales arise when identifying underwater organisms. To solve such problems and further improve the accuracy of relevant models, this study proposes a marine biological object-detection architecture based on an improved YOLOv5 framework. First, the backbone framework of Real-Time Models for object Detection (RTMDet) is introduced. The core module, Cross-Stage Partial Layer (CSPLayer), includes a large convolution kernel, which allows the detection network to precisely capture contextual information more comprehensively. Furthermore, a common convolution layer is added to the stem layer, to extract more valuable information from the images efficiently. Then, the BoT3 module with the multi-head self-attention (MHSA) mechanism is added into the neck module of YOLOv5, such that the detection network has a better effect in scenes with dense targets and the detection accuracy is further improved. The introduction of the BoT3 module represents a key innovation of this paper. Finally, union dataset augmentation (UDA) is performed on the training set using the Minimal Color Loss and Locally Adaptive Contrast Enhancement (MLLE) image augmentation method, and the result is used as the input to the improved YOLOv5 framework. Experiments on the underwater datasets URPC2019 and URPC2020 show that the proposed framework not only alleviates the interference of underwater image degradation, but also makes the [email protected] reach 79.8% and 79.4% and improves the [email protected] by 3.8% and 1.1%, respectively, when compared with the original YOLOv8 on URPC2019 and URPC2020, demonstrating that the proposed framework presents superior performance for the high-precision detection of marine organisms. Full article
Show Figures

Figure 1

15 pages, 5805 KiB  
Article
Real-Time Forest Fire Detection by Ensemble Lightweight YOLOX-L and Defogging Method
by Jiarun Huang, Zhili He, Yuwei Guan and Hongguo Zhang
Sensors 2023, 23(4), 1894; https://doi.org/10.3390/s23041894 - 8 Feb 2023
Cited by 16 | Viewed by 2645
Abstract
Forest fires can destroy forest and inflict great damage to the ecosystem. Fortunately, forest fire detection with video has achieved remarkable results in enabling timely and accurate fire warnings. However, the traditional forest fire detection method relies heavily on artificially designed features; CNN-based [...] Read more.
Forest fires can destroy forest and inflict great damage to the ecosystem. Fortunately, forest fire detection with video has achieved remarkable results in enabling timely and accurate fire warnings. However, the traditional forest fire detection method relies heavily on artificially designed features; CNN-based methods require a large number of parameters. In addition, forest fire detection is easily disturbed by fog. To solve these issues, a lightweight YOLOX-L and defogging algorithm-based forest fire detection method, GXLD, is proposed. GXLD uses the dark channel prior to defog the image to obtain a fog-free image. After the lightweight improvement of YOLOX-L by GhostNet, depth separable convolution, and SENet, we obtain the YOLOX-L-Light and use it to detect the forest fire in the fog-free image. To evaluate the performance of YOLOX-L-Light and GXLD, mean average precision (mAP) was used to evaluate the detection accuracy, and network parameters were used to evaluate the lightweight effect. Experiments on our forest fire dataset show that the number of the parameters of YOLOX-L-Light decreased by 92.6%, and the mAP increased by 1.96%. The mAP of GXLD is 87.47%, which is 2.46% higher than that of YOLOX-L; and the average fps of GXLD is 26.33 when the input image size is 1280 × 720. Even in a foggy environment, the GXLD can detect a forest fire in real time with a high accuracy, target confidence, and target integrity. This research proposes a lightweight forest fire detection method (GXLD) with fog removal. Therefore, GXLD can detect a forest fire with a high accuracy in real time. The proposed GXLD has the advantages of defogging, a high target confidence, and a high target integrity, which makes it more suitable for the development of a modern forest fire video detection system. Full article
Show Figures

Figure 1

Review

Jump to: Research

33 pages, 18843 KiB  
Review
Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative Study
by Hung-Cuong Nguyen, Thi-Hao Nguyen, Rafał Scherer and Van-Hung Le
Sensors 2023, 23(11), 5121; https://doi.org/10.3390/s23115121 - 27 May 2023
Cited by 8 | Viewed by 4362
Abstract
Human activity recognition (HAR) is an important research problem in computer vision. This problem is widely applied to building applications in human–machine interactions, monitoring, etc. Especially, HAR based on the human skeleton creates intuitive applications. Therefore, determining the current results of these studies [...] Read more.
Human activity recognition (HAR) is an important research problem in computer vision. This problem is widely applied to building applications in human–machine interactions, monitoring, etc. Especially, HAR based on the human skeleton creates intuitive applications. Therefore, determining the current results of these studies is very important in selecting solutions and developing commercial products. In this paper, we perform a full survey on using deep learning to recognize human activity based on three-dimensional (3D) human skeleton data as input. Our research is based on four types of deep learning networks for activity recognition based on extracted feature vectors: Recurrent Neural Network (RNN) using extracted activity sequence features; Convolutional Neural Network (CNN) uses feature vectors extracted based on the projection of the skeleton into the image space; Graph Convolution Network (GCN) uses features extracted from the skeleton graph and the temporal–spatial function of the skeleton; Hybrid Deep Neural Network (Hybrid–DNN) uses many other types of features in combination. Our survey research is fully implemented from models, databases, metrics, and results from 2019 to March 2023, and they are presented in ascending order of time. In particular, we also carried out a comparative study on HAR based on a 3D human skeleton on the KLHA3D 102 and KLYOGA3D datasets. At the same time, we performed analysis and discussed the obtained results when applying CNN-based, GCN-based, and Hybrid–DNN-based deep learning networks. Full article
Show Figures

Figure 1

Back to TopTop