sensors-logo

Journal Browser

Journal Browser

Image Processing and Pattern Recognition Based on Deep Learning—2nd Edition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: 25 November 2024 | Viewed by 10161

Special Issue Editors


E-Mail Website
Guest Editor
Department of Automation and Industrial Informatics, Faculty of Automatic Control and Computer Science, University POLITEHNICA of Bucharest, 060042 Bucharest, Romania
Interests: image acquisition; image processing; feature extraction; image classification; image segmentation; artificial neural networks; deep learning; wireless sensor networks; unmanned aerial vehicles; data fusion; data processing in medicine; data processing in agriculture
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Automation and Industrial Informatics, Faculty of Automatic Control and Computer Science, University POLITEHNICA of Bucharest, 060042 Bucharest, Romania
Interests: convolutional neural networks; artificial intelligence; medical image processing; biomedical optical imaging; computer vision; computerised monitoring; data acquisition; image colour analysis; texture analysis; cloud computing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The pattern recognition applied in the analysis and interpretation of regions of interest in images is today related to the use of artificial intelligence and in particular neural networks based on deep learning. The current trends in the use of neural networks include the modification of networks from established families to increase statistical or time performance, transfer learning, the use of multiple networks in more complex systems, merging decisions of individual networks, and combining efficient features with neural networks for higher-performance detection or classification. In addition, combination with other classifiers based on artificial intelligence can is a possible avenue.

The aim of this Special Issue is to publish original research contributions concerning new neural-network-based approaches of image processing and pattern recognition with direct applications in different domains, such as: remote sensing, crop monitoring, boarding monitoring, system support in medical diagnosis, emotion detection, and so on.

The scope of the Special Issue includes (but is not limited to) the following scope research areas concerning the image processing and pattern recognition by the aid of new artificial intelligence techniques:

  • Image processing;
  • Pattern recognition;
  • Image segmentation;
  • Object classification;
  • Neural networks;
  • Deep learning;
  • Decision fusion;
  • Systems based on multiple neural networks;
  • The detection of regions of interest from remote images;
  • Industry applications;
  • Precision agriculture application;
  • Medical application;
  • The monitoring of protected areas;
  • Disaster monitoring and assessment.

Prof. Dr. Dan Popescu
Prof. Dr. Loretta Ichim
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Related Special Issue

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 5950 KiB  
Article
Ancient Chinese Character Recognition with Improved Swin-Transformer and Flexible Data Enhancement Strategies
by Yi Zheng, Yi Chen, Xianbo Wang, Donglian Qi and Yunfeng Yan
Sensors 2024, 24(7), 2182; https://doi.org/10.3390/s24072182 - 28 Mar 2024
Viewed by 607
Abstract
The decipherment of ancient Chinese scripts, such as oracle bone and bronze inscriptions, holds immense significance for understanding ancient Chinese history, culture, and civilization. Despite substantial progress in recognizing oracle bone script, research on the overall recognition of ancient Chinese characters remains somewhat [...] Read more.
The decipherment of ancient Chinese scripts, such as oracle bone and bronze inscriptions, holds immense significance for understanding ancient Chinese history, culture, and civilization. Despite substantial progress in recognizing oracle bone script, research on the overall recognition of ancient Chinese characters remains somewhat lacking. To tackle this issue, we pioneered the construction of a large-scale image dataset comprising 9233 distinct ancient Chinese characters sourced from images obtained through archaeological excavations. We propose the first model for recognizing the common ancient Chinese characters. This model consists of four stages with Linear Embedding and Swin-Transformer blocks, each supplemented by a CoT Block to enhance local feature extraction. We also advocate for an enhancement strategy, which involves two steps: firstly, conducting adaptive data enhancement on the original data, and secondly, randomly resampling the data. The experimental results, with a top-one accuracy of 87.25% and a top-five accuracy of 95.81%, demonstrate that our proposed method achieves remarkable performance. Furthermore, through the visualizing of model attention, it can be observed that the proposed model, trained on a large number of images, is able to capture the morphological characteristics of ancient Chinese characters to a certain extent. Full article
Show Figures

Figure 1

17 pages, 6501 KiB  
Article
Unsupervised Conditional Diffusion Models in Video Anomaly Detection for Monitoring Dust Pollution
by Limin Cai, Mofei Li and Dianpeng Wang
Sensors 2024, 24(5), 1464; https://doi.org/10.3390/s24051464 - 23 Feb 2024
Viewed by 646
Abstract
Video surveillance is widely used in monitoring environmental pollution, particularly harmful dust. Currently, manual video monitoring remains the predominant method for analyzing potential pollution, which is inefficient and prone to errors. In this paper, we introduce a new unsupervised method based on latent [...] Read more.
Video surveillance is widely used in monitoring environmental pollution, particularly harmful dust. Currently, manual video monitoring remains the predominant method for analyzing potential pollution, which is inefficient and prone to errors. In this paper, we introduce a new unsupervised method based on latent diffusion models. Specifically, we propose a spatio-temporal network structure, which better integrates the spatial and temporal features of videos. Our conditional guidance mechanism samples frames of input videos to guide high-quality generation and obtains frame-level anomaly scores, comparing generated videos with original ones. We also propose an efficient compression strategy to reduce computational costs, allowing the model to perform in a latent space. The superiority of our method was demonstrated by numerical experiments in three public benchmarks and practical application analysis in coal mining over previous SOTA methods with better AUC, of at most over 3%. Our method accurately detects abnormal patterns in multiple challenging environmental monitoring scenarios, illustrating the potential application possibilities in the environmental protection domain and beyond. Full article
Show Figures

Figure 1

35 pages, 29002 KiB  
Article
Characterization of Partial Discharges in Dielectric Oils Using High-Resolution CMOS Image Sensor and Convolutional Neural Networks
by José Miguel Monzón-Verona, Pablo González-Domínguez and Santiago García-Alonso
Sensors 2024, 24(4), 1317; https://doi.org/10.3390/s24041317 - 18 Feb 2024
Viewed by 707
Abstract
In this work, an exhaustive analysis of the partial discharges that originate in the bubbles present in dielectric mineral oils is carried out. To achieve this, a low-cost, high-resolution CMOS image sensor is used. Partial discharge measurements using that image sensor are validated [...] Read more.
In this work, an exhaustive analysis of the partial discharges that originate in the bubbles present in dielectric mineral oils is carried out. To achieve this, a low-cost, high-resolution CMOS image sensor is used. Partial discharge measurements using that image sensor are validated by a standard electrical detection system that uses a discharge capacitor. In order to accurately identify the images corresponding to partial discharges, a convolutional neural network is trained using a large set of images captured by the image sensor. An image classification model is also developed using deep learning with a convolutional network based on a TensorFlow and Keras model. The classification results of the experiments show that the accuracy achieved by our model is around 95% on the validation set and 82% on the test set. As a result of this work, a non-destructive diagnosis method has been developed that is based on the use of an image sensor and the design of a convolutional neural network. This approach allows us to obtain information about the state of mineral oils before breakdown occurs, providing a valuable tool for the evaluation and maintenance of these dielectric oils. Full article
Show Figures

Figure 1

21 pages, 18333 KiB  
Article
Parsing Netlists of Integrated Circuits from Images via Graph Attention Network
by Wenxing Hu, Xianke Zhan and Minglei Tong
Sensors 2024, 24(1), 227; https://doi.org/10.3390/s24010227 - 30 Dec 2023
Viewed by 1157
Abstract
A massive number of paper documents that include important information such as circuit schematics can be converted into digital documents by optical sensors like scanners or digital cameras. However, extracting the netlists of analog circuits from digital documents is an exceptionally challenging task. [...] Read more.
A massive number of paper documents that include important information such as circuit schematics can be converted into digital documents by optical sensors like scanners or digital cameras. However, extracting the netlists of analog circuits from digital documents is an exceptionally challenging task. This process aids enterprises in digitizing paper-based circuit diagrams, enabling the reuse of analog circuit designs and the automatic generation of datasets required for intelligent design models in this domain. This paper introduces a bottom-up graph encoding model aimed at automatically parsing the circuit topology of analog integrated circuits from images. The model comprises an improved electronic component detection network based on the Swin Transformer, an algorithm for component port localization, and a graph encoding model. The objective of the detection network is to accurately identify component positions and types, followed by automatic dataset generation through port localization, and finally, utilizing the graph encoding model to predict potential connections between circuit components. To validate the model’s performance, we annotated an electronic component detection dataset and a circuit diagram dataset, comprising 1200 and 3552 training samples, respectively. Detailed experimentation results demonstrate the superiority of our proposed enhanced algorithm over comparative algorithms across custom and public datasets. Furthermore, our proposed port localization algorithm significantly accelerates the annotation speed of circuit diagram datasets. Full article
Show Figures

Figure 1

21 pages, 8353 KiB  
Article
Velocity and Color Estimation Using Event-Based Clustering
by Xavier Lesage, Rosalie Tran, Stéphane Mancini and Laurent Fesquet
Sensors 2023, 23(24), 9768; https://doi.org/10.3390/s23249768 - 11 Dec 2023
Viewed by 817
Abstract
Event-based clustering provides a low-power embedded solution for low-level feature extraction in a scene. The algorithm utilizes the non-uniform sampling capability of event-based image sensors to measure local intensity variations within a scene. Consequently, the clustering algorithm forms similar event groups while simultaneously [...] Read more.
Event-based clustering provides a low-power embedded solution for low-level feature extraction in a scene. The algorithm utilizes the non-uniform sampling capability of event-based image sensors to measure local intensity variations within a scene. Consequently, the clustering algorithm forms similar event groups while simultaneously estimating their attributes. This work proposes taking advantage of additional event information in order to provide new attributes for further processing. We elaborate on the estimation of the object velocity using the mean motion of the cluster. Next, we are examining a novel form of events, which includes intensity measurement of the color at the concerned pixel. These events may be processed to estimate the rough color of a cluster, or the color distribution in a cluster. Lastly, this paper presents some applications that utilize these features. The resulting algorithms are applied and exercised thanks to a custom event-based simulator, which generates videos of outdoor scenes. The velocity estimation methods provide satisfactory results with a trade-off between accuracy and convergence speed. Regarding color estimation, the luminance estimation is challenging in the test cases, while the chrominance is precisely estimated. The estimated quantities are adequate for accurately classifying objects into predefined categories. Full article
Show Figures

Figure 1

15 pages, 7876 KiB  
Article
Detection of AI-Created Images Using Pixel-Wise Feature Extraction and Convolutional Neural Networks
by Fernando Martin-Rodriguez, Rocio Garcia-Mojon and Monica Fernandez-Barciela
Sensors 2023, 23(22), 9037; https://doi.org/10.3390/s23229037 - 8 Nov 2023
Viewed by 2896
Abstract
Generative AI has gained enormous interest nowadays due to new applications like ChatGPT, DALL E, Stable Diffusion, and Deepfake. In particular, DALL E, Stable Diffusion, and others (Adobe Firefly, ImagineArt, etc.) can create images from a text prompt and are even able to [...] Read more.
Generative AI has gained enormous interest nowadays due to new applications like ChatGPT, DALL E, Stable Diffusion, and Deepfake. In particular, DALL E, Stable Diffusion, and others (Adobe Firefly, ImagineArt, etc.) can create images from a text prompt and are even able to create photorealistic images. Due to this fact, intense research has been performed to create new image forensics applications able to distinguish between real captured images and videos and artificial ones. Detecting forgeries made with Deepfake is one of the most researched issues. This paper is about another kind of forgery detection. The purpose of this research is to detect photorealistic AI-created images versus real photos coming from a physical camera. Id est, making a binary decision over an image, asking whether it is artificially or naturally created. Artificial images do not need to try to represent any real object, person, or place. For this purpose, techniques that perform a pixel-level feature extraction are used. The first one is Photo Response Non-Uniformity (PRNU). PRNU is a special noise due to imperfections on the camera sensor that is used for source camera identification. The underlying idea is that AI images will have a different PRNU pattern. The second one is error level analysis (ELA). This is another type of feature extraction traditionally used for detecting image editing. ELA is being used nowadays by photographers for the manual detection of AI-created images. Both kinds of features are used to train convolutional neural networks to differentiate between AI images and real photographs. Good results are obtained, achieving accuracy rates of over 95%. Both extraction methods are carefully assessed by computing precision/recall and F1-score measurements. Full article
Show Figures

Figure 1

16 pages, 5659 KiB  
Article
Research on Fine-Grained Image Recognition of Birds Based on Improved YOLOv5
by Xiaomei Yi, Cheng Qian, Peng Wu, Brian Tapiwanashe Maponde, Tengteng Jiang and Wenying Ge
Sensors 2023, 23(19), 8204; https://doi.org/10.3390/s23198204 - 30 Sep 2023
Cited by 3 | Viewed by 1350
Abstract
Birds play a vital role in maintaining biodiversity. Accurate identification of bird species is essential for conducting biodiversity surveys. However, fine-grained image recognition of birds encounters challenges due to large within-class differences and small inter-class differences. To solve this problem, our study took [...] Read more.
Birds play a vital role in maintaining biodiversity. Accurate identification of bird species is essential for conducting biodiversity surveys. However, fine-grained image recognition of birds encounters challenges due to large within-class differences and small inter-class differences. To solve this problem, our study took a part-based approach, dividing the identification task into two parts: part detection and identification classification. We proposed an improved bird part detection algorithm based on YOLOv5, which can handle partial overlap and complex environmental conditions between part objects. The backbone network incorporates the Res2Net-CBAM module to enhance the receptive fields of each network layer, strengthen the channel characteristics, and improve the sensitivity of the model to important information. Additionally, in order to boost data on features extraction and channel self-regulation, we have integrated CBAM attention mechanisms into the neck. The success rate of our suggested model, according to experimental findings, is 86.6%, 1.2% greater than the accuracy of the original model. Furthermore, when compared with other algorithms, our model’s accuracy shows noticeable improvement. These results show how useful the method we suggested is for quickly and precisely recognizing different bird species. Full article
Show Figures

Figure 1

13 pages, 300 KiB  
Article
Display-Semantic Transformer for Scene Text Recognition
by Xinqi Yang, Wushour Silamu, Miaomiao Xu and Yanbing Li
Sensors 2023, 23(19), 8159; https://doi.org/10.3390/s23198159 - 28 Sep 2023
Viewed by 1307
Abstract
Linguistic knowledge helps a lot in scene text recognition by providing semantic information to refine the character sequence. The visual model only focuses on the visual texture of characters without actively learning linguistic information, which leads to poor model recognition rates in some [...] Read more.
Linguistic knowledge helps a lot in scene text recognition by providing semantic information to refine the character sequence. The visual model only focuses on the visual texture of characters without actively learning linguistic information, which leads to poor model recognition rates in some noisy (distorted and blurry, etc.) images. In order to address the aforementioned issues, this study builds upon the most recent findings of the Vision Transformer, and our approach (called Display-Semantic Transformer, or DST for short) constructs a masked language model and a semantic visual interaction module. The model can mine deep semantic information from images to assist scene text recognition and improve the robustness of the model. The semantic visual interaction module can better realize the interaction between semantic information and visual features. In this way, the visual features can be enhanced by the semantic information so that the model can achieve a better recognition effect. The experimental results show that our model improves the average recognition accuracy on six benchmark test sets by nearly 2% compared to the baseline. Our model retains the benefits of having a small number of parameters and allows for fast inference speed. Additionally, it attains a more optimal balance between accuracy and speed. Full article
Show Figures

Figure 1

Back to TopTop