Electronics

Editorial

Jump to: Research, Review

8 pages, 175 KiB

Open AccessEditorial

Deep Learning in Image Processing and Pattern Recognition

by Aili Wang, Haibin Wu and Yuji Iwahori

Electronics 2025, 14(10), 1942; https://doi.org/10.3390/electronics14101942 - 9 May 2025

Viewed by 617

Abstract

The current field shows a trend of multi-dimensional fusion [...] Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

Research

Jump to: Editorial, Review

18 pages, 3564 KiB

Open AccessArticle

Offline Mongolian Handwriting Recognition Based on Data Augmentation and Improved ECA-Net

by Qing-Dao-Er-Ji Ren, Lele Wang, Zerui Ma and Saheya Barintag

Electronics 2024, 13(5), 835; https://doi.org/10.3390/electronics13050835 - 21 Feb 2024

Cited by 3 | Viewed by 1631

Abstract

Writing is an important carrier of cultural inheritance, and the digitization of handwritten texts is an effective means to protect national culture. Compared to Chinese and English handwriting recognition, the research on Mongolian handwriting recognition started relatively late and achieved few results due [...] Read more.

Writing is an important carrier of cultural inheritance, and the digitization of handwritten texts is an effective means to protect national culture. Compared to Chinese and English handwriting recognition, the research on Mongolian handwriting recognition started relatively late and achieved few results due to the characteristics of the script itself and the lack of corpus. First, according to the characteristics of Mongolian handwritten characters, the random erasing data augmentation algorithm was modified, and a dual data augmentation (DDA) algorithm was proposed by combining the improved algorithm with horizontal wave transformation (HWT) to augment the dataset for training the Mongolian handwriting recognition. Second, the classical CRNN handwriting recognition model was improved. The structure of the encoder and decoder was adjusted according to the characteristics of the Mongolian script, and the attention mechanism was introduced in the feature extraction and decoding stages of the model. An improved handwriting recognition model, named the EGA model, suitable for the features of Mongolian handwriting was suggested. Finally, the effectiveness of the EGA model was verified by a large number of data tests. Experimental results demonstrated that the proposed EGA model improves the recognition accuracy of Mongolian handwriting, and the structural modification of the encoder and coder effectively balances the recognition accuracy and complexity of the model. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

16 pages, 5028 KiB

Open AccessArticle

YOLO-Rlepose: Improved YOLO Based on Swin Transformer and Rle-Oks Loss for Multi-Person Pose Estimation

by Yi Jiang, Kexin Yang, Jinlin Zhu and Li Qin

Electronics 2024, 13(3), 563; https://doi.org/10.3390/electronics13030563 - 30 Jan 2024

Cited by 9 | Viewed by 3859

Abstract

In recent years, there has been significant progress in human pose estimation, fueled by the widespread adoption of deep convolutional neural networks. However, despite these advancements, multi-person 2D pose estimation still remains highly challenging due to factors such as occlusion, noise, and non-rigid [...] Read more.

In recent years, there has been significant progress in human pose estimation, fueled by the widespread adoption of deep convolutional neural networks. However, despite these advancements, multi-person 2D pose estimation still remains highly challenging due to factors such as occlusion, noise, and non-rigid body movements. Currently, most multi-person pose estimation approaches handle joint localization and association separately. This study proposes a direct regression-based method to estimate the 2D human pose from a single image. The authors name this network YOLO-Rlepose. Compared to traditional methods, YOLO-Rlepose leverages Transformer models to better capture global dependencies between image feature blocks and preserves sufficient spatial information for keypoint detection through a multi-head self-attention mechanism. To further improve the accuracy of the YOLO-Rlepose model, this paper proposes the following enhancements. Firstly, this study introduces the C3 Module with Swin Transformer (C3STR). This module builds upon the C3 module in You Only Look Once (YOLO) by incorporating a Swin Transformer branch, enhancing the YOLO-Rlepose model’s ability to capture global information and rich contextual information. Next, a novel loss function named Rle-Oks loss is proposed. The loss function facilitates the training process by learning the distributional changes through Residual Log-likelihood Estimation. To assign different weights based on the importance of different keypoints in the human body, this study introduces a weight coefficient into the loss function. The experiments proved the efficiency of the proposed YOLO-Rlepose model. On the COCO dataset, the model outperforms the previous SOTA method by 2.11% in AP. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

22 pages, 7098 KiB

Open AccessArticle

Detection of Small Lesions on Grape Leaves Based on Improved YOLOv7

by Mingji Yang, Xinbo Tong and Haisong Chen

Electronics 2024, 13(2), 464; https://doi.org/10.3390/electronics13020464 - 22 Jan 2024

Cited by 6 | Viewed by 3336

Abstract

The precise detection of small lesions on grape leaves is beneficial for early detection of diseases. In response to the high missed detection rate of small target diseases on grape leaves, this paper adds a new prediction branch and combines an improved channel [...] Read more.

The precise detection of small lesions on grape leaves is beneficial for early detection of diseases. In response to the high missed detection rate of small target diseases on grape leaves, this paper adds a new prediction branch and combines an improved channel attention mechanism and an improved E-ELAN (Extended-Efficient Long-range Attention Network) to propose an improved algorithm for the YOLOv7 (You Only Look Once version 7) model. Firstly, to address the issue of low resolution for small targets, a new detection head is added to detect smaller targets. Secondly, in order to increase the feature extraction ability of E-ELAN components in YOLOv7 for small targets, the asymmetric convolution is introduced into E-ELAN to replace the original 3 × 3 convolution in E-ELAN network to achieve multi-scale feature extraction. Then, to address the issue of insufficient extraction of information from small targets in YOLOv7, a channel attention mechanism was introduced and improved to enhance the network’s sensitivity to small-scale targets. Finally, the CIoU (Complete Intersection over Union) in the original YOLOv7 network model was replaced with SIoU (Structured Intersection over Union) to optimize the loss function and enhance the network’s localization ability. In order to verify the effectiveness of the improved YOLOv7 algorithm, three common grape leaf diseases were selected as detection objects to create a dataset for experiments. The results show that the average accuracy of the algorithm proposed in this paper is 2.7% higher than the original YOLOv7 algorithm, reaching 93.5%. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

17 pages, 7421 KiB

Open AccessArticle

A Residual Network with Efficient Transformer for Lightweight Image Super-Resolution

by Fengqi Yan, Shaokun Li, Zhiguo Zhou and Yonggang Shi

Electronics 2024, 13(1), 194; https://doi.org/10.3390/electronics13010194 - 2 Jan 2024

Cited by 1 | Viewed by 2626

Abstract

In recent years, deep learning approaches have achieved remarkable results in the field of Single-Image Super-Resolution (SISR). To attain improved performance, most existing methods focus on constructing more-complex networks that demand extensive computational resources, thereby significantly impeding the advancement and real-world application of [...] Read more.

In recent years, deep learning approaches have achieved remarkable results in the field of Single-Image Super-Resolution (SISR). To attain improved performance, most existing methods focus on constructing more-complex networks that demand extensive computational resources, thereby significantly impeding the advancement and real-world application of super-resolution techniques. Furthermore, many lightweight super-resolution networks employ knowledge distillation strategies to reduce network parameters, which can considerably slow down inference speeds. In response to these challenges, we propose a Residual Network with an Efficient Transformer (RNET). RNET incorporates three effective design elements. First, we utilize Blueprint-Separable Convolution (BSConv) instead of traditional convolution, effectively reducing the computational workload. Second, we propose a residual connection structure for local feature extraction, streamlining feature aggregation and accelerating inference. Third, we introduce an efficient transformer module to enhance the network’s ability to aggregate contextual features, resulting in recovered images with richer texture details. Additionally, spatial attention and channel attention mechanisms are integrated into our model, further augmenting its capabilities. We evaluate the proposed method on five general benchmark test sets. With these innovations, our network outperforms existing efficient SR methods on all test sets, achieving the best performance with the fewest parameters, particularly in the area of texture detail enhancement in images. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

16 pages, 4180 KiB

Open AccessArticle

Pixel-Level Degradation for Text Image Super-Resolution and Recognition

by Xiaohong Qian, Lifeng Xie, Ning Ye, Renlong Le and Shengying Yang

Electronics 2023, 12(21), 4546; https://doi.org/10.3390/electronics12214546 - 5 Nov 2023

Cited by 1 | Viewed by 2379

Abstract

In the realm of image reconstruction, deep learning-based super-resolution (SR) has established itself as a prevalent technique, particularly in the domain of text image restoration. This study aims to address notable deficiencies in existing research, including constraints imposed by restricted datasets and challenges [...] Read more.

In the realm of image reconstruction, deep learning-based super-resolution (SR) has established itself as a prevalent technique, particularly in the domain of text image restoration. This study aims to address notable deficiencies in existing research, including constraints imposed by restricted datasets and challenges related to model generalization. Specifically, the goal is to enhance the super-resolution network’s reconstruction of scene text image super-resolution and utilize the generated degenerate dataset to alleviate issues associated with poor generalization due to the sparse scene text image super-resolution dataset. The methodology employed begins with the degradation of images from the MJSynth dataset, using a stochastic degradation process to create eight distinct degraded versions. Subsequently, a blank image is constructed, preserving identical dimensions to the low-resolution image, with each pixel sourced randomly from the corresponding points across the eight degraded images. Following several iterations of training via Finetune, the LR-HR method is applied to the TextZoom dataset. The pivotal metric for assessment is optical character recognition (OCR) accuracy, recognized for its fundamental role in gauging the pragmatic effectiveness of this approach. The experimental findings reveal a notable enhancement in OCR accuracy when compared to the TBSRN model, yielding improvements of 2.4%, 2.3%, and 4.8% on the TextZoom dataset. This innovative approach, founded on pixel-level degradation, not only exhibits commendable generalization capabilities but also demonstrates resilience in confronting the intricate challenges inherent to text image super-resolution. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

12 pages, 3351 KiB

Open AccessArticle

Defect Detection Method of Phosphor in Glass Based on Improved YOLO5 Algorithm

by Yong Qin, Zhenye Pan and Chenhao Shao

Electronics 2023, 12(18), 3917; https://doi.org/10.3390/electronics12183917 - 17 Sep 2023

Cited by 1 | Viewed by 1524

Abstract

Phosphor in Glass (PiG) is easy to be stirred unevenly during production and processing, and improper use of instruments and other factors lead to defective products. In this paper, we propose an improved YOLOv5 target detection algorithm. Firstly, the Coordinate Attention (CA) is [...] Read more.

Phosphor in Glass (PiG) is easy to be stirred unevenly during production and processing, and improper use of instruments and other factors lead to defective products. In this paper, we propose an improved YOLOv5 target detection algorithm. Firstly, the Coordinate Attention (CA) is introduced into the backbone network to enable the network to notice detect targets in a larger range. Secondly, the Bidirectional Feature Pyramid Network (BiFPN) is used to fuse different scale information in the neck part to obtain the output feature map with rich semantic information. At the same time, the weighted bidirectional feature fusion pyramid structure adjusts the contribution of different scale input feature maps to the output by introducing weights. This optimization enhances the feature fusion effect, reduces the loss of feature information in the convolution process, and improves detection accuracy. Then, the GIOU_Loss function is replaced with the EIOU_Loss function to speed up the convergence. Finally, the comparative experiment is carried out with the self-made PiG dataset. The experimental results show that the average accuracy mAP of this method is 12.35% higher than that of the original method (YOLOv5s), with a detection speed is 53.92 FPS, aligning with the actual needs of industrial detection. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

17 pages, 17336 KiB

Open AccessFeature PaperArticle

Fully Automatic Approach for Smoke Tracking Based on Deep Image Quality Enhancement and Adaptive Level Set Model

by Rimeh Daoudi, Aymen Mouelhi, Moez Bouchouicha, Eric Moreau and Mounir Sayadi

Electronics 2023, 12(18), 3888; https://doi.org/10.3390/electronics12183888 - 14 Sep 2023

Cited by 1 | Viewed by 1693

Abstract

In recent decades, the need for advanced systems with good precision, low cost, and high-time response for wildfires and smoke detection and monitoring has become an absolute necessity. In this paper, we propose a novel, fast, and autonomous approach for denoising and tracking [...] Read more.

In recent decades, the need for advanced systems with good precision, low cost, and high-time response for wildfires and smoke detection and monitoring has become an absolute necessity. In this paper, we propose a novel, fast, and autonomous approach for denoising and tracking smoke in video sequences captured from a camera in motion. The proposed method is based mainly on two stages: the first one is a reconstruction and denoising path with a novel lightweight convolutional autoencoder architecture. The second stage is a specific scheme designated for smoke tracking, and it consists of the following: first, the foreground frames are extracted with the HSV color model and textural features of smoke; second, possible false detections of smoke regions are eliminated with image processing technique and last smoke contours detection is performed with an adaptive nonlinear level set. The obtained experimental results exposed in this paper show the potential of the proposed approach and prove its efficiency in smoke video denoising and tracking with a minimized number of false negative regions and good detection rates. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

17 pages, 5144 KiB

Open AccessArticle

A Multi-Stage Adaptive Copy-Paste Data Augmentation Algorithm Based on Model Training Preferences

by Xiaoyu Yu, Fuchao Li, Yan Liu and Aili Wang

Electronics 2023, 12(17), 3695; https://doi.org/10.3390/electronics12173695 - 31 Aug 2023

Viewed by 1910

Abstract

Datasets play an important role in the field of object detection. However, the production of the dataset is influenced by objective environment and human subjectivity, resulting in class imbalanced or even long-tailed distribution in the datasets. At present, the optimization methods based on [...] Read more.

Datasets play an important role in the field of object detection. However, the production of the dataset is influenced by objective environment and human subjectivity, resulting in class imbalanced or even long-tailed distribution in the datasets. At present, the optimization methods based on data augmentation still rely on subjective parameter adjustments, which is tedious. In this paper, we propose a multi-stage adaptive Copy-Paste augmentation (MSACP) algorithm. This algorithm divides model training into multiple training stages, each stage forming unique training preferences for that stage. Based on these training preferences, the class information of the training set is adaptively adjusted, which not only alleviates the problem of class imbalance in training, but also expands different sample sizes for categories with insufficient information at different training stages. Experimental verification of the traffic sign dataset Tsinghua–Tencent 100K (TT100K) was carried out and showed that the proposed method not only can improve the class imbalance in the dataset, but can also improve the detection performance of models. By using MSACP to transplant the trained optimal weights to an embedded platform, and combining YOLOv3-tiny, the model’s accuracy in detecting traffic signs in autonomous driving scenarios was improved, verifying the effectiveness of the MSACP algorithm in practical applications. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

22 pages, 24276 KiB

Open AccessArticle

Multi-Scale Spatial–Spectral Attention-Based Neural Architecture Search for Hyperspectral Image Classification

by Yingluo Song, Aili Wang, Yan Zhao, Haibin Wu and Yuji Iwahori

Electronics 2023, 12(17), 3641; https://doi.org/10.3390/electronics12173641 - 29 Aug 2023

Cited by 5 | Viewed by 1886

Abstract

Convolutional neural networks (CNNs) are indeed commonly employed for hyperspectral image classification. However, the architecture of cellular neural networks typically requires manual design and fine-tuning, which can be quite laborious. Fortunately, there have been recent advancements in the field of Neural Architecture Search [...] Read more.

Convolutional neural networks (CNNs) are indeed commonly employed for hyperspectral image classification. However, the architecture of cellular neural networks typically requires manual design and fine-tuning, which can be quite laborious. Fortunately, there have been recent advancements in the field of Neural Architecture Search (NAS) that enable the automatic design of networks. These NAS techniques have significantly improved the accuracy of HSI classification, pushing it to new levels. This article proposes a Multi-Scale Spatial–Spectral Attention-based NAS, MS³ANAS) framework for HSI classification to automatically design a neural network structure for HSI classifiers. First, this paper constructs a multi-scale attention mechanism extended search space, which considers multi-scale filters to reduce parameters while maintaining large-scale receptive field and enhanced multi-scale spectral–spatial feature extraction to increase network sensitivity towards hyperspectral information. Then, we combined the slow–fast learning architecture update paradigm to optimize and iteratively update the architecture vector and effectively improve the model’s generalization ability. Finally, we introduced the Lion optimizer to track only momentum and use symbol operations to calculate updates, thereby reducing memory overhead and effectively reducing training time. The proposed NAS method demonstrates impressive classification performance and effectively improves accuracy across three HSI datasets (University of Pavia, Xuzhou, and WHU-Hi-Hanchuan). Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

20 pages, 4612 KiB

Open AccessArticle

Gaze Estimation Method Combining Facial Feature Extractor with Pyramid Squeeze Attention Mechanism

by Jingfang Wei, Haibin Wu, Qing Wu, Yuji Iwahori, Xiaoyu Yu and Aili Wang

Electronics 2023, 12(14), 3104; https://doi.org/10.3390/electronics12143104 - 17 Jul 2023

Cited by 2 | Viewed by 2646

Abstract

To address the issue of reduced gaze estimation accuracy caused by individual differences in different environments, this study proposes a novel gaze estimation algorithm based on attention mechanisms. Firstly, by constructing a facial feature extractor (FFE), the method obtains facial feature information about [...] Read more.

To address the issue of reduced gaze estimation accuracy caused by individual differences in different environments, this study proposes a novel gaze estimation algorithm based on attention mechanisms. Firstly, by constructing a facial feature extractor (FFE), the method obtains facial feature information about the eyes and locates the feature areas of the left and right eyes. Then, the L2CSNet (

l_{2}

loss + cross-entropy loss + softmax layer network), which integrates the PSA (pyramid squeeze attention), is designed to increase the correlation weights related to gaze estimation in the feature areas, suppress other irrelevant weights, and extract more fine-grained feature information to obtain gaze direction features. Finally, by integrating L2CSNet with FFE and PSA, FPSA_L2CSNet was proposed, which is fully tested on four representative publicly available datasets and a real-world dataset comprising individuals of different backgrounds, lighting conditions, nationalities, skin tones, ages, genders, and partial occlusions. The experimental results indicate that the accuracy of the gaze estimation model proposed in this paper has been improved by 13.88%, 11.43%, and 7.34%, compared with L2CSNet, FSE_L2CSNet, and FCBA_L2CSNet, respectively. This model not only improves the robustness of gaze estimation but also provides more accurate estimation results than the original model. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

25 pages, 26557 KiB

Open AccessArticle

Quality of Life Prediction on Walking Scenes Using Deep Neural Networks and Performance Improvement Using Knowledge Distillation

by Thanasit Rithanasophon, Kitsaphon Thitisiriwech, Pittipol Kantavat, Boonserm Kijsirikul, Yuji Iwahori, Shinji Fukui, Kazuki Nakamura and Yoshitsugu Hayashi

Electronics 2023, 12(13), 2907; https://doi.org/10.3390/electronics12132907 - 2 Jul 2023

Cited by 2 | Viewed by 2093

Abstract

The well-being of residents is a top priority for megacities, which is why urban design and sustainable development are crucial topics. Quality of Life (QoL) is used as an effective key performance index (KPI) to measure the efficiency of a city plan’s quantity [...] Read more.

The well-being of residents is a top priority for megacities, which is why urban design and sustainable development are crucial topics. Quality of Life (QoL) is used as an effective key performance index (KPI) to measure the efficiency of a city plan’s quantity and quality factors. For city dwellers, QoL for pedestrians is also significant. The walkability concept evaluates and analyzes the QoL in a walking scene. However, the traditional questionnaire survey approach is costly, time-consuming, and limited in its evaluation area. To overcome these limitations, the paper proposes using artificial intelligence (AI) technology to evaluate walkability data collected through a questionnaire survey using virtual reality (VR) tools. The proposed method involves knowledge extraction using deep convolutional neural networks (DCNNs) for information extraction and deep learning (DL) models to infer QoL scores. Knowledge distillation (KD) is also applied to reduce the model size and improve real-time performance. The experiment results demonstrate that the proposed approach is practical and can be considered an alternative method for acquiring QoL. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

15 pages, 3510 KiB

Open AccessArticle

Text Emotion Recognition Based on XLNet-BiGRU-Att

by Tian Han, Zhu Zhang, Mingyuan Ren, Changchun Dong, Xiaolin Jiang and Quansheng Zhuang

Electronics 2023, 12(12), 2704; https://doi.org/10.3390/electronics12122704 - 16 Jun 2023

Cited by 16 | Viewed by 3149

Abstract

Text emotion recognition (TER) is an important natural language processing (NLP) task which is widely used in human–computer interaction, public opinion analysis, mental health analysis, and social network analysis. In this paper, a deep learning model based on XLNet with bidirectional recurrent unit [...] Read more.

Text emotion recognition (TER) is an important natural language processing (NLP) task which is widely used in human–computer interaction, public opinion analysis, mental health analysis, and social network analysis. In this paper, a deep learning model based on XLNet with bidirectional recurrent unit and attention mechanism (XLNet-BiGRU-Att) is proposed in order to improve TER performance. XLNet is used to build bidirectional language models which can learn contextual information simultaneously, while the bidirectional gated recurrent unit (BiGRU) helps to extract more effective features which can pay attention to current and previous states using hidden layers and the attention mechanism (Att) provides different weights to enhance the ’attention’ paid to important information, thereby improving the quality of word vectors and the accuracy of sentiment analysis model judgments. The proposed model composed of XLNet, BiGRU, and Att improves performance on the whole TER task. Experiments on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database and the Chinese Academy of Sciences Institute of Automation (CASIA) dataset were carried out to compare XLNet-BiGRU-Att, XLNet, BERT, and BERT-BiLSTM, and the results show that the model proposed in this paper has superior performance compared to the others. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

22 pages, 4087 KiB

Open AccessArticle

An Improved Unscented Kalman Filtering Combined with Feature Triangle for Head Position Tracking

by Xiaoyu Yu, Yan Zhang, Haibin Wu and Aili Wang

Electronics 2023, 12(12), 2665; https://doi.org/10.3390/electronics12122665 - 14 Jun 2023

Cited by 2 | Viewed by 2189

Abstract

Aiming at the problem of feature point tracking loss caused by large head rotation and facial occlusion in doctors, this paper designs a head-position-tracking system based on geometric triangles and unscented Kalman filtering. By interconnecting the three feature points of the left and [...] Read more.

Aiming at the problem of feature point tracking loss caused by large head rotation and facial occlusion in doctors, this paper designs a head-position-tracking system based on geometric triangles and unscented Kalman filtering. By interconnecting the three feature points of the left and right pupil centers and the tip of the nose, they form a coplanar triangle. When the posture of the doctor’s head changes due to rotation, the shape of the corresponding geometric triangle will also deform. Using the inherent laws therein, the head posture can be estimated based on changes in the geometric model. Due to the inaccurate positioning of feature points caused by the deflection of the human head wearing a mask, traditional linear Kalman filtering algorithms are difficult to accurately track feature points. This paper combines geometric triangles with an unscented Kalman Filter (UKF) to obtain head posture, which has been fully tested in different environments, such as different faces, wearing/not wearing masks, and dark/bright light via public and measured datasets. The final experimental results show that compared to the linear Kalman filtering algorithm with a single feature point, the traceless Kalman filtering algorithm combined with geometric triangles in this paper not only improves the robustness of nonlinear angle of view tracking but also can provide more accurate estimates than traditional Kalman filters. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

17 pages, 4182 KiB

Open AccessArticle

X-ray Security Inspection Image Dangerous Goods Detection Algorithm Based on Improved YOLOv4

by Xiaoyu Yu, Wenjun Yuan and Aili Wang

Electronics 2023, 12(12), 2644; https://doi.org/10.3390/electronics12122644 - 12 Jun 2023

Cited by 10 | Viewed by 3635

Abstract

Aiming at the problems of multi-scale and serious overlap of dangerous goods in X-ray security-inspection-image samples, an X-ray dangerous-goods-detection algorithm with high detection accuracy is designed based on the improvement of YOLOv4. Using deformable convolution to redesign YOLOv4’s path-aggregation-network (PANet) module, deformable convolution [...] Read more.

Aiming at the problems of multi-scale and serious overlap of dangerous goods in X-ray security-inspection-image samples, an X-ray dangerous-goods-detection algorithm with high detection accuracy is designed based on the improvement of YOLOv4. Using deformable convolution to redesign YOLOv4’s path-aggregation-network (PANet) module, deformable convolution can flexibly change its receptive field based on the shape of the detected object. When the high-level information and low-level information are fused in the PANet module, deformable convolution is used to align features, which can effectively improve the detection accuracy. Then, the Focal-EIOU loss function is introduced, which can solve the problem of the CIOU loss function being prone to causing severe loss-value oscillation when dealing with low-quality samples. During training, the network can converge more quickly and the detection accuracy can be slightly improved. Finally, Soft-NMS was used to improve the non-maximum suppression of YOLOv4, effectively solving the problem of the high overlap rate of hazardous materials in the X-ray security-inspection dataset and improving accuracy. On the SIXRay dataset, this model detected 95.73%, 83.00%, 82.95%, 85.13%, and 80.74% AP for guns, knives, wrenches, pliers, and scissors, respectively, and the detected mAP reached 85.51%. The proposed model can effectively reduce the false-detection rate of dangerous goods in X-ray security images and improve the detection ability of small targets. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

16 pages, 10422 KiB

Open AccessArticle

Crowd Counting by Multi-Scale Dilated Convolution Networks

by Jingwei Dong, Ziqi Zhao and Tongxin Wang

Electronics 2023, 12(12), 2624; https://doi.org/10.3390/electronics12122624 - 10 Jun 2023

Cited by 2 | Viewed by 2541

Abstract

The number of people in a crowd is crucial information in public safety, intelligent monitoring, traffic management, architectural design, and other fields. At present, the counting accuracy in public spaces remains compromised by some unavoidable situations, such as the uneven distribution of a [...] Read more.

The number of people in a crowd is crucial information in public safety, intelligent monitoring, traffic management, architectural design, and other fields. At present, the counting accuracy in public spaces remains compromised by some unavoidable situations, such as the uneven distribution of a crowd and the difference in head scale caused by people’s differing distances from the camera. To solve these problems, we propose a deep learning crowd counting model, multi-scale dilated convolution networks (MSDCNet), based on crowd density map estimation. MSDCNet consists of three parts. The front-end network uses the truncated VGG16 to obtain preliminary features of the input image, with a proposed spatial pyramid pooling (SPP) module replacing the max-pooling layer to extract features with scale invariance. The core network is our proposed multi-scale feature extraction network (MFENet) for extracting features in three different scales. The back-end network consists of consecutive dilation convolution layers instead of traditional alternate convolution and pooling to expand the receptive field, extract high-level semantic information and avoid the spatial feature loss of small-scale heads. The experimental results on three public datasets show that the proposed model solved the above problems satisfactorily and obtained better counting accuracy than representative models in terms of mean absolute error (MAE) and mean square error (MSE). Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Graphical abstract

17 pages, 6571 KiB

Open AccessArticle

A Multi-View Face Expression Recognition Method Based on DenseNet and GAN

by Jingwei Dong, Yushun Zhang and Lingye Fan

Electronics 2023, 12(11), 2527; https://doi.org/10.3390/electronics12112527 - 3 Jun 2023

Cited by 6 | Viewed by 2730

Abstract

Facial expression recognition (FER) techniques can be widely used in human-computer interaction, intelligent robots, intelligent monitoring, and other domains. Currently, FER methods based on deep learning have become the mainstream schemes. However, these methods have some problems, such as a large number of [...] Read more.

Facial expression recognition (FER) techniques can be widely used in human-computer interaction, intelligent robots, intelligent monitoring, and other domains. Currently, FER methods based on deep learning have become the mainstream schemes. However, these methods have some problems, such as a large number of parameters, difficulty in being applied to embedded processors, and the fact that recognition accuracy is affected by facial deflection. To solve the problem of a large number of parameters, we propose a DSC-DenseNet model, which improves the standard convolution in DenseNet to depthwise separable convolution (DSC). To solve the problem wherein face deflection affects the recognition effect, we propose a posture normalization model based on GAN: a GAN with two local discriminators (LD-GAN) that strengthen the discriminatory abilities of the expression-related local parts, such as the parts related to the eyes, eyebrows, mouth, and nose. These discriminators improve the model’s ability to retain facial expressions and evidently benefits FER. Quantitative and qualitative experimental results on the Fer2013 and KDEF datasets have consistently shown the superiority of our FER method when working with multi-pose face images. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

22 pages, 3066 KiB

Open AccessArticle

Speech Emotion Recognition Based on Deep Residual Shrinkage Network

by Tian Han, Zhu Zhang, Mingyuan Ren, Changchun Dong, Xiaolin Jiang and Quansheng Zhuang

Electronics 2023, 12(11), 2512; https://doi.org/10.3390/electronics12112512 - 2 Jun 2023

Cited by 17 | Viewed by 2933

Abstract

Speech emotion recognition (SER) technology is significant for human–computer interaction, and this paper studies the features and modeling of SER. Mel-spectrogram is introduced and utilized as the feature of speech, and the theory and extraction process of mel-spectrogram are presented in detail. A [...] Read more.

Speech emotion recognition (SER) technology is significant for human–computer interaction, and this paper studies the features and modeling of SER. Mel-spectrogram is introduced and utilized as the feature of speech, and the theory and extraction process of mel-spectrogram are presented in detail. A deep residual shrinkage network with bi-directional gated recurrent unit (DRSN-BiGRU) is proposed in this paper, which is composed of convolution network, residual shrinkage network, bi-directional recurrent unit, and fully-connected network. Through the self-attention mechanism, DRSN-BiGRU can automatically ignore noisy information and improve the ability to learn effective features. Network optimization, verification experiment is carried out in three emotional datasets (CASIA, IEMOCAP, and MELD), and the accuracy of DRSN-BiGRU are 86.03%, 86.07%, and 70.57%, respectively. The results are also analyzed and compared with DCNN-LSTM, CNN-BiLSTM, and DRN-BiGRU, which verified the superior performance of DRSN-BiGRU. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

19 pages, 4531 KiB

Open AccessArticle

Anomalous Behavior Detection with Spatiotemporal Interaction and Autoencoder Enhancement

by Bohao Li, Kai Xie, Xuepeng Zeng, Mingxuan Cao, Chang Wen, Jianbiao He and Wei Zhang

Electronics 2023, 12(11), 2438; https://doi.org/10.3390/electronics12112438 - 27 May 2023

Cited by 1 | Viewed by 2023

Abstract

To reduce the cargo loss rate caused by abnormal consumption behavior in smart retail cabinets, two problems need to be solved. The first is that the diversity of consumers leads to a diversity of actions contained in the same behavior, which makes the [...] Read more.

To reduce the cargo loss rate caused by abnormal consumption behavior in smart retail cabinets, two problems need to be solved. The first is that the diversity of consumers leads to a diversity of actions contained in the same behavior, which makes the accuracy of consumer behavior identification low. Second, the difference between normal interaction behavior and abnormal interaction behavior is small, and anomalous features are difficult to define. Therefore, we propose an anomalous behavior detection algorithm with human–object interaction graph convolution and confidence-guided difference enhancement. Aiming to solve the problem of low accuracy of consumer behavior recognition, including interactive behavior, the human–object interaction graph convolutional network is used to recognize action and extract video frames of abnormal human behavior. To define anomalies, we detect anomalies by delineating anomalous areas of the anomaly video frames. We use a confidence-guided anomaly enhancement module to perform confidence detection on the encoder-extracted coded features using a confidence full connection layer. The experimental results showed that the action recognition algorithm had good generalization ability and accuracy, and the screened video frames have obvious destruction characteristics, and the area under the receiver operating characteristic (AUROC) curve reached 82.8% in the detection of abnormal areas. Our research provides a new solution for the detection of abnormal behavior that destroys commodity packaging, which has considerable application value. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

13 pages, 2500 KiB

Open AccessArticle

E-ReInForMIF Routing Algorithm Based on Energy Selection and Erasure Code Tolerance Machine

by Qiong Wu, Hai Huang, Xinmiao Lu, Jiaxing Qu, Juntao Gu and Cunfang Yang

Electronics 2023, 12(11), 2408; https://doi.org/10.3390/electronics12112408 - 25 May 2023

Viewed by 1184

Abstract

Aiming at the problems of data loss and uneven energy consumption in wireless sensor networks during data transmission, this paper proposes a ReInForM transmission fault-tolerant routing algorithm based on energy selection and erasure code fault-tolerant machines (E-ReInForMIF). The E-ReInForMIF algorithm improves the multi-path [...] Read more.

Aiming at the problems of data loss and uneven energy consumption in wireless sensor networks during data transmission, this paper proposes a ReInForM transmission fault-tolerant routing algorithm based on energy selection and erasure code fault-tolerant machines (E-ReInForMIF). The E-ReInForMIF algorithm improves the multi-path routing algorithm by combining an erasure coding fault-tolerant machine and node residual energy sorting selection. First, the erasure coding fault-tolerant machine is used to encode the signal, determine the number of transmission paths through multi-path routing, and then select the specific node of the next hop by sorting the residual energy of the node. The E-ReInForMIF routing algorithm effectively solves the problems of uneven energy consumption and data loss in data transmission, improving network lifespan and transmission reliability. Finally, the signal is decoded. The simulation results show that the E-ReInForMIF routing algorithm is superior to the ReInForM routing algorithm in improving the reliability of data transmission. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

16 pages, 54182 KiB

Open AccessArticle

Domain-Aware Adaptive Logarithmic Transformation

by Xuelai Fang and Xiangchu Feng

Electronics 2023, 12(6), 1318; https://doi.org/10.3390/electronics12061318 - 9 Mar 2023

Cited by 3 | Viewed by 2053

Abstract

Tone mapping (TM) aims to display high dynamic range scenes on media with limited visual information reproduction. Logarithmic transformation is a widely used preprocessing method in TM algorithms. However, the conventional logarithmic transformation does not take the difference in image properties into account, [...] Read more.

Tone mapping (TM) aims to display high dynamic range scenes on media with limited visual information reproduction. Logarithmic transformation is a widely used preprocessing method in TM algorithms. However, the conventional logarithmic transformation does not take the difference in image properties into account, nor does it consider tone mapping algorithms, which are designed based on the luminance or gradient-domain features. There will be problems such as oversaturation and loss of details. Based on the analysis of existing preprocessing methods, this paper proposes a domain-aware adaptive logarithmic transformation AdaLogT as a preprocessing method for TM algorithms. We introduce the parameter p and construct different objective functions for different domains TM algorithms to determine the optimal parameter values adaptively. Specifically, for luminance-domain algorithms, we use image exposure and histogram features to construct objective function; while for gradient-domain algorithms, we introduce texture-aware exponential mean local variance (EMLV) to build objective function. Finally, we propose a joint domain-aware logarithmic preprocessing method for deep-neural-network-based TM algorithms. The experimental results show that the novel preprocessing method AdaLogT endows each domain algorithm with wider scene adaptability and improves the performance in terms of visual effects and objective evaluations, the subjective and objective index scores of the tone mapping quality index improved by 6.04% and 5.90% on average for the algorithms. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

15 pages, 1960 KiB

Open AccessArticle

Improved Reconstruction Algorithm of Wireless Sensor Network Based on BFGS Quasi-Newton Method

by Xinmiao Lu, Cunfang Yang, Qiong Wu, Jiaxu Wang, Yuhan Wei, Liyu Zhang, Dongyuan Li and Lanfei Zhao

Electronics 2023, 12(6), 1267; https://doi.org/10.3390/electronics12061267 - 7 Mar 2023

Cited by 6 | Viewed by 1570

Abstract

Aiming at the problems of low reconstruction rate and poor reconstruction precision when reconstructing sparse signals in wireless sensor networks, a sparse signal reconstruction algorithm based on the Limit-Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) quasi-Newton method is proposed. The L-BFGS quasi-Newton method uses a two-loop recursion algorithm [...] Read more.

Aiming at the problems of low reconstruction rate and poor reconstruction precision when reconstructing sparse signals in wireless sensor networks, a sparse signal reconstruction algorithm based on the Limit-Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) quasi-Newton method is proposed. The L-BFGS quasi-Newton method uses a two-loop recursion algorithm to find the descent direction d_k directly by calculating the step difference between m adjacent iteration points, and a matrix H_k approximating the inverse of the Hessian matrix is constructed. It solves the disadvantages of BFGS requiring the calculation and storage of H_k, reduces the algorithm complexity, and improves the reconstruction rate. Finally, the experimental results show that the L-BFGS quasi-Newton method has good experimental results for solving the problem of sparse signal reconstruction in wireless sensor networks. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

12 pages, 5299 KiB

Open AccessArticle

Detecting Human Falls in Poor Lighting: Object Detection and Tracking Approach for Indoor Safety

by Xing Zi, Kunal Chaturvedi, Ali Braytee, Jun Li and Mukesh Prasad

Electronics 2023, 12(5), 1259; https://doi.org/10.3390/electronics12051259 - 6 Mar 2023

Cited by 21 | Viewed by 5445

Abstract

Falls are one the leading causes of accidental death for all people, but the elderly are at particularly high risk. Falls are severe issue in the care of those elderly people who live alone and have limited access to health aides and skilled [...] Read more.

Falls are one the leading causes of accidental death for all people, but the elderly are at particularly high risk. Falls are severe issue in the care of those elderly people who live alone and have limited access to health aides and skilled nursing care. Conventional vision-based systems for fall detection are prone to failure in conditions with low illumination. Therefore, an automated system that detects falls in low-light conditions has become an urgent need for protecting vulnerable people. This paper proposes a novel vision-based fall detection system that uses object tracking and image enhancement techniques. The proposed approach is divided into two parts. First, the captured frames are optimized using a dual illumination estimation algorithm. Next, a deep-learning-based tracking framework that includes detection by YOLOv7 and tracking by the Deep SORT algorithm is proposed to perform fall detection. On the Le2i fall and UR fall detection (URFD) datasets, we evaluate the proposed method and demonstrate the effectiveness of fall detection in dark night environments with obstacles. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

17 pages, 6796 KiB

Open AccessArticle

Deep Signal-Dependent Denoising Noise Algorithm

by Lanfei Zhao, Shijun Li and Jun Wang

Electronics 2023, 12(5), 1201; https://doi.org/10.3390/electronics12051201 - 2 Mar 2023

Cited by 2 | Viewed by 2167

Abstract

Although many existing noise parameter estimations of image signal-dependent noise have certain denoising effects, most methods are not ideal. There are some problems with these methods, such as poor noise suppression effects, smooth details, lack of flexible denoising ability, etc. To solve these [...] Read more.

Although many existing noise parameter estimations of image signal-dependent noise have certain denoising effects, most methods are not ideal. There are some problems with these methods, such as poor noise suppression effects, smooth details, lack of flexible denoising ability, etc. To solve these problems, in this study, we propose a deep signal-dependent denoising noise algorithm. The algorithm combines the model method with a convolutional neural network. We use the noise level of the noise image and the noise image together as the input of the convolutional neural network to obtain a wider range of noise levels than the single noise image as the input. In the convolutional neural network, the deep features of the image are extracted by multi-layer residuals, which solves the difficult problem of training. Extensive experiments demonstrate that our noise parameter estimation has good denoising performance. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

19 pages, 5042 KiB

Open AccessArticle

Research on Spectrum Prediction Technology Based on B-LTF

by Xue Wang, Qian Chen and Xiaoyang Yu

Electronics 2023, 12(1), 247; https://doi.org/10.3390/electronics12010247 - 3 Jan 2023

Cited by 8 | Viewed by 2419

Abstract

With the rapid development of global communication technology, the problem of scarce spectrum resources has become increasingly prominent. In order to alleviate the problem of frequency use, rationally use limited spectrum resources and improve frequency utilization, spectrum prediction technology has emerged. Through the [...] Read more.

With the rapid development of global communication technology, the problem of scarce spectrum resources has become increasingly prominent. In order to alleviate the problem of frequency use, rationally use limited spectrum resources and improve frequency utilization, spectrum prediction technology has emerged. Through the effective prediction of spectrum usage, the number of subsequent spectrum sensing processes can be slowed down, and the accuracy of spectrum decisions can be increased to improve the response speed of the whole cognitive radio technology. The rise of deep learning has brought changes to traditional spectrum predicting algorithms. This paper proposes a spectrum predicting method called Back Propagation-Long short-term memory Time Forecasting (B-LTF) by using Back Propagation-Long Short-term Memory (BP-LSTM) network model. According to the historical spectrum data, the future spectrum trend and the channel state of the future time node are predicted. The purpose of our research is to achieve dynamic spectrum access by improving the accuracy of spectrum prediction and better assisting cognitive radio technology. By comparing with BP, LSTM and Gate Recurrent Unit (GRU) network models, we clarify that the improved model of recurrent time network can deal with time series more effectively. The simulation results show that the proposed model has better prediction performance, and the change in time series length has a significant impact on the prediction accuracy of the deep learning model. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

21 pages, 5585 KiB

Open AccessArticle

Research on Analog Circuit Soft Fault Diagnosis Method Based on Mathematical Morphology Fractal Dimension

by Xinmiao Lu, Cunfang Yang, Qiong Wu, Jiaxu Wang, Zihan Lu, Shuai Sun, Kaiyi Liu and Dan Shao

Electronics 2023, 12(1), 184; https://doi.org/10.3390/electronics12010184 - 30 Dec 2022

Cited by 7 | Viewed by 1738

Abstract

It is difficult for traditional circuit-fault feature-extraction methods to accurately distinguish between nonlinear analog-circuit faults and analog-circuit faults with high fault rates and high diagnostic costs. To solve this problem, this paper proposes a method of mathematical morphology fractal dimension (VMD-MMFD) based on [...] Read more.

It is difficult for traditional circuit-fault feature-extraction methods to accurately distinguish between nonlinear analog-circuit faults and analog-circuit faults with high fault rates and high diagnostic costs. To solve this problem, this paper proposes a method of mathematical morphology fractal dimension (VMD-MMFD) based on variational mode decomposition for soft-fault feature extraction in analog circuits. First, the signal is decomposed into variational modes to suppress the influence of environmental noise, and multiple high-dimensional eigenmode functions with different center frequencies are obtained. The fractal dimension of the signal feature information component IMF is calculated, and then, KPCA (Kernel Principal Component Analysis) is used to remove the overlapping and redundant parts of the data. The fault set obtained is used as the basis for judging the working state and the fault type of the circuit. The experimental results of the simulation circuits show that this method can be effectively used for circuit-fault diagnosis. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

16 pages, 5959 KiB

Open AccessArticle

Soft Fault Diagnosis of Analog Circuit Based on EEMD and Improved MF-DFA

by Xinmiao Lu, Zihan Lu, Qiong Wu, Jiaxu Wang, Cunfang Yang, Shuai Sun, Dan Shao and Kaiyi Liu

Electronics 2023, 12(1), 114; https://doi.org/10.3390/electronics12010114 - 27 Dec 2022

Cited by 6 | Viewed by 2038

Abstract

Aiming at the problems of nonlinearity and serious confusion of fault characteristics in analog circuits, this paper proposed a fault diagnosis method for an analog circuit based on ensemble empirical pattern decomposition (EEMD) and improved multifractal detrended fluctuations analysis (MF-DFA). This method consists [...] Read more.

Aiming at the problems of nonlinearity and serious confusion of fault characteristics in analog circuits, this paper proposed a fault diagnosis method for an analog circuit based on ensemble empirical pattern decomposition (EEMD) and improved multifractal detrended fluctuations analysis (MF-DFA). This method consists of three steps: preprocessing, feature extraction, and fault classification identification. First, the EEMD decomposition preprocesses (denoises) the original signal; then, the appropriate IMF components are selected by correlation analysis; then, the IMF components are processed by the improved MF-DFA, and the fault feature values are extracted by calculating the multifractal spectrum parameters, and then the feature values are input to a support vector machine (SVM) for classification, which enables the diagnosis of soft faults in analog circuits. The experimental results show that the proposed EEMD-improved MF-DFA method effectively extracts the features of soft faults in nonlinear analog circuits and obtains a high diagnosis rate. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

14 pages, 4862 KiB

Open AccessArticle

Tone Mapping Method Based on the Least Squares Method

by Lanfei Zhao, Guoqing Li and Jun Wang

Electronics 2023, 12(1), 31; https://doi.org/10.3390/electronics12010031 - 22 Dec 2022

Cited by 5 | Viewed by 2371

Abstract

Tone mapping is used to compress the dynamic range of image data without distortion. To compress the dynamic range of HDR images and prevent halo artifacts, a tone mapping method is proposed based on the least squares method. Our method first uses weights [...] Read more.

Tone mapping is used to compress the dynamic range of image data without distortion. To compress the dynamic range of HDR images and prevent halo artifacts, a tone mapping method is proposed based on the least squares method. Our method first uses weights for the estimation of the illumination, and the image detail layer is obtained by the Retinex model. Then, a global tone mapping function with the parameter is used to compress the dynamic range, and the parameter is obtained by fitting the function to the histogram equalization. Finally, the detail layer and the illumination layer are fused to obtain the LDR image. The experimental results show that the proposed method can efficiently restore real-world scene information while preventing halo artifacts. Therefore, tone mapping quality index and mean Weber contrast of the tone-mapped image are 8% and 12% higher than the closest competition tone mapping method. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

13 pages, 5298 KiB

Open AccessArticle

Three-Stage Tone Mapping Algorithm

by Lanfei Zhao, Ruiyang Sun and Jun Wang

Electronics 2022, 11(24), 4072; https://doi.org/10.3390/electronics11244072 - 7 Dec 2022

Cited by 3 | Viewed by 2410

Abstract

In this paper, a tone mapping algorithm is presented to map real-world luminance into displayed luminance. Our purpose is to reveal the local contrast of real-world scenes on a conventional monitor. Around this point, we propose a three-stage algorithm to visualize high dynamic [...] Read more.

In this paper, a tone mapping algorithm is presented to map real-world luminance into displayed luminance. Our purpose is to reveal the local contrast of real-world scenes on a conventional monitor. Around this point, we propose a three-stage algorithm to visualize high dynamic range images. All pixels of high dynamic range images are classified into three groups. For the first stage, we introduce piecewise linear mapping as the global tone mapping operator to map the luminance of the first group, which provides overall impressions of luminance. For the second stage, the luminance of the second group is determined by the weighted average of its neighborhood pixels, which are derived from the first group’s pixels. For the third stage, the luminance of the third group is determined by the weighted average of its neighborhood pixels, which are derived from the second group’s pixels. Experimental results on several real-world images and the TMQI database show that our algorithm can improve the visibility of real-world scenes with about 12% and 9% higher scores of mean opinion score and tone-mapped image quality index than the closest competitive tone mapping methods. Compared to the existing tone mapping methods, our algorithm produces visually compelling results without halo artifacts and loss of detail. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

12 pages, 7200 KiB

Open AccessArticle

Furniture Image Classification Based on Depthwise Group Over-Parameterized Convolution

by Han Ye, Xiaodong Zhu, Chengyang Liu, Linlin Yang and Aili Wang

Electronics 2022, 11(23), 3889; https://doi.org/10.3390/electronics11233889 - 24 Nov 2022

Cited by 5 | Viewed by 2788

Abstract

In this paper, an improved VGG16 combined with depthwise group over-parameterized convolution (DGOVGG16) model is proposed to realize automatic furniture image classification. Firstly, depthwise over-parameterized convolution combined with group convolution is combined to construct depthwise group over-parameterized convolution, which is introduced to the [...] Read more.

In this paper, an improved VGG16 combined with depthwise group over-parameterized convolution (DGOVGG16) model is proposed to realize automatic furniture image classification. Firstly, depthwise over-parameterized convolution combined with group convolution is combined to construct depthwise group over-parameterized convolution, which is introduced to the VGG 16 model for reducing the number of parameters of the overall model while extracting more sufficient semantic features of furniture images. Then, this paper uses the ReLU activation function in the former part of the neural network to reduce the correlation between parameters and accelerate the weight update speed of the former part of the model. Meantime, the proposed model applies Leaky-ReLU activation function in the last layer to avoid the problem that some neurons do not update. Compared with the six furniture image classification methods based on MobileNetV2, AlexNet, ShuffleNetv2, GoogleNet, VGG 16 and GVGG16, the experimental results show the proposed DGOVGG16 with average accuracy (AA) of 95.51% has better classification performance. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

18 pages, 1030 KiB

Open AccessArticle

Multi-Task Learning for Scene Text Image Super-Resolution with Multiple Transformers

by Kosuke Honda, Masaki Kurematsu, Hamido Fujita and Ali Selamat

Electronics 2022, 11(22), 3813; https://doi.org/10.3390/electronics11223813 - 20 Nov 2022

Cited by 3 | Viewed by 2620

Abstract

Scene text image super-resolution aims to improve readability by recovering text shapes from low-resolution degraded text images. Although recent developments in deep learning have greatly improved super-resolution (SR) techniques, recovering text images with irregular shapes, heavy noise, and blurriness is still challenging. This [...] Read more.

Scene text image super-resolution aims to improve readability by recovering text shapes from low-resolution degraded text images. Although recent developments in deep learning have greatly improved super-resolution (SR) techniques, recovering text images with irregular shapes, heavy noise, and blurriness is still challenging. This is because networks with Convolutional Neural Network (CNN)-based backbones cannot sufficiently capture the global long-range correlations of text images or detailed sequential information about the text structure. In order to address this issue, this paper proposes a Multi-task learning-based Text Super-resolution (MTSR) Network to approach this problem. MTSR is a multi-task architecture for image reconstruction and SR. It uses transformer-based modules to transfer complementary features of the reconstruction model, such as noise removal capability and text structure information, to the SR model. In addition, another transformer-based module using 2D positional encoding is used to handle irregular deformations of the text. The feature maps generated from these two transformer-based modules are fused to attempt improvement of the visual quality of images with heavy noise, blurriness, and irregular deformations. Experimental results on the TextZoom dataset and several scene text recognition benchmarks show that our MTSR significantly improves the accuracy of existing text recognizers. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Graphical abstract

13 pages, 2630 KiB

Open AccessArticle

Research on Retinex Algorithm Combining with Attention Mechanism for Image Enhancement

by Mingzhu Liu, Junyu Chen and Xiaofei Han

Electronics 2022, 11(22), 3695; https://doi.org/10.3390/electronics11223695 - 11 Nov 2022

Cited by 6 | Viewed by 2599

Abstract

Considering the high noise and chromatic aberration in the Retinex-Net image enhancement results, this paper put forward a modified Retinex-Net algorithm for weak illumination image enhancement based on the Decom-Net and Enhance-Net structures of Retinex-Net. The improved structure proposed in this paper adds [...] Read more.

Considering the high noise and chromatic aberration in the Retinex-Net image enhancement results, this paper put forward a modified Retinex-Net algorithm for weak illumination image enhancement based on the Decom-Net and Enhance-Net structures of Retinex-Net. The improved structure proposed in this paper adds the attention mechanism ECA-Net into the Decom-Net and Enhance-Net convolution layer of the original Retinex-Net structure, which can effectively reduce the problem of irrelevant background and local brightness imbalance, activate sensitive features, and improve the image’s details and brightness processing ability. Additionally, deep connected attention networks are embedded between the introduced attention modules, so that all of the attention modules can be trained jointly to improve the learning ability. Furthermore, the improved method also introduces a noise reduction loss function and a color loss function to suppress noise and to reduce image color distortion. The test results of the proposed method indicate that the image’s overall brightness can be balanced, the local areas cannot be overexposed, and more image details and color information can be retained than with other enhancement algorithms. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

9 pages, 2234 KiB

Open AccessArticle

A Coverage Hole Patching Algorithm for Heterogeneous Wireless Sensor Networks

by Xinmiao Lu, Yuhan Wei, Qiong Wu, Cunfang Yang, Dongyuan Li, Liyu Zhang and Ying Zhou

Electronics 2022, 11(21), 3563; https://doi.org/10.3390/electronics11213563 - 1 Nov 2022

Cited by 5 | Viewed by 1832

Abstract

The improvement of coverage is a critical issue in the coverage hole patching of sensors. Traditionally, VOPR and VORCP algorithms improve the coverage of the detection area by improving the original VOR algorithm, but coverage hole patching algorithms only target homogeneous networks. In [...] Read more.

The improvement of coverage is a critical issue in the coverage hole patching of sensors. Traditionally, VOPR and VORCP algorithms improve the coverage of the detection area by improving the original VOR algorithm, but coverage hole patching algorithms only target homogeneous networks. In the real world, however, the nodes in the wireless sensor network (WSN) are often heterogeneous, i.e., the sensors have different sensing radii. The VORPH algorithm uses the VOR in a hybrid heterogeneous network and improves the original algorithm. The patched nodes are better utilized, and the detection range is enlarged. However, the utilization rate of the patched nodes is not optimized, making it impossible to patch the coverage holes to the maximum degree. In the environment of hybrid heterogeneous WSN, we propose a coverage hole patching algorithm with a priority mechanism. The algorithm determines the patching priority based on the size of the coverage holes, thereby improving network coverage, reducing node redundancy, and balancing resource allocation. The proposed algorithm was compared under the same environment by simulation and analysis. The results show that our algorithm is superior to the traditional coverage hole patching algorithms in coverage rate, and can reduce node redundancy. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

15 pages, 4324 KiB

Open AccessArticle

Research on Small Acceptance Domain Text Detection Algorithm Based on Attention Mechanism and Hybrid Feature Pyramid

by Mingzhu Liu, Ben Li and Wei Zhang

Electronics 2022, 11(21), 3559; https://doi.org/10.3390/electronics11213559 - 31 Oct 2022

Cited by 3 | Viewed by 1803

Abstract

In the traditional text detection process, the text area of the small receptive field in the video image is easily ignored, the features that can be extracted are few, and the calculation is large. These problems are not conducive to the recognition of [...] Read more.

In the traditional text detection process, the text area of the small receptive field in the video image is easily ignored, the features that can be extracted are few, and the calculation is large. These problems are not conducive to the recognition of text information. In this paper, a lightweight network structure on the basis of the EAST algorithm, the Convolution Block Attention Module (CBAM), is proposed. It is suitable for the spatial and channel hybrid attention module of text feature extraction of the natural scene video images. The improved structure proposed in this paper can obtain deep network features of text and reduce the computation of text feature extraction. Additionally, a hybrid feature pyramid + BLSTM network is designed to improve the attention to the small acceptance domain text regions and the text sequence features of the region. The test results on the ICDAR2015 demonstrate that the improved construction can effectively boost the attention of small acceptance domain text regions and improve the sequence feature detection accuracy of small acceptance domain of long text regions without significantly increasing computation. At the same time, the proposed network constructions are superior to the traditional EAST algorithm and other improved algorithms in accuracy rate P, recall rate R, and F-value. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

13 pages, 2045 KiB

Open AccessArticle

Modulation Recognition of Digital Signal Using Graph Feature and Improved K-Means

by Guodong Li, Xvan Qin, He Liu, Kaiyuan Jiang and Aili Wang

Electronics 2022, 11(20), 3298; https://doi.org/10.3390/electronics11203298 - 13 Oct 2022

Cited by 3 | Viewed by 2031

Abstract

Automatic modulation recognition (AMR) has been wildly used in both military and civilian fields. Since the recognition of digital signal at low signal-to-noise (SNR) ratio is difficult and complex, in this paper, a clustering analysis algorithm is proposed for its recognition. Firstly, the [...] Read more.

Automatic modulation recognition (AMR) has been wildly used in both military and civilian fields. Since the recognition of digital signal at low signal-to-noise (SNR) ratio is difficult and complex, in this paper, a clustering analysis algorithm is proposed for its recognition. Firstly, the digital signal constellation is extracted from the received waveform (digital signal + noise) by using the orthogonal decomposition and then, it is denoised by using an algorithm referred to as auto density-based spatial clustering technique in noise (ADBSCAN). The combination of density peak clustering (DPC) algorithm and improved K-means clustering is used to extract the constellation’s graph features, the eigenvalues are input into cascade support vector machine (SVM) multi-classifiers, and the signal modulation mode is obtained. BPSK, QPSK, 8PSK, 16QAM and 32QAM five kinds of digital signals are trained and classified by our proposed method. Compared with the classical machine learning algorithm, the proposed algorithm has higher recognition accuracy at low SNR (less than 4dB), which confirmed that the proposed modulation recognition method is effective in noncooperation communication systems. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

Review

Jump to: Editorial, Research

24 pages, 11183 KiB

Open AccessReview

Deep Learning in the Phase Extraction of Electronic Speckle Pattern Interferometry

by Wenbo Jiang, Tong Ren and Qianhua Fu

Electronics 2024, 13(2), 418; https://doi.org/10.3390/electronics13020418 - 19 Jan 2024

Cited by 18 | Viewed by 3194

Abstract

Electronic speckle pattern interferometry (ESPI) is widely used in fields such as materials science, biomedical research, surface morphology analysis, and optical component inspection because of its high measurement accuracy, broad frequency range, and ease of measurement. Phase extraction is a critical stage in [...] Read more.

Electronic speckle pattern interferometry (ESPI) is widely used in fields such as materials science, biomedical research, surface morphology analysis, and optical component inspection because of its high measurement accuracy, broad frequency range, and ease of measurement. Phase extraction is a critical stage in ESPI. However, conventional phase extraction methods exhibit problems such as low accuracy, slow processing speed, and poor generalization. With the continuous development of deep learning in image processing, the application of deep learning in phase extraction from electronic speckle interferometry images has become a critical topic of research. This paper reviews the principles and characteristics of ESPI and comprehensively analyzes the phase extraction processes for fringe patterns and wrapped phase maps. The application, advantages, and limitations of deep learning techniques in filtering, fringe skeleton line extraction, and phase unwrapping algorithms are discussed based on the representation of measurement results. Finally, this paper provides a perspective on future trends, such as the construction of physical models for electronic speckle interferometry, improvement and optimization of deep learning models, and quantitative evaluation of phase extraction quality, in this field. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Deep Learning in Image Processing and Pattern Recognition

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (36 papers)

Editorial

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI