Electronics

Research

Jump to: Review

21 pages, 1127 KiB

Open AccessArticle

Efficient Compression of Red Blood Cell Image Dataset Using Joint Deep Learning-Based Pattern Classification and Data Compression

by Zerin Nusrat, Md Firoz Mahmud and W. David Pan

Electronics 2025, 14(8), 1556; https://doi.org/10.3390/electronics14081556 - 11 Apr 2025

Viewed by 530

Abstract

Millions of people across the globe are affected by the life-threatening disease of Malaria. To achieve the remote screening and diagnosis of the disease, the rapid transmission of large-size microscopic images is necessary, thereby demanding efficient data compression techniques. In this paper, we [...] Read more.

Millions of people across the globe are affected by the life-threatening disease of Malaria. To achieve the remote screening and diagnosis of the disease, the rapid transmission of large-size microscopic images is necessary, thereby demanding efficient data compression techniques. In this paper, we argued that well-classified images might lead to higher overall compression of the images in the datasets. To this end, we investigated the novel approach of joint pattern classification and compression of microscopic red blood cell images. Specifically, we used deep learning models, including a vision transformer and convolutional autoencoders, to classify red blood cell images into normal and Malaria-infected patterns, prior to applying compression on the images classified into different patterns separately. We evaluated the impacts of varying classification accuracy on overall image compression efficiency. The results highlight the importance of the accurate classification of images in improving overall compression performance. We demonstrated that the proposed deep learning-based joint classification/compression method offered superior performance compared with traditional lossy compression approaches such as JPEG and JPEG 2000. Our study provides useful insights into how deep learning-based pattern classification could benefit data compression, which would be advantageous in telemedicine, where large-image-size reduction and high decoded image quality are desired. Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Figure 1

29 pages, 14024 KiB

Open AccessArticle

Side-Scan Sonar Image Classification Based on Joint Image Deblurring–Denoising and Pre-Trained Feature Fusion Attention Network

by Baolin Xie, Hongmei Zhang and Weihan Wang

Electronics 2025, 14(7), 1287; https://doi.org/10.3390/electronics14071287 - 25 Mar 2025

Viewed by 614

Abstract

Side-Scan Sonar (SSS) is widely used in underwater rescue operations and the detection of seabed targets, such as shipwrecks, drowning victims, and aircraft. However, the quality of sonar images is often degraded by noise sources like reverberation and speckle noise, which complicate the [...] Read more.

Side-Scan Sonar (SSS) is widely used in underwater rescue operations and the detection of seabed targets, such as shipwrecks, drowning victims, and aircraft. However, the quality of sonar images is often degraded by noise sources like reverberation and speckle noise, which complicate the extraction of effective features. Additionally, challenges such as limited sample sizes and class imbalances are prevalent in side-scan sonar image data. These issues directly impact the accuracy of deep learning-based target classification models for SSS images. To address these challenges, we propose a side-scan sonar image classification model based on joint image deblurring–denoising and a pre-trained feature fusion attention network. Firstly, by employing transform domain filtering in conjunction with upsampling and downsampling techniques, the joint image deblurring–denoising approach effectively reduces image noise while preserving and enhancing edge and texture features. Secondly, a feature fusion attention network based on transfer learning is employed for image classification. Through the transfer learning approach, a feature extractor based on depthwise separable convolutions and densely connected networks is trained to effectively address the challenge of limited training samples. Subsequently, a dual-path feature fusion strategy is utilized to leverage the complementary strengths of different feature extraction networks. Furthermore, by incorporating channel attention and spatial attention mechanisms, key feature channels and regions are adaptively emphasized, thereby enhancing the accuracy and robustness of image classification. Finally, the Gradient-weighted Class Activation Mapping (Grad-CAM) technique is integrated into the proposed model to ensure interpretability and transparency. Experimental results show that our model achieves a classification accuracy of 96.80% on a side-scan sonar image dataset, confirming the effectiveness of this method for SSS image classification. Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Figure 1

16 pages, 7636 KiB

Open AccessArticle

YOLOv5s-Based Lightweight Object Recognition with Deep and Shallow Feature Fusion

by Guili Wang, Chang Liu, Lin Xu, Liguo Qu, Hangyu Zhang, Longlong Tian, Chenhao Li, Liangwang Sun and Minyu Zhou

Electronics 2025, 14(5), 971; https://doi.org/10.3390/electronics14050971 - 28 Feb 2025

Cited by 1 | Viewed by 802

Abstract

In object detection, targets in adverse and complex scenes often have limited information and pose challenges for feature extraction. To address this, we designed a lightweight feature extraction network based on the Convolutional Block Attention Module (CBAM) and multi-scale information fusion. Within the [...] Read more.

In object detection, targets in adverse and complex scenes often have limited information and pose challenges for feature extraction. To address this, we designed a lightweight feature extraction network based on the Convolutional Block Attention Module (CBAM) and multi-scale information fusion. Within the YOLOv5s backbone, we construct deep feature maps, integrate CBAM, and fuse high-resolution shallow features with deep features. We also add new output heads with distinct feature extraction structures for classification and localization, significantly enhancing detection performance, especially under strong light, nighttime, and rainy conditions. Experimental results show superior detection performance in complex scenes, particularly for pedestrian crossing detection in adverse weather and low-light conditions. Using an open-source dataset from Shanghai Jiao Tong University, our algorithm improves pedestrian crossing-detection precision (AP0.5:0.95) by 5.9%, reaching 82.3%, while maintaining a detection speed of 44.8 FPS, meeting real-time detection requirements. The source code is available at GitHub. Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Figure 1

15 pages, 4959 KiB

Open AccessArticle

Image–Text Person Re-Identification with Transformer-Based Modal Fusion

by Xin Li, Hubo Guo, Meiling Zhang and Bo Fu

Electronics 2025, 14(3), 525; https://doi.org/10.3390/electronics14030525 - 28 Jan 2025

Viewed by 1778

Abstract

Existing person re-identification methods utilizing CLIP (Contrastive Language-Image Pre-training) mostly suffer from coarse-grained alignment issues. This is primarily due to the original design intention of the CLIP model, which aims at broad and global alignment between images and texts to support a wide [...] Read more.

Existing person re-identification methods utilizing CLIP (Contrastive Language-Image Pre-training) mostly suffer from coarse-grained alignment issues. This is primarily due to the original design intention of the CLIP model, which aims at broad and global alignment between images and texts to support a wide range of image–text matching tasks. However, in the specific domain of person re-identification, local features and fine-grained information are equally important in addition to global features. This paper proposes an innovative modal fusion approach, aiming to precisely locate the most prominent pedestrian information in images by combining visual features extracted by the ResNet-50 model with text representations generated by a text encoder. This method leverages the cross-attention mechanism of the Transformer Decoder to enable text features to dynamically guide visual features, enhancing the ability to identify and locate the target pedestrian. Experiments conducted on four public datasets, namely MSMT17, Market1501, DukeMTMC, and Occluded-Duke, demonstrate that our method outperforms the baseline network by 5.4%, 2.7%, 2.6%, and 9.2% in mAP, and by 4.3%, 1.7%, 2.7%, and 11.8% in Rank-1, respectively. This method exhibits excellent performance and provides new research insights for the task of person re-identification. Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Figure 1

15 pages, 6308 KiB

Open AccessArticle

Physics-Driven Image Dehazing from the Perspective of Unmanned Aerial Vehicles

by Tong Cui, Qingyue Dai, Meng Zhang, Kairu Li, Xiaofei Ji, Jiawei Hao and Jie Yang

Electronics 2024, 13(21), 4186; https://doi.org/10.3390/electronics13214186 - 25 Oct 2024

Viewed by 1246

Abstract

Drone vision is widely used in change detection, disaster response, and military reconnaissance due to its wide field of view and flexibility. However, under haze and thin cloud conditions, image quality is usually degraded due to atmospheric scattering. This results in issues like [...] Read more.

Drone vision is widely used in change detection, disaster response, and military reconnaissance due to its wide field of view and flexibility. However, under haze and thin cloud conditions, image quality is usually degraded due to atmospheric scattering. This results in issues like color distortion, reduced contrast, and lower clarity, which negatively impact the performance of subsequent advanced visual tasks. To improve the quality of unmanned aerial vehicle (UAV) images, we propose a dehazing method based on calibration of the atmospheric scattering model. We designed two specialized neural network structures to estimate the two unknown parameters in the atmospheric scattering model: the atmospheric light intensity A and medium transmission t. However, calculation errors always occur in both processes for estimating the two unknown parameters. The error accumulation for atmospheric light and medium transmission will cause the deviation in color fidelity and brightness. Therefore, we designed an encoder-decoder structure for irradiance guidance, which not only eliminates error accumulation but also enhances the detail in the restored image, achieving higher-quality dehazing results. Quantitative and qualitative evaluations indicate that our dehazing method outperforms existing techniques, effectively eliminating haze from drone images and significantly enhancing image clarity and quality in hazy conditions. Specifically, the compared experiment on the R100 dataset demonstrates that the proposed method improved the peak signal-to-noise ratio (PSNR) and structure similarity index measure (SSIM) metrics by 6.9 dB and 0.08 over the second-best method, respectively. On the N100 dataset, the method improved the PSNR and SSIM metrics by 8.7 dB and 0.05 over the second-best method, respectively. Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Graphical abstract

16 pages, 8351 KiB

Open AccessArticle

SCL-Dehaze: Toward Real-World Image Dehazing via Semi-Supervised Codebook Learning

by Tong Cui, Qingyue Dai, Meng Zhang, Kairu Li and Xiaofei Ji

Electronics 2024, 13(19), 3826; https://doi.org/10.3390/electronics13193826 - 27 Sep 2024

Cited by 1 | Viewed by 1616

Abstract

Existing dehazing methods deal with real-world haze images with difficulty, especially scenes with thick haze. One of the main reasons is lacking real-world pair data and robust priors. To improve dehazing ability in real-world scenes, we propose a semi-supervised codebook learning dehazing method. [...] Read more.

Existing dehazing methods deal with real-world haze images with difficulty, especially scenes with thick haze. One of the main reasons is lacking real-world pair data and robust priors. To improve dehazing ability in real-world scenes, we propose a semi-supervised codebook learning dehazing method. The codebook is used as a strong prior to guide the hazy image recovery process. However, the following two issues arise when the codebook is applied to the image dehazing task: (1) Latent space features obtained from the coding of degraded hazy images suffer from matching errors when nearest-neighbour matching is performed. (2) Maintaining a good balance of image recovery quality and fidelity for heavily degraded dense hazy images is difficult. To reduce the nearest-neighbor matching error rate in the vector quantization stage of VQGAN, we designed the unit dual-attention residual transformer module (UDART) to correct the latent space features. The UDART can make the latent features obtained from the encoding stage closer to those of the corresponding clear image. To balance the quality and fidelity of the dehazing result, we design a haze density guided weight adaptive module (HDGWA), which can adaptively adjust the multi-scale skip connection weights according to haze density. In addition, we use mean teacher, a semi-supervised learning strategy, to bridge the domain gap between synthetic and real-world data and enhance the model generalization in real-world scenes. Comparative experiments show that our method achieves improvements of 0.003, 2.646, and 0.019 over the second-best method for the no-reference metrics FADE, MUSIQ, and DBCNN, respectively, on the real-world dataset URHI. Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Figure 1

19 pages, 8955 KiB

Open AccessArticle

Underwater Robot Target Detection Algorithm Based on YOLOv8

by Guangwu Song, Wei Chen, Qilong Zhou and Chenkai Guo

Electronics 2024, 13(17), 3374; https://doi.org/10.3390/electronics13173374 - 25 Aug 2024

Cited by 12 | Viewed by 2185

Abstract

Although the ocean is rich in energy and covers a vast portion of the planet, the present results of underwater target identification are not sufficient because of the complexity of the underwater environment. An enhanced technique based on YOLOv8 is proposed to solve [...] Read more.

Although the ocean is rich in energy and covers a vast portion of the planet, the present results of underwater target identification are not sufficient because of the complexity of the underwater environment. An enhanced technique based on YOLOv8 is proposed to solve the problems of low identification accuracy and low picture quality in the target detection of current underwater robots. Firstly, considering the issue of model parameters, only the convolution of the ninth layer is modified, and the deformable convolution is designed to be adaptive. Certain parts of the original convolution are replaced with DCN v3, in order to address the issue of the deformation of underwater photos with fewer parameters and more effectively capture the deformation and fine details of underwater objects. Second, the ability to recognize multi-scale targets is improved by employing SPPFCSPC, and the ability to express features is improved by combining high-level semantic features with low-level shallow features. Lastly, using WIoU loss v3 instead of the CIoU loss function improves the overall performance of the model. The enhanced algorithm mAP achieves 86.5%, an increase of 2.1% over the YOLOv8s model, according to the results of the testing of the underwater robot grasping. This meets the real-time detection needs of underwater robots and significantly enhances the performance of the object detection model. Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Figure 1

15 pages, 17295 KiB

Open AccessArticle

Progressive Discriminative Feature Learning for Visible-Infrared Person Re-Identification

by Feng Zhou, Zhuxuan Cheng, Haitao Yang, Yifeng Song and Shengpeng Fu

Electronics 2024, 13(14), 2825; https://doi.org/10.3390/electronics13142825 - 18 Jul 2024

Cited by 1 | Viewed by 1303

Abstract

The visible-infrared person re-identification (VI-ReID) task aims to retrieve the same pedestrian between visible and infrared images. VI-ReID is a challenging task due to the huge modality discrepancy and complex intra-modality variations. Existing works mainly complete the modality alignment at one stage. However, [...] Read more.

The visible-infrared person re-identification (VI-ReID) task aims to retrieve the same pedestrian between visible and infrared images. VI-ReID is a challenging task due to the huge modality discrepancy and complex intra-modality variations. Existing works mainly complete the modality alignment at one stage. However, aligning modalities at different stages has positive effects on the intra-class and inter-class distances of cross-modality features, which are often ignored. Moreover, discriminative features with identity information may be corrupted in the processing of modality alignment, further degrading the performance of person re-identification. In this paper, we propose a progressive discriminative feature learning (PDFL) network that adopts different alignment strategies at different stages to alleviate the discrepancy and learn discriminative features progressively. Specifically, we first design an adaptive cross fusion module (ACFM) to learn the identity-relevant features via modality alignment with channel-level attention. For well preserving identity information, we propose a dual-attention-guided instance normalization module (DINM), which can well guide instance normalization to align two modalities into a unified feature space through channel and spatial information embedding. Finally, we generate multiple part features of a person to mine subtle differences. Multi-loss optimization is imposed during the training process for more effective learning supervision. Extensive experiments on the public datasets of SYSU-MM01 and RegDB validate that our proposed method performs favorably against most state-of-the-art methods. Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Figure 1

28 pages, 67696 KiB

Open AccessArticle

PerNet: Progressive and Efficient All-in-One Image-Restoration Lightweight Network

by Wentao Li, Guang Zhou, Sen Lin and Yandong Tang

Electronics 2024, 13(14), 2817; https://doi.org/10.3390/electronics13142817 - 17 Jul 2024

Cited by 1 | Viewed by 1729

Abstract

The existing image-restoration methods are only effective for specific degradation tasks, but the type of image degradation in practical applications is unknown, and mismatch between the model and the actual degradation will lead to performance decline. Attention mechanisms play an important role in [...] Read more.

The existing image-restoration methods are only effective for specific degradation tasks, but the type of image degradation in practical applications is unknown, and mismatch between the model and the actual degradation will lead to performance decline. Attention mechanisms play an important role in image-restoration tasks; however, it is difficult for existing attention mechanisms to effectively utilize the continuous correlation information of image noise. In order to solve these problems, we propose a Progressive and Efficient All-in-one Image Restoration Lightweight Network (PerNet). The network consists of a Plug-and-Play Efficient Local Attention Module (PPELAM). The PPELAM is composed of multiple Efficient Local Attention Units (ELAUs) and PPELAM can effectively use the global information and horizontal and vertical correlation of image degradation features in space, so as to reduce information loss and have a small number of parameters. PerNet is able to learn the degradation properties of images very well, which allows us to reach an advanced level in image-restoration tasks. Experiments show that PerNet has excellent results for typical restoration tasks (image deraining, image dehazing, image desnowing and underwater image enhancement), and the excellent performance of ELAU combined with Transformer in the ablation experiment chapter further proves the high efficiency of ELAU. Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Figure 1

15 pages, 5604 KiB

Open AccessArticle

Real-Time Deep Learning Framework for Accurate Speed Estimation of Surrounding Vehicles in Autonomous Driving

by Iván García-Aguilar, Jorge García-González, Enrique Domínguez, Ezequiel López-Rubio and Rafael M. Luque-Baena

Electronics 2024, 13(14), 2790; https://doi.org/10.3390/electronics13142790 - 16 Jul 2024

Cited by 1 | Viewed by 2125

Abstract

Accurate speed estimation of surrounding vehicles is of paramount importance for autonomous driving to prevent potential hazards. This paper emphasizes the critical role of precise speed estimation and presents a novel real-time framework based on deep learning to achieve this from images captured [...] Read more.

Accurate speed estimation of surrounding vehicles is of paramount importance for autonomous driving to prevent potential hazards. This paper emphasizes the critical role of precise speed estimation and presents a novel real-time framework based on deep learning to achieve this from images captured by an onboard camera. The system detects and tracks vehicles using convolutional neural networks and analyzes their trajectories with a tracking algorithm. Vehicle speeds are then accurately estimated using a regression model based on random sample consensus. A synthetic dataset using the CARLA simulator has been generated to validate the presented methodology. The system can simultaneously estimate the speed of multiple vehicles and can be easily integrated into onboard computer systems, providing a cost-effective solution for real-time speed estimation. This technology holds significant potential for enhancing vehicle safety systems, driver assistance, and autonomous driving. Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Figure 1

21 pages, 11115 KiB

Open AccessArticle

HA-Net: A Hybrid Algorithm Model for Underwater Image Color Restoration and Texture Enhancement

by Jin Qian, Hui Li and Bin Zhang

Electronics 2024, 13(13), 2623; https://doi.org/10.3390/electronics13132623 - 4 Jul 2024

Cited by 2 | Viewed by 1280

Abstract

Due to the extremely irregular nonlinear degradation of images obtained in real underwater environments, it is difficult for existing underwater image enhancement methods to stably restore degraded underwater images, thus making it challenging to improve the efficiency of marine work. We propose a [...] Read more.

Due to the extremely irregular nonlinear degradation of images obtained in real underwater environments, it is difficult for existing underwater image enhancement methods to stably restore degraded underwater images, thus making it challenging to improve the efficiency of marine work. We propose a hybrid algorithm model for underwater image color restoration and texture enhancement, termed HA-Net. First, we introduce a dynamic color correction algorithm based on depth estimation to restore degraded images and mitigate color attenuation in underwater images by calculating the depth of targets and backgrounds. Then, we propose a multi-scale U-Net to enhance the network’s feature extraction capability and introduce a parallel attention module to capture image spatial information, thereby improving the model’s accuracy in recognizing deep semantics such as fine texture. Finally, we propose a global information compensation algorithm to enhance the output image’s integrity and boost the network’s learning ability. Experimental results on synthetic standard data sets and real data demonstrate that our method produces images with clear texture and bright colors, outperforming other algorithms in both subjective and objective evaluations, making it more suitable for real marine environments. Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Figure 1

15 pages, 8754 KiB

Open AccessArticle

EFE-CNA Net: An Approach for Effective Image Deblurring Using an Edge-Sensitive Focusing Encoder

by Fengbo Zheng, Xiu Zhang, Lifen Jiang and Gongbo Liang

Electronics 2024, 13(13), 2493; https://doi.org/10.3390/electronics13132493 - 26 Jun 2024

Viewed by 1859

Abstract

Deep learning-based image deblurring techniques have made great advancements, improving both processing speed and deblurring efficacy. However, existing methods still face challenges when dealing with complex blur types and the semantic understanding of images. The segment anything model (SAM), a versatile deep learning [...] Read more.

Deep learning-based image deblurring techniques have made great advancements, improving both processing speed and deblurring efficacy. However, existing methods still face challenges when dealing with complex blur types and the semantic understanding of images. The segment anything model (SAM), a versatile deep learning model that accurately and efficiently segments objects in images, facilitates various tasks in computer vision. This article leverages SAM’s proficiency in capturing object edges and enhancing image content comprehension to improve image deblurring. We introduce the edge-sensitive focusing encoder (EFE) module, which utilizes masks generated by the SAM framework and re-weights the masked portion following SAM segmentation by detecting its features and high-frequency information. The EFE module uses the masks to locate the position of the blur in an image while identifying the intensity of the blur, allowing the model to focus more accurately on specific features. Masks with greater high-frequency information are assigned higher weights, prompting the network to prioritize them during processing. Based on the EFE module, we develop a deblurring network called the edge-sensitive focusing encoder-based convolution–normalization and attention network (EFE-CNA Net), which utilizes the EFE module to enhance the deblurring process, employs an image-mask decoder to merge features from both the image and the mask from the EFE module, and incorporates the CNA Net as its base network. This design enables the model to focus on distinct features at various locations, enhancing its learning process through the guidance provided by the EFE module and the blurred images. Testing results on the RealBlur and REDS datasets demonstrate the effectiveness of the EFE-CNA Net, achieving peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) metrics of 28.77, 0.902 (RealBlur-J), 36.40, 0.956 (RealBlur-R), 31.45, and 0.919 (REDS). Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Figure 1

15 pages, 32016 KiB

Open AccessArticle

A Multiscale Parallel Pedestrian Recognition Algorithm Based on YOLOv5

by Qi Song, ZongHe Zhou, ShuDe Ji, Tong Cui, BuDan Yao and ZeQi Liu

Electronics 2024, 13(10), 1989; https://doi.org/10.3390/electronics13101989 - 20 May 2024

Cited by 1 | Viewed by 1292

Abstract

Mainstream pedestrian recognition algorithms have problems such as low accuracy and insufficient real-time performance. In this study, we developed an improved pedestrian recognition algorithm named YOLO-MSP (multiscale parallel) based on residual network ideas, and we improved the network architecture based on YOLOv5s. Three [...] Read more.

Mainstream pedestrian recognition algorithms have problems such as low accuracy and insufficient real-time performance. In this study, we developed an improved pedestrian recognition algorithm named YOLO-MSP (multiscale parallel) based on residual network ideas, and we improved the network architecture based on YOLOv5s. Three pooling layers were used in parallel in the MSP module to output multiscale features and improve the accuracy of the model while ensuring real-time performance. The Swin Transformer module was also introduced into the network, which improved the efficiency of the model in image processing by avoiding global calculations. The CBAM (Convolutional Block Attention Module) attention mechanism was added to the C3 module, and this new module was named the CBAMC3 module, which improved model efficiency while ensuring the model was lightweight. The WMD-IOU (weighted multidimensional IOU) loss function proposed in this study used the shape change between the recognition frame and the real frame as a parameter to calculate the loss of the recognition frame shape, which could guide the model to better learn the shape and size of the target and optimize recognition performance. Comparative experiments using the INRIA public data set showed that the proposed YOLO-MSP algorithm outperformed state-of-the-art pedestrian recognition methods in accuracy and speed. Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Figure 1

15 pages, 606 KiB

Open AccessArticle

Towards Super Compressed Neural Networks for Object Identification: Quantized Low-Rank Tensor Decomposition with Self-Attention

by Baichen Liu, Dongwei Wang, Qi Lv, Zhi Han and Yandong Tang

Electronics 2024, 13(7), 1330; https://doi.org/10.3390/electronics13071330 - 2 Apr 2024

Cited by 2 | Viewed by 1916

Abstract

Deep convolutional neural networks have a large number of parameters and require a significant number of floating-point operations during computation, which limits their deployment in situations where the storage space is limited and computational resources are insufficient, such as in mobile phones and [...] Read more.

Deep convolutional neural networks have a large number of parameters and require a significant number of floating-point operations during computation, which limits their deployment in situations where the storage space is limited and computational resources are insufficient, such as in mobile phones and small robots. Many network compression methods have been proposed to address the aforementioned issues, including pruning, low-rank decomposition, quantization, etc. However, these methods typically fail to achieve a significant compression ratio in terms of the parameter count. Even when high compression rates are achieved, the network’s performance is often significantly deteriorated, making it difficult to perform tasks effectively. In this study, we propose a more compact representation for neural networks, named Quantized Low-Rank Tensor Decomposition (QLTD), to super compress deep convolutional neural networks. Firstly, we employed low-rank Tucker decomposition to compress the pre-trained weights. Subsequently, to further exploit redundancies within the core tensor and factor matrices obtained through Tucker decomposition, we employed vector quantization to partition and cluster the weights. Simultaneously, we introduced a self-attention module for each core tensor and factor matrix to enhance the training responsiveness in critical regions. The object identification results in the CIFAR10 experiment showed that QLTD achieved a compression ratio of 35.43×, with less than 1% loss in accuracy and a compression ratio of 90.61×, with less than a 2% loss in accuracy. QLTD was able to achieve a significant compression ratio in terms of the parameter count and realize a good balance between compressing parameters and maintaining identification accuracy. Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Figure 1

17 pages, 6098 KiB

Open AccessArticle

MIX-Net: Hybrid Attention/Diversity Network for Person Re-Identification

by Minglang Li, Zhiyong Tao, Sen Lin and Kaihao Feng

Electronics 2024, 13(5), 1001; https://doi.org/10.3390/electronics13051001 - 6 Mar 2024

Cited by 2 | Viewed by 1844

Abstract

Person re-identification (Re-ID) networks are often affected by factors such as pose variations, changes in viewpoint, and occlusion, leading to the extraction of features that encompass a considerable amount of irrelevant information. However, most research has struggled to address the challenge of simultaneously [...] Read more.

Person re-identification (Re-ID) networks are often affected by factors such as pose variations, changes in viewpoint, and occlusion, leading to the extraction of features that encompass a considerable amount of irrelevant information. However, most research has struggled to address the challenge of simultaneously endowing features with both attentive and diversified information. To concurrently extract attentive yet diverse pedestrian features, we amalgamated the strengths of convolutional neural network (CNN) attention and self-attention. By integrating the extracted latent features, we introduced a Hybrid Attention/Diversity Network (MIX-Net), which adeptly captures attentive but diverse information from personal images via a fusion of attention branches and attention suppression branches. Additionally, to extract latent information from secondary important regions to enrich the diversity of features, we designed a novel Discriminative Part Mask (DPM). Experimental results establish the robust competitiveness of our approach, particularly in effectively distinguishing individuals with similar attributes. Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Figure 1

Review

Jump to: Research

23 pages, 3374 KiB

Open AccessReview

A Review: Remote Sensing Image Object Detection Algorithm Based on Deep Learning

by Chenshuai Bai, Xiaofeng Bai and Kaijun Wu

Electronics 2023, 12(24), 4902; https://doi.org/10.3390/electronics12244902 - 6 Dec 2023

Cited by 14 | Viewed by 5752

Abstract

Target detection in optical remote sensing images using deep-learning technologies has a wide range of applications in urban building detection, road extraction, crop monitoring, and forest fire monitoring, which provides strong support for environmental monitoring, urban planning, and agricultural management. This paper reviews [...] Read more.

Target detection in optical remote sensing images using deep-learning technologies has a wide range of applications in urban building detection, road extraction, crop monitoring, and forest fire monitoring, which provides strong support for environmental monitoring, urban planning, and agricultural management. This paper reviews the research progress of the YOLO series, SSD series, candidate region series, and Transformer algorithm. It summarizes the object detection algorithms based on standard improvement methods such as supervision, attention mechanism, and multi-scale. The performance of different algorithms is also compared and analyzed with the common remote sensing image data sets. Finally, future research challenges, improvement directions, and issues of concern are prospected, which provides valuable ideas for subsequent related research. Full article

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Deep Learning-Based Image Restoration and Object Identification

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (16 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI