entropy-logo

Journal Browser

Journal Browser

Application of Information Theory to Computer Vision and Image Processing

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (3 July 2023) | Viewed by 23689

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor
Facultad de Ingeniería, Universidad Autónoma de Baja California, Mexicali 21376, Mexico
Interests: fourth industrial revolution; artificial intelligence; cybersystems
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Applied Physics, Autonomous University of Baja California, Mexicali 21100, Mexico
Interests: automated metrology; 3D coordinates measurement; robotic navigation; machine vision; simulation of the robotic swarms behaviour
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Engineering Faculty, Universidad Autónoma de Baja California, Mexicali 21100, Mexico
Interests: machine vision; stereo vision; systems laser; scanner control; digital image processing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer Systems, Tecnológico Nacional de México, IT de Mexicali, Mexicali 21376, Mexico
Interests: machine vision; stereo vision; systems laser; scanner control; analogic and digital processing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

World perception is the product of complex optical and physical processes of the human visual system, allowing light stimuli to penetrate through the pupils to reach the retina composed of photoreceptors that transform light into electrochemical energy, which can then be transmitted to the brain to organize, interpret, and analyze the information received and create a perceived reality. In a similar optical and physical process, machine vision is the eyes of cybernetic systems for joining the virtual and real world to coexist in human lives, looking to integrate this technology into our daily lives with creativity and globalization view through interconnectivity. This is possible due to the advanced technologies of sensors and systems to acquire and compute information. Such tasks are based on the integration of optoelectronics devices for sensors and cameras. Sensors, artificial intelligence algorithms, embedded systems, robust control, inertial navigation systems, robotics, interconnectivity, Big Data, information interchange within robotic swarms, and cloud computing form the basis of machine vision developments for cyber-physical systems to collaborate with humans and their real and virtual environments and activities.

This Special Issue aims to publish information theory, measurement methods, data processing, tools, and techniques for the design and instrumentation used in machine vision systems via the application of computer vision and image processing.

Dr. Wendy Flores-Fuentes
Dr. Oleg Sergiyenko
Dr. Julio Cesar Rodriguez-Quinonez
Dr. Jesús Elías Miranda-Vega
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine vision
  • cyber-physical systems
  • navigation
  • 3D spatial coordinates
  • information theory applications
  • data interchange
  • instrumentation
  • measurements
  • artificial intelligence
  • signal and image processing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

9 pages, 225 KiB  
Editorial
Application of Information Theory to Computer Vision and Image Processing
by Wendy Flores-Fuentes, Oleg Sergiyenko, Julio C. Rodríguez-Quiñonez and Jesús E. Miranda-Vega
Entropy 2024, 26(2), 114; https://doi.org/10.3390/e26020114 - 26 Jan 2024
Cited by 1 | Viewed by 1613
Abstract
Our perception of the world is the product of the human visual system’s complex optical and physical process [...] Full article

Research

Jump to: Editorial

21 pages, 15179 KiB  
Article
Shannon Entropy Used for Feature Extractions of Optical Patterns in the Context of Structural Health Monitoring
by Wendy Garcia-González, Wendy Flores-Fuentes, Oleg Sergiyenko, Julio C. Rodríguez-Quiñonez, Jesús E. Miranda-Vega and Daniel Hernández-Balbuena
Entropy 2023, 25(8), 1207; https://doi.org/10.3390/e25081207 - 14 Aug 2023
Cited by 1 | Viewed by 1668
Abstract
A novelty signal processing method is proposed for a technical vision system (TVS). During data acquisition of an optoelectrical signal, part of this is random electrical fluctuation of voltages. Information theory (IT) is a well-known field that deals with random processes. A method [...] Read more.
A novelty signal processing method is proposed for a technical vision system (TVS). During data acquisition of an optoelectrical signal, part of this is random electrical fluctuation of voltages. Information theory (IT) is a well-known field that deals with random processes. A method based on using of the Shannon Entropy for feature extractions of optical patterns is presented. IT is implemented in structural health monitoring (SHM) to augment the accuracy of optoelectronic signal classifiers for a metrology subsystem of the TVS. To enhance the TVS spatial coordinate measurement performance at real operation conditions with electrical and optical noisy environments to estimate structural displacement better and evaluate its health for a better estimation of structural displacement and the evaluation of its health. Five different machine learning (ML) techniques are used in this work to classify optical patterns captured with the TVS. Linear predictive coding (LPC) and Autocorrelation function (ACC) are for extraction of optical patterns. The Shannon entropy segmentation (SH) method extracts relevant information from optical patterns, and the model’s performance can be improved. The results reveal that segmentation with Shannon’s entropy can achieve over 95.33%. Without Shannon’s entropy, the worst accuracy was 33.33%. Full article
Show Figures

Figure 1

14 pages, 6217 KiB  
Article
A Low-Illumination Enhancement Method Based on Structural Layer and Detail Layer
by Wei Ge, Le Zhang, Weida Zhan, Jiale Wang, Depeng Zhu and Yang Hong
Entropy 2023, 25(8), 1201; https://doi.org/10.3390/e25081201 - 12 Aug 2023
Viewed by 985
Abstract
Low-illumination image enhancement technology is a topic of interest in the field of image processing. However, while improving image brightness, it is difficult to effectively maintain the texture and details of the image, and the quality of the image cannot be guaranteed. In [...] Read more.
Low-illumination image enhancement technology is a topic of interest in the field of image processing. However, while improving image brightness, it is difficult to effectively maintain the texture and details of the image, and the quality of the image cannot be guaranteed. In order to solve this problem, this paper proposed a low-illumination enhancement method based on structural and detail layers. Firstly, we designed an SRetinex-Net model. The network is mainly divided into two parts: a decomposition module and an enhancement module. Second, the decomposition module mainly adopts the SU-Net structure, which is an unsupervised network that decomposes the input image into a structural layer image and detail layer image. Afterward, the enhancement module mainly adopts the SDE-Net structure, which is divided into two branches: the SDE-S branch and the SDE-D branch. The SDE-S branch mainly enhances and adjusts the brightness of the structural layer image through Ehnet and Adnet to prevent insufficient or overexposed brightness enhancement in the image. The SDE-D branch is mainly denoised and enhanced with textural details through a denoising module. This network structure can greatly reduce computational costs. Moreover, we also improved the total variation optimization model as a mixed loss function and added structural metrics and textural metrics as variables on the basis of the original loss function, which can well separate the structure edge and texture edge. Numerous experiments have shown that our structure has a more significant impact on the brightness and detail preservation of image restoration. Full article
Show Figures

Figure 1

30 pages, 8299 KiB  
Article
Deep Learning in Precision Agriculture: Artificially Generated VNIR Images Segmentation for Early Postharvest Decay Prediction in Apples
by Nikita Stasenko, Islomjon Shukhratov, Maxim Savinov, Dmitrii Shadrin and Andrey Somov
Entropy 2023, 25(7), 987; https://doi.org/10.3390/e25070987 - 28 Jun 2023
Cited by 5 | Viewed by 2772
Abstract
Food quality control is an important task in the agricultural domain at the postharvest stage for avoiding food losses. The latest achievements in image processing with deep learning (DL) and computer vision (CV) approaches provide a number of effective tools based on the [...] Read more.
Food quality control is an important task in the agricultural domain at the postharvest stage for avoiding food losses. The latest achievements in image processing with deep learning (DL) and computer vision (CV) approaches provide a number of effective tools based on the image colorization and image-to-image translation for plant quality control at the postharvest stage. In this article, we propose the approach based on Generative Adversarial Network (GAN) and Convolutional Neural Network (CNN) techniques to use synthesized and segmented VNIR imaging data for early postharvest decay and fungal zone predictions as well as the quality assessment of stored apples. The Pix2PixHD model achieved higher results in terms of VNIR images translation from RGB (SSIM = 0.972). Mask R-CNN model was selected as a CNN technique for VNIR images segmentation and achieved 58.861 for postharvest decay zones, 40.968 for fungal zones and 94.800 for both the decayed and fungal zones detection and prediction in stored apples in terms of F1-score metric. In order to verify the effectiveness of this approach, a unique paired dataset containing 1305 RGB and VNIR images of apples of four varieties was obtained. It is further utilized for a GAN model selection. Additionally, we acquired 1029 VNIR images of apples for training and testing a CNN model. We conducted validation on an embedded system equipped with a graphical processing unit. Using Pix2PixHD, 100 VNIR images from RGB images were generated at a rate of 17 frames per second (FPS). Subsequently, these images were segmented using Mask R-CNN at a rate of 0.42 FPS. The achieved results are promising for enhancing the food study and control during the postharvest stage. Full article
Show Figures

Figure 1

18 pages, 5167 KiB  
Article
SCFusion: Infrared and Visible Fusion Based on Salient Compensation
by Haipeng Liu, Meiyan Ma, Meng Wang, Zhaoyu Chen and Yibo Zhao
Entropy 2023, 25(7), 985; https://doi.org/10.3390/e25070985 - 27 Jun 2023
Cited by 4 | Viewed by 1239
Abstract
The aim of infrared and visible image fusion is to integrate the complementary information of the two modalities for high-quality fused images. However, many deep learning fusion algorithms have not considered the characteristics of infrared images in low-light scenes, leading to the problems [...] Read more.
The aim of infrared and visible image fusion is to integrate the complementary information of the two modalities for high-quality fused images. However, many deep learning fusion algorithms have not considered the characteristics of infrared images in low-light scenes, leading to the problems of weak texture details, low contrast of infrared targets and poor visual perception in the existing methods. Therefore, in this paper, we propose a salient compensation-based fusion method that makes sufficient use of the characteristics of infrared and visible images to generate high-quality fused images under low-light conditions. First, we design a multi-scale edge gradient module (MEGB) in the texture mainstream to adequately extract the texture information of the dual input of infrared and visible images; on the other hand, the salient tributary is pre-trained by salient loss to obtain the saliency map based on the salient dense residual module (SRDB) to extract salient features, which is supplemented in the process of overall network training. We propose the spatial bias module (SBM) to fuse global information with local information. Finally, extensive comparison experiments with existing methods show that our method has significant advantages in describing target features and global scenes, the effectiveness of the proposed module is demonstrated by ablation experiments. In addition, we also verify the facilitation of this paper’s method for high-level vision on a semantic segmentation task. Full article
Show Figures

Figure 1

20 pages, 14588 KiB  
Article
Improved Thermal Infrared Image Super-Resolution Reconstruction Method Base on Multimodal Sensor Fusion
by Yichun Jiang, Yunqing Liu, Weida Zhan and Depeng Zhu
Entropy 2023, 25(6), 914; https://doi.org/10.3390/e25060914 - 9 Jun 2023
Cited by 6 | Viewed by 1853
Abstract
When traditional super-resolution reconstruction methods are applied to infrared thermal images, they often ignore the problem of poor image quality caused by the imaging mechanism, which makes it difficult to obtain high-quality reconstruction results even with the training of simulated degraded inverse processes. [...] Read more.
When traditional super-resolution reconstruction methods are applied to infrared thermal images, they often ignore the problem of poor image quality caused by the imaging mechanism, which makes it difficult to obtain high-quality reconstruction results even with the training of simulated degraded inverse processes. To address these issues, we proposed a thermal infrared image super-resolution reconstruction method based on multimodal sensor fusion, aiming to enhance the resolution of thermal infrared images and rely on multimodal sensor information to reconstruct high-frequency details in the images, thereby overcoming the limitations of imaging mechanisms. First, we designed a novel super-resolution reconstruction network, which consisted of primary feature encoding, super-resolution reconstruction, and high-frequency detail fusion subnetwork, to enhance the resolution of thermal infrared images and rely on multimodal sensor information to reconstruct high-frequency details in the images, thereby overcoming limitations of imaging mechanisms. We designed hierarchical dilated distillation modules and a cross-attention transformation module to extract and transmit image features, enhancing the network’s ability to express complex patterns. Then, we proposed a hybrid loss function to guide the network in extracting salient features from thermal infrared images and reference images while maintaining accurate thermal information. Finally, we proposed a learning strategy to ensure the high-quality super-resolution reconstruction performance of the network, even in the absence of reference images. Extensive experimental results show that the proposed method exhibits superior reconstruction image quality compared to other contrastive methods, demonstrating its effectiveness. Full article
Show Figures

Figure 1

17 pages, 3911 KiB  
Article
Structured Cluster Detection from Local Feature Learning for Text Region Extraction
by Huei-Yung Lin and Chin-Yu Hsu
Entropy 2023, 25(4), 658; https://doi.org/10.3390/e25040658 - 14 Apr 2023
Viewed by 1259
Abstract
The detection of regions of interest is commonly considered as an early stage of information extraction from images. It is used to provide the contents meaningful to human perception for machine vision applications. In this work, a new technique for structured region detection [...] Read more.
The detection of regions of interest is commonly considered as an early stage of information extraction from images. It is used to provide the contents meaningful to human perception for machine vision applications. In this work, a new technique for structured region detection based on the distillation of local image features with clustering analysis is proposed. Different from the existing methods, our approach takes the application-specific reference images for feature learning and extraction. It is able to identify text clusters under the sparsity of feature points derived from the characters. For the localization of structured regions, the cluster with high feature density is calculated and serves as a candidate for region expansion. An iterative adjustment is then performed to enlarge the ROI for complete text coverage. The experiments carried out for text region detection of invoice and banknote demonstrate the effectiveness of the proposed technique. Full article
Show Figures

Figure 1

24 pages, 12398 KiB  
Article
Hybrid Multi-Dimensional Attention U-Net for Hyperspectral Snapshot Compressive Imaging Reconstruction
by Siming Zheng, Mingyu Zhu and Mingliang Chen
Entropy 2023, 25(4), 649; https://doi.org/10.3390/e25040649 - 12 Apr 2023
Viewed by 1634
Abstract
In order to capture the spatial-spectral (x,y,λ) information of the scene, various techniques have been proposed. Different from the widely used scanning-based methods, spectral snapshot compressive imaging (SCI) utilizes the idea of compressive sensing to compressively capture [...] Read more.
In order to capture the spatial-spectral (x,y,λ) information of the scene, various techniques have been proposed. Different from the widely used scanning-based methods, spectral snapshot compressive imaging (SCI) utilizes the idea of compressive sensing to compressively capture the 3D spatial-spectral data-cube in a single-shot 2D measurement and thus it is efficient, enjoying the advantages of high-speed and low bandwidth. However, the reconstruction process, i.e., to retrieve the 3D cube from the 2D measurement, is an ill-posed problem and it is challenging to reconstruct high quality images. Previous works usually use 2D convolutions and preliminary attention to address this challenge. However, these networks and attention do not exactly extract spectral features. On the other hand, 3D convolutions can extract more features in a 3D cube, but increase computational cost significantly. To balance this trade-off, in this paper, we propose a hybrid multi-dimensional attention U-Net (HMDAU-Net) to reconstruct hyperspectral images from the 2D measurement in an end-to-end manner. HMDAU-Net integrates 3D and 2D convolutions in an encoder–decoder structure to fully utilize the abundant spectral information of hyperspectral images with a trade-off between performance and computational cost. Furthermore, attention gates are employed to highlight salient features and suppress the noise carried by the skip connections. Our proposed HMDAU-Net achieves superior performance over previous state-of-the-art reconstruction algorithms. Full article
Show Figures

Figure 1

15 pages, 2591 KiB  
Article
Multi-Receptive Field Soft Attention Part Learning for Vehicle Re-Identification
by Xiyu Pang, Yilong Yin and Yanli Zheng
Entropy 2023, 25(4), 594; https://doi.org/10.3390/e25040594 - 31 Mar 2023
Cited by 1 | Viewed by 1340
Abstract
Vehicle re-identification across multiple cameras is one of the main problems of intelligent transportation systems (ITSs). Since the differences in the appearance between different vehicles of the same model are small and the appearance of the same vehicle changes drastically from different viewpoints, [...] Read more.
Vehicle re-identification across multiple cameras is one of the main problems of intelligent transportation systems (ITSs). Since the differences in the appearance between different vehicles of the same model are small and the appearance of the same vehicle changes drastically from different viewpoints, vehicle re-identification is a challenging task. In this paper, we propose a model called multi-receptive field soft attention part learning (MRF-SAPL). The MRF-SAPL model learns semantically diverse vehicle part-level features under different receptive fields through multiple local branches, alleviating the problem of small differences in vehicle appearance. To align vehicle parts from different images, this study uses soft attention to adaptively locate the positions of the parts on the final feature map generated by a local branch and maintain the continuity of the internal semantics of the parts. In addition, to obtain parts with different semantic patterns, we propose a new loss function that punishes overlapping regions, forcing the positions of different parts on the same feature map to not overlap each other as much as possible. Extensive ablation experiments demonstrate the effectiveness of our part-level feature learning method MRF-SAPL, and our model achieves state-of-the-art performance on two benchmark datasets. Full article
Show Figures

Figure 1

15 pages, 41031 KiB  
Article
Coupling Quantum Random Walks with Long- and Short-Term Memory for High Pixel Image Encryption Schemes
by Junqing Liang, Zhaoyang Song, Zhongwei Sun, Mou Lv and Hongyang Ma
Entropy 2023, 25(2), 353; https://doi.org/10.3390/e25020353 - 14 Feb 2023
Cited by 6 | Viewed by 1839
Abstract
This paper proposes an encryption scheme for high pixel density images. Based on the application of the quantum random walk algorithm, the long short-term memory (LSTM) can effectively solve the problem of low efficiency of the quantum random walk algorithm in generating large-scale [...] Read more.
This paper proposes an encryption scheme for high pixel density images. Based on the application of the quantum random walk algorithm, the long short-term memory (LSTM) can effectively solve the problem of low efficiency of the quantum random walk algorithm in generating large-scale pseudorandom matrices, and further improve the statistical properties of the pseudorandom matrices required for encryption. The LSTM is then divided into columns and fed into the LSTM in order for training. Due to the randomness of the input matrix, the LSTM cannot be trained effectively, so the output matrix is predicted to be highly random. The LSTM prediction matrix of the same size as the key matrix is generated based on the pixel density of the image to be encrypted, which can effectively complete the encryption of the image. In the statistical performance test, the proposed encryption scheme achieves an average information entropy of 7.9992, an average number of pixels changed rate (NPCR) of 99.6231%, an average uniform average change intensity (UACI) of 33.6029%, and an average correlation of 0.0032. Finally, various noise simulation tests are also conducted to verify its robustness in real-world applications where common noise and attack interference are encountered. Full article
Show Figures

Figure 1

14 pages, 3667 KiB  
Article
An Infusion Containers Detection Method Based on YOLOv4 with Enhanced Image Feature Fusion
by Lei Ju, Xueyu Zou, Xinjun Zhang, Xifa Xiong, Xuxun Liu and Luoyu Zhou
Entropy 2023, 25(2), 275; https://doi.org/10.3390/e25020275 - 2 Feb 2023
Cited by 1 | Viewed by 1509
Abstract
The detection of infusion containers is highly conducive to reducing the workload of medical staff. However, when applied in complex environments, the current detection solutions cannot satisfy the high demands for clinical requirements. In this paper, we address this problem by proposing a [...] Read more.
The detection of infusion containers is highly conducive to reducing the workload of medical staff. However, when applied in complex environments, the current detection solutions cannot satisfy the high demands for clinical requirements. In this paper, we address this problem by proposing a novel method for the detection of infusion containers that is based on the conventional method, You Only Look Once version 4 (YOLOv4). First, the coordinate attention module is added after the backbone to improve the perception of direction and location information by the network. Then, we build the cross stage partial–spatial pyramid pooling (CSP-SPP) module to replace the spatial pyramid pooling (SPP) module, which allows the input information features to be reused. In addition, the adaptively spatial feature fusion (ASFF) module is added after the original feature fusion module, path aggregation network (PANet), to facilitate the fusion of feature maps at different scales for more complete feature information. Finally, EIoU is used as a loss function to solve the anchor frame aspect ratio problem, and this improvement allows for more stable and accurate information of the anchor aspect when calculating losses. The experimental results demonstrate the advantages of our method in terms of recall, timeliness, and mean average precision (mAP). Full article
Show Figures

Figure 1

15 pages, 2103 KiB  
Article
Image Registration for Visualizing Magnetic Flux Leakage Testing under Different Orientations of Magnetization
by Shengping Li, Jie Zhang, Gaofei Liu, Nanhui Chen, Lulu Tian, Libing Bai and Cong Chen
Entropy 2023, 25(1), 167; https://doi.org/10.3390/e25010167 - 13 Jan 2023
Viewed by 1555
Abstract
The Magnetic Flux Leakage (MFL) visualization technique is widely used in the surface defect inspection of ferromagnetic materials. However, the information of the images detected through the MFL method is incomplete when the defect (especially for the cracks) is complex, and some information [...] Read more.
The Magnetic Flux Leakage (MFL) visualization technique is widely used in the surface defect inspection of ferromagnetic materials. However, the information of the images detected through the MFL method is incomplete when the defect (especially for the cracks) is complex, and some information would be lost when magnetized unidirectionally. Then, the multidirectional magnetization method is proposed to fuse the images detected under different magnetization orientations. It causes a critical problem: the existing image registration methods cannot be applied to align the images because the images are different when detected under different magnetization orientations. This study presents a novel image registration method for MFL visualization to solve this problem. In order to evaluate the registration, and to fuse the information detected in different directions, the mutual information between the reference image and the MFL image calculated by the forward model is designed as a measure. Furthermore, Particle Swarm Optimization (PSO) is used to optimize the registration process. The comparative experimental results demonstrate that this method has a higher registration accuracy for the MFL images of complex cracks than the existing methods. Full article
Show Figures

Figure 1

18 pages, 4528 KiB  
Article
Scale Enhancement Pyramid Network for Small Object Detection from UAV Images
by Jian Sun, Hongwei Gao, Xuna Wang and Jiahui Yu
Entropy 2022, 24(11), 1699; https://doi.org/10.3390/e24111699 - 21 Nov 2022
Cited by 6 | Viewed by 2240
Abstract
Object detection is challenging in large-scale images captured by unmanned aerial vehicles (UAVs), especially when detecting small objects with significant scale variation. Most solutions employ the fusion of different scale features by building multi-scale feature pyramids to ensure that the detail and semantic [...] Read more.
Object detection is challenging in large-scale images captured by unmanned aerial vehicles (UAVs), especially when detecting small objects with significant scale variation. Most solutions employ the fusion of different scale features by building multi-scale feature pyramids to ensure that the detail and semantic information are abundant. Although feature fusion benefits object detection, it still requires the long-range dependencies information necessary for small objects with significant scale variation detection. We propose a simple yet effective scale enhancement pyramid network (SEPNet) to address these problems. A SEPNet consists of a context enhancement module (CEM) and feature alignment module (FAM). Technically, the CEM combines multi-scale atrous convolution and multi-branch grouped convolution to model global relationships. Additionally, it enhances object feature representation, preventing features with lost spatial information from flowing into the feature pyramid network (FPN). The FAM adaptively learns offsets of pixels to preserve feature consistency. The FAM aims to adjust the location of sampling points in the convolutional kernel, effectively alleviating information conflict caused by the fusion of adjacent features. Results indicate that the SEPNet achieves an AP score of 18.9% on VisDrone, which is 7.1% higher than the AP score of state-of-the-art detectors RetinaNet achieves an AP score of 81.5% on PASCAL VOC. Full article
Show Figures

Figure 1

Back to TopTop