MDPI - Publisher of Open Access Journals

28 pages, 10549 KiB

Open AccessArticle

Multispectral Target Detection Based on Deep Feature Fusion of Visible and Infrared Modalities

by Yongsheng Zhao, Yuxing Gao, Xu Yang and Luyang Yang

Appl. Sci. 2025, 15(11), 5857; https://doi.org/10.3390/app15115857 - 23 May 2025

Viewed by 531

Multispectral detection leverages visible and infrared imaging to improve detection performance in complex environments. However, conventional convolution-based fusion methods predominantly rely on local feature interactions, limiting their capacity to fully exploit cross-modal information and making them more susceptible to interference from complex backgrounds. [...] Read more.

Multispectral detection leverages visible and infrared imaging to improve detection performance in complex environments. However, conventional convolution-based fusion methods predominantly rely on local feature interactions, limiting their capacity to fully exploit cross-modal information and making them more susceptible to interference from complex backgrounds. To overcome these challenges, the YOLO-MEDet multispectral target detection model is proposed. Firstly, the YOLOv5 architecture is redesigned into a two-stream backbone network, incorporating a midway fusion strategy to integrate multimodal features from the C3 to C5 layers, thereby enhancing detection accuracy and robustness. Secondly, the Attention-Enhanced Feature Fusion Framework (AEFF) is introduced to optimize both cross-modal and intra-modal feature representations by employing an attention mechanism, effectively boosting model performance. Finally, the C3-PSA (C3 Pyramid Compressed Attention) module is integrated to reinforce multiscale spatial feature extraction and refine feature representation, ultimately improving detection accuracy while reducing false alarms and missed detections in complex scenarios. Extensive experiments on the FLIR, KAIST, and M3FD datasets, along with additional validation using SimuNPS simulations, confirm the superiority of YOLO-MEDet. The results indicate that the proposed model outperforms existing approaches across multiple evaluation metrics, providing an innovative solution for multispectral target detection. Full article

► Show Figures

Figure 1

33 pages, 14577 KiB

Open AccessArticle

A Color-Based Multispectral Imaging Approach for a Human Detection Camera

by Shuji Ono

J. Imaging 2025, 11(4), 93; https://doi.org/10.3390/jimaging11040093 - 21 Mar 2025

Viewed by 1809

Abstract

In this study, we propose a color-based multispectral approach using four selected wavelengths (453, 556, 668, and 708 nm) from the visible to near-infrared range to separate clothing from the background. Our goal is to develop a human detection camera that supports real-time [...] Read more.

In this study, we propose a color-based multispectral approach using four selected wavelengths (453, 556, 668, and 708 nm) from the visible to near-infrared range to separate clothing from the background. Our goal is to develop a human detection camera that supports real-time processing, particularly under daytime conditions and for common fabrics. While conventional deep learning methods can detect humans accurately, they often require large computational resources and struggle with partially occluded objects. In contrast, we treat clothing detection as a proxy for human detection and construct a lightweight machine learning model (multi-layer perceptron) based on these four wavelengths. Without relying on full spectral data, this method achieves an accuracy of 0.95, precision of 0.97, recall of 0.93, and an F1-score of 0.95. Because our color-driven detection relies on pixel-wise spectral reflectance rather than spatial patterns, it remains computationally efficient. A simple four-band camera configuration could thus facilitate real-time human detection. Potential applications include pedestrian detection in autonomous driving, security surveillance, and disaster victim searches. Full article

(This article belongs to the Special Issue Color in Image Processing and Computer Vision)

► Show Figures

Figure 1

15 pages, 2391 KiB

Open AccessArticle

Multispectral Pedestrian Detection Based on Prior-Saliency Attention and Image Fusion

by Jiaren Guo, Zihao Huang and Yanyun Tao

Electronics 2024, 13(9), 1770; https://doi.org/10.3390/electronics13091770 - 3 May 2024

Cited by 2 | Viewed by 1592

Abstract

Detecting pedestrians in varying illumination conditions poses a significant challenge, necessitating the development of innovative solutions. In response to this, we introduce Prior-AttentionNet, a pedestrian detection model featuring a Prior-Attention mechanism. This model leverages the stark contrast between thermal objects and their backgrounds [...] Read more.

Detecting pedestrians in varying illumination conditions poses a significant challenge, necessitating the development of innovative solutions. In response to this, we introduce Prior-AttentionNet, a pedestrian detection model featuring a Prior-Attention mechanism. This model leverages the stark contrast between thermal objects and their backgrounds in far-infrared (FIR) images by employing saliency attention derived from FIR images via UNet. However, extracting salient regions of diverse scales from FIR images poses a challenge for saliency attention. To address this, we integrate Simple Linear Iterative Clustering (SLIC) superpixel segmentation, embedding the segmentation feature map as prior knowledge into UNet’s decoding stage for comprehensive end-to-end training and detection. This integration enhances the extraction of focused attention regions, with the synergy of segmentation prior and saliency attention forming the core of Prior-AttentionNet. Moreover, to enrich pedestrian details and contour visibility in low-light conditions, we implement multispectral image fusion. Experimental evaluations were conducted on the KAIST and OTCBVS datasets. Applying Prior-Attention mode to FIR-RGB images significantly improves the delineation and focus on multi-scale pedestrians. Prior-AttentionNet’s general detector demonstrates the capability of detecting pedestrians with minimal computational resources. The ablation studies indicate that the FIR-RGB+ Prior-Attention mode markedly enhances detection robustness over other modes. When compared to conventional multispectral pedestrian detection models, Prior-AttentionNet consistently surpasses them by achieving higher mean average precision and lower miss rates in diverse scenarios, during both day and night. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

24 pages, 10144 KiB

Open AccessArticle

CMCA-YOLO: A Study on a Real-Time Object Detection Model for Parking Lot Surveillance Imagery

by Ning Zhao, Ke Wang, Jiaxing Yang, Fengkai Luan, Liping Yuan and Hu Zhang

Electronics 2024, 13(8), 1557; https://doi.org/10.3390/electronics13081557 - 19 Apr 2024

Cited by 6 | Viewed by 3888

Abstract

In the accelerated phase of urbanization, intelligent surveillance systems play an increasingly pivotal role in enhancing urban management efficiency, particularly in the realm of parking lot administration. The precise identification of small and overlapping targets within parking areas is of paramount importance for [...] Read more.

In the accelerated phase of urbanization, intelligent surveillance systems play an increasingly pivotal role in enhancing urban management efficiency, particularly in the realm of parking lot administration. The precise identification of small and overlapping targets within parking areas is of paramount importance for augmenting parking efficiency and ensuring the safety of vehicles and pedestrians. To address this challenge, this paper delves into and amalgamates cross-attention and multi-spectral channel attention mechanisms, innovatively designing the Criss-cross and Multi-spectral Channel Attention (CMCA) module and subsequently refining the CMCA-YOLO model, specifically optimized for parking lot surveillance scenarios. Through meticulous analysis of pixel-level contextual information and frequency characteristics, the CMCA-YOLO model achieves significant advancements in accuracy and speed for detecting small and overlapping targets, exhibiting exceptional performance in complex environments. Furthermore, the study validates the research on a proprietary dataset of parking lot scenes comprising 4502 images, where the CMCA-YOLO model achieves an mAP@0.5 score of 0.895, with a pedestrian detection accuracy that surpasses the baseline model by 5%. Comparative experiments and ablation studies with existing technologies thoroughly demonstrate the CMCA-YOLO model’s superiority and advantages in handling complex surveillance scenarios. Full article

► Show Figures

Figure 1

17 pages, 6317 KiB

Open AccessArticle

INSANet: INtra-INter Spectral Attention Network for Effective Feature Fusion of Multispectral Pedestrian Detection

by Sangin Lee, Taejoo Kim, Jeongmin Shin, Namil Kim and Yukyung Choi

Sensors 2024, 24(4), 1168; https://doi.org/10.3390/s24041168 - 10 Feb 2024

Cited by 13 | Viewed by 2551

Abstract

Pedestrian detection is a critical task for safety-critical systems, but detecting pedestrians is challenging in low-light and adverse weather conditions. Thermal images can be used to improve robustness by providing complementary information to RGB images. Previous studies have shown that multi-modal feature fusion [...] Read more.

Pedestrian detection is a critical task for safety-critical systems, but detecting pedestrians is challenging in low-light and adverse weather conditions. Thermal images can be used to improve robustness by providing complementary information to RGB images. Previous studies have shown that multi-modal feature fusion using convolution operation can be effective, but such methods rely solely on local feature correlations, which can degrade the performance capabilities. To address this issue, we propose an attention-based novel fusion network, referred to as INSANet (INtra-INter Spectral Attention Network), that captures global intra- and inter-information. It consists of intra- and inter-spectral attention blocks that allow the model to learn mutual spectral relationships. Additionally, we identified an imbalance in the multispectral dataset caused by several factors and designed an augmentation strategy that mitigates concentrated distributions and enables the model to learn the diverse locations of pedestrians. Extensive experiments demonstrate the effectiveness of the proposed methods, which achieve state-of-the-art performance on the KAIST dataset and LLVIP dataset. Finally, we conduct a regional performance evaluation to demonstrate the effectiveness of our proposed network in various regions. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

15 pages, 15774 KiB

Open AccessArticle

Illumination-Aware Cross-Modality Differential Fusion Multispectral Pedestrian Detection

by Chishe Wang, Jinjin Qian, Jie Wang and Yuting Chen

Electronics 2023, 12(17), 3576; https://doi.org/10.3390/electronics12173576 - 24 Aug 2023

Cited by 3 | Viewed by 1735

Abstract

Multispectral information fusion technology is a practical approach to enhance pedestrian detection performance in low light conditions. However, current methods often overlook the impact of illumination on modal weights and the significance of inter-modal differential information. Therefore, this paper proposes a novel illumination-aware [...] Read more.

Multispectral information fusion technology is a practical approach to enhance pedestrian detection performance in low light conditions. However, current methods often overlook the impact of illumination on modal weights and the significance of inter-modal differential information. Therefore, this paper proposes a novel illumination-aware cross-modality differential fusion (IACMDF) model. The weights of the different modalities in the fusion stage are adaptively adjusted according to the illumination intensity of the current scene. On the other hand, the advantages of the respective modalities are fully enhanced by amplifying the differential information and suppressing the commonality of the twin modalities. In addition, to reduce the loss problem caused by the importance occupied by different channels of the feature map in the convolutional pooling process, this work adds the squeeze-and-excitation attention mechanism after the fusion process. Experiments on the public multispectral dataset KAIST have shown that the average miss rate of our method is substantially reduced compared to the baseline model. Full article

► Show Figures

Figure 1

13 pages, 2343 KiB

Open AccessArticle

All-Weather Pedestrian Detection Based on Double-Stream Multispectral Network

by Chih-Hsien Hsia, Hsiao-Chu Peng and Hung-Tse Chan

Electronics 2023, 12(10), 2312; https://doi.org/10.3390/electronics12102312 - 20 May 2023

Cited by 6 | Viewed by 2202

Abstract

Recently, advanced driver assistance systems (ADAS) have attracted wide attention in pedestrian detection for using the multi-spectrum generated by multi-sensors. However, it is quite challenging for image-based sensors to perform their tasks due to instabilities such as light changes, object shading, or weather [...] Read more.

Recently, advanced driver assistance systems (ADAS) have attracted wide attention in pedestrian detection for using the multi-spectrum generated by multi-sensors. However, it is quite challenging for image-based sensors to perform their tasks due to instabilities such as light changes, object shading, or weather conditions. Considering all the above, based on different spectral information of RGB and thermal images, this study proposed a deep learning (DL) framework to improve the problem of confusing light sources and extract highly differentiated multimodal features through multispectral fusion. Pedestrian detection methods, including a double-stream multispectral network (DSMN), were used to extract a multispectral fusion and double-stream detector with Yolo-based (MFDs-Yolo) information. Moreover, a self-adaptive multispectral weight adjustment method improved illumination–aware network (i-IAN) for later fusion strategy, making different modes complimentary. According to the experimental results, the good performance of this detection method was demonstrated in the public dataset KAIST and the multispectral pedestrian detection dataset FLIR, and it even performed better than the most advanced method in the miss rate (MR) (IoU@0.75) evaluation system. Full article

(This article belongs to the Special Issue New Trends in Deep Learning for Computer Vision)

► Show Figures

Figure 1

18 pages, 12133 KiB

Open AccessArticle

HAFNet: Hierarchical Attentive Fusion Network for Multispectral Pedestrian Detection

by Peiran Peng, Tingfa Xu, Bo Huang and Jianan Li

Remote Sens. 2023, 15(8), 2041; https://doi.org/10.3390/rs15082041 - 12 Apr 2023

Cited by 10 | Viewed by 3112

Abstract

Multispectral pedestrian detection via visible and thermal image pairs has received widespread attention in recent years. It provides a promising multi-modality solution to address the challenges of pedestrian detection in low-light environments and occlusion situations. Most existing methods directly blend the results of [...] Read more.

Multispectral pedestrian detection via visible and thermal image pairs has received widespread attention in recent years. It provides a promising multi-modality solution to address the challenges of pedestrian detection in low-light environments and occlusion situations. Most existing methods directly blend the results of the two modalities or combine the visible and thermal features via a linear interpolation. However, such fusion strategies tend to extract coarser features corresponding to the positions of different modalities, which may lead to degraded detection performance. To mitigate this, this paper proposes a novel and adaptive cross-modality fusion framework, named Hierarchical Attentive Fusion Network (HAFNet), which fully exploits the multispectral attention knowledge to inspire pedestrian detection in the decision-making process. Concretely, we introduce a Hierarchical Content-dependent Attentive Fusion (HCAF) module to extract top-level features as a guide to pixel-wise blending features of two modalities to enhance the quality of the feature representation and a plug-in multi-modality feature alignment (MFA) block to fine-tune the feature alignment of two modalities. Experiments on the challenging KAIST and CVC-14 datasets demonstrate the superior performance of our method with satisfactory speed. Full article

(This article belongs to the Special Issue Data Fusion for Urban Applications)

► Show Figures

Figure 1

23 pages, 6312 KiB

Open AccessReview

Deep Learning-Based Pedestrian Detection in Autonomous Vehicles: Substantial Issues and Challenges

by Sundas Iftikhar, Zuping Zhang, Muhammad Asim, Ammar Muthanna, Andrey Koucheryavy and Ahmed A. Abd El-Latif

Electronics 2022, 11(21), 3551; https://doi.org/10.3390/electronics11213551 - 31 Oct 2022

Cited by 67 | Viewed by 14843

Abstract

In recent years, autonomous vehicles have become more and more popular due to their broad influence over society, as they increase passenger safety and convenience, lower fuel consumption, reduce traffic blockage and accidents, save costs, and enhance reliability. However, autonomous vehicles suffer from [...] Read more.

In recent years, autonomous vehicles have become more and more popular due to their broad influence over society, as they increase passenger safety and convenience, lower fuel consumption, reduce traffic blockage and accidents, save costs, and enhance reliability. However, autonomous vehicles suffer from some functionality errors which need to be minimized before they are completely deployed onto main roads. Pedestrian detection is one of the most considerable tasks (functionality errors) in autonomous vehicles to prevent accidents. However, accurate pedestrian detection is a very challenging task due to the following issues: (i) occlusion and deformation and (ii) low-quality and multi-spectral images. Recently, deep learning (DL) technologies have exhibited great potential for addressing the aforementioned pedestrian detection issues in autonomous vehicles. This survey paper provides an overview of pedestrian detection issues and the recent advances made in addressing them with the help of DL techniques. Informative discussions and future research works are also presented, with the aim of offering insights to the readers and motivating new research directions. Full article

(This article belongs to the Special Issue V2X Communications and Applications for NET-2030)

► Show Figures

Figure 1

16 pages, 11978 KiB

Open AccessArticle

Multispectral Benchmark Dataset and Baseline for Forklift Collision Avoidance

by Hyeongjun Kim, Taejoo Kim, Won Jo, Jiwon Kim, Jeongmin Shin, Daechan Han, Yujin Hwang and Yukyung Choi

Sensors 2022, 22(20), 7953; https://doi.org/10.3390/s22207953 - 19 Oct 2022

Cited by 1 | Viewed by 2944

Abstract

In this paper, multispectral pedestrian detection is mainly discussed, which can contribute to assigning human-aware properties to automated forklifts to prevent accidents, such as collisions, at an early stage. Since there was no multispectral pedestrian detection dataset in an intralogistics domain, we collected [...] Read more.

In this paper, multispectral pedestrian detection is mainly discussed, which can contribute to assigning human-aware properties to automated forklifts to prevent accidents, such as collisions, at an early stage. Since there was no multispectral pedestrian detection dataset in an intralogistics domain, we collected a dataset; the dataset employs a method that aligns image pairs with different domains, i.e. RGB and thermal, without the use of a cumbersome device such as a beam splitter, but rather by exploiting the disparity between RGB sensors and camera geometry. In addition, we propose a multispectral pedestrian detector called SSD 2.5D that can not only detect pedestrians but also estimate the distance between an automated forklift and workers. In extensive experiments, the performance of detection and centroid localization is validated with respect to evaluation metrics used in the driving car domain but with distinct categories, such as hazardous zone and warning zone, to make it more applicable to the intralogistics domain. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

20 pages, 1001 KiB

Open AccessArticle

An Unsupervised Transfer Learning Framework for Visible-Thermal Pedestrian Detection

by Chengjin Lyu, Patrick Heyer, Bart Goossens and Wilfried Philips

Sensors 2022, 22(12), 4416; https://doi.org/10.3390/s22124416 - 10 Jun 2022

Cited by 7 | Viewed by 2986

Abstract

Dual cameras with visible-thermal multispectral pairs provide both visual and thermal appearance, thereby enabling detecting pedestrians around the clock in various conditions and applications, including autonomous driving and intelligent transportation systems. However, due to the greatly varying real-world scenarios, the performance of a [...] Read more.

Dual cameras with visible-thermal multispectral pairs provide both visual and thermal appearance, thereby enabling detecting pedestrians around the clock in various conditions and applications, including autonomous driving and intelligent transportation systems. However, due to the greatly varying real-world scenarios, the performance of a detector trained on a source dataset might change dramatically when evaluated on another dataset. A large amount of training data is often necessary to guarantee the detection performance in a new scenario. Typically, human annotators need to conduct the data labeling work, which is time-consuming, labor-intensive and unscalable. To overcome the problem, we propose a novel unsupervised transfer learning framework for multispectral pedestrian detection, which adapts a multispectral pedestrian detector to the target domain based on pseudo training labels. In particular, auxiliary detectors are utilized and different label fusion strategies are introduced according to the estimated environmental illumination level. Intermediate domain images are generated by translating the source images to mimic the target ones, acting as a better starting point for the parameter update of the pedestrian detector. The experimental results on the KAIST and FLIR ADAS datasets demonstrate that the proposed method achieves new state-of-the-art performance without any manual training annotations on the target data. Full article

(This article belongs to the Topic Methods for Data Labelling for Intelligent Systems)

► Show Figures

Figure 1

21 pages, 1978 KiB

Open AccessArticle

Adopting the YOLOv4 Architecture for Low-Latency Multispectral Pedestrian Detection in Autonomous Driving

by Kamil Roszyk, Michał R. Nowicki and Piotr Skrzypczyński

Sensors 2022, 22(3), 1082; https://doi.org/10.3390/s22031082 - 30 Jan 2022

Cited by 60 | Viewed by 7755

Abstract

Detecting pedestrians in autonomous driving is a safety-critical task, and the decision to avoid a a person has to be made with minimal latency. Multispectral approaches that combine RGB and thermal images are researched extensively, as they make it possible to gain robustness [...] Read more.

Detecting pedestrians in autonomous driving is a safety-critical task, and the decision to avoid a a person has to be made with minimal latency. Multispectral approaches that combine RGB and thermal images are researched extensively, as they make it possible to gain robustness under varying illumination and weather conditions. State-of-the-art solutions employing deep neural networks offer high accuracy of pedestrian detection. However, the literature is short of works that evaluate multispectral pedestrian detection with respect to its feasibility in obstacle avoidance scenarios, taking into account the motion of the vehicle. Therefore, we investigated the real-time neural network detector architecture You Only Look Once, the latest version (YOLOv4), and demonstrate that this detector can be adapted to multispectral pedestrian detection. It can achieve accuracy on par with the state-of-the-art while being highly computationally efficient, thereby supporting low-latency decision making. The results achieved on the KAIST dataset were evaluated from the perspective of automotive applications, where low latency and a low number of false negatives are critical parameters. The middle fusion approach to YOLOv4 in its Tiny variant achieved the best accuracy to computational efficiency trade-off among the evaluated architectures. Full article

(This article belongs to the Special Issue Sensors and Object Detection for Autonomous Driving in Adverse Conditions)

► Show Figures

Figure 1

15 pages, 4542 KiB

Open AccessArticle

Robust Pedestrian Detection Based on Multi-Spectral Image Fusion and Convolutional Neural Networks

by Xu Chen, Lei Liu and Xin Tan

Electronics 2022, 11(1), 1; https://doi.org/10.3390/electronics11010001 - 21 Dec 2021

Cited by 21 | Viewed by 3438

Abstract

Nowadays, pedestrian detection is widely used in fields such as driving assistance and video surveillance with the progression of technology. However, although the research of single-modal visible pedestrian detection has been very mature, it is still not enough to meet the demand of [...] Read more.

Nowadays, pedestrian detection is widely used in fields such as driving assistance and video surveillance with the progression of technology. However, although the research of single-modal visible pedestrian detection has been very mature, it is still not enough to meet the demand of pedestrian detection at all times. Thus, a multi-spectral pedestrian detection method via image fusion and convolutional neural networks is proposed in this paper. The infrared intensity distribution and visible appearance features are retained with a total variation model based on local structure transfer, and pedestrian detection is realized with the multi-spectral fusion results and the target detection network YOLOv3. The detection performance of the proposed method is evaluated and compared with the detection methods based on the other four pixel-level fusion algorithms and two fusion network architectures. The results attest that our method has superior detection performance, which can detect pedestrian targets robustly even in the case of harsh illumination conditions and cluttered backgrounds. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

16 pages, 27436 KiB

Open AccessArticle

Detection of Archaeological Surface Ceramics Using Deep Learning Image-Based Methods and Very High-Resolution UAV Imageries

by Athos Agapiou, Athanasios Vionis and Giorgos Papantoniou

Land 2021, 10(12), 1365; https://doi.org/10.3390/land10121365 - 10 Dec 2021

Cited by 18 | Viewed by 5863

Abstract

Mapping surface ceramics through systematic pedestrian archaeological survey is considered a consistent method to recover the cultural biography of sites within a micro-region. Archaeologists nowadays conduct surface survey equipped with navigation devices counting, documenting, and collecting surface archaeological potsherds within a set of [...] Read more.

Mapping surface ceramics through systematic pedestrian archaeological survey is considered a consistent method to recover the cultural biography of sites within a micro-region. Archaeologists nowadays conduct surface survey equipped with navigation devices counting, documenting, and collecting surface archaeological potsherds within a set of plotted grids. Recent advancements in unmanned aerial vehicles (UAVs) and image processing analysis can be utilised to support such surface archaeological investigations. In this study, we have implemented two different artificial intelligence image processing methods over two areas of interest near the present-day village of Kophinou in Cyprus, in the Xeros River valley. We have applied a random forest classifier through the Google Earth Engine big data cloud platform and a Single Shot Detector neural network in the ArcGIS Pro environment. For the first case study, the detection was based on red–green–blue (RGB) high-resolution orthophotos. In contrast, a multispectral camera covering both the visible and the near-infrared parts of the spectrum was used in the second area of investigation. The overall results indicate that such an approach can be used in the future as part of ongoing archaeological pedestrian surveys to detect scattered potsherds in areas of archaeological interest, even if pottery shares a very high spectral similarity with the surface. Full article

(This article belongs to the Special Issue Land: 10th Anniversary)

► Show Figures

Figure 1

17 pages, 17044 KiB

Open AccessArticle

Attention Fusion for One-Stage Multispectral Pedestrian Detection

by Zhiwei Cao, Huihua Yang, Juan Zhao, Shuhong Guo and Lingqiao Li

Sensors 2021, 21(12), 4184; https://doi.org/10.3390/s21124184 - 18 Jun 2021

Cited by 56 | Viewed by 5529

Abstract

Multispectral pedestrian detection, which consists of a color stream and thermal stream, is essential under conditions of insufficient illumination because the fusion of the two streams can provide complementary information for detecting pedestrians based on deep convolutional neural networks (CNNs). In this paper, [...] Read more.

Multispectral pedestrian detection, which consists of a color stream and thermal stream, is essential under conditions of insufficient illumination because the fusion of the two streams can provide complementary information for detecting pedestrians based on deep convolutional neural networks (CNNs). In this paper, we introduced and adapted a simple and efficient one-stage YOLOv4 to replace the current state-of-the-art two-stage fast-RCNN for multispectral pedestrian detection and to directly predict bounding boxes with confidence scores. To further improve the detection performance, we analyzed the existing multispectral fusion methods and proposed a novel multispectral channel feature fusion (MCFF) module for integrating the features from the color and thermal streams according to the illumination conditions. Moreover, several fusion architectures, such as Early Fusion, Halfway Fusion, Late Fusion, and Direct Fusion, were carefully designed based on the MCFF to transfer the feature information from the bottom to the top at different stages. Finally, the experimental results on the KAIST and Utokyo pedestrian benchmarks showed that Halfway Fusion was used to obtain the best performance of all architectures and the MCFF could adapt fused features in the two modalities. The log-average miss rate (MR) for the two modalities with reasonable settings were 4.91% and 23.14%, respectively. Full article

(This article belongs to the Topic Intelligent Transportation Systems)

► Show Figures

Figure 1

Search Results (18)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (18)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI