1. Introduction
With the development of millimeter wave imaging technology, the millimeter wave security scheme is becoming more and more sophisticated [
1,
2,
3]. The current milli-meter wave imaging systems can be divided into two categories: passive imaging and active imaging. Passive imaging systems use the object’s own radiation for imaging. Because the object’s radiation power is meager, the imaging effect is terrible. Active imaging systems use the reflection characteristics of objects for imaging. Since the power of the emission source is high, the reflected radiation can obtain a clear image of the human body. Upon the introduction of the concept of Multiple-Input Multiple-Output (MIMO), SAR imaging-based security inspection systems are also in development, which greatly improves the imaging accuracy and rate [
4,
5,
6,
7,
8,
9]. Different from X-ray, active millimeter-wave security equipment does not cause harm to the human body [
10].
At present, there are two scanning methods for SAR imaging security inspection systems. One is circular scanning, in which the linear array antenna is placed on a vertical plane. Although the circular scanning system has a better imaging effect on the target, it has the disadvantages of slow imaging speed and complex mechanical and electrical structure. The flat scanning system places the antenna on a horizontal plane to scan vertically, contributing to a fast-scanning speed and a simple mechanical and electrical structure. However, the antenna distance from the human body is equal at any given moment, so the flat scanning system has different imaging clarity at different human body positions.
With the availability of more sophisticated millimeter wave imaging systems, the conditions for target detection in millimeter wave images are facilitated. Yeom et al. [
11] extract geometric feature vectors from Principal Component Analysis (PCA) transform target shapes and use the geometric feature vectors to analyze the segmented binary images for target recognition. This method uses traditional image processing methods, and the detection target is limited.
In recent years, deep learning has been widely used in target detection from SAR images. Chen et al. [
12,
13] use multi-scale fusion to enhance the method of extracting features to achieve a high accuracy detection of bridges and aircrafts. Meng et al. [
14] propose a human pose segmentation algorithm based on deep Convolutional Neural Network (CNN) detection, which enables human images to be divided into several parts for recognition. Lopez-Tapia et al. [
15] enhance the passive millimeter wave image to improve the detection rate of the target. Guo et al. [
16] use human contour segmentation combined with deep learning to achieve the detection of targets in passive millimeter wave images. Liu et al. [
17] analyze millimeter wave imaging data, then extract a familiar millimeter wave image and a new spatial depth map for further detection. There is only one detect object in these studies.
Del Prete et al. [
18] used the region-based convolutional neural network to detect ship wakes. Because some ship wakes are similar to water surface membranes, the mAP is only 67.63%. Ghaderpour et al. [
19] studied the various frequency and time-frequency decomposition methods based on Fourier and least square analysis. Arivazhagan et al. [
20] used wavelet transform and Fourier transform for target/change detection. The wavelet-based algorithms proved to be successful in detecting changes/targets within images.
Most of the existing multi-object recognition is the recognition of a human body and a hidden object. Pang et al. [
21] use the YOLO v3 algorithm for the real-time detection of human concealed metal weapons from passive millimeter wave images. They mainly identify pistols and people. The shapes of the two targets are different, so the deep learning network can readily identify them. Zhang et al. [
22] propose an improved Faster Region-Based Convolutional Neural Network (R-CNN) for target detection from the millimeter wave image of a circular scanning system. The mean Average Precision of the improved network reaches 69.7%. The human body in their image occupies more than 50% of the pixel area, and hidden objects are placed inside the human body. This recognition has a distinctive feature: the recognition rate of the human body is very high, which can reach 98.75%, but the recognition rate of hidden objects is very low, such as the recognition rate of mobile phones is only 47.18%. This mAP cannot objectively express the recognition rate of hidden objects. This article does not consider the human body as the detection target, but only recognizes targets with inconsistent clarity and similar targets, such as a pistol, hammer, wrench.
In this article, a suspicious multi-object detection and recognition method for millimeter wave SAR security inspection images based on multi-path extraction network is proposed. We overcome the problem of inconsistent definition targets and similar targets difficult to identify. We use a flat scanning imaging system to obtain a SAR image of the human body. The imaging focal length of the flat scanning system is a fixed value, but the body surface is not flat. Due to the change of the placement position of the target, the target area will be in different imaging distances, and the scanning imaging will have inconsistent clarity on the same target. Targets with inconsistent clarity are at least 1/4 of the targets blurred, and the blurred area is not limited to the boundary. At the same time, SAR imaging is often a grayscale image with a single color. The identification of dangerous articles mainly depends on the shape of the object. To the best of our knowledge, most of the current research focuses on detecting hidden objects in the human body, but few studies identify the names of hidden objects. This article combines the imaging characteristics to solve the interference problem of inconsistent object clarity and objects similar, and a multi-path extraction network for multi-target recognition of SAR imaging security inspection images is proposed.
The main research of this article is as follows:
- (1)
A suspicious multi-object detection and recognition method based on multi-path extraction network (MPEN) is proposed for millimeter wave SAR security inspection images. Combining the output characteristics of the deep and shallow networks, it realizes the recognition of inconsistent sharpness targets and similar targets.
- (2)
A Multi-Path Feature Pyramid (MPFP) model and an improved residual block distribution are proposed. We use MPFP to output the semantic information of the deep network independently. The feature map contains the semantics of a deep independent network, which enhances the feature extraction of different targets and better distinguishes different targets. At the same time, we adjust the number of repetitions of the residual block, and we focus on using the receptive field of the shallow network. The modified residual block distribution helps distinguish the differences in the contours of different targets and improve the recognition accuracy.
- (3)
This article provides a new idea for automatically identifying multi-objects from the SAR imaging of security inspection systems.
The structure of this article is as follows. In
Section 2, the main problems of multi-target recognition are introduced. In
Section 3, the MPEN of millimeter wave SAR security inspection images proposed in this paper is described in detail. In
Section 4, the experimental results and corresponding image analysis under different conditions are described. In
Section 5, the specific differences between our method and existing research are discussed. Finally, the research conclusions are given in
Section 6.
2. SAR Imaging System of Security Inspection
At present, different security inspection systems with SAR imaging have different imaging effects, and there is no relevant dataset. Therefore, this study is based on the laboratory self-developed security inspection system with SAR imaging. The system structure is shown in
Figure 1. The system places all sources and detectors in parallel on the
X axis, and scans up and down along the
Z axis. The operating frequency of the system is 35 GHz. Our security inspection system has a simple mechanical structure, and the scanning structure can complete scanning in one cycle of the
Z-axis movement. The system can run continuously, scanning imaging is saved in jpg format, and the ratio of each image is fixed at 200 × 400 pixels.
In principle, the imaging system is a near-field imaging radar. The intensity of the millimeter waves reflected by metal contraband is higher than that of the human body. The contraband can be seen as a brighter area in the imaging. In the datasets, the target placement position is the center and the edge of the human body. The recognition target is set to common contraband, such as pistols, hammers, and wrenches.
When the flat scanning system detects the target close to the body surface, the target image will be deformed at a certain angle. At the same time, the distance between the target and the scanning antenna is not a fixed value. Because the imaging focal length is fixed, the target close to the human body surface will be blurred in some areas. Take the pistol imaging result as an example. The imaging is shown in
Figure 2. The pistol is placed on the edge of the body, and the pistol image is clear, as shown in
Figure 2a. When the pistol is placed on the surface of the human body, the outline of the pistol merges with the outline of the human body, and only the area of the pistol grip is clear, as shown in
Figure 2b.
In addition, this article mainly uses hammers and wrenches to identify similar targets. As shown in
Figure 3, both the hammer and wrench have a long handle. The handles of the two targets are similar, and they are similar to the human limbs. The difference is that the hammer has a head, and the wrench has an opening. Therefore, this article is aimed at the characteristics of the flat scanning millimeter wave imaging system, and a suspicious multi-object detection and recognition method for millimeter wave SAR security inspection images based on multi-path extraction network is proposed. Our method improves the recognition rate of targets with inconsistent clarity and similar targets in SAR images.
The data used in this experiment is a laboratory-developed security inspection system with 1000 SAR images, 90% of which are used for training and 10% for testing. We consider using most of the dataset for training, and the test dataset contains all actual recognition situations. We have three recognition targets: a pistol, wrench, and hammer. We produced three types of datasets: inconsistent clarity of objects, similar objects, and mixed targets (including inconsistent clarity of objects and similar objects, at the same time). Each category is divided into two situations: the target is placed on the body’s trunk, and the target is placed on the limbs. We use a method of including multiple recognition targets in one image to make up for the problem of a small dataset. Although the number of test images is slight, many targets need to be identified, thereby ensuring the reliability of the test results. The images in the dataset are labeled using LabelImg, with one to three targets in each image, as shown in the figure after LabelImg. The target is placed at a random position on the human body, and the training image shown in
Figure 4 is when the target is placed at the thigh position. The targets in all test images are randomly placed, and the number of targets in the test images ranged from one to three. The object of this research is to identify the inconsistent clarity of objects and similar targets. It is required that the targets in the dataset do not block each other. Therefore, when placing the target on the human torso, the number of objects placed generally does not exceed three. When the target is placed on the limbs, the number of places is generally one. All test images are used as independent tests to verify the network performance, and the test images are all 200 × 400 pixels.
4. Results
As shown in
Figure 10,
F1 combines the results of Precision and Recall. As shown in
Figure 4, we first compare the relationship between the network’s Score Threshold and
F1. In MPEN, when the Score Threshold value is 0.5, all three recognition targets can maintain a high
F1 value. The
F1 value of the hammer begins to decline rapidly when the Score Threshold value is 0.4. When the Score Threshold value of the SSD network is 0.5, only the
F1 value of the wrench exceeds 0.5. In MPEN, the three recognition targets can maintain a high
F1 value when the Score Threshold value is 0.5. In other words, when the Score Threshold value is 0.5, both Precision and Recall can be maintained at a higher level. Therefore, this study will mainly discuss the performance changes of all aspects of the network when the Score Threshold value is 0.5.
The overall comparison of mAP and recognition time between YOLO v3 network and MPEN can be seen in
Table 1. Compared with the YOLO v3 network, the mAP of MPEN is increased by 7.57% under the same time-consuming condition. Although the recognition time of the SSD network is short, its mAP is too low, and
F1 performs poorly when the Score Threshold value is 0.5. Therefore, the following analysis does not compare to the SSD network. In
Table 2, we compare the recognition accuracy of a pistol, wrench, and hammer. Even though the similarity of wrench and hammer is high, the accuracy of MPEN is still higher than that of the YOLO v3 network. The recognition accuracy of a hammer is lower than that of the YOLO v3 network. Among the three targets, the improvement is the highest, 11.73%, and the recognition accuracy of the pistol is the lowest, 4.98%. At the same time, compared with the AP of each target, the AP of the wrench is the highest among the three, which is 9.72%. The AP increase is the lowest among the three, at 5.32%.
Next in the process is to choose 4 SAR images that are scanned by a typical flat scanning system with inconsistent targets and similar targets, and use YOLO v3 and MPEN to detect targets.
Figure 11 shows the test results. Among them, in
Figure 11a, MPEN can effectively detect the hammer and wrench with a confidence level greater than 0.7. While the hammer is missing in the YOLO v3 network, the confidence level of the wrench is only 0.52. In
Figure 11b, MPEN can detect the target wrench with a confidence level greater than 0.6, while the YOLO v3 network misidentifies the wrench as a pistol. In
Figure 11c, both the MPEN and YOLO v3 network can detect the wrench, but the confidence of MPEN is 0.96, while the confidence of the YOLO v3 network is only 0.81. In
Figure 11d, MPEN can detect the handgun with a confidence level of 0.58, while the handgun is missing in the YOLO v3 network. It can be seen that the YOLO v3 network failed to detect or misclassified the target, and MPEN has further improved its ability to distinguish the blurred targets with low pixels and similar targets.
We list the false alarm rate and miss alarm rate of the network recognition target in
Table 3. The false alarm rate is the ratio of the number of misclassified targets to the total number of samples. The main reason for a false recognition is that the human limbs are similar to the target handle, and the YOLO network has a low ability to distinguish similar targets. The miss alarm rate is the ratio of the number of missed targets to the total number of samples. Missing targets are mainly due to inconsistent target clarity, which leads to insufficient network acquisition capabilities. The false alarm rates of pistols, hammers, and wrenches for the network that we propose to MPEN is lower than the YOLO v3 network, and the gun miss rate is slightly higher than that of the YOLO v3 network. The false rate and miss rate can reflect the performance of the network from another angle. According to the data in
Table 3, MPEN can reduce the error rate when detecting similar targets. In terms of missed detection rate, the missed detection rate of pistols has increased slightly, but the missed detection rate of hammers and wrenches has decreased. According to the increase in error rate and missed detection rate, it can be considered that MPEN has a particular improvement in the recognition of inconsistent clarity targets. At the same time, it has a particular improvement in the recognition ability of similar interference targets (such as human limbs).
In this study, contraband is placed in different human body parts, deliberately imaging some blurred contraband images. The purpose is to simulate some extreme usage scenarios of security inspection equipment in practical applications as much as possible. The deep learning network has no requirements for human posture. The recognition results are shown in
Figure 12. When the contraband is on the human torso, the appearance of the contraband is more obvious. In
Figure 12a,b,e,j, the blue box is selected as the wrench. When the wrench is on the outer thigh, back, waist, and leg of the human body, compared with the other three images, the wrench in
Figure 12a has a different appearance, clarity, and edge. In
Figure 12e, the left side of the middle wrench is more blurred than the right side. In
Figure 12d,f,g, when the pistol is on the legs, buttocks, and chest of the human body, the muzzle in
Figure 12f is more blurred than the handle of the firearm. MPEN can still identify targets. Where the target may be confused with the human body, the tester will give instructions by hand to provide a mark for verification. In
Figure 12c, the hammer is on the inner thigh, and the hammerhead can confirm the position, but the handle position is indicated by hand. In a situation where the hammer is placed in the human body under the armpit, the shape of the hammer itself may be confused with the human body, and the handle position with hand instructions. Another example: putting a hammer on a person’s hand, so that the handle of the hammer almost merges with the hand. At this time, the handle is indicated by hand. In order to show that the hand’s instructions will not affect the recognition, place the wrench on the shank. At this time, the wrench on the shank can still be detected.
5. Discussion
There are many researches on target detection using deep learning. Chen et al. [
12,
13] inspected the aircraft. Their detection network has a solid ability to extract targets to detect aircraft in the image. The purpose of our method to improve the detection capability is mainly to distinguish similar targets. Del Prete et al. [
18] detected the wake of the ship. This method is interfered with by other similar elements, resulting in low mAP. Our method can distinguish similar targets even when the target is similar to the body part. Ghaderpour et al. [
19] and Arivazhagan et al. [
20] used wavelet transform and Fourier transform for target/change detection. This detection method can detect multiple targets. Due to the limitation of the algorithm, it can only detect multiple target selections in sequence. Our network uses a deep learning solution to identify and label multiple targets at the same time.
At present, multi-object often needs to be identified in the security inspection system. Most of the current research focuses on detecting a single target, and the research on the recognition of multi-object is relatively immature. Since the SAR imaging systems used by each researcher vary, there may be differences in the data sets used. Due to the performance limitations of the flat scanning imaging system, there is inconsistency in the imaging of foreign objects on the surface of the human body.
The current related studies mainly focus on the detection of caches in the human body, and there are relatively few studies on the identification of cache types. Compared with the [
14,
16,
17] network, we found that the three researchers mainly detected hidden objects, and did not detect the shape and size of the hidden objects, so the detection accuracy of this type of research is relatively high. MPEN can detect multi-object at the same time, and similar objects (hammer and wrench are listed in this article) can be well distinguished again.
Less research results on multiple target recognition on the human surface. In [
17], they focus on the recognition of handguns and human bodies. The two targets differ significantly in shape, and the human body occupies the majority of the image area. When the pistol is placed on the human body surface, the pixel share of the two targets is approximately constant, so it is easier to recognize the two targets. In [
18], the researcher’s purpose of the study is similar to our goal. The researchers conduct tests on human bodies, knives, pistols, bottles, and mobile phones. Since the recognition rate of the human body is as high as 98.75%, the recognition rate of the mobile phone is only 47.18%. At this time, the detection performance of the network is affected by the recognition rate of the human body. Although the mAP of this target detection is 69.7%, it does not reflect the recognition rate of dangerous targets. In security inspection equipment, the recognition rate of dangerous targets should be the first consideration. We only consider dangerous targets, such as pistols, hammers, and wrenches. Our mAP can objectively reflect the identification of dangerous targets.
Since the imaging system we use operates at 35 GHz, if a system with higher imaging resolution is used to obtain SAR images, there will be a difference in performance. There are many prohibited items in the security inspection system, and now it is necessary to train more contraband models to establish datasets. Moreover, as far as the existing recognition accuracy is concerned, it still cannot meet the actual detection requirements. In terms of recognition accuracy and false alarm rate, there is still room for further improvement.
6. Conclusions
A suspicious multi-object detection and recognition method for millimeter wave SAR security inspection images based on multi-path extraction network has been proposed. The method includes proposing a multi-path feature pyramid (MPFP) module and modifying the distribution of residual blocks. Flat scanning system had fixed focal length, blurred imaging, and difficult target recognition. We modified the number of repetitions of the residual block in Darknet-53 and simultaneously increased the path of three scale features. The deep output scale was output separately, and the shallow network was combined with the output after sampling on the deep scale. Our method was the extraction of target boundaries and improve the ability to recognize clarity targets. Compared with the YOLO v3 network, mAP increased by 7.57%, and the recognition accuracy increased by 11.73% at the highest.
In general, our method performed better in SAR images for poor sharpness and similar target recognition. However, when our method recognized a target with specific unique characteristics (such as a pistol), the contour feature of the pistol was relatively regular. It was easy to coincide with the contour of the human body when placed on the edge of the human body. In this case, there still has room for improvement in recognition accuracy. The millimeter wave imaging accuracy was related to the operating frequency of the system. The multi-object recognition method proposed in this article can be applied to devices with lower operating frequencies. It still had high recognition accuracy under low accuracy, which can reduce the dependence of millimeter wave recognition system on working frequency and improve equipment economy. Our method had a better ability to distinguish similar targets. It was used in the field of security inspection and can refine the recognition targets, such as identifying the models of aircraft, ships, and vehicles. The core of research had shifted from identifying a certain type of target to identifying a certain target. And we hope to utilize as much information as possible in SAR images to promote SAR image target recognition research.