Concealed Object Detection and Recognition System Based on Millimeter Wave FMCW Radar

: At present, millimeter wave radar imaging technology has become a recognized human security solution in the ﬁeld. The millimeter wave radar imaging system can be used to detect a concealed object; multiple-input multiple-output radar antennas and synthetic aperture radar techniques are used to obtain the raw data. The analytical Fourier transform algorithm is used for image reconstruction. When imaging a target at 90 mm from radar, which belongs to the near ﬁeld imaging scene, the image resolution can reach 1.90 mm in X -direction and 1.73 mm in Y direction. Since the error caused by the distance between radar and target will lead to noise, the original reconstruction image is processed by gamma transform, which eliminates image noise, then the image is enhanced by linearly stretched transform to improve visual recognition, which lays a good foundation for supervised learning. In order to ﬂexibly deploy the machine learning algorithm in various application scenarios, ShufﬂeNetV2, MobileNetV3 and GhostNet representative of lightweight convolutional neural networks with redeﬁned convolution, branch structure and optimized network layer structure are used to distinguish multi-category SAR images. Through the fusion of squeeze-and-excitation and the selective kernel attention mechanism, more precise features are extracted for classiﬁcation, the proposed GhostNet_SEResNet56 can realize the best classiﬁcation accuracy of SAR images within limited resources, which prediction accuracy is 98.18% and the number of parameters is 0.45 M.


Introduction
In recent years, terrorist activities have occurred frequently, mostly in crowded public places such as airports, railway stations and subways [1]. At present, there are publicity and security measures to prohibit the carrying of dangerous goods in relevant areas, but the existing security mode cannot meet the demand of real-time security in peak passenger flow [2]. Therefore, it is necessary to carry out non-contact human safety inspection for people who may carry dangerous substances. The current security imaging technology mainly consists of X-ray imaging, infrared imaging, millimeter wave imaging and so on.
Currently, millimeter wave radar is widely used in human vital signs measurement, aerial imaging and non-injury detection by analyzing the amplitude and phase information of the received signal [3]. For near-field imaging systems, the millimeter wave can penetrate all kinds of optical opaque materials and dielectric materials, such as composite materials, ceramics and clothing. It can penetrate the surface to image the hidden target. Millimeter wave radar detection imaging technology has great potential in various application markets, such as ground penetrating radar, non-destructive testing and medical imaging. It has become one of the most important imaging technologies in recent ten years. The millimeter wave radar has the advantages of high resolution and no harm to the human body [4]. However, many millimeter wave imaging studies involve highly complex and expensive the human body [4]. However, many millimeter wave imaging studies involve highly complex and expensive customized systems. In 2020, MIMO-ISAR technology was used to reduce scanning time in a near-field millimeter wave imaging system [5]. In 2021, dualpolarization antennas were employed to improve the millimeter wave imaging system [6]. This makes it possible to design low-cost and low-power millimeter wave imagers based on the latest development of frequency modulated continuous wave (FMCW) millimeter wave radar with synthetic aperture radar (SAR) [7] and multiple-input multiple-output (MIMO) [8] radar antennas technology. This paper uses the MIMO-SAR radar to move along a zigzag route: the radar starts at three transmit antennas and four receive antennas and transmits the FMCW signal at each position, receives and stores the radar echo signal at the corresponding transmitting position, which generates an equivalent long antenna aperture-the image's longitudinal resolution and horizontal resolution is guaranteed.
However, in the previous near-field millimeter wave imaging system, human intervention is needed to check whether the tested person is carrying dangerous goods, which greatly reduces the detection efficiency. In recent years, convolutional neural networks have been used for SAR images classification [9]. There is a lot of redundancy in mainstream convolutional neural networks, which leads to the process of training the model taking up a lot of time and memory space. Lightweight convolutional neural networks, such as predicting facial expressions by ShuffleNetV2 [10], complete autonomous vehicle target detection by MobileNetV3 [11] and remote sensing image classification by Ghost-Net [12], reduce the amount of network parameters and calculations through redefining convolution, adopting branch structure and optimizing the network layer structure. Compared with traditional neural networks, lightweight CNN reduces the size of the model and increases the speed while maintaining the same level of accuracy. On the basis of the existing lightweight neural network, this paper innovatively introduces the SE (squeezeand-excitation) and SK (selective kernel) attention mechanism module, the importance of each feature channel is automatically acquired through learning, and then the useful features are promoted according to this importance and the features that are not useful for the current task are suppressed. The system performance is improved, and better classification results are obtained. Therefore, this paper will implement a two-dimensional millimeter wave imaging system based on the combination of the low-cost millimeter wave radar and the MIMO-SAR technology. The IWR1443 mm wave radar board, mmWave-Devpack, mechanical slide rail and TSW1400 mm wave development board are selected to build the hardware environment. Through HSDC Pro, Uniflash, MATLAB, Python and other software environments, three processes can be implemented: (1) radar Z scanning along X and Y axes and acquiring original data; (2) image reconstruction and preprocessing; (3) image recognition. Finally, the target can be detected and recognized. The process is shown in Figure 1.

Test Object Distance
IWR1443 mm wave radar is used in this system to judge whether there is an object in the detection direction. The millimeter wave radar emits continuous frequency modulated waves (FMCW), the obtained intermediate frequency (IF) signal is transformed by fast Fourier transform (FFT), which is analyzed in the frequency domain, then the frequency of the corresponding point at the spectrum peak is obtained [13], as shown in Figure 2.

Test Object Distance
IWR1443 mm wave radar is used in this system to judge whether there is an object in the detection direction. The millimeter wave radar emits continuous frequency modulated waves (FMCW), the obtained intermediate frequency (IF) signal is transformed by fast Fourier transform (FFT), which is analyzed in the frequency domain, then the frequency of the corresponding point at the spectrum peak is obtained [13], as shown in Figure 2. According to the Formulas (1) and (2), the distance results of metal objects with high reflectivity are shown in Table 1. (1)

2
(2) If the reflectivity of the object is high, the intensity of the IF signal obtained by the radar will be correspondently large. The signal is transformed from the time domain to the frequency domain, where the frequency corresponds to the distance of the object, and the peak value in the frequency domain after the signal transformation indicates that the object exists at the distance.
The two round measurements were measured at different time points. According to the analysis of the experimental results, target distances calculated by the algorithm are consistent with the true values of 0.35 m, 0.50 m and 0.75 m, the relative error is less than 5%. This experiment shows that the existence of a point at a certain distance of an object can be observed statically through the IF signal generated by the radar. This idea is According to the Formulas (1) and (2), the distance results of metal objects with high reflectivity are shown in Table 1. If the reflectivity of the object is high, the intensity of the IF signal obtained by the radar will be correspondently large. The signal is transformed from the time domain to the frequency domain, where the frequency corresponds to the distance of the object, and the peak value in the frequency domain after the signal transformation indicates that the object exists at the distance.
The two round measurements were measured at different time points. According to the analysis of the experimental results, target distances calculated by the algorithm are consistent with the true values of 0.35 m, 0.50 m and 0.75 m, the relative error is less than 5%. This experiment shows that the existence of a point at a certain distance of an object can be observed statically through the IF signal generated by the radar. This idea is Appl. Sci. 2021, 11, 8926 4 of 17 extended to a two-dimensional imaging process, the reflectivity of each point of the target can be obtained by the IF signal.

Synthetic Aperture Radar (SAR) and Multiple-Input Multiple-Output (MIMO) Radar Antennas Technique
Using a single radiation unit, the radar moves continuously along a straight line. After receiving the echo signal of the target at different positions, the intermediate frequency (IF) signal is obtained by radar correlative demodulation and stored; the raw data is then uploaded to the host. In this way, the aperture of the antenna can be increased, which can be regarded as a column of the horizontal antenna array [14]. In the course of a radar Z scan, the MIMO-SAR radar is used to improve image resolution and reduce imaging cost compared to using a multi-radar imaging system. In this paper, GUI in MATLAB is used to control the synchronization of radar transceiver signal and mechanical slide motion. X and Y axis linkage Z scanning as shown in Figure 3. extended to a two-dimensional imaging process, the reflectivity of each point of the target can be obtained by the IF signal.

Synthetic Aperture Radar (SAR) and Multiple-Input Multiple-Output (MIMO) Radar Antennas Technique
Using a single radiation unit, the radar moves continuously along a straight line. After receiving the echo signal of the target at different positions, the intermediate frequency (IF) signal is obtained by radar correlative demodulation and stored; the raw data is then uploaded to the host. In this way, the aperture of the antenna can be increased, which can be regarded as a column of the horizontal antenna array [14]. In the course of a radar Z scan, the MIMO-SAR radar is used to improve image resolution and reduce imaging cost compared to using a multi-radar imaging system. In this paper, GUI in MATLAB is used to control the synchronization of radar transceiver signal and mechanical slide motion. X and Y axis linkage Z scanning as shown in Figure 3.

Radar Enabled Three Transmitting Antennas and Four Receiving Antennas
In the first version, the radar uses a single transmitter and single receiver mode, and the sampling interval needs to be controlled at 0.9495 mm in the Y direction, requiring multiple scans, which will increase the error of longitudinal movement of the mechanical slide rail, and it is very time-consuming.
In the second version, in order to ensure the sampling interval and improve the resolution of the image, this paper started with three transmitting antennas and four receiving antennas enabled [15]. Therefore, the concept of the virtual channel can be constructed. A total of 12 virtual channels are arranged linearly in the Y-direction.
In the actual test, it was found that 12 virtual channels are used for data analysis at the same time to generate image blur, which will lead to the decline of resolution. In order to improve the quality of information carried by pixels in the longitudinal direction of the image, this paper removes the virtual channel with a higher interference on the upper edge and lower edge, selecting 8 virtual channels to construct 3D data blocks; the scan length on the Y-axis is estimated to be 1 where 8, is the number of scans in the Y direction. After using the MIMO-SAR radar antennas technology, the mechanical slide moves 2λ 7.590 mm each time in the longitudinal direction, the image resolution between each virtual channel is λ/4, as shown in the Figure 4a. A comparison of scanning time and equivalent antenna aperture of the single Y-direction scan between the second version and the first version are shown in Figure 4b.

Radar Enabled Three Transmitting Antennas and Four Receiving Antennas
In the first version, the radar uses a single transmitter and single receiver mode, and the sampling interval needs to be controlled at 0.9495 mm in the Y direction, requiring multiple scans, which will increase the error of longitudinal movement of the mechanical slide rail, and it is very time-consuming.
In the second version, in order to ensure the sampling interval and improve the resolution of the image, this paper started with three transmitting antennas and four receiving antennas enabled [15]. Therefore, the concept of the virtual channel can be constructed. A total of 12 virtual channels are arranged linearly in the Y-direction.
In the actual test, it was found that 12 virtual channels are used for data analysis at the same time to generate image blur, which will lead to the decline of resolution. In order to improve the quality of information carried by pixels in the longitudinal direction of the image, this paper removes the virtual channel with a higher interference on the upper edge and lower edge, selecting 8 virtual channels to construct 3D data blocks; the scan length on the Y-axis is estimated to be D y ≈ N y (M − 1) λ 4 where M = 8, N y is the number of scans in the Y direction. After using the MIMO-SAR radar antennas technology, the mechanical slide moves 2λ = 7.590 mm each time in the longitudinal direction, the image resolution between each virtual channel is λ/4, as shown in the Figure 4a

Actual Measurement Parameter Setting
By using the MIMO-SAR radar, the horizontal equivalent antenna aperture is extended when the mechanical slide rail moves at a uniform speed of 20 mm/s. The radar starts with three transmitting antennas and four receiving antennas; 8 virtual channels are used in this paper, and they are arranged linearly; each step in the longitudinal direction is 7.590 mm, and the longitudinal equivalent antenna aperture is extended. The parameters set in this paper can measure the target with a distance of 90 mm from the radar. The number of sampling points in the horizontal direction is 180, and the number of sampling points in the longitudinal direction is 104. Scanning time and image resolution can be well guaranteed. Detailed parameters are shown in Tables 2 and 3.

Actual Measurement Parameter Setting
By using the MIMO-SAR radar, the horizontal equivalent antenna aperture is extended when the mechanical slide rail moves at a uniform speed of 20 mm/s. The radar starts with three transmitting antennas and four receiving antennas; 8 virtual channels are used in this paper, and they are arranged linearly; each step in the longitudinal direction is 7.590 mm, and the longitudinal equivalent antenna aperture is extended. The parameters set in this paper can measure the target with a distance of 90 mm from the radar. The number of sampling points in the horizontal direction is 180, and the number of sampling points in the longitudinal direction is 104. Scanning time and image resolution can be well guaranteed. Detailed parameters are shown in Tables 2 and 3.

Image Resolution
The resolution of reconstructed image depends on wavelength, scan length and target distance. For two-dimensional imaging, the horizontal (X-axis) and longitudinal (Y-axis) resolutions are estimated to be [2,16]: where D x and D y are the physical lengths of the two-dimensional scan length. According to Z 0 = 90 mm, D x = 90 mm, D y = 98.67 mm, λ = 3.798 mm. The image resolution in X and Y directions are δ x = 1.90 mm and δ y = 1.73 mm.

Building 3D Data Block
After analyzing the bin data returned by radar, a one-dimensional array is obtained, which is converted into two-dimensional data blocks according to the number of IF signal sampling points, and then it is converted into a three-dimensional data block according to the number of sampling points in the horizontal direction and longitudinal direction. Each virtual channel is phase compensated [17], and IF signals of the 8 virtual channels are obtained simultaneously, each virtual channel corresponds to a definite longitudinal scale at a definite X-coordinate. Take a 2D data block with a fixed Y-axis value in the 3D data block, as shown in Figure 5. The resolution of reconstructed image depends on wavelength, scan length and target distance. For two-dimensional imaging, the horizontal (X-axis) and longitudinal (Yaxis) resolutions are estimated to be [2,16]: where and are the physical lengths of the two-dimensional scan length. According to 90 mm, 90 mm, 98.67 mm, λ 3.798 mm. The image resolution in X and Y directions are 1.90 mm and 1.73 mm.

Building 3D Data Block
After analyzing the bin data returned by radar, a one-dimensional array is obtained, which is converted into two-dimensional data blocks according to the number of IF signal sampling points, and then it is converted into a three-dimensional data block according to the number of sampling points in the horizontal direction and longitudinal direction. Each virtual channel is phase compensated [17], and IF signals of the 8 virtual channels are obtained simultaneously, each virtual channel corresponds to a definite longitudinal scale at a definite X-coordinate. Take a 2D data block with a fixed Y-axis value in the 3D data block, as shown in Figure 5.

Reconstruction Image
In the millimeter wave radar imaging process, the radar transmits an FMCW signal and irradiates the target through a synthetic aperture. The received signal at different space points is interferometric demodulated and recorded, and then the IF signal of target to host after scanning is uploaded. Since the purpose of this paper is to generate SAR images, we chose the analytical Fourier transform, which is an existing image reconstruction algorithm [18], according to the dispersion relation of the plane wave in free space, the wave number is divided into three components in a Cartesian coordinate system: The values of the Fourier transform variables and are 2 to 2 , which satisfy the visible region:

Reconstruction Image
In the millimeter wave radar imaging process, the radar transmits an FMCW signal and irradiates the target through a synthetic aperture. The received signal at different space points is interferometric demodulated and recorded, and then the IF signal of target to host after scanning is uploaded. Since the purpose of this paper is to generate SAR images, we chose the analytical Fourier transform, which is an existing image reconstruction algorithm [18], according to the dispersion relation of the plane wave in free space, the wave number k is divided into three components in a Cartesian coordinate system: The values of the Fourier transform variables k x and k y are −2k to 2k, which satisfy the visible region: Two-dimensional plane reflectance of target at a distance of z 0 from the radar can be expressed as: where u(x, y, n) is a three-dimensional data block. FT 2D and FT −1 2D in Formula (6) denote 2D Fourier and inverse Fourier transform operations, respectively.
The following is the image reconstruction of the actual object, as shown in Figure 6. In this paper, the millimeter wave radar is used to detect hidden objects, so the target is placed in a cardboard box with a distance of 90 mm from the radar. The simultaneous activation of the MIMO-SAR radar and the mechanical slide ensured the resolution of the image. Two-dimensional plane reflectance of target at a distance of from the radar can be expressed as: where , , is a three-dimensional data block. and in Formula (6) denote 2D Fourier and inverse Fourier transform operations, respectively.
The following is the image reconstruction of the actual object, as shown in Figure 6. In this paper, the millimeter wave radar is used to detect hidden objects, so the target is placed in a cardboard box with a distance of 90 mm from the radar. The simultaneous activation of the MIMO-SAR radar and the mechanical slide ensured the resolution of the image. The target used in the test is the scissors, which are opened and placed in the paper box. After the image reconstruction algorithm, the details of the scissors can be clearly seen with high object identification. The result of image reconstruction is shown in Figure 7.  The target used in the test is the scissors, which are opened and placed in the paper box. After the image reconstruction algorithm, the details of the scissors can be clearly seen with high object identification. The result of image reconstruction is shown in Figure 7. Two-dimensional plane reflectance of target at a distance of from the radar can be expressed as: where , , is a three-dimensional data block. and in Formula (6) denote 2D Fourier and inverse Fourier transform operations, respectively.
The following is the image reconstruction of the actual object, as shown in Figure 6. In this paper, the millimeter wave radar is used to detect hidden objects, so the target is placed in a cardboard box with a distance of 90 mm from the radar. The simultaneous activation of the MIMO-SAR radar and the mechanical slide ensured the resolution of the image. The target used in the test is the scissors, which are opened and placed in the paper box. After the image reconstruction algorithm, the details of the scissors can be clearly seen with high object identification. The result of image reconstruction is shown in Figure 7.  The scissors placed in the paper box can be detected by the millimeter wave radar, and the SAR image is clearly visible, which verifies the effectiveness and reliability of the analytic Fourier imaging algorithm.

Image Preprocessing
The data set consists of 250 SAR images, which contains 10 categories such as wrench, wire stripper, hammer, rasp, ax, scissors, key, disc, pliers and gun, and each category contains 25 SAR images. The photo of the test object and the corresponding SAR radar image are shown in Figure 8. The experimental setting is to place the item in the carton, and the effect is the same as that when the clothing covers the object.
Appl. Sci. 2021, 11, 8926 8 of 16 The scissors placed in the paper box can be detected by the millimeter wave radar, and the SAR image is clearly visible, which verifies the effectiveness and reliability of the analytic Fourier imaging algorithm.

Image Preprocessing
The data set consists of 250 SAR images, which contains 10 categories such as wrench, wire stripper, hammer, rasp, ax, scissors, key, disc, pliers and gun, and each category contains 25 SAR images. The photo of the test object and the corresponding SAR radar image are shown in Figure 8. The experimental setting is to place the item in the carton, and the effect is the same as that when the clothing covers the object. In the reconstruction algorithm, the distance parameter is given in advance, so that the target can be imaged near this range. The radar original reconstruction image may contain noise, which is caused by the distance error between the target and the radar. In the actual security check process, the relative distance between the object and the radar cannot be guaranteed to be very accurate, so the image preprocessing is very important, which can eliminate noise and enhance the image features and also, improve visual recognition. The radar original reconstructionimage is first processed using the gamma transform algorithm and then linear stretching is carried out, as shown in Figure 9.  In the reconstruction algorithm, the distance parameter Z 0 is given in advance, so that the target can be imaged near this range. The radar original reconstruction image may contain noise, which is caused by the distance error between the target and the radar. In the actual security check process, the relative distance between the object and the radar cannot be guaranteed to be very accurate, so the image preprocessing is very important, which can eliminate noise and enhance the image features and also, improve visual recognition. The radar original reconstructionimage is first processed using the gamma transform algorithm and then linear stretching is carried out, as shown in Figure 9.
where c = 1 and γ = 2.4, gamma transform algorithm deals with the normalized brightness and then reverse-transforms to the real pixel gray value. The linear stretch piecewise function is: , 30 ≤ x < 60 120 + ( that the target can be imaged near this range. The radar original reconstruction image may contain noise, which is caused by the distance error between the target and the radar. In the actual security check process, the relative distance between the object and the radar cannot be guaranteed to be very accurate, so the image preprocessing is very important, which can eliminate noise and enhance the image features and also, improve visual recognition. The radar original reconstructionimage is first processed using the gamma transform algorithm and then linear stretching is carried out, as shown in Figure 9. (a) (b) The gray value of each pixel in image represents the energy of a certain point of the target at a certain distance, so the radar original reconstruction image indicates that the SAR image has the characteristic that the reflected energy of the target is higher than the noise energy. Since the energy value containing the object information is concentrated in the bright region, the gamma transformation algorithm is used, the parameter is adjusted to 2.4 to increase the contrast in the bright areas and decrease the contrast in the dark areas [19]. Then the pixel gray value is handled by a linear stretching algorithm, which contains four piecewise functions: (1) eliminate the noise; (2) preserve the information of the low gray pixel area; (3) map the original pixel value to a higher and wider brightness region, which increases the contrast and brightness of the image; (4) make the image not appear in extremely bright pixels, which ensures the integrity of the image.
By using the gamma transformation algorithm, the effective information of radar original reconstruction image is retained, while the noise is reduced. After the linear stretch, the image is enhanced, and improves the visual recognition. Image preprocessing lays a foundation for the subsequent supervised learning. The results are shown in Figure 10.
The gray value of each pixel in image represents the energy of a certain point of the target at a certain distance, so the radar original reconstruction image indicates that the SAR image has the characteristic that the reflected energy of the target is higher than the noise energy. Since the energy value containing the object information is concentrated in the bright region, the gamma transformation algorithm is used, the parameter is adjusted to 2.4 to increase the contrast in the bright areas and decrease the contrast in the dark areas [19]. Then the pixel gray value is handled by a linear stretching algorithm, which contains four piecewise functions: (1) eliminate the noise; (2) preserve the information of the low gray pixel area; (3) map the original pixel value to a higher and wider brightness region, which increases the contrast and brightness of the image; (4) make the image not appear in extremely bright pixels, which ensures the integrity of the image.
By using the gamma transformation algorithm, the effective information of radar original reconstruction image is retained, while the noise is reduced. After the linear stretch, the image is enhanced, and improves the visual recognition. Image preprocessing lays a foundation for the subsequent supervised learning. The results are shown in Figure 10.

Lightweight Convolutional Neural Networks
While the traditional convolutional neural network leads to the process of training the model, which occupies a lot of time and memory space, a lightweight convolutional neural network, with the advantages of small model volume, high accuracy and less computation, can be used to construct an object recognition algorithm. Software can be integrated into resource-limited embedded and mobile devices, which meets the actual needs of security scene.
Lightweight convolution neural networks include MobileNet, ShuffleNet, GhostNet and other lightweight models. MobileNet and ShuffleNet, respectively, use point-wise convolution and channel shuffle to achieve the purpose of feature communication, which realizes the fusion of features between different groups. GhostNet adopts a different approach, which is based on a group of original feature images and uses linear transformation to obtain more features that can excavate the useful information from the original

Lightweight Convolutional Neural Networks
While the traditional convolutional neural network leads to the process of training the model, which occupies a lot of time and memory space, a lightweight convolutional neural network, with the advantages of small model volume, high accuracy and less computation, can be used to construct an object recognition algorithm. Software can be integrated into resource-limited embedded and mobile devices, which meets the actual needs of security scene.
Lightweight convolution neural networks include MobileNet, ShuffleNet, GhostNet and other lightweight models. MobileNet and ShuffleNet, respectively, use point-wise convolution and channel shuffle to achieve the purpose of feature communication, which realizes the fusion of features between different groups. GhostNet adopts a different approach, which is based on a group of original feature images and uses linear transformation to obtain more features that can excavate the useful information from the original features. The original feature and the linear transform feature are spliced together to enlarge the feature image. By redefining the convolution rules, the lightweight model can extract image features efficiently with a shallow network structure and a few parameters.
The active millimeter wave imaging system can obtain single-channel images, which contain less information than light images, and the contrast between target contour and background is not obvious. More importantly, active millimeter wave images will have varying degrees of virtual shadows due to their imaging principle, which will have a great impact on the classification effect. Based on these characteristics, this paper will use the convolution neural network module with the method of experiment and attention mechanism. On the one hand, the convolutional neural network has strong feature extraction ability; on the other hand, the attention mechanism is used to obtain more details of the target to be concerned, so as to suppress interference information in millimeter wave images and improve the efficiency and accuracy of feature extraction.
The data set consists of 250 SAR images in 10 categories, and the training set and the validation set are divided in a ratio of 3 to 2, all of which are scanned at a distance of 90 mm from the radar. Three representative lightweight networks are proposed in this paper: (1) ShufflenetV2; (2) MobileNetV3; (3) GhostNet_ResNet56 based on GhostNet, which have been repeated for five rounds of verification, and the prediction accuracy is the average of the five rounds of experiments. The images are firstly normalized, and the number of image channels input to the neural network is adjusted to adapt to the characteristics of the grayscale images. During the training, the learning rate of all networks is set as 0.01, the batch size is 16, and the epochs are 30.

• ShuffleNetV2
The ShuffleNetV2 network improves the ShuffleNetV1 network. Firstly, the convolution step size is selected. For the bottleneck block with convolution step size of 1, the input features are first divided into two parts according to channels. The result is entered into two branches, one of which does not take any action to reduce the number of parameters and computational complexity, and the other branch does not take grouping convolution to reduce the memory access cost. For the subsampling building block with convolution step size of 2, the number of feature channels is doubled. In ShufflenetV2, a layer of 1 * 1 convolution is added before the average pooling layer, to further mix features. Concat module is used to replace the original addition of each element to reduce the computational complexity, and Channel Shuffle module is added to increase the information communication between channels [20]. The ShuffleNetV2 convolutional neural network flowchart is shown in Figure 11.
The SAR images are recognized through ShuffleNetV2 network, and the accuracy of validation set is 84.55%. This low accuracy may be due to the slower convergence speed of the network in a limited number of epochs. Moreover, as the number of epochs increases, the accuracy will be improved to a certain extent. complexity, and the other branch does not take grouping convolution to reduce the memory access cost. For the subsampling building block with convolution step size of 2, the number of feature channels is doubled. In ShufflenetV2, a layer of 1 * 1 convolution is added before the average pooling layer, to further mix features. Concat module is used to replace the original addition of each element to reduce the computational complexity, and Channel Shuffle module is added to increase the information communication between channels [20]. The Shuf-fleNetV2 convolutional neural network flowchart is shown in Figure 11. Figure 11. ShuffleNetV2 network structure. Figure 11. ShuffleNetV2 network structure.

• MobileNetV3
MobileNetV3 combines the advantages of MobileNetV1 and MobileNetV2. At the convolution level, MobileNetV1 introduces the deep separable convolution, decomposes the standard convolution into deep convolution and point-by-point convolution, and Mo-bileNetV2 introduces the linear bottleneck and backward residual structure in the network structure. On this basis, MobileNetV3 introduces a squeeze-excitation (SE) attention mechanism in the bottleneck structure. The SE module automatically obtains the importance degree of each feature channel through learning, which enhances the useful features according to the importance degree and inhibits the features that are less useful to the current task [21].
The SAR images are recognized through MobileNetV3 (SE) network, and the accuracy of validation set is 98.18%.

• GhostNet
GhostNet proposes a novel Ghost module that replaces ordinary convolution and can generate more feature images with fewer parameters. Unlike ordinary convolution, the Ghost module contains two steps. In the first step, the feature image of input is convolved to obtain the feature image with half the channel number of ordinary convolution operation. In the second step, linear transformation is used to obtain another part of the feature image generated in the first step. Finally, the two groups of feature images are stitched together to generate the final feature image. The ghost module can replace the ordinary convolution to reduce the computational cost of the convolution layer [22]. The GhostNet convolutional neural network flowchart is shown in Figure 12. volved to obtain the feature image with half the channel number of ordinary convolution operation. In the second step, linear transformation is used to obtain another part of the feature image generated in the first step. Finally, the two groups of feature images are stitched together to generate the final feature image. The ghost module can replace the ordinary convolution to reduce the computational cost of the convolution layer [22]. The GhostNet convolutional neural network flowchart is shown in Figure 12. The SAR images are recognized through the GhostNet_ResNet56 network, and the accuracy of the validation set is 95.45%. This accuracy is significantly higher than that of ShuffleNetv2 but a bit lower than that of MobileNetV3. However, in terms of model parameters and memory usage, GhostNet_ResNet56 is better than MobileNetV3. Thus, GhostNet_ResNet56 is suitable for the classification task of millimeter wave images.
In order to further improve the accuracy of networks, a confusion matrix is used to reflect the accuracy of image classification more clearly, as shown in Figure 13. The SAR images are recognized through the GhostNet_ResNet56 network, and the accuracy of the validation set is 95.45%. This accuracy is significantly higher than that of ShuffleNetv2 but a bit lower than that of MobileNetV3. However, in terms of model parameters and memory usage, GhostNet_ResNet56 is better than MobileNetV3. Thus, GhostNet_ResNet56 is suitable for the classification task of millimeter wave images.
In order to further improve the accuracy of networks, a confusion matrix is used to reflect the accuracy of image classification more clearly, as shown in Figure 13. It can be seen from the confusion matrix that the GhostNet_ResNet56 network is not good at distinguishing between key, pliers, knife, ax, etc., which leads to lower prediction accuracy. Based on the above three basic network models, it can be seen that the Mo-bileNetV3 convolutional neural network with the introduction of the SE attention mechanism model has the highest prediction accuracy. Therefore, the squeeze-and-excitation (SE) and selective-kernel (SK) attention mechanism modules are used to improve the existing classification network.

Two Optimization Algorithms of Attention Mechanism
The squeeze-and-excitation (SE) attention mechanism mainly uses squeeze, excitation and scale to recalibrate the previous features. The squeeze operation, which compresses features along the spatial dimension, turns each two-dimensional feature channel into a real number, which has a global receptive field and represents the global distribution of the response over the characteristic channel. The output dimension matches the number of feature channels input. Next is the excitation operations; it is a mechanism similar to the doors for a recurring neural network. The parameter is used to generate weights for each feature channel. Finally, through scale operation, the weight of the output is treated as the importance of each feature channel after the feature selection and It can be seen from the confusion matrix that the GhostNet_ResNet56 network is not good at distinguishing between key, pliers, knife, ax, etc., which leads to lower prediction accuracy. Based on the above three basic network models, it can be seen that the MobileNetV3 convolutional neural network with the introduction of the SE attention mechanism model has the highest prediction accuracy. Therefore, the squeeze-and-excitation (SE) and selective-kernel (SK) attention mechanism modules are used to improve the existing classification network.

Two Optimization Algorithms of Attention Mechanism
The squeeze-and-excitation (SE) attention mechanism mainly uses squeeze, excitation and scale to recalibrate the previous features. The squeeze operation, which compresses features along the spatial dimension, turns each two-dimensional feature channel into a real number, which has a global receptive field and represents the global distribution of the response over the characteristic channel. The output dimension matches the number of feature channels input. Next is the excitation operations; it is a mechanism similar to the doors for a recurring neural network. The parameter w is used to generate weights for each feature channel. Finally, through scale operation, the weight of the output is treated as the importance of each feature channel after the feature selection and weighted to the previous features to complete the recalibration of the original features in the channel dimension [23].
The selective-kernel (SK) attention mechanism uses a non-linear approach that fuses features from different kernels to adjust the size of the receptive field, which contains split, fuse and select. Split operation generates multiple channels with different kernel sizes, which are related to different receptive field sizes of neurons. The fuse operation combines information from multiple channels to obtain a global and understandable representation for weight selection. The select operation fuses the feature images of different kernel sizes according to the selected weights [24]. In this paper, SE and SK attention mechanisms are used to optimize the neural network algorithms of ShuffleNet series, MobileNet series and GhostNet series.
• GhostNet_SEResNet56 The squeeze-and-excitation (SE) attention mechanism is introduced into Ghost Net_SEResNet56 lightweight convolutional neural network to optimize its network structure [25]. The process is shown in Figure 14.

Results and Discussion
According to the results of the confusion matrix, this paper uses SE and SK attention mechanism to optimize MobileNetV3, ShuffleNetV2 and GhostNet lightweight convolutional neural networks. The results are shown in Table 4.  Figure 14. SE attention mechanism optimizes GhostNet_SEResNet56 network.

Results and Discussion
According to the results of the confusion matrix, this paper uses SE and SK attention mechanism to optimize MobileNetV3, ShuffleNetV2 and GhostNet lightweight convolutional neural networks. The results are shown in Table 4. Where Madd represents the number of operations multiplied first and then added, FLOPs represent the number of floating point operations and MemR + W represents the total memory space occupied by the model. Table 4 shows that the prediction accuracy of the three series networks was significantly improved after the optimization of SE and SK attention mechanism. Although the introduction of the attention module into the SAR image recognition algorithm will increase the network load slightly, it is a tolerable range.
The Madd, Parameters, FLOPs and MemR + W of ShuffleNet series are all higher than those of the other two models, which indicates that the number of calculation amount of the model is the largest and occupies the most memory space, but its prediction performance on SAR image datasets is worse than that of MobileNetV3 series and GhostNet series.
By comparing GhostNet_SEResNet56 and MobileNetV3_SK, the prediction accuracy of GhostNet_SEResNet56 is the same as MobileNetV3_SK; the Madd and FLOPs of Ghost-Net_SEResNet56 are slightly higher than MobileNetV3_SK, but parameters and MemR + W are significantly lower than MobileNetV3_SK, indicating that GhostNet_SEResNet56 optimized by SE attention mechanism can play the greatest advantages within the most limited resources. The confusion matrix of GhostNet_SEResNet56 algorithm is shown in Figure 15a. The Madd, Parameters, FLOPs and MemR + W of ShuffleNet series are all higher than those of the other two models, which indicates that the number of calculation amount of the model is the largest and occupies the most memory space, but its prediction performance on SAR image datasets is worse than that of MobileNetV3 series and GhostNet series.
By comparing GhostNet_SEResNet56 and MobileNetV3_SK, the prediction accuracy of GhostNet_SEResNet56 is the same as MobileNetV3_SK; the Madd and FLOPs of Ghost-Net_SEResNet56 are slightly higher than MobileNetV3_SK, but parameters and MemR + W are significantly lower than MobileNetV3_SK, indicating that GhostNet_SEResNet56 optimized by SE attention mechanism can play the greatest advantages within the most limited resources. The confusion matrix of GhostNet_SEResNet56 algorithm is shown in Figure 15a. GhostNet_ResNet56 is optimized by the SE attention mechanism. Compared with the network without attention mechanism, the network with attention mechanism can significantly improve the accuracy in few epochs. In addition, its convergence speed is significantly accelerated, and the oscillation effect of the tail is effectively weakened, as shown in Figure 15b. Comprehensively consider the classification accuracy of the neural network and its memory occupation, GhostNet_SEResNet56 are used as the object recognition network in this paper.
In this paper, the millimeter wave imaging system can obtain the target SAR image at 90 mm. The number of virtual channels can be increased by increasing the antenna array, GhostNet_ResNet56 is optimized by the SE attention mechanism. Compared with the network without attention mechanism, the network with attention mechanism can significantly improve the accuracy in few epochs. In addition, its convergence speed is significantly accelerated, and the oscillation effect of the tail is effectively weakened, as shown in Figure 15b. Comprehensively consider the classification accuracy of the neural network and its memory occupation, GhostNet_SEResNet56 are used as the object recognition network in this paper.
In this paper, the millimeter wave imaging system can obtain the target SAR image at 90 mm. The number of virtual channels can be increased by increasing the antenna array, and then the longitudinal antenna aperture can be increased. The horizontal synthetic aperture can be widened by increasing the horizontal slide movement distance. The improved hardware can amplify the measured distance while maintaining the image resolution.
In a realistic scenario, target containers and humans carrying targets can sway and move by more than the wavelength which will cause the image to blur. To solve this problem, the speed of the object can be measured first when the object is moving, and then the influence of the speed can be compensated in the imaging algorithm.
Lightweight neural network of deep learning is used for target recognition. Different from the previous manual intervention mode, dangerous objects are identified by machine learning, which can greatly improve the efficiency of security inspection and reduce the uncertainty of manual identification. The limitation of the system at the present stage of this paper is that only ten categories of objects can be identified, which does not include all dangerous goods. In addition, the accuracy of network prediction after the optimization of SE and SK attention mechanism has not been greatly improved, and the lightweight convolutional neural network is easy to overfit and fall into local optimal solution, so the data set needs to be expanded.

Conclusions
In this paper, a detection and recognition system for concealed objects based on the MIMO-SAR radar is proposed. The contributions made in this paper are as follows:

1.
By using the MIMO-SAR radar, the aperture of the radar antenna is expanded to 90 mm in the X-axis direction. Eight virtual channels are established in the Y-direction, which widens the length of the longitudinal direction aperture in each transverse scanning can be equivalent to 4λ. Image resolution can reach 1.90 mm in X-direction and 1.73 mm in Y-direction, when the object is 90 mm away from the radar. The MIMO-SAR imaging system can effectively reduce the scanning time cost, the system economic cost and improve the image resolution.

2.
Gamma transform with a coefficient of 2.4 and linear stretch processing are innovatively carried out for the SAR images to remove the noise caused by distance error and improve visual recognition, which lays a good foundation for the subsequent supervised learning network. 3.
The lightweight convolutional neural network is small in size and occupies less resources, but the prediction accuracy is not high. After the optimization of the SE and SK attention mechanism, the prediction accuracy is improved with the increase of a small part of the resource occupancy rate. Combined with the prediction accuracy; computational complexity: Madd, FLOPs; memory occupation rate: MemR + W, parameters. GhostNet_SEResNet56 is the optimal prediction algorithm for SAR data set, which prediction accuracy of the validation set is 98.