Multi-Input Deep Learning Based FMCW Radar Signal Classiﬁcation

: In autonomous driving vehicles, the emergency braking system uses lidar or radar sensors to recognize the surrounding environment and prevent accidents. The conventional classiﬁers based on radar data using deep learning are single input structures using range–Doppler maps or micro-Doppler. Deep learning with a single input structure has limitations in improving classiﬁcation performance. In this paper, we propose a multi-input classiﬁer based on convolutional neural network (CNN) to reduce the amount of computation and improve the classiﬁcation performance using the frequency modulated continuous wave (FMCW) radar. The proposed multi-input deep learning structure is a CNN-based structure using a distance Doppler map and a point cloud map as multiple inputs. The classiﬁcation accuracy with the range–Doppler map or the point cloud map is 85% and 92%, respectively. It has been improved to 96% with both maps. In this paper, we use two features of the range–Doppler and point cloud maps to improve identiﬁcation performance with deep learning systems. There are no common features between the range–Doppler and point cloud maps except the intensity of the reﬂected signal. Therefore, when the two features are directly putting into a deep learning system as in Figure 1, optimal performance cannot be obtained.


Introduction
Recently, autonomous driving technologies such as the advanced driver assistance system (ADAS) are being actively. ADAS are commercially available to reduce drivers' fatigue and help safe driving. ADAS includes the adaptive cruise control, intelligent parking assist system, lane departure warning system and autonomous emergency braking system [1][2][3]. Among them, the autonomous emergency braking system prevents accidents by directly operating the brake when an accident such as the collision is expected. To prevent accidents, it is important to be aware of the surrounding environment.
Sensors such as cameras, lidar sensors and radars are used to recognize the surrounding environment. Cameras are inexpensive and have the advantage of being able to recognize objects on the road, but distance information cannot be obtained. Lidar sensors are expensive and have the disadvantage of not getting good performance in any weather conditions such as snow or rain. Radars can measure distance and speed [4], but have the disadvantage of less capable of distinguishing objects than lidar sensors [5]. However, compared to cameras and lidar sensors, they have a more robust detection performance in dark and bad weather conditions [5]. Therefore, radars have emerged as one of the core sensors for ADAS to replace cameras and expensive lidar sensors that are highly affected by the surrounding environment, and research on implementing target classifiers using deep learning-based radars are actively underway [6][7][8].
Many studies [9][10][11][12] use a single input such as the spectrogram and range-Doppler map for classification. However, these images change greatly depending on the angle at which the object faces the radar. Additionally, there are many similar images between objects to be identified. These similar images degrade the identification accuracy of cars, people and motorcycles. The point cloud map is easier to identify objects because the image difference between objects is clearer than the range-Doppler map.
people and motorcycles. The point cloud map is easier to identify objects because the image difference between objects is clearer than the range-Doppler map.
In this paper, we use two features of the range-Doppler and point cloud maps to improve identification performance with deep learning systems. There are no common features between the range-Doppler and point cloud maps except the intensity of the reflected signal. Therefore, when the two features are directly putting into a deep learning system as in Figure 1, optimal performance cannot be obtained. The contributions of the paper are summarized as follows:


We propose a radar-based classification system with collected data using frequency modulated continuous wave (FMCW) radar.  The distance-Doppler map changes greatly depending on the angle at which the object faces the radar. Therefore, we propose a convolutional neural network (CNN)based multi-input deep learning model, which uses both the distance-Doppler map and the point cloud map as inputs to enhance the classification accuracy.
The rest of the paper is organized as follows. Section 2 describes related work on the object classification algorithms using radars. Section 3 presents the conventional singleinput CNN structure and proposes a multiple-input CNN structure. Section 4 discusses the data collection method and the data analysis, and Section 5 explains the experimental results using the collected data. Finally, we conclude in Section 6.

Related Work
In general, the radar cross section [13], phase information [14] and micro-Doppler [15,16] are used to classify objects using radar. However, conventional object detection and classification methods require a large computational complexity [17]. To resolve this problem, a classification technique applying deep learning using an FMCW radar has been proposed in recognition of human behavior and hand gestures. Vaishnav et al. [18] proposed a human behavior recognition technique using an FMCW radar. Additionally, Anishchenko et al. [19] proposed a non-contact automatic fall detection system with an AlexNet using radar data. Skaria et al. [20] used a three-channel time-Doppler map data to improve the performance of the conventional classifier using data from one channel and showed an improved performance of about 10% compared to the conventional singlechannel range-Doppler map. Kim et al. [21] created a time-Doppler spectrogram using a The contributions of the paper are summarized as follows: • We propose a radar-based classification system with collected data using frequency modulated continuous wave (FMCW) radar.

•
The distance-Doppler map changes greatly depending on the angle at which the object faces the radar. Therefore, we propose a convolutional neural network (CNN) -based multi-input deep learning model, which uses both the distance-Doppler map and the point cloud map as inputs to enhance the classification accuracy.
The rest of the paper is organized as follows. Section 2 describes related work on the object classification algorithms using radars. Section 3 presents the conventional singleinput CNN structure and proposes a multiple-input CNN structure. Section 4 discusses the data collection method and the data analysis, and Section 5 explains the experimental results using the collected data. Finally, we conclude in Section 6.

Related Work
In general, the radar cross section [13], phase information [14] and micro-Doppler [15,16] are used to classify objects using radar. However, conventional object detection and classification methods require a large computational complexity [17]. To resolve this problem, a classification technique applying deep learning using an FMCW radar has been proposed in recognition of human behavior and hand gestures. Vaishnav et al. [18] proposed a human behavior recognition technique using an FMCW radar. Additionally, Anishchenko et al. [19] proposed a non-contact automatic fall detection system with an AlexNet using radar data. Skaria et al. [20] used a three-channel time-Doppler map data to improve the performance of the conventional classifier using data from one channel and showed an improved performance of about 10% compared to the conventional singlechannel range-Doppler map. Kim et al. [21] created a time-Doppler spectrogram using a 7.25 GHz Doppler radar. Additionally, then, deep convolutional neural networks (DCNNs) were used to detect the presence and behavior of humans.
Identification research using radar has been applied not only to human behavior recognition but also in various fields such as autonomous vehicles and active sonar. Angelov et al. [22] used automotive radar data and presented the results of classifying cars, people and bicycles using three artificial neural network structures: a convoluted network, a residual network and a combination of the convolutional and recurrent network. Lee et al. [23] used CNN as an identifier using power-normalized Cepstral coefficients (PNCC) as a feature for the identification of underwater objects in the active sonar. Daher et al. [24] identified various classes based on Rulex [25], a high-performance machine learning package, using 24 GHz radar data. Forecasts with a varying number of classes were performed with one, two, or three classes of vehicles and one for humans. Furthermore, they applied a single forecast for all four classes and cascading forecasts in a treelike structure while varying algorithms, cascading the block order, setting class weights and varying the data splitting ratio for each forecast to improve prediction accuracy. Kim et al. [26] used CNN [27], VGG16 [28] and VGG19 [28] as feature extractors using a spectrogram, and used the support vector machine (SVM) for classification. Additionally, Kim et al. [17] analyzed the classification results according to the deep learning technique of various structures using the range-Doppler map.
Until now, most studies have changed the deep learning structure using one feature such as the spectrogram or range-Doppler map. However, since these studies make the structure of deep learning deeper, the amount of computation increases. Additionally, if only the range-Doppler map is used for identification, similar images exist between objects, or various images depending on the azimuth angle between the object and the radar will degrade the identification result.
In this paper, a basic CNN structure composed of a convolution layer and a pooling layer is used in consideration of computational complexity to resolve these problems. To solve the problem when only one range-Doppler map is used, two images are used with the distance-Doppler and point cloud maps. In addition, we propose a multiple input-based CNN identification technique to have optimal identification performance.

Proposed Multi-Input Based CNN Classifier
This section describes the structure of a single input CNN used for image identification and proposes a multiple input CNN structure. In all experiments, the most basic CNN structure was used considering the amount of computation. Figure 1 show the conventional data input method, the deep learning structure and output used in the experiment. Figure 2 shows the proposed multi-input-based deep learning structure. The conventional CNN structure used for image identification uses single channel or two-channel data as a single input as shown in Figure 1. Additionally, based on a single input, CNN and long short-term memory (LSTM) are combined and used in combination with deep learning of several structures [29].
Electronics 2021, 10, x FOR PEER REVIEW 3 of 12 people and bicycles using three artificial neural network structures: a convoluted network, a residual network and a combination of the convolutional and recurrent network. Lee et al. [23] used CNN as an identifier using power-normalized Cepstral coefficients (PNCC) as a feature for the identification of underwater objects in the active sonar. Daher et al. [24] identified various classes based on Rulex [25], a high-performance machine learning package, using 24 GHz radar data. Forecasts with a varying number of classes were performed with one, two, or three classes of vehicles and one for humans. Furthermore, they applied a single forecast for all four classes and cascading forecasts in a treelike structure while varying algorithms, cascading the block order, setting class weights and varying the data splitting ratio for each forecast to improve prediction accuracy. Kim et al. [26] used CNN [27], VGG16 [28] and VGG19 [28] as feature extractors using a spectrogram, and used the support vector machine (SVM) for classification. Additionally, Kim et al. [17] analyzed the classification results according to the deep learning technique of various structures using the range-Doppler map. Until now, most studies have changed the deep learning structure using one feature such as the spectrogram or range-Doppler map. However, since these studies make the structure of deep learning deeper, the amount of computation increases. Additionally, if only the range-Doppler map is used for identification, similar images exist between objects, or various images depending on the azimuth angle between the object and the radar will degrade the identification result.
In this paper, a basic CNN structure composed of a convolution layer and a pooling layer is used in consideration of computational complexity to resolve these problems. To solve the problem when only one range-Doppler map is used, two images are used with the distance-Doppler and point cloud maps. In addition, we propose a multiple inputbased CNN identification technique to have optimal identification performance.

Proposed Multi-Input Based CNN Classifier
This section describes the structure of a single input CNN used for image identification and proposes a multiple input CNN structure. In all experiments, the most basic CNN structure was used considering the amount of computation. Figure 1 show the conventional data input method, the deep learning structure and output used in the experiment. Figure 2 shows the proposed multi-input-based deep learning structure. The conventional CNN structure used for image identification uses single channel or two-channel data as a single input as shown in Figure 1. Additionally, based on a single input, CNN and long short-term memory (LSTM) are combined and used in combination with deep learning of several structures [29]. We used the deep learning structure in Figure 1 for identification using 1-channel and 2-channel images. The point cloud map has a relatively uniform image than the range-Doppler map. Therefore, better classification performance can be obtained when the range-Doppler map and the point cloud map are used together. However, there is no common factor between the two maps because we imaged to consider only the shape of the object in the two maps for object identification, optimal performance cannot be obtained when a two-channel image is used as an input. As shown in Figure 2, features are extracted from the range-Doppler map and the point cloud map by using CNN. Figure 3 shows the block diagram of the proposed algorithm.
tronics 2021, 10, x FOR PEER REVIEW 4 o We used the deep learning structure in Figure 1 for identification using 1-chan and 2-channel images. The point cloud map has a relatively uniform image than range-Doppler map. Therefore, better classification performance can be obtained wh the range-Doppler map and the point cloud map are used together. However, there is common factor between the two maps because we imaged to consider only the shape the object in the two maps for object identification, optimal performance cannot be tained when a two-channel image is used as an input. As shown in Figure 2, features extracted from the range-Doppler map and the point cloud map by using CNN. Figur shows the block diagram of the proposed algorithm. The first step of proposed algorithm is image preprocessing. In Figure 4a is the p processing of the point cloud map with x and y labels removed, and Figure 4b is the p processing of the range-Doppler map. The preprocessing applied to both images is same, and both images use an image with a resolution of 50 × 50 to shorten the learn time. Both the range-Doppler map and the point cloud map are represented in color cording to the intensity of the signal. In the case of a range-Doppler map, a signal due noise exists even in a part where an object does not exist. Additionally, in the case of point cloud map, even if the signal reflected by the same object has a similar shape, signal intensity decreases as the distance increases. Therefore, to reduce the influence the noise signal and use only the shape of the object generated by the reflected signal learning, the RGB channel is converted to a gray channel. Afterwards, to completely move the grids and noise, a threshold value is set to 253, noise signals lower than intensity of the reflected signal are removed, and a median filter is applied.   The first step of proposed algorithm is image preprocessing. In Figure 4a is the preprocessing of the point cloud map with x and y labels removed, and Figure 4b is the preprocessing of the range-Doppler map. The preprocessing applied to both images is the same, and both images use an image with a resolution of 50 × 50 to shorten the learning time. Both the range-Doppler map and the point cloud map are represented in color according to the intensity of the signal. In the case of a range-Doppler map, a signal due to noise exists even in a part where an object does not exist. Additionally, in the case of the point cloud map, even if the signal reflected by the same object has a similar shape, the signal intensity decreases as the distance increases. Therefore, to reduce the influence of the noise signal and use only the shape of the object generated by the reflected signal for learning, the RGB channel is converted to a gray channel. Afterwards, to completely remove the grids and noise, a threshold value is set to 253, noise signals lower than the intensity of the reflected signal are removed, and a median filter is applied.
Electronics 2021, 10, x FOR PEER REVIEW We used the deep learning structure in Figure 1 for identification using 1-ch and 2-channel images. The point cloud map has a relatively uniform image tha range-Doppler map. Therefore, better classification performance can be obtained the range-Doppler map and the point cloud map are used together. However, there common factor between the two maps because we imaged to consider only the sha the object in the two maps for object identification, optimal performance cannot b tained when a two-channel image is used as an input. As shown in Figure 2  The first step of proposed algorithm is image preprocessing. In Figure 4a is th processing of the point cloud map with x and y labels removed, and Figure 4b is th processing of the range-Doppler map. The preprocessing applied to both images same, and both images use an image with a resolution of 50 × 50 to shorten the lea time. Both the range-Doppler map and the point cloud map are represented in col cording to the intensity of the signal. In the case of a range-Doppler map, a signal d noise exists even in a part where an object does not exist. Additionally, in the case point cloud map, even if the signal reflected by the same object has a similar shap signal intensity decreases as the distance increases. Therefore, to reduce the influen the noise signal and use only the shape of the object generated by the reflected sign learning, the RGB channel is converted to a gray channel. Afterwards, to complete move the grids and noise, a threshold value is set to 253, noise signals lower tha intensity of the reflected signal are removed, and a median filter is applied.   The next step is feature extraction. Features are extracted by CNN. Features extracted from each of the distance-Doppler and point cloud maps are concatenated into one. Afterward, features are extracted once more from the concatenated data. Finally, the object is classified using a fully connected layer.

Experiment Setup and Data Analysis
This section describes the experimental scenario for acquiring data directly from the radar, the image preprocessing for deep learning and analyzes the characteristics of each object image. A 79 GHz-band FMCW radar was used, and the specifications of the radar are shown in Table 1. The radar was installed at a height of 1.5 m, and the experiment was conducted in a space with no objects other than the objects to be tested as shown in Figure 5a. The cars used in the experiment are Hyundai Santafe and Chrysler 300c, and the motorcycle is SYM GTS125i. Figure 5b shows the path traveled by the targets. The radar on the x-axis is the location where the radar is installed, and the square is the radar detection range. Cars have stronger reflected signals than humans, so they can be detected even at the maximum radar detection distance of 12 m. However, since it is difficult to detect a person with weak reflected signal strength, the maximum distance is limited to 11 m, and the experiment was not conducted at 0-2 m due to the strong reflected wave. So, all targets moved within 2-11 m, and the targets moved vertically, horizontally and diagonally as shown by the arrows in Figure 5b to collect data from various angles. People data used 1746 and 416 data for training and verification, respectively, and 1648 and 411 data for cars and 1604 and 441 data for the motorcycle were used in the experiment. The next step is feature extraction. Features are extracted by CNN. Features extracted from each of the distance-Doppler and point cloud maps are concatenated into one. Afterward, features are extracted once more from the concatenated data. Finally, the object is classified using a fully connected layer.

Experiment Setup and Data Analysis
This section describes the experimental scenario for acquiring data directly from the radar, the image preprocessing for deep learning and analyzes the characteristics of each object image. A 79 GHz-band FMCW radar was used, and the specifications of the radar are shown in Table 1. The radar was installed at a height of 1.5 m, and the experiment was conducted in a space with no objects other than the objects to be tested as shown in Figure 5a. The cars used in the experiment are Hyundai Santafe and Chrysler 300c, and the motorcycle is SYM GTS125i. Figure 5b shows the path traveled by the targets. The radar on the x-axis is the location where the radar is installed, and the square is the radar detection range. Cars have stronger reflected signals than humans, so they can be detected even at the maximum radar detection distance of 12 m. However, since it is difficult to detect a person with weak reflected signal strength, the maximum distance is limited to 11 m, and the experiment was not conducted at 0-2 m due to the strong reflected wave. So, all targets moved within 2-11 m, and the targets moved vertically, horizontally and diagonally as shown by the arrows in Figure 5b to collect data from various angles. People data used 1746 and 416 data for training and verification, respectively, and 1648 and 411 data for cars and 1604 and 441 data for the motorcycle were used in the experiment.  Figure 6a,c,e represents the range-Doppler map when a person moves in the diagonal, vertical and horizontal directions, respectively, the x-axis represents the Doppler, and the y-axis represents the range. Regardless of the direction of movement, the reason that the Doppler value appears in the form of a wide spread in positive and negative values is due to the reflected signal caused by shaking of arms and legs when walking. Point cloud maps of Figure 6b,d,f shows similar images regardless of direction except for the surrounding noise signals. As shown in Figure 6, the distance Doppler map by a person has  Figure 6a,c,e represents the range-Doppler map when a person moves in the diagonal, vertical and horizontal directions, respectively, the x-axis represents the Doppler, and the y-axis represents the range. Regardless of the direction of movement, the reason that the Doppler value appears in the form of a wide spread in positive and negative values is due to the reflected signal caused by shaking of arms and legs when walking. Point cloud maps of Figure 6b,d,f shows similar images regardless of direction except for the surrounding noise signals. As shown in Figure 6, the distance Doppler map by a person has various types of images depending on the walking speed, that is, the speed at which the arms and legs shake.
Electronics 2021, 10, x FOR PEER REVIEW 6 of 12 various types of images depending on the walking speed, that is, the speed at which the arms and legs shake.
(e) (f)  Figure 7 shows an image of a car moving diagonally, vertically and transversely from the top.  Figure 7a,c,e represents the range-Doppler map when the vehicle moves in the diagonal, vertical and horizontal directions, respectively. When a person moved, the intensity of the reflected signal was stronger compared to the range-Doppler map. In the case of Figure 7a,e, the range-Doppler map had a more diverse Doppler value when moving in the vertical direction due to the wheel and wheel movement. Additionally, the reason Figure 7e had a wider Doppler value than Figure 7a is that the area of the wheel viewed by the radar is wider. However, as shown in Figure 7c, when the car approached or moved away from the radar, the Doppler value was fixed and spread up and down in the graph. In the case of the point cloud map, it appeared long in the direction the car was moving.  the radar is wider. However, as shown in Figure 7c, when the car approached or moved away from the radar, the Doppler value was fixed and spread up and down in the graph. In the case of the point cloud map, it appeared long in the direction the car was moving. Figure 8 shows the data when the motorcycle is moving. Figure 8a,b is the range-Doppler map and point cloud map when moving in the diagonal direction, respectively, and in the case of the range-Doppler map, it can be seen that the Doppler value generated by the wheel appeared weakly around a strong signal. The point cloud map spread diagonally like a motorcycle. Figure 8c,d are graphs of the motorcycle moving in the radar direction, and show similar shapes to those of the vehicle in Figure 7c when moving in the vertical direction. However, it is shorter in y-axis compared to the car. The point cloud map spreads widely in the direction of movement. When moving in the transverse direction as shown in Figure 8e,f, the area where the wheel faces the radar is larger than in the diagonal direction, so it can be confirmed that more Doppler occurs than when moving in the diagonal direction. Likewise, it can be seen that the point cloud map spreads widely in the horizontal direction, which is the moving direction.
Electronics 2021, 10, x FOR PEER REVIEW 8 of 12 Figure 8 shows the data when the motorcycle is moving. Figure 8a,b is the range-Doppler map and point cloud map when moving in the diagonal direction, respectively, and in the case of the range-Doppler map, it can be seen that the Doppler value generated by the wheel appeared weakly around a strong signal. The point cloud map spread diagonally like a motorcycle. Figure 8c,d are graphs of the motorcycle moving in the radar direction, and show similar shapes to those of the vehicle in Figure 7c when moving in the vertical direction. However, it is shorter in y-axis compared to the car. The point cloud map spreads widely in the direction of movement. When moving in the transverse direction as shown in Figure 8e,f, the area where the wheel faces the radar is larger than in the diagonal direction, so it can be confirmed that more Doppler occurs than when moving in the diagonal direction. Likewise, it can be seen that the point cloud map spreads widely in the horizontal direction, which is the moving direction.

Experimental Result
In this section, we compared the experimental results with the proposed algorithm, CNN-based multiple input structure, the results of experimenting with single-channel data and the results of using two-channel data. The computer used in the experiment was the Intel core CPU i9-9900 3.10 GHz, 32 GB RAM and GPU GeForce RTX 2080 SUPER. In all experiments, the categorical cross-entropy function was used for the loss function, Adam [30] for the optimizer and rectified linear unit (ReLU) [31] for the activation function for the three classifications of car, motorcycle and person. The learning rate was 0.0001, the epoch was 100 times and the batch size was 64. Tensorflow framework is used to implement the proposed model [32]. Table 2 shows the number of parameters used for learning and classification performance, according to input data and input structure. When the range-Doppler map was used as a single input, it had 82.26% classification performance. When a point cloud map was used as a single input, the performance was better than when a range-Doppler map was an input, and the accuracy was 91.32%. In the case of the range-Doppler map, there were many data of similar shape between data of cars, people and motorcycle. However, in the case of the point cloud map, the number of similar data was smaller than that of the range-Doppler map. Therefore, it had better identification performance when using the range-Doppler map and the point cloud map as 2 channels as shown in Figure 2 than when using only the range-Doppler map. However, there was no correlation between the range-Doppler map image and the point cloud map image. Therefore, when the proposed multi-input structure was used, the best performance was obtained at 95.98%.  Figure 9 is the confusion matrix of the proposed system. The x-axis is the actual value and the y-axis is the predicted value. In the case of the motorcycle, the result of misclassification as a person was the mos and conversely, in the case of people, there were many cases of misclassification as a mo torcycle. The reason is that in the case of people and motorcycles, there were many simila In the case of the motorcycle, the result of misclassification as a person was the most, and conversely, in the case of people, there were many cases of misclassification as a motorcycle. The reason is that in the case of people and motorcycles, there were many similar data in the form of a wide spread of Doppler values in the range-Doppler map. In many cases, cars were misclassified as motorcycles, because in the case of motorcycles and cars, both the range-Doppler map and the point cloud map have similar data in both sizes and shapes. In addition, to reduce the number of trainable parameters, the experiment was conducted by varying the number of fully connected layers. From the results shown in Table 2, although the number of trainable parameters was decreased when some fully connected layers were removed, the model still achieved an accuracy of 96.21%.

Conclusions
In this paper, we proposed a multiple-input algorithm for a deep learning-based classification model by collecting radar signals for cars, people and motorcycles. In the experiment, only the basic CNN structure was used considering a real-time operation. The range-Doppler map has a problem that the image changes a lot depending on the angle at which the object faces the radar. Therefore, a point cloud map with a small change in the image according to angle was added as a feature. There are no common features between the distance-Doppler map and the point cloud map. When the two images were input through two channels, it did not have optimal identification performance. Therefore, we proposed a CNN-based multi-input deep learning model and obtained the best identification accuracy.