People Walking Classiﬁcation using Automotive Radar

: Automotive radars are able to guarantee high performances at the expenses of a relatively low cost, and recently their application has been extended to several ﬁelds in addition to the original one. In this paper we consider the use of this kind of radars to discriminate different types of people’s movements in a real context. To this end, we exploit two different maps obtained from radar, that is, a spectrogram and a range-Doppler map. Through the application of dimensionality reduction methods, such as principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) algorithm, and the use of machine learning techniques we prove that is possible to classify with a very good precision people’s way of walking even employing commercial devices speciﬁcally designed for other purposes.


Introduction
Recognition of a person's type of movement has implications for many aspects of daily life, from security applications to monitoring for assisted living. Discriminating whether a person is running or walking normally in airports or shopping centers, for example, may help video surveillance to detect possible dangerous situations [1][2][3]. Tools designed for this purpose involve the use of contactless devices, and radar technology is particularly suitable for the mentioned scenario.
Besides its ability to detect the presence of targets of all kinds, sometimes even at considerable distances, radar technology has attracted a large attention thanks to its versatility and usefulness in several fields, from medical applications [4] to traffic surveillance monitoring [5].
In this paper we consider the use of an automotive radar to classify different types of monitored actions. With respect to the work described in Reference [6], the examined activities present less evident differences, since our goal is to distinguish people's way of walking on the basis of their speed. Moreover, the radar here considered works with a higher frequency range and therefore a smaller wavelength, thus allowing a better interaction with objects and improved performance in the micro-Doppler extraction. In addition, the millimeter wave technology exploited allows us to discriminate with a good accuracy also the position of the hands during the walk, whether they are in free movement or hold in pockets. Speed and hands movement classification is performed by using Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE) methods for feature extraction and supervised learning for the classification task. Different algorithms have been tested, obtaining the best performance in terms of accuracy by using the Nearest Neighbor (NN) and the Support Vector Machine (SVM).

Used Devices
The radar used in this paper is the Texas Instruments AWR 1642 [35], originally developed for the automotive market [36]. Being designed for industrial applications, it has reduced costs with respect to other types of radar, and its use in the context of interest of this work allows to verify if it is possible to classify people movements with a good accuracy also exploiting commercial devices. It exploits two bandwidths, in the frequency ranges of 76-77 GHz and 77-81 GHz with 1 and 4 GHz bandwidth, respectively. The former is used for long range applications (up to 150 metres) and the latter for short range applications (up to 30 metres). The radar considered in Reference [6] is an Ankortek SDR-Kit 2400AD, a Software Defined Radio (SDR) working in the frequency range 24-26 GHz, with a maximum bandwidth of 2 GHz and a chirp time and maximum ramp slope of 15.625 MHz/µs [37]. With respect to radar used in Reference [6], the higher frequency, the larger bandwidth and the steeper ramp allow to achieve an increase of twenty time in range performance and three times in speed.
An additional characteristic of AWR 1642 is the presence of multiple-input multiple-output (MIMO) technology in the sensor [38], which, in case of radar systems, has the goal of improving performance in angular detection.

Transmitted Signals
Signals are transmitted using FMCW modulation, this requires that the transmission frequency varies linearly from a minimum value to a maximum value in a time interval, called chirp time. The transmitted chirps are grouped into frames. Inside each frame, which has a time duration called periodicity, the radar transmits a certain number of chirps, as schematized in Figure 1. Each chirp is built as shown in Figure 2. We have an Idle Time, needed because the ramp generator requires some time to restart the ramp and generate a new chirp. Then a guard time, or analog-to-digital converter (ADC) Valid Start Time, is considered in the first part of the ramp, which is not linear and may lead to a performance reduction, as described in Reference [39].
Then we have the effective ADC Sampling Time, which represents the time duration of the ramp acquired by the radar. Within this interval ADCs samples of the IF signals are collected. As  easily observable in Figure 2, a time shorter than the total ramp time t ramp is used, so the used radar bandwidth B is smaller than the maximum possible TB, and is calculated as where S represents the slope of the ramp and ADC SamplingTime is given by the product between the the number of samples n samples acquired for each chirp and the sampling period t sampling . The devices configuration must take in account these parameters in order to avoid as much as possible the non linear effects of the sensor. The importance of avoiding the first part of the ramp is evident from the analysis of the intermediate frequency (IF) signal during time and on the complex plane. As briefly described in Reference [40], it is possible to see this effect on the IQ plot. Using different calibrations, the first with ADC Valid Start Time equal to zero, and the second with time equal to 6 µs it is possible to observe how this imperfection can be avoided without the need of using an algorithm. Figure 3(a) shows 500 samples of IF signal across two different chirps. If the guard time is not used, there is a spike at the beginning of the first chirp. In Figure 3(b) the case of the same segment of IF signal, with an ADC Valid Start Time of 6 µs is depicted, not showing the same effect. The spike disappears with a minimum value of 3 µs.

Radar Signal Processing
On the basis of the used configuration we have four available receiving lines. To perform our analysis we need just one of them, so we can sum the complex samples coming from the analog-to-digital converters in order to improve the signal-to-noise ratio. We thus obtain a vector of complex samples which can be reorganized in the form of matrix, as shown in Figure 4. Along the rows of the matrix, also called fast time, we have samples from single chirps, while on the columns, or slow time, we samples from different chirps.  From this raw matrix we can extract two types of map to classify different types of movement. The first one contains information about distance and speed of the subject and it is obtained by applying a Fourier transform to columns and then to rows; this map is defined as Range -Doppler Map. Each element of the map is a complex number and it is built considering the total acquisition, as shown in Figure 5. Besides distance and speed, this allows to extract also the micro-Doppler components characterizing the movement. Since our subjects are moving during the acquisitions, we can extract the spectrogram from each Range bin and hence characterize their micro-Dopplers along the entire activity.
During each acquisition all the objects are stationary with the exception of the subject under exam, therefore the only significant micro-Doppler components are related to her/him. The presence of stationary objects does not influence either the Range-Doppler map, but only the zero Doppler. As described in Reference [19], from this map is possible to identify the kind of movement carried out by the subject.
The second type of map can be extracted through spectrograms and is denoted as Doppler-Time Map. A spectrogram is the most common time-frequency representation [41], and it is derived from the Short Time Fourier Transform (STFT) according to the following equation where n represents a discrete index of time, k is a discrete index of frequency and w[·] is a window function. The STFT can in fact be considered as the Fourier transform of a signal multiplied by a window sliding over time. A trade-off between resolution in time and in frequency must be found, and overlapping frequencies can help in this sense [42]. Starting from the range matrix, the second matrix in Figure 5, and applying the STFT along the rows, we obtain the Doppler -Time map. This function uses windows of 512 samples, with an overlap of 98% and an Hann window is applied. Figure 6 depicts this process. By using both the mentioned maps it is possible to classify different kinds of movements, as will be explained in the next section.

Movements Classification
In this section we briefly describe the dimensionality reduction techniques and the classification algorithms used in the following section to discriminate the kinds of activities under consideration. As regards features extraction, we resort to two different methods to reduce data dimensionality, the PCA and the t-SNE.
Both the maps obtained through the radar signal processing, that is, the Range-Doppler map and Time-Doppler map, are considered as images. Vectors resulting from the application of dimensionality reduction techniques to these images, that is, the principal components extracted from PCA and the main dimensions given by the t-SNE, will serve as features vectors. We have a set of N images I n of dimension [l × m], with n = 1, · · · , N. Images are initially vectorized row-wise and grouped in order to form a training set X = [x (1) , · · · , x (N) ] T , where T denotes the transpose operator; rows of X correspond to observations and columns correspond to variables.

Principal Component Analysis
Principal Component Analysis (PCA) [43] is a non supervised transform also known as Karhunen-Loeve transform (KLT). It aims at finding suitable linear transformations y of the observed variables that are easily interpreted and capable of highlighting and summarizing the information inherent in the initial matrix I. This tool is especially useful when dealing with a considerable number of variables from which you want to extract the greatest possible information while working with a smaller set of variables.
PCA can hence be described as a transformation of a given set of N vectors into inputs (variables) of the same length K placed in a vector N-dimensional X, which allows to transform this vector into a second vector y, built in such a way that the first element of y includes the greatest possible variability (and therefore more information) of the original variables, that the second represents the greater variability of the x i after the first component, and so up to y (N) which takes into account the smallest fraction of the original variance. Therefore the main components are those linear combinations of the random variables x (N) according to the unit norm which make the variance maximum and which are uncorrelated.
The resulting vector y form the feature vector which can be used for classification. Moreover, PCA algorithm is invertible, so original data can be easily recovered.

t-distributed Stochastic Neighbor Embedding
t-SNE [44] is a non linear and non supervised transform, specifically designed to reduce dimensionality to 2 or 3 dimensions in order to display multidimensional data.
The t-SNE algorithm consists of two main steps. Given our set of N vectorized images x (1) , · · · , x (N) with length l · m, t-SNE first computes the conditional probability p j|i , which represents the similarity of datapoint x j to datapoint x i . In other words, it evaluates the probability that x i would pick x j as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian centered at x i . In formulas, t-SNE then defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the Kullback-Leibler (KL) divergence between the two distributions with respect to the locations of the points in the map.

Classification Algorithms
As regards the classification task, we consider the use of k-NN and SVMs. They are both supervised and non parametric algorithms.
k-NN is an instance-based algorithm, meaning that it does not explicitly learn a model. Instead, it chooses to memorize the training instances which are subsequently used as "knowledge" for the forecasting phase. In concrete terms, this means that only when a query is made in the database (i.e., when asked to provide a label with an input), the algorithm will use the training instances to send a response. As a drawback, this algorithm presents both a storage cost during the training phase, since it is necessary to store a potentially huge dataset, and a computational cost during the prediction phase since the classification of a given observation requires the vision and/or analysis of the entire dataset. In the context of classification, the k-NN algorithm essentially boils down to determine a majority vote among the k closest neighbors to a given unknown instance. The proximity is defined by a distance metric, usually the Euclidean distance, between two data points. SVM algorithm classifies data by creating a linear of non-linear decision boundary to separate different classes. It projects the data through a non-linear function to a space with a higher dimension, lifting them from their original space to a feature space, which can be of unlimited dimension. To perform this operation, SVM makes use of kernels, among which one of the most used is the Gaussian kernel.

Experimental Setup
In our experiments we consider a data set composed by nineteen subjects who repeated each activity for three times, for a total of 168 different acquisitions. Three different activities were examined: • Slow walk; • Fast walk; • Slow walk with hands in pockets.
Note that we do not consider a data set built ad hoc: each subject was simply asked to walk in a "slow" or in a "fast" way, without specifying the number of steps or the time required to complete the activity, in order to generate data as realistic as possible. In addition acquisitions belonging to subjects of different height and weight were collected to provide a set comprehensive of a large variety of characteristics. Walking speed difference is subtle and depends on the person examined, who interpreted it subjectively. In general, the average speed measured for the fast walk is around 2 m/s, while for the slow walk, with both free hands or hands in pockets is about 1.2 m/s. Differences in subjects' speed, including the "holding the arm" case (which is similar to our "hands in pockets"), have been considered in References [45,46], although their datasets were composed by 8 and 3 subjects, respectively. The radar configuration parameters are chosen according to the measurement area selected and to the kind of activity required to the subjects. Some parameters are chosen according to the following range R equation where f beat is the beat frequency and c is the speed of light. We can evaluate the maximum speed of the target as where λ represents the wavelength of the transmitted signal and t chirp is the time duration of the chirp. The measurement area is an hallway, about 12 meters long, and is free of furniture. During each activity the subject goes from the starting point in front of the RADAR to a distance of about 9 meters, and then comes back. Due the fact that the measurement time is of 16 seconds, it is possible that the acquisition ends before the subject returns to the initial position. The parameters used for the measurements are reported in Table 1.
A first analysis has been made on the background without any subject, which is depicted in Figure 7. Only one measurement has been performed, since the test area is the same for all the subjects. From this analysis is possible to see that the background does not affect the measurements, thus we can neglect its effect in the movements classification.
In Figure 8 we show an example a subject walking in different ways, displaying both Range-Doppler maps (on the left) and Doppler-time maps (on the right). It is possible to observe that slow and fast walk are easily recognizable in the maps. As expected, maps related to slow walk with hands in pockets present a slightly less evident Doppler with respect to free hands, but this effect is scarcely noticeable.

Classification
Data obtained after the processing of the radar signal are treated as images. Since their original size cannot be easily handled, all matrices have been reshaped to the same dimension [195 × 119]. In order to further reduce dimensionality and to extract features from images, PCA and t-SNE algorithm have been then applied separately to data.
In Figures 9(a) and 9(b) we show the classification accuracy resulting from exploiting a different number of principal components, by using a NN classifier and a SVM algorithm. We choose to use a Gaussian kernel for the SVM.The value of k for the k-NN and the kernel used for SVM have been chosen by using a leave-one-out cross-validation algorithm, which aims at minimizing the validation error. Each sample of the dataset is alternatively selected as a validation set, whilst the remaining part represents the training set. In this way all samples are used only one time both for training, both for validation. Results obtained by the algorithm for odd values of k between 1 and 49 are shown in Figure 10, where k equal to 1 leads to an error of about 2.4%. The validation error obtained by different kernels in percentage is reported in Table 2, thus directing the choice to the use of linear kernel in our scenario. Sixty percent of the acquisitions are used for training, while the remainder is used for testing. Results have been averaged over 100 classification results obtained choosing training and test sets at random. We consider here only two classes, corresponding to the slow and fast walk. Interestingly, it is possible to observe that the number of principal components (or number of dimension in case of t-SNE) that here corresponds to the number of features, has a small impact on the classification performance. The application of PCA or t-SNE algorithm to extract features from images leads to very similar results, although t-SNE was originally designed to reduce data to two or three dimensions, and becomes very slow for higher values. In addition, we obtain the same results using both Range-Doppler and Doppler-Time maps.
In Tables 3 and 4 we show the confusion matrices obtained by applying classification on two and three different classes. In the first table, measurements of slow walk and slow walk with hands in pockets have been incorporated into a single class, while in the second table they have been split into two separate classes. As predictable, distinguishing free hands from hands in pockets is a much more complicated task than identify different ways of walking. In the first case in fact the best accuracy obtained is about 72% and red boxes highlight the presence of a number of misclassified examples, although the fast walk is recognized from the other activities with a high precision (87.5%); SVM methods seem to achieve better performance than KNN algorithms. In the latter case we have instead an excellent accuracy of more than 93%. In both Tables 3 and 4 we highlighted a high presence of correct detections in green, while a high number of misclassified samples is marked in red.
In Table 5, we give an overview of the results obtained by other works focused on the classification of walking activities through radar measurements, showing the best accuracies achieved. [*] denotes  the present work. In Reference [45] 7 types of activities are considered, that is, walking backwards, limping, depressed, elderly, excited, holding the arm and walking in a zigzag, and the radar used is an Ultra-Wide Band; Reference [46] considers a FMCW radar, and the examined activities are crawl, creep on hands and knees, walk, jog and run. Although the difference between walking slowly or quickly is less evident than the other activities, we prove that our system is able to achieve a better accuracy. Moreover, we consider a larger number of subjects that move differently from each other, thus confirming the validity of our method in a realistic context. The activity of holding the arm while walking [45], which is in some way comparable to our case of walking slowly with hands in pockets, could not be differentiated from the others at all, with a specific accuracy of 42.42% (see Reference [45], Table 2).  The subjectivity and the personal speed interpretation of the conducted tests represents the major error source for our classification model. A standardized time or number of steps during the experiment should probably improve the system performance, but this would not represent a realistic scenario and it is out of the scope of this work.

Conclusion and Future Works
We have assessed the performance of an automotive radar to classify different types of movements, focusing our attention to the distinction of people's way of walking. The dataset was not built ad hoc, but we have collected acquisitions of subjects with different characteristics free to walk in a given indoor environment. We have considered the use of PCA and t-SNE techniques to reduce data dimensionality and to extract features, and then we have applied different classification algorithms. From the obtained results it is possible to state that movement classification of human targets is a much more complex task with respect to the discrimination of people from other objects. However, we have shown that, by exploiting the micro-Doppler components of the radar signal, we are able to identify with a high accuracy slow and fast walking. We have also characterized the presence or absence of movement of the arms with more than 72% of precision, which represents a good starting point for a future work. A possible future direction may also include the investigation of deep learning methods in our scenario in order to better distinguish small movements.
Author Contributions: G.C. designed the system, G.C. and L.S. performed the experimental tests and data processing, writing also the main part of the paper. A.D.S. and E.G. participated in data collection and processing. E.G. coordinated the project, the discussion of result, and the manuscript writing.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: