An Accurate Method to Distinguish Between Stationary Human and Dog Targets Under Through-Wall Condition Using UWB Radar

: Research work on distinguishing humans from animals can help provide priority orders and optimize the distribution of resources in earthquake- or mining-related rescue missions. However, the existing solutions are few and their stability and accuracy of classiﬁcation are less. This study proposes an accurate method for distinguishing stationary human targets from dog targets under through-wall condition based on ultra-wideband (UWB) radar. Eight humans and ﬁve beagles were used to collect 130 samples of through-wall signals using the UWB radar. Twelve corresponding features belonging to four categories were combined using the support vector machine (SVM) method. A recursive feature elimination (RFE) method determined an optimal feature subset from the twelve features to overcome overﬁtting and poor generalization. The results after ten-fold cross-validation showed that the area under the receiver operator characteristic (ROC) curve can reach 0.9993, which indicates that the two subjects can be distinguished under through-wall condition. The study also compared the ability of the proposed features of four categories when used independently in a classiﬁer. Comparison results indicated that wavelet entropy-corresponding features among them have the best performance. The method and results are envisioned to be applied in various practical situations, such as post-disaster searching, hostage rescues, and intelligent homecare.


Introduction
The rapid developments in ultra-wideband (UWB) radar technology [1][2][3][4][5][6][7] have attracted increasing interest in civilian and military applications, such as earthquake and hostage rescue operations and gesture recognitions. The technologies of radar imaging, localization, and action recognition have been significantly improved over the past decade, and several research groups are focusing on the distinction between humans and animals based on the advancements in radar technologies. The work in [8] developed a feature-based classification method for walking movements in animals versus humans for border security applications. The research in [9] proposed a classification algorithm using micro-Doppler signals obtained from humans and dogs moving in four different directions, and [10] combined the Gaussian mixture model and the hidden Markov model to distinguish between slow moving animal and human targets to detect potential livestock thieves and poachers within reserves and farmland.
However, these research works between humans and animals have concentrated on the moving states of humans and animals, and there have been no reports focusing on the distinction in stationary The block diagram of the utilized UWB radar system is illustrated in Figure 1. Firstly, the pulse generator generated trigger pulses with a center frequency and pulse repetition frequency of 500 MHz and 128 KHz, respectively. Then, the pulses were sent to the transmitter to excite the transmitter antenna (TA), where they were shaped into bipolar pulse. Next, the bow-tie dipole antenna transmitted the vertically polarized pulses with peak power of approximately 5 W. Meanwhile, the trigger pulses were sent to the delay unit to produce a series of software-controlled range gates of 300 ps in width to turn on the receiver, which was identical with the transmitting one. Finally, through the receiving antenna (RA), return pulse only within the range gates were sampled, integrated, amplified, and then transferred to the computer for further analysis [16]. The detailed parameters of the UWB radar system are listed in Table 1.  After signal acquisition, the raw echo data , are stored in the form of waveforms, where m = 1, 2,⋯, M denotes the sampling point in propagation time, and n =1, 2,⋯, N denotes the sampling point in observation time. The two-dimensional (2-D) pseudo-color image of raw echo data is illustrated in Figure 2, when the human target is 2.5 m behind the wall. The time-axis associated with range along each received waveform is termed as the "fast-time" and denoted by that is in the order of nanoseconds. The time-axis corresponding to the acquisition time of waveforms along the measurement duration is termed as "slow-time" and denoted by t that is in the order of seconds. Each waveform contains M = 2048 sample points and the recorded profile is 20 ns long, corresponding to a detection range of 3 m. The amplitudes along slow-time at each sample point, which stands for a specific range, are defined as point signal. The scanning speed is 64 waveforms per second which is higher than the Nyquist sampling rate for the respiration; thus, the number of recorded waveforms in slow-time can be written as 64 .  After signal acquisition, the raw echo data D(m, n) are stored in the form of waveforms, where m = 1, 2,· · · , M denotes the sampling point in propagation time, and n =1, 2,· · · , N denotes the sampling point in observation time. The two-dimensional (2-D) pseudo-color image of raw echo data is illustrated in Figure 2, when the human target is 2.5 m behind the wall. The time-axis associated with range along each received waveform is termed as the "fast-time" and denoted by τ that is in the order of nanoseconds. The time-axis corresponding to the acquisition time of waveforms along the measurement duration is termed as "slow-time" and denoted by t that is in the order of seconds. Each waveform contains M = 2048 sample points and the recorded profile is τ max = 20 ns long, corresponding to a detection range of 3 m. The amplitudes along slow-time at each sample point, which stands for a specific range, are defined as point signal. The scanning speed is 64 waveforms per second which is higher than the Nyquist sampling rate for the respiration; thus, the number of recorded waveforms in slow-time can be written as N = 64t. Remote Sens. 2019, 11, x FOR PEER REVIEW 4 of 21

Signal Preprocessing
Signal preprocessing steps, including distance accumulation, normalization along the fast-time dimension, direct current (DC) removal, 2 Hz low-pass (LP) filtering, and adaptive filtering based on least mean square (LMS) are first performed, as illustrated in Figure 3. It should be noted that all subsequent feature extractions are based on the preprocessed signals using the steps shown in Figure  3.  Distance accumulation will help reduce the computational complexity effectively on the premise of guaranteeing detailed information. It can be determined by: where , is the echo matrix data after distance accumulation and Q is the distance window width along the fast-time dimension. The value of x is x = 1, 2,⋯, X, and X denotes the distance point number on the fast-time dimension after accumulation. Next, normalization along the fast-time dimension is accomplished to guarantee that the waveforms have the comparable amplitudes, and its processing is as follows: DC removal will remove the DC component and baseline drift. The step of DC removal is given by:

Signal Preprocessing
Signal preprocessing steps, including distance accumulation, normalization along the fast-time dimension, direct current (DC) removal, 2 Hz low-pass (LP) filtering, and adaptive filtering based on least mean square (LMS) are first performed, as illustrated in Figure 3. It should be noted that all subsequent feature extractions are based on the preprocessed signals using the steps shown in Figure 3.  Distance accumulation will help reduce the computational complexity effectively on the premise of guaranteeing detailed information. It can be determined by: where DataA(x, n) is the echo matrix data after distance accumulation and Q is the distance window width along the fast-time dimension. The value of x is x = 1, 2,· · · , X, and X denotes the distance point number on the fast-time dimension after accumulation. Next, normalization along the fast-time dimension is accomplished to guarantee that the waveforms have the comparable amplitudes, and its processing is as follows: Remote Sens. 2019, 11, 2571 5 of 21 DC removal will remove the DC component and baseline drift. The step of DC removal is given by: The cut-off frequency of the low-pass filter is chosen to be 2 Hz to filter out the high-frequency noise and retain respiratory signals. It can be determined by: where H(t) is the impulse response function of the finite impulse response (FIR) filter [13]. Adaptive filtering by means of the least mean square (LMS) algorithm is used to suppress the strong clutters. It is illustrated and verified in [17].
Matrix data DataAF(x, n) is obtained after the execution of all preprocessing steps. The 2-D pseudo-color image of the preprocessed signal is shown in Figure 4, when the human target is 2.5 m behind the wall. The subsequent feature calculation will proceed with DataAF(x, n).
The cut-off frequency of the low-pass filter is chosen to be 2 Hz to filter out the high-frequency noise and retain respiratory signals. It can be determined by: where H(t) is the impulse response function of the finite impulse response (FIR) filter [13]. Adaptive filtering by means of the least mean square (LMS) algorithm is used to suppress the strong clutters. It is illustrated and verified in [17].
Matrix data , is obtained after the execution of all preprocessing steps. The 2-D pseudo-color image of the preprocessed signal is shown in Figure 4, when the human target is 2.5 m behind the wall. The subsequent feature calculation will proceed with , .

Feature Extraction
According to the accumulation of research results in recent years, 12 capable features belonging to four categories, consisting of two energy-corresponding features, two correlation coefficientcorresponding features, four wavelet entropy-corresponding features, and four frequencycorresponding features were utilized after signal preprocessing. The detailed feature descriptions are described below.

Energy-Corresponding Features
where is a scalar denoting amplitudes of the point signals. Then, an optimal window (OW), where the target signal is in the middle position, is chosen along the fast-time dimension with fixed width, and the of every point signal in the OW is calculated. Accordingly, the is expressed as:

Feature Extraction
According to the accumulation of research results in recent years, 12 capable features belonging to four categories, consisting of two energy-corresponding features, two correlation coefficient-corresponding features, four wavelet entropy-corresponding features, and four frequency-corresponding features were utilized after signal preprocessing. The detailed feature descriptions are described below.

•
Standard deviation change rate of micro vibration (StdCRMV) When calculating StdCRMV, Q = 1. Preliminary researches have shown that the deviations in the amplitude values in target position are the greatest in DataAF(x, n), indicating that the standard deviation (Std) of the target signal is the greatest. Target signal is defined as the specific point signal which is right at the target position. The calculation of Std is expressed as: where y is a scalar denoting amplitudes of the point signals. Then, an optimal window (OW), where the target signal is in the middle position, is chosen along the fast-time dimension with fixed width, and the Std of every point signal in the OW is calculated. Accordingly, the StdCRMV is expressed as: where Std max is the max Std value and Std min is the min Std value within the OW. According to the work in [18], it is better to choose the width of the OW as ow Std = 15, so that the difference between human targets and dog targets will be largest. Figure 5 illustrates the value of Std closer and further than the target's position for 15 points of human and dog targets, respectively. Generally, the changing trend of Std in the OW of human targets is much gentler than that of dog targets. Therefore, the StdCRMV of human targets is much lower than that of dog targets.
Remote Sens. 2019, 11, x FOR PEER REVIEW 6 of 21 where is the max value and is the min value within the OW. According to the work in [18], it is better to choose the width of the OW as 15, so that the difference between human targets and dog targets will be largest. Figure 5 illustrates the value of closer and further than the targ (a) (b)  Energy ratio of the reference frequency band ( ) In the calculation of , the preprocessing steps are similar to that of . The difference is that Q = 10 and the normalization is along the slow-time dimension, as illustrated in Equation (7). Thus, the 2048 points along the fast-time index associated to the range are compressed into 200 points [18].
Then, ensemble empirical mode decomposition (EEMD) is performed on the target signal. EEMD is a noise-eliminating algorithm, and it can decompose the original target signal into a series of intrinsic mode function (IMF) components with different characteristic scales by adding multiple sets of different white noises [19][20][21]. Its procedures are as follows: 1. Add a random white noise signal to the original target signal (t): where is the noise-added signal, j=1,2,…,TN and TN is the number of trails, which is chosen to be 50 here.

Decompose
t into a series of IMF components , as follows: , where , is the i-order IMF component of the j-th trail, is the number of IMF components of the j-th trail, and is the residue of the j-th trail.
3. If j < TN, then steps 1 and 2 are repeated, and different random white noise signals are added • Energy ratio of the reference frequency band (ERRFB) In the calculation of ERRFB, the preprocessing steps are similar to that of StdCRMV. The difference is that Q = 10 and the normalization is along the slow-time dimension, as illustrated in Equation (7). Thus, the 2048 points along the fast-time index associated to the range are compressed into 200 points [18].
Then, ensemble empirical mode decomposition (EEMD) is performed on the target signal. EEMD is a noise-eliminating algorithm, and it can decompose the original target signal into a series of intrinsic mode function (IMF) components with different characteristic scales by adding multiple sets of different white noises [19][20][21]. Its procedures are as follows: 1.
Add a random white noise signal wn j (t) to the original target signal Signal o (t): where Signal j (t) is the noise-added signal, j = 1, 2, . . . , TN and TN is the number of trails, which is chosen to be 50 here.

2.
Decompose Signal j (t) into a series of IMF components IMF i,j as follows: where IMF i,j is the i-order IMF component of the j-th trail, V j is the number of IMF components of the j-th trail, and Residue j is the residue of the j-th trail. 3.
If j < TN, then steps 1 and 2 are repeated, and different random white noise signals are added each time.

4.
Obtain V = min(V 1 , V 2 , . . . , V TN ) and calculate the ensemble average of corresponding IMF components of the decompositions as the EEMD result: where i = 1, 2, . . . , V. IMF i is the ensemble average of corresponding IMF components of the decompositions.
After dividing the original target signal into V-order IMF components by EEMD, noise will be eliminated using the discriminant method as expressed in Equations (11) and (12): where β is the energy concentration ratio and R IMF i is the discrete autocorrelation sequence of the i-order (i = 2,3, . . . ,V) IMF components of the target signal, which is defined in Equation (10). [n 1 , n 2 ] denotes the interval with length of three points wherein the symmetric point of R IMF i is in the middle and η denotes energy concentration ratio in the interval. When the decline rate of energy concentration ratio β(v) satisfies the condition β(v) > 2, the v-order IMF component is considered as denoised signal. The reconstructed signal s re is expressed in Equation (13).
ERRFB is the energy proportion of the reconstructed signal in human respiratory frequency band (0.2-0.4 Hz), expressed in Equation (14).
where E re is the total energy of the reconstructed signal in frequency domain, and E re_hb is the energy of reconstructed signal in the reference frequency band, i.e., the human respiratory frequency band. Generally, ERRFB of humans is approximately 40% and that of dogs is approximately 18% [15,19]. Because the details are particularly important in calculation with OCCMV, the distance window width here is Q = 1, and the width of OW here is ow OCCMV = 15. In this way, OCCMV is expressed as Equations (15) and (16):

Correlation Coefficient-Corresponding Features
where TS represents the amplitudes of target signal, and Y the amplitudes of other point signals within the OW. The horizontal bar above the TS and Y denotes the calculation of the average value. CCMV(y) is the correlation coefficient between target signal and another point signal in the OW. OCCMV is the mean value of the CCMV of each point signal in the OW. Generally, the CCMV of human targets is very stable in the OW, close to one. Whereas the change of CCMV of dog targets is greater. So the OCCMV of human targets is larger than that of dog targets. The comparison result of CCMV of human targets and dog targets is expressed as in Figure 6.  Change rate of correlation coefficient of micro vibration ( ) As illustrated in Figure 6, the trend of change of human target is stable, whereas the trend of change of dog target is more intense. Therefore, the of human targets is much smaller than that of dog targets. Equation (17) defines the calculation of .

Wavelet Entropy-Corresponding Features
In the calculation of wavelet entropy-corresponding features, the parameter of Q in the preprocessing steps is chosen to be Q = 10 and the normalization is along slow-time dimension. Thus, the raw echo data are compressed into 200 points along the fast-time index, similar to the preprocessing of .
 Mean of wavelet entropy of target signal ( ) Wavelet analysis can provide optimal time-frequency resolution of the signal and entropy can quantify the signal's frequency patterns as a relevant measure of order or disorder in a dynamic system. Wavelet entropy (WE) combines the advantages of both. It can accurately characterize the dynamic change information of time-frequency variation of non-stationary signal complexity in time domain. More signal components indicate a more complex and irregular signal, and a larger value of wavelet entropy. Generally, dog signals are more complex than human signals [12,22]. Therefore, the of human targets is much smaller than that of dog targets. In the calculation of , wavelet transform is first performed, followed by the calculation of • Change rate of correlation coefficient of micro vibration (CRCCMV) As illustrated in Figure 6, the trend of change of human target is stable, whereas the trend of change of dog target is more intense. Therefore, the CRCCMV of human targets is much smaller than that of dog targets. Equation (17) defines the calculation of CRCCMV.

Wavelet Entropy-Corresponding Features
In the calculation of wavelet entropy-corresponding features, the parameter of Q in the preprocessing steps is chosen to be Q = 10 and the normalization is along slow-time dimension. Thus, the raw echo data are compressed into 200 points along the fast-time index, similar to the preprocessing of ERRFB.

•
Mean of wavelet entropy of target signal (MWE) Wavelet analysis can provide optimal time-frequency resolution of the signal and entropy can quantify the signal's frequency patterns as a relevant measure of order or disorder in a dynamic system. Wavelet entropy (WE) combines the advantages of both. It can accurately characterize the dynamic change information of time-frequency variation of non-stationary signal complexity in time domain. More signal components indicate a more complex and irregular signal, and a larger value of wavelet entropy. Generally, dog signals are more complex than human signals [12,22]. Therefore, the MWE of human targets is much smaller than that of dog targets.
In the calculation of MWE, wavelet transform is first performed, followed by the calculation of relative wavelet energy. In Matsui, T. et al. [23], Morlet first considered wavelets as a family of functions generated by translations and dilations of a unique function called the "mother wavelet" ψ(t). The family wavelet is expressed as: where a is the scaling parameter which measures the degree of compression, b is the translation parameter which determines the time location of the wavelet, and t represents time. Let L 2 ( ) be the real square integrable function space, the discrete wavelet transform of a signal S(t) ∈ L 2 ( ) is defined as [12,13]: where C j (k) is wavelet coefficient of wavelet sequence. For practical signal processing, the signal is assumed to be given by the sampled values S = s 0 (n), n = 1, . . . , N . Then, the signal reconstructed by wavelet transform is expressed as: where ψ j,k (t) = 2 j 2 ψ(2 j t − k) with j, k ∈ Z, and ψ(t) is the mother wavelet. j = −1, −2, . . . , −J is the number of resolution levels and its maximum value is J = log 2 N if the decomposition is performed over all resolution levels. k is the time index and r j (t) is the residual at scale j.
In radar signal processing, the signal is divided among non-overlapping temporal windows of length L, and the length is chosen to be L = 128 empirically. Then, appropriate signal values are assigned to the central point of the time window for each interval i = 1, 2, . . . , N T, N T, = N/L. Next, by considering the mean wavelet energy instead of the total wavelet energy, the mean energy at each resolution level j for the time window i using the wavelet coefficient is: (21) where N j is the number of wavelet coefficients at resolution j involved in the time widow i. Then, the total energy of wavelet coefficient at interval i is expressed as: Next, the relative wavelet energy that represents the energy's probability distribution in scales is obtained by: Finally, the average wavelet entropy of the whole time period is given by: • Standard deviation of wavelet entropy of target signal (StdWE) StdWE is a feature to quantitatively describe the fluctuation degree of wavelet entropy. StdWE of human targets is much smaller than that of dog targets, since human target signals are more regular [12,13], generally speaking. It is given by:

Mean of MWE in the OW window (MMWEOW)
In the calculation of MMWEOW, an OW, where the target point signal is right in the middle, is also needed and the width is chosen to be ow MMWE = 20. MMWEOW is the mean value of MWE of each point signal closer and further the target position for ow MMWE point, i.e., the mean value of MWE in the OW. It is given by: where P is the position of target point signal. •

Ratio of wavelet entropy (RWE)
RWE is the ratio of the mean value of MWE inside and outside the OW. It is expressed as:

Frequency-Corresponding Features
In this section, according to the research and accumulation of our group, it was found that the spectrum distribution of human targets in Hilbert marginal spectrum is different from that of dog targets. The curve of Hilbert marginal spectrum of human targets steeps slightly and the shape of the area under the curve is narrow and high. In contrast, the curve of dog targets is slightly gentle and the shape of the area under the curve is wide and low. Therefore, the frequency value corresponding to 1/4 total frequency band area, 3/4 total frequency band area under the Hilbert marginal spectrum curve, and the width between the above two are extracted. In this step, EEMD is performed first to process the target signal as Equations (8)-(13) illustrate. Then, Hilbert transformation is implemented in each IMF component of the reconstructed signal s re .
Next, the analytical signals are constructed by: The amplitude of the analytical signals is expressed by α i (t) = IMF i (t) 2 +ÎMF i (t) 2 and the instantaneous phase is expressed as . Then, the Hilbert marginal spectrum can be given by: where T is the total signal duration and V use f ul is the total IMF component number of the reconstructed signals. ω is the instantaneous frequency which is defined as ω = 2π dt [24]. The comparison results of Hilbert marginal spectrum of human targets and dog targets are demonstrated in Figure 7.  / , / and width between / and / ( ) In this step, EEMD is performed first to process the target signal as Equations (8)-(13) illustrate. Then, Hilbert transformation is implemented in each IMF component of the reconstructed signal .

(29)
Next, the analytical signals are constructed by: The amplitude of the analytical signals is expressed by and the instantaneous phase is expressed as ]. Then, the Hilbert marginal spectrum can be given by: where is the total signal duration and is the total IMF component number of the reconstructed signals. is the instantaneous frequency which is defined as 2 [24]. The comparison results of Hilbert marginal spectrum of human targets and dog targets are demonstrated in Figure 7. The f 1/4 and f 3/4 are the frequency values of the points corresponding to 1/4 total frequency band area and 3/4 total frequency band area, respectively, in Figure 7. Accordingly, WOHMS is given by: •

Respiratory Frequency (RF)
After performing Fourier transform of the target signal, the RF of human targets and dog targets can be obtained.

SVM-Based Classification Procedure and Feature Selection Strategy
A support vector machine (SVM) is a type of generalized linear classifier which classifies data by supervised learning. Its decision boundary is the maximum margin hyperplane for learning samples. SVM classifier is chosen here because it has good performance for small samples when dividing data into two categories, which fits well with our purpose of distinguishing human beings from dogs.
Although the 12 features are calculated for each sample, these features may not contribute equally to improving the ability of distinguishing human targets from dog targets. There may be overfitting and poor generalization when too many features less capable of distinguishing or highly correlated to each other are adopted in the classifier. To find an optimal feature subset with optimal distinguishing power, recursive feature elimination method on SVM (SVM-RFE) is performed for the overall feature subset. In the selection process of the SVM-RFE, all features are sorted by backward elimination, wherein for each iteration, the feature that has the least contribution in the distinction is removed, i.e., the lowest weight [25,26]. The weights are calculated by the widely used LIBSVM package [27,28]. The higher the weight is, the stronger the distinguishing capability of the feature is. Then, after sorting, different feature subsets would be obtained by choosing the Top-f (1 ≤ f ≤ F) features from the ranked features, where F is the total numbers of features (F = 12). Accordingly, there would be F classifiers. Next, the different classifiers involving different feature subsets are evaluated by calculating the receiver operating characteristic (ROC) curve and the area under the curve (AUC). Finally, to validate the stability and generalization of the optimal classifier, 100 rounds of the ten-fold cross-validation method are performed to calculate the average parameters of the classifier. The procedure of the classification and selection are expressed as follows.

1.
Sorting the overall feature set.
while (the number of features is not zero){ calculating the weights of each feature using the LIBSVM package; removing the feature with the lowest weight; }

2.
Modeling classifiers with the Top-f (1 ≤ f ≤ F) features from the sorted sequence.

3.
Choose the optimal feature subset with the highest AUC from the classifier in step 2.

4.
Calculating the key parameters of the classifier in step 3 after 100 round ten-fold cross-validation.
In the procedure of ten-fold cross-validation, the sample data are firstly divided randomly into ten copies with equal numbers. Then, nine of them are used as training samples and the remaining copy is used as the test data per time, until each data copy is used as test data.
The ordinate of the ROC curve is the true positive rate which is defined as "Sensitivity" and the abscissas is the false positive rate which is defined as "1 − Specificity". The calculations of Sensitivity and Specificity are illustrated in Equations (33) and (34), respectively. One of the advantages of the ROC curve is that when the distribution of positive and negative samples changes, the shape of the ROC curve can be basically unchanged. Therefore, the interference caused by different test sets can be reduced and the evaluation of the performance of the model is more objective. The key parameter extracted from the ROC curve is the value of AUC; a larger value denotes a classifier with a better performance.
It is specified that in the calculation of the AUC of each classifier, the Grid-Search method in the LIBSVM package is applied to determine the optimal parameter combinations of (c, g) which are closely related to the distinguishing performance of the classifier. The c is the penalty coefficient corresponding to the generalization ability of the model. g, one of the parameters of the kernel function in the classifier, is related to the number of support vectors and further affects the speed of model training and prediction. Finally, an optimal feature subset for distinguishing task is selected by choosing the subsets with the highest AUC values. The overall steps in the selection of the optimal classifier and the detailed processes of the SVM-RFE are illustrated in Figures 8 and 9, respectively. LIBSVM package is applied to determine the optimal parameter combinations of (c, g) which are closely related to the distinguishing performance of the classifier. The c is the penalty coefficient corresponding to the generalization ability of the model. g, one of the parameters of the kernel function in the classifier, is related to the number of support vectors and further affects the speed of model training and prediction. Finally, an optimal feature subset for distinguishing task is selected by choosing the Figure 8. Overall steps of the selection of the optimal classifier. In "Step 1: Radar data acquisition", the raw echo data are collected. In "Step 2: Feature extraction", twelve feature species belonging to four categories are extracted. In "Step 3: Optimal classifier selection by SVM-RFE", an optimal classifier with the highest AUC value is selected. Figure 9. Detailed processes of the SVM-RFE. In "Step 1: Feature sorting", the color of a feature represents its contribution in the distinguishing task. A warmer color denotes more contributions. Abbreviations: AUC, area under the curve; In "Step 2: AUC values calculation", the optimal parameter combinations of (c, g) are determined and AUC values with Top-f (1 f F) features from the sorted sequence are calculated. Finally, an optimal classifier with optimal Top-f features will be obtained.

Figure 8.
Overall steps of the selection of the optimal classifier. In "Step 1: Radar data acquisition", the raw echo data are collected. In "Step 2: Feature extraction", twelve feature species belonging to four categories are extracted. In "Step 3: Optimal classifier selection by SVM-RFE", an optimal classifier with the highest AUC value is selected. function in the classifier, is related to the number of support vectors and further affects the speed of model training and prediction. Finally, an optimal feature subset for distinguishing task is selected by choosing the subsets with the highest AUC values. The overall steps in the selection of the optimal classifier and the detailed processes of the SVM-RFE are illustrated in Figures 8 and 9, respectively. Overall steps of the selection of the optimal classifier. In "Step 1: Radar data acquisition", the raw echo data are collected. In "Step 2: Feature extraction", twelve feature species belonging to four categories are extracted. In "Step 3: Optimal classifier selection by SVM-RFE", an optimal classifier with the highest AUC value is selected. Figure 9. Detailed processes of the SVM-RFE. In "Step 1: Feature sorting", the color of a feature represents its contribution in the distinguishing task. A warmer color denotes more contributions. Abbreviations: AUC, area under the curve; In "Step 2: AUC values calculation", the optimal parameter combinations of (c, g) are determined and AUC values with Top-f (1 f F) features from the sorted sequence are calculated. Finally, an optimal classifier with optimal Top-f features will be obtained. Figure 9. Detailed processes of the SVM-RFE. In "Step 1: Feature sorting", the color of a feature represents its contribution in the distinguishing task. A warmer color denotes more contributions. Abbreviations: AUC, area under the curve; In "Step 2: AUC values calculation", the optimal parameter combinations of (c, g) are determined and AUC values with Top-f (1 ≤ f ≤ F) features from the sorted sequence are calculated. Finally, an optimal classifier with optimal Top-f features will be obtained.
In ten-fold cross-validation, sample data are divided into ten copies and implemented ten times for the training test procedure, until each copy is used as test data once. The results of the ten times are accumulated and performance of the classifiers are evaluated using AUC value, Sensitivity, Specificity, and Accuracy (ACC). The human target labels in the classifiers are "+1" as the positive samples, and the dog target labels are "−1" as the negative samples. Sensitivity is the accuracy of judging actual human targets as human targets. Specificity is the accuracy of judging actual dog targets as dog targets. ACC is the overall accuracy of judging the targets correctly, whether human or dog targets. AUC, the area under the ROC curve, is a parameter for evaluating the overall performance of the classifier. The calculations of Sensitivity, Specificity, and Accuracy are expressed as: the number o f "+1" both in predictive and test labels simultaneously total number o f " + 1" in the test labels The test labels are the labels ("+1" or "−1") of the target samples when the samples are treated as test data. The predictive labels are the corresponding predictive classification results of the test data used in the optimal classifier.
In addition, the method is also tested to distinguish human targets versus no targets and dog targets versus no targets. Let us assume that the accuracies of both classifications are highly validated, then it prompts that any two types of targets within human, dog, and no target can be effectively classified using the method. Simultaneously, it can further prove that the vital information of human or dog is effectively captured. Fifty sets of environmental interference signals without any targets are collected and utilized for classification with signals of human targets and dog targets, respectively. The classification method used here is SVM-RFE, which is identical to that of the method in the classification between human and dog targets. No targets samples are labeled as "−1" in both classifications. The key result parameters of optimal classifiers with optimal subsets are described in Section 5.4.

Experimental Setup and Data acquisition
A total of eight healthy human targets aged between 23 and 42 and five grown-up beagle dog targets aged approximately 1 year were involved. All dogs were provided by the Experimental Animal Center of Fourth Military Medical University. A 28-cm-thick brick wall was present between the targets and the antennas. The photographs and geometries of the experiments are shown in Figure 10. Each target was detected ten times at about 2.5 m away from the wall. Because the time interval of every two acquisitions, distance away from the wall, posture facing radar, and environmental interference are not identical each time, the raw radar signals can be treated from different samples when the acquisition times are small (10 times per target). Each dog target laid prone quietly on the experimental table, whose middle line was 2.5 m away from the wall. Owing to the width of the table being 0.8 m, dog targets could choose a comfortable posture on the table following their wishes, until they remained still for a long time. Thus, dogs could lay facing different directions with random postures in each sample collection. Similarly, different directions of standing facing are set in the human signal collection scenario. What is more, for each individual human target, measuring intervals ranged from 1 minute to 20 minutes between two acquisition signals, and the measuring intervals of each individual dog target ranged from 1 minute to 12 hours. Therefore, the states of targets and the environmental interference were not identical each time. So, there may be large differences in the signal features for each acquisition, even if it is the same target. The detailed information for the experimental subjects is presented in Table 2.   Table 3 summarizes the feature information to show the extracted features more clearly. The total feature sets includes 12 species in four categories, and the detailed computation and acquisition methods are described in Section 3.  Table 3 summarizes the feature information to show the extracted features more clearly. The total feature sets includes 12 species in four categories, and the detailed computation and acquisition methods are described in Section 3. Note: "OW w " represents the width of the OW.

Classification Between Human Targets and Dog Targets
The results of the feature sorting and the optimal feature subset are illustrated in Table 4. The order of the features in the list of sorting results represents their distinguishing performance. The smaller the sequence number, the higher the feature weights. Among the features, the CRCCMV has the highest weight and contributes the most to the distinguishing task.

RF
The classification performance using different subsets of ranked features is illustrated in Figure 11. The optimal feature subset is the Top-11 ranked features. The feature that contributes the least is the respiratory frequency (RF), which was expected since the RF of human and dog targets are about the same, which is approximately 0.2-0.4 Hz. Therefore, the difference in the RF feature between human and dog targets is the smallest. The feature that contributes the most is CRCCMV. As shown in Figure 6, the CCMV curve of human targets is much smoother than that of dog targets. In this way, the difference of CRCCMV between human targets and dog targets will be much bigger. The key parameters of the classifier with the optimal feature subset after 100 rounds of ten-fold cross-validation is presented in Table 5. Table 5. Key parameters of the classifier with the optimal feature subset after 100 rounds of ten-fold cross-validation.

Analysis of Contribution of Different Categories
To compare the ability of different category features in distinguishing human targets from dog targets, the overall features are divided into five groups of features and their performance is evaluated using the ROC curve and key parameters of classifier: (1) Correlation coefficient-corresponding features; (2) wavelet entropy-corresponding features; (3) energy-corresponding features; (4) frequency-corresponding features; and (5) optimal feature subset. The comparison results are expressed in Figure 12 and Table 6. Correlation coefficient-corresponding features independently and wavelet entropy-corresponding features independently have similar performances, viewed from the comparison results of ROC curve, obviously superior to the other two groups when used along. The optimal feature subset has the best performance in any way viewed from the parameters in Table 6. From the perspective of the ability in identifying human targets, correlation coefficient-corresponding features utilized independently have the same performance as the optimal feature subset, but the ability of identifying dog targets has a much poorer comparison.

Classification Between No Target Signals and Target Signals
The photograph and geometry of no target signals acquisition are demonstrated in Figure 13. The deployments are the same as those of dogs, except that there are no targets.

Classification Between No Target Signals and Target Signals
The photograph and geometry of no target signals acquisition are demonstrated in Figure 13. The deployments are the same as those of dogs, except that there are no targets. The collected 50 samples of no targets were used for classification with human target samples and dog target samples, respectively. The key parameter of the corresponding optimal classifiers after SVM-RFE and 100 rounds of ten-fold cross-validation are listed in Table 7. Table 7. Key parameters of the corresponding optimal classifiers of no target classification with human targets and dog targets, respectively.

Sensitivity Specificity Accuracy AUC
No target with human target classification 1.0000 0.9890 0.9958 1.0000 No target with dog target classification 0.9530 0.9936 0.9733 0.9976 The AUC of classifier between no targets and human targets can reach 1.0. Considered in conjunction with the classification between human and dog targets, the AUC value substantiates that the collected signals of human targets are veritable and useful, i.e., the 80 samples of humans truly contain the information of human targets. Similarly, the best AUC value between no targets and dog targets is 0.9976, which implies that the dog signals are also true target signals. Meanwhile, the results in Table 7 verify the validity of the method for the classification between target situations and no target situations. The collected 50 samples of no targets were used for classification with human target samples and dog target samples, respectively. The key parameter of the corresponding optimal classifiers after SVM-RFE and 100 rounds of ten-fold cross-validation are listed in Table 7. Table 7. Key parameters of the corresponding optimal classifiers of no target classification with human targets and dog targets, respectively. The AUC of classifier between no targets and human targets can reach 1.0. Considered in conjunction with the classification between human and dog targets, the AUC value substantiates that the collected signals of human targets are veritable and useful, i.e., the 80 samples of humans truly contain the information of human targets. Similarly, the best AUC value between no targets and dog targets is 0.9976, which implies that the dog signals are also true target signals. Meanwhile, the results in Table 7 verify the validity of the method for the classification between target situations and no target situations.

Discussion
The proposed method provided an outstanding performance in distinguishing stationary human and dog targets under through-wall condition. Twelve features belonging to four categories were combined based on the SVM-RFE method, and the distinguishing accuracy was able to reach 0.9924. The method was significant for some actual applied post-disaster rescuing situations. Especially, it could help optimizing distribution of rescue resources and enhance the rescuing confidence in post-earthquake searching and miner accident rescuing, where the trapped subjects are buried under obstacles and unable to move. Furthermore, there are basically no other research groups that have reported on the classification of stationary humans and animals, since the existing method concentrates mostly on the distinguishing of moving targets. However, there are a few issues to further consider for practical application. Besides dogs, cats, rabbits, and some other common family pets that are likely to cause false alarm in post-disaster rescue applications could be included in further research.

Conclusions
An accurate algorithm to distinguish stationary humans from dogs under through-wall conditions is proposed, based on a 500 MHz center-frequency UWB radar. The algorithm combines four categories involving 11 feature species to form an optimal feature subset based on the SVM-RFE method. The classifier using the optimal feature subset was found to have excellent performance in the distinguishing task. The AUC average value and ACC value are 0.9993 and 0.9924, respectively, after 100 rounds of ten-fold cross validation, which confirms that the algorithm is efficient and suitable for the classification of stationary human and dog targets under through-wall conditions. In addition, the correlation coefficient-corresponding features have the most capable contribution compared to the other three groups, which are wavelet entropy-corresponding features, energy-corresponding features, and frequency-corresponding features. To be more rigorous, the classification between no target situations and target situations were performed. The results confirm that the collected human and dog target signals truly contain the information of the respective targets. Meanwhile, the AUC values also verify that the method proposed in this paper is valid in classification between no targets situations and targets situations. We envision that this algorithm can be applied to various practical situations such as earthquake and hostage rescue missions and intelligent homes.