# Highly-Optimized Radar-Based Gesture Recognition System with Depthwise Expansion Module

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

**Depthwise Expansion Module**. The proposed solution utilizes the depthwise convolutions, followed by the standard CNN2D performing a feature embedding. The depth of the topology is regulated by $\alpha $ parameter, where $\alpha \in \{0.25,0.50,0.75,1.00\}$. The system classifies the FMCW radar signal representing eight gestures. The main objective is to obtain higher recognition accuracy than state-of-the-art frameworks for radars, by simultaneously reducing the number of parameters, model size, and inference time. The main modifications compared to the original MobileNetV1 implementation include the replacement of convolutional layers by linearly increasing the number of feature maps through the

**Depthwise Expansion Module**and the usage of fully connected layers in the place of the global average pooling layer. Moreover, we have adapted the size of the input tensor to our data and obtained higher recognition accuracy than the state-of-the-art frameworks. In the proposed framework, the signal from the FMCW radar has been transformed into the compressed representation to avoid the usage of ineffective neural network operators. The gesture vocabulary comprises eight gestures. The data collection setup consists of Raspberry Pi4, tripod, and an Infineon BGT60TR13C radar sensor, while the inference setup is built of Coral Edge TPU, tripod, and an Infineon BGT60TR13C radar board. The acquired samples of each gesture have been preprocessed and then transformed into a 3D tensor, including the range time, velocity time and azimuth time maps. After data preprocessing, the model has been trained, optimized, and deployed on the Coral Edge TPU board.

- We present a novel building block—Depthwise Expansion Module. To the best of our knowledge, this type of building block has never been proposed in the field of radar-based gesture recognition.
- We deploy and test our algorithm on Edge TPU, proposing the 8-bit algorithm implementation. As far as we are aware, we propose the first radar-based gesture recognition classifier, which is optimized and deployed on the Coral Edge TPU board.
- We propose a signal processing pipeline that allows a compressed data representation of the FMCW radar signal.

## 2. Related Works

## 3. System Description and Implementation

#### 3.1. The General Overview of the Proposed Framework

#### 3.2. Radar

#### 3.3. Radar Signal Model

#### 3.4. Radar Signal Processing

#### 3.4.1. Range Doppler Image Generation

#### 3.4.2. Angle Estimation

#### 3.4.3. Dataset Generation

#### 3.5. Gesture Vocabulary

**(a) down -> up**(swiping the hand from down to top),

**(b) up -> down**(swiping the hand from top to bottom),

**(c) left -> right**(swiping the hand from left to right),

**(d) rubbing (rubbing with fingers)**,

**(e) right -> left**(swiping the hand from right to left),

**(f) diagonal southeast -> northwest**(swiping the hand from left bottom corner to right top corner),

**(g) diagonal southwest -> northeast**(swiping the hand from right bottom corner to left top corner), and

**(h) clapping**(clapping hands). Figure 7 presents the t-SNE representation of the collected data. Figure 7 consists of subfigures (a), (b), (c), and (d) presenting the t-SNE representation of combined data, t-SNE representation of range time maps, t-SNE representation of velocity time maps, and t-SNE representation of azimuth time maps, respectively. It can be clearly noticed that concatenating the collected data, i.e., the composition of range time, velocity time, and azimuth time maps, allows for the best separation of clusters. Considering the remaining representations, we can notice that the quality of data separation is worse.

#### 3.6. Experimental Setup

## 4. Deep Learning Classifier

#### 4.1. CNN Architecture

- Input Layer: representing the input data in the form of a 3D tensor.
- Convolutional Layer: the main objective of a convolutional layer is the feature extraction achieved by convolving the input data with a kernel in the form of a 2D matrix. The filter kernels are moved through the input data generating the output (feature maps) of the convolutional layer. The principle of operation of the convolutional layer is depicted in Figure 14.
- Batch Normalization Layer: the layer used after convolution to speed up the training process.
- Activation Function: the activation function, e.g., ReLu, LeakyRelu, ReLu6, SiLu, SeLu, and GELU. It is used to introduce the nonlinearity, and to be able to learn more sophisticated data patterns.
- MaxPooling2D: the layer utilized for the dimensionality reduction and feature extraction of the most relevant data.
- Regularization Layers: e.g., Dropout, AlphaDropout, and GaussianDropout; employed to make the classifier noise resistant.

#### 4.2. Radar Edge Network

#### 4.2.1. Depthwise Separable Convolutions

#### 4.2.2. Depthwise Expansion Module

#### 4.2.3. Proposed Classifier

#### 4.3. Edge TPU Deployment

## 5. Performance Evaluation

#### 5.1. Classification Accuracy

#### 5.2. Comparison with Existing Techniques

## 6. Conclusions

## Author Contributions

## Funding

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Abbreviations

HCI | Human–Computer Interaction |

FMCW | Frequency-Modulated Continuous Wave |

RGB | Red Green Blue |

ToF | Time of Flight |

3DCNN | 3D Convolutional Neural Networks |

LSTM | Long Short-term Memory |

RNN | Recurrent Neural Networks |

IoT | Internet of Things |

LRACNN | Long Recurrent All Convolutional Neural Network |

NCS 2 | Neural Compute Stick 2 |

RDM | Range-Doppler Map |

k-NN | k-Nearest Neighbour |

RFC | Random Forest Classifier |

CWT | Continuous Wavelet Transform |

R3DCNN | Recurrent 3D Convolutional Neural Network |

I3D | Inflated 3D ConvNets |

2D-FFT | 2-Dimensional Fast Fourier Transform |

MUSIC | Multiple Signal Classification |

FDTW | Fusion Dynamic Time Wrapping |

LDA | Linear Discriminant Analysis |

QDA | Quadratic Discriminant Analysis |

SVM | Support Vector Machine |

ADC | Analog to Digital Converter |

VGA | Voltage Gain Amplifier |

RDI | Range-Doppler Image |

FFT | Fast Fourier Transform |

RAI | Range-Angle Image |

SVD | Singular Value Decomposition |

MVDR | Minimum Variance Distortionless Response |

DOA | Direction of Arrival |

## References

- Shehab, A.H.; Al-Janabi, S. Edge Computing: Review and Future Directions (Computación de Borde: Revisión y Direcciones Futuras). REVISTA AUS J.
**2019**, 368–380. [Google Scholar] [CrossRef] - Yasen, M.; Jusoh, S. A systematic review on hand gesture recognition techniques, challenges and applications. PeerJ Comput. Sci.
**2019**, 5, e218. [Google Scholar] [CrossRef] [PubMed][Green Version] - Park, H.; McKilligan, S. A systematic literature review for human-computer interaction and design thinking process integration. In Proceedings of the International Conference of Design, User Experience, and Usability, Las Vegas, NV, USA, 15–20 July 2018; Springer: Cham, Switzerland, 2018; pp. 725–740. [Google Scholar]
- Mirsu, R.; Simion, G.; Caleanu, C.D.; Pop-Calimanu, I.M. A pointnet-based solution for 3d hand gesture recognition. Sensors
**2020**, 20, 3226. [Google Scholar] [CrossRef] - Nebiker, S.; Meyer, J.; Blaser, S.; Ammann, M.; Rhyner, S. Outdoor Mobile Mapping and AI-Based 3D Object Detection with Low-Cost RGB-D Cameras: The Use Case of On-Street Parking Statistics. Remote Sens.
**2021**, 13, 3099. [Google Scholar] [CrossRef] - Kumar, P.; Jaiswal, A.; Deepak, B.; Reddy, G.R.M. Hand gesture-based stable powerpoint presentation using kinect. In Progress in Intelligent Computing Techniques: Theory, Practice, and Applications; Springer: Singapore, 2018; pp. 81–94. [Google Scholar]
- Khari, M.; Garg, A.K.; Crespo, R.G.; Verdú, E. Gesture Recognition of RGB and RGB-D Static Images Using Convolutional Neural Networks. Int. J. Interact. Multim. Artif. Intell.
**2019**, 5, 22–27. [Google Scholar] [CrossRef] - Nguyen, N.-H.; Phan, T.; Lee, G.; Kim, S.; Yang, H. Gesture Recognition Based on 3D Human Pose Estimation and Body Part Segmentation for RGB Data Input. Appl. Sci.
**2020**, 10, 6188. [Google Scholar] [CrossRef] - Hakim, N.L.; Shih, T.K.; Arachchi, S.P.K.; Aditya, W.; Chen, Y.; Lin, C. Dynamic hand gesture recognition using 3DCNN and LSTM with FSM context-aware model. Sensors
**2019**, 19, 5429. [Google Scholar] [CrossRef][Green Version] - Kumar, P.; Gauba, H.; Roy, P.P.; Dogra, D.P. Coupled HMM-based multi-sensor data fusion for sign language recognition. Pattern Recognit. Lett.
**2017**, 86, 1–8. [Google Scholar] [CrossRef] - Abeßer, J. A review of deep learning based methods for acoustic scene classification. Appl. Sci.
**2020**, 10, 2020. [Google Scholar] [CrossRef][Green Version] - Alexakis, G.; Panagiotakis, S.; Fragkakis, A.; Markakis, E.; Vassilakis, K. Control of smart home operations using natural language processing, voice recognition and IoT technologies in a multi-tier architecture. Designs
**2019**, 3, 32. [Google Scholar] [CrossRef][Green Version] - Agathya, M.; Brilliant, S.M.; Akbar, N.R.; Supadmini, S. Review of a framework for audiovisual dialog-based in human computer interaction. In Proceedings of the 2015 IEEE International Conference on Information & Communication Technology and Systems (ICTS), Surabaya, Indonesia, 16 September 2015; pp. 137–140. [Google Scholar]
- Palacios, J.M.; Sagüés, C.; Montijano, E.; Llorente, S. Human-computer interaction based on hand gestures using RGB-D sensors. Sensors
**2013**, 13, 11842–11860. [Google Scholar] [CrossRef] - Paravati, G.; Gatteschi, V. Human-computer interaction in smart environments. Sensors
**2015**, 15, 19487–19494. [Google Scholar] [CrossRef][Green Version] - Singh, S.; Nasoz, F. Facial expression recognition with convolutional neural networks. In Proceedings of the 2020 IEEE 10th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 6–8 January 2020; pp. 0324–0328. [Google Scholar]
- Manaris, B. Natural language processing: A human-computer interaction perspective. In Advances in Computers; Elsevier: Amsterdam, The Netherlands, 1998; Volume 47, pp. 1–66. [Google Scholar]
- Katona, J. A Review of Human–Computer Interaction and Virtual Reality Research Fields in Cognitive InfoCommunications. Appl. Sci.
**2021**, 11, 2646. [Google Scholar] [CrossRef] - Aditya, K.; Chacko, P.; Kumari, D.; Kumari, D.; Bilgaiyan, S. Recent trends in HCI: A survey on data glove, LEAP motion and microsoft kinect. In Proceedings of the 2018 IEEE International Conference on System, Computation, Automation and Networking (ICSCA), Pondicherry, India, 6–7 July 2018; pp. 1–5. [Google Scholar]
- Ahmed, S.; Kallu, K.D.; Ahmed, S.; Cho, S.H. Hand gestures recognition using radar sensors for human-computer-interaction: A review. Remote Sens.
**2021**, 13, 527. [Google Scholar] [CrossRef] - Yu, M.; Kim, N.; Jung, Y.; Lee, S. A frame detection method for real-time hand gesture recognition systems using CW-radar. Sensors
**2020**, 20, 2321. [Google Scholar] [CrossRef][Green Version] - Kabanda, G. Review of Human Computer Interaction and Computer Vision; GRIN Verlag: Munich, Germany, 2019. [Google Scholar]
- D’Eusanio, A.; Simoni, A.; Pini, S.; Borghi, G.; Vezzani, R.; Cucchiara, R. A Transformer-Based Network for Dynamic Hand Gesture Recognition. In Proceedings of the IEEE 2020 International Conference on 3D Vision (3DV), Fukuoka, Japan, 25–28 November 2020; pp. 623–632. [Google Scholar]
- Molchanov, P.; Yang, X.; Gupta, S.; Kim, K.; Tyree, S.; Kautz, J. Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4207–4215. [Google Scholar]
- Carreira, J.; Zisserman, A. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6299–6308. [Google Scholar]
- D’Eusanio, A.; Simoni, A.; Pini, S.; Borghi,, G.; Vezzani, R.; Cucchiara, R. Multimodal hand gesture classification for the human–car interaction. In Informatics; Multidisciplinary Digital Publishing Institute: Basel, Switzerland, 2020; Volume 7, p. 31. [Google Scholar]
- Hazra, S.; Santra, A. Robust gesture recognition using millimetric-wave radar system. IEEE Sens. Lett.
**2018**, 2, 1–4. [Google Scholar] [CrossRef] - Hazra, S.; Santra, A. Short-range radar-based gesture recognition system using 3D CNN with triplet loss. IEEE Access
**2019**, 7, 125623–125633. [Google Scholar] [CrossRef] - Hazra, S.; Santra, A. Radar gesture recognition system in presence of interference using self-attention neural network. In Proceedings of the 2019 18th IEEE International Conference on Machine Learning And Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 1409–1414. [Google Scholar]
- Santra, A.; Hazra, S. Deep Learning Applications of Short-Range Radars; Artech House: Norwood, MA, USA, 2020. [Google Scholar]
- Sun, Y.; Zhang, B.; Luo, M. Survey of Edge Computing Based on a Generalized Framework and Some Recommendation. In Proceedings of the International Conference on Edge Computing, Honolulu, HI, USA, 18–20 September 2020; Springer: Cham, Switzerland, 2020; pp. 111–126. [Google Scholar]
- Liu, F.; Tang, G.; Li, Y.; Cai, Z.; Zhang, X.; Zhou, T. A survey on edge computing systems and tools. Proc. IEEE
**2019**, 107, 1537–1562. [Google Scholar] [CrossRef][Green Version] - Yang, Z.; Zhang, S.; Li, R.; Li, C.; Wang, M.; Wang, D.; Zhang, M. Efficient Resource-Aware Convolutional Neural Architecture Search for Edge Computing with Pareto-Bayesian Optimization. Sensors
**2021**, 21, 444. [Google Scholar] [CrossRef] [PubMed] - Hamdan, S.; Ayyash, M.; Almajali, S. Edge-computing architectures for internet of things applications: A survey. Sensors
**2020**, 20, 6441. [Google Scholar] [CrossRef] - Koubâa, A.; Ammar, A.; Alahdab, M.; Kanhouch, A.; Azar, A.T. DeepBrain: Experimental Evaluation of Cloud-Based Computation Offloading and Edge Computing in the Internet-of-Drones for Deep Learning Applications. Sensors
**2020**, 20, 5240. [Google Scholar] [CrossRef] [PubMed] - McClellan, M.; Cervelló-Pastor, C.; Sallent, S. Deep learning at the mobile edge: Opportunities for 5G networks. Appl. Sci.
**2020**, 10, 4735. [Google Scholar] [CrossRef] - TensorFlow Models on the Edge TPU. Coral. Available online: https://coral.ai/docs/edgetpu/models-intro/#supported-operations (accessed on 18 August 2021).
- Capra, M.; Maurizio, B.; Marchisio, A.; Shafique, M.; Masera, G.; Martina, M. An updated survey of efficient hardware architectures for accelerating deep convolutional neural networks. Future Internet
**2020**, 12, 113. [Google Scholar] [CrossRef] - Véstias, M.P. A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing. Algorithms
**2019**, 12, 154. [Google Scholar] [CrossRef][Green Version] - Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM
**2017**, 60, 84–90. [Google Scholar] [CrossRef] - Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv
**2014**, arXiv:1409.1556. [Google Scholar] - He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Fan, F.-L.; Xiong, J.; Li, M.; Wang, G. On interpretability of artificial neural networks: A survey. IEEE Trans. Radiat. Plasma Med. Sci.
**2021**. [Google Scholar] [CrossRef] - Shahroudnejad, A. A survey on understanding, visualizations, and explanation of deep neural networks. arXiv
**2021**, arXiv:2102.01792. [Google Scholar] - Véstias, M.P. Deep learning on edge: Challenges and trends. Smart Syst. Des. Appl. Chall.
**2020**, 23–42. [Google Scholar] [CrossRef] - Deng, S.; Zhao, H.; Fang, W.; Yin, J.; Dustdar, S.; Zomaya, A.Y. Edge intelligence: The confluence of edge computing and artificial intelligence. IEEE Internet Things J.
**2020**, 7, 7457–7469. [Google Scholar] [CrossRef][Green Version] - Chen, J.; Ran, X. Deep Learning with Edge Computing: A Review. Proc. IEEE
**2019**, 107, 1655–1674. [Google Scholar] [CrossRef] - Wang, X.; Han, Y.; Leung, V.C.M.; Niyato, D.; Yan, X.; Chen, X. Convergence of edge computing and deep learning: A comprehensive survey. IEEE Commun. Surv. Tutor.
**2020**, 22, 869–904. [Google Scholar] [CrossRef][Green Version] - Sun, S.; Cao, Z.; Zhu, H.; Zhao, J. A survey of optimization methods from a machine learning perspective. IEEE Trans. Cybern.
**2019**, 50, 3668–3681. [Google Scholar] [CrossRef][Green Version] - Kastratia, M.; Bibaa, M. A State-of-the-Art Survey of Advanced Optimization Methods in Machine Learning RTA-CSIT 2021: Tirana, Albania. In Proceedings of the 4th International Conference on Recent Trends and Applications in Computer Science and Information Technology, Tirana, Albania, 21–22 May 2021. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv
**2017**, arXiv:1704.04861. [Google Scholar] - Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv
**2016**, arXiv:1602.07360. [Google Scholar] - Tan, M.; Le, Q.V. Mixconv: Mixed depthwise convolutional kernels. arXiv
**2019**, arXiv:1907.09595. [Google Scholar] - Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Yu, T.; Zhu, H. Hyper-parameter optimization: A review of algorithms and applications. arXiv
**2020**, arXiv:2003.05689. [Google Scholar] - Elsken, T.; Metzen, J.H.; Hutter, F. Neural architecture search: A survey. J. Mach. Learn. Res.
**2019**, 20, 1997–2017. [Google Scholar] - Siems, J.N.; Klein, A.; Archambeau, C.; Mahsereci, M. Dynamic Pruning of a Neural Network via Gradient Signal-to-Noise Ratio. In Proceedings of the 8th ICML Workshop on Automated Machine Learning (AutoML), Virtual, 23–24 July 2021. [Google Scholar]
- Meng, F.; Cheng, H.; Li, K.; Luo, H.; Guo, X.; Lu, G.; Sun, X. Pruning filter in filter. arXiv
**2020**, arXiv:2009.14410. [Google Scholar] - Liebenwein, L.; Baykal, C.; Carter, B.; Gifford, D.; Rus, D. Lost in pruning: The effects of pruning neural networks beyond test accuracy. In Proceedings of Machine Learning and Systems 3; 2021; Available online: https://proceedings.mlsys.org/paper/2021 (accessed on 20 October 2021).
- Nagel, M.; Fournarakis, M.; Amjad, R.A.; Bondarenko, Y.; van Baalen, M.; Blankevoort, T. A White Paper on Neural Network Quantization. arXiv
**2021**, arXiv:2106.08295. [Google Scholar] - Zhao, R.; Hu, Y.; Dotzel, J.; de Sa, C.; Zhang, Z. Improving neural network quantization without retraining using outlier channel splitting. In Proceedings of the International Conference on Machine Learning, PMLR, Beach, CA, USA, 9–15 June 2019; pp. 7543–7552. [Google Scholar]
- Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2704–2713. [Google Scholar]
- Neill, J.O. An overview of neural network compression. arXiv
**2020**, arXiv:2006.03669. [Google Scholar] - Cheng, Y.; Wang, D.; Zhou, P.; Zhang, T. A survey of model compression and acceleration for deep neural networks. arXiv
**2017**, arXiv:1710.09282. [Google Scholar] - Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv
**2015**, arXiv:1503.02531. [Google Scholar] - Weiß, J.; Pérez, R.; Biebl, E. Improved people counting algorithm for indoor environments using 60 GHz FMCW radar. In Proceedings of the 2020 IEEE Radar Conference (RadarConf20), Florence, Italy, 21–25 September 2020; pp. 1–6. [Google Scholar]
- Aydogdu, C.Y.; Hazra, S.; Santra, A.; Weigel, R. Multi-modal cross learning for improved people counting using short-range FMCW radar. In Proceedings of the 2020 IEEE International Radar Conference (RADAR), Washington, DC, USA, 27 April–1 May 2020; pp. 250–255. [Google Scholar]
- Thi Phuoc Van, N.; Tang, L.; Demir, V.; Hasan, S.F.; Minh, N.D.; Mukhopadhyay, S. Microwave radar sensing systems for search and rescue purposes. Sensors
**2019**, 19, 2879. [Google Scholar] [CrossRef] [PubMed][Green Version] - Turppa, E.; Kortelainen, J.M.; Antropov, O.; Kiuru, T. Vital sign monitoring using FMCW radar in various sleeping scenarios. Sensors
**2020**, 20, 6505. [Google Scholar] [CrossRef] [PubMed] - Wu, Q.; Zhao, D. Dynamic hand gesture recognition using FMCW radar sensor for driving assistance. In Proceedings of the 2018 IEEE 10th International Conference on Wireless Communications and Signal Processing (WCSP), Hangzhou, China, 18–20 October 2018; pp. 1–6. [Google Scholar]
- Son, Y.-S.; Sung, H.; Heo, S.W. Automotive frequency modulated continuous wave radar interference reduction using per-vehicle chirp sequences. Sensors
**2018**, 18, 2831. [Google Scholar] [CrossRef] [PubMed][Green Version] - Lin, J., Jr.; Li, Y.; Hsu, W.; Lee, T. Design of an FMCW radar baseband signal processing system for automotive application. SpringerPlus
**2016**, 5, 1–16. [Google Scholar] [CrossRef][Green Version] - Zhang, Z.; Tian, Z.; Zhou, M. Latern: Dynamic continuous hand gesture recognition using FMCW radar sensor. IEEE Sens. J.
**2018**, 18, 3278–3289. [Google Scholar] [CrossRef] - Ahmed, S.; Cho, S.H. Hand gesture recognition using an IR-UWB radar with an inception module-based classifier. Sensors
**2020**, 20, 564. [Google Scholar] [CrossRef][Green Version] - Molchanov, P.; Gupta, S.; Kim, K.; Pulli, K. Multi-sensor system for driver’s hand-gesture recognition. In Proceedings of the 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Ljubljana, Slovenia, 4–8 May 2015; Volume 1, pp. 1–8. [Google Scholar]
- Lien, J.; Gillian, N.; Karagozler, M.E.; Amihood, P.; Schwesig, C.; Olson, E.; Raja, H.; Poupyrev, I. Soli: Ubiquitous gesture sensing with millimeter wave radar. ACM Trans. Graph. (TOG)
**2016**, 35, 1–19. [Google Scholar] [CrossRef][Green Version] - Chmurski, M.; Zubert, M. Novel Radar-based Gesture Recognition System using Optimized CNN-LSTM Deep Neural Network for Low-power Microcomputer Platform. In Proceedings of the ICAART, Online, 4–6 February 2021; pp. 882–890. [Google Scholar]
- Chmurski, M.; Zubert, M.; Bierzynski, K.; Santra, A. Analysis of Edge-Optimized Deep Learning Classifiers for Radar-Based Gesture Recognition. IEEE Access
**2021**, 9, 74406–74421. [Google Scholar] [CrossRef] - Manganaro, F.; Pini, S.; Borghi, G.; Vezzani, R.; Cucchiara, R. Hand gestures for the human-car interaction: The briareo dataset. In Proceedings of the International Conference on Image Analysis and Processing, Trento, Italy, 9–13 September 2019; Springer: Cham, Switzerland, 2019; pp. 560–571. [Google Scholar]
- Liu, L.; Shao, L. Learning discriminative representations from RGB-D video data. In Proceedings of the Twenty-third International Joint Conference on Artificial Intelligence, Beijing, China, 3–9 August 2013. [Google Scholar]
- Escalera, S.; Baró, X.; Gonzalez, J.; Bautista, M.A.; Madadi, M.; Reyes, M.; Ponce-López, V.; Escalante, H.J.; Shotton, J.; Guyon, I. Chalearn looking at people challenge 2014: Dataset and results. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 459–473. [Google Scholar]
- Soomro, K.; Zamir, A.R.; Shah, M. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv
**2012**, arXiv:1212.0402. [Google Scholar] - Kuehne, H.; Jhuang, H.; Garrote, E.; Poggio, T.; Serre, T. HMDB: A large video database for human motion recognition. In Proceedings of the 2011 IEEE International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2556–2563. [Google Scholar]
- Wang, Y.; Ren, A.; Zhou, M.; Wang, W.; Yang, X. A novel detection and recognition method for continuous hand gesture using fmcw radar. IEEE Access
**2020**, 8, 167264–167275. [Google Scholar] [CrossRef] - Wang, S.; Song, J.; Lien, J.; Poupyrev, I.; Hilliges, O. Interacting with soli: Exploring fine-grained dynamic gesture recognition in the radio-frequency spectrum. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology, Tokyo, Japan, 16–19 October 2016; pp. 851–860. [Google Scholar]
- Ritchie, M.; Capraru, R.; Fioranelli, F. Dop-NET: A micro-Doppler radar data challenge. Electron. Lett.
**2020**, 56, 568–570. [Google Scholar] [CrossRef][Green Version] - Ritchie, M.; Jones, A.M. Micro-Doppler gesture recognition using Doppler, time and range based features. In Proceedings of the 2019 IEEE Radar Conference (RadarConf), Boston, MA, USA, 22–26 April 2019; pp. 1–6. [Google Scholar]
- Trotta, S.; Weber, D.; Jungmaier, R.W.; Baheti, A.; Lien, J.; Noppeney, D.; Tabesh, M.; Rumpler, C.; Aichner, M.; Albel, S.; et al. SOLI: A Tiny Device for a New Human Machine Interface. In Proceedings of the 2021 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; Volume 64, pp. 42–44. [Google Scholar]
- Chudnikov, V.V.; Shakhtarin, B.I.; Bychkov, A.V.; Kazaryan, S.M. DOA Estimation in Radar Sensors with Colocated Antennas. In Proceedings of the IEEE 2020 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO), Svetlogorsk, Russia, 1–3 July 2020; pp. 1–6. [Google Scholar]

**Figure 1.**Data collection, preprocessing, training, and evaluation process of the proposed hand gesture recognition framework for FMCW radar.

**Figure 2.**BGT60TR13C radar board [89].

**Figure 3.**BGT60TR13C radar sensor block diagram [89]. The signal sensed by the three receiver channels (RX1, RX2, and RX3) is mixed with the transmitted signal from TX1, processed, and then converted digitally through the ADC.

**Figure 7.**(

**a**) t-SNE representation of all information, including range time maps, velocity time maps, and azimuth time maps. It is clearly visible that the composition of this information together allows for the separation of clusters; (

**b**) t-SNE representation of range time maps; (

**c**) t-SNE representation of velocity time maps; and (

**d**) t-SNE representation of azimuth time maps.

**Figure 12.**Data collection setup; (

**a**) Raspberry Pi4; (

**b**) 3D-printed case and radar board; and (

**c**) tripod with 3D-printed case and radar board.

**Figure 13.**Inference setup; (

**a**) Coral Edge TPU; (

**b**) 3D-printed case and radar board; and (

**c**) tripod with 3D-printed case and radar board.

**Table 1.**Comparative characteristics of accuracies for the non-optimized and the optimized versions.

Accuracy [%] | |||
---|---|---|---|

x86 | Edge TPU | ||

Topologies | CNN3D | 99.63% | N/A |

CNN2D | 86.25% | 85.88% | |

MobileNetV2—1 bottleneck | 98.88% | 98.88% | |

MobileNetV2—2 bottleneck | 99.00% | 98.75% | |

MobileNetV2—3 bottleneck | 97.13% | 97.25% | |

MobileNetV2—4 bottleneck | 98.50% | 98.50% | |

MobileNetV2—5 bottleneck | 97.75% | 97.75% | |

MobileNetV2—6 bottleneck | 98.00% | 97.88% | |

Proposed 1 | 98.00% | 98.13% | |

Proposed 2 | 97.50% | 97.38% | |

Proposed 3 | 98.13% | 98.00% | |

Proposed 4 | 97.63% | 97.63% |

**Table 2.**Comparison with other approaches. DL: deep learning, k-NN: k-Nearest Neighbour, LDA: linear discriminant analysis, QDA: quadratic discriminant analysis, SVM-l: support vector machine with linear kernel, SVM-q: support vector machine with quadratic kernel.

Model | No. Gestures | Accuracy | Type of Algorithm |
---|---|---|---|

Hazra et al. [27] | 5 | 94.34% | DL |

Zhang et al. [74] | 8 | 96.00% | DL |

Ahmed et al. [75] | 8 | 95.00% | DL |

Hazra et al. [28] | 6 | 94.50% | DL |

Molchanov et al. [76] | 11 | 94.10% | DL |

Lien et al. [77] | 4 | 92.10% | RF |

Chmurski et al. [78] | 4 | 95.05% | DL |

Chmurski et al. [79] | 4 | 98.10% | DL |

D’Eusanio et al. [23] | 25 | 87.60% | DL |

D’Eusanio et al. [23] | 12 | 97.20% | DL |

Molchanov et al. [24] | 25 | 83.80% | DL |

D’Eusanio et al. [26] | 25 | 76.10% | DL |

D’Eusanio et al. [26] | 12 | 92.00% | DL |

Wang et al. [85] | 6 | 95.83% | FDTW |

Wang et al. [86] | 4 | 87.17% | DL |

Ritchie et al. [87] | 4 | 69.7% | DT |

Ritchie et al. [87] | 4 | 71.4% | k-NN |

Ritchie et al. [87] | 4 | 54.6% | LDA |

Ritchie et al. [87] | 4 | 59.7% | QDA |

Ritchie et al. [87] | 4 | 61.9% | SVM-l |

Ritchie et al. [87] | 4 | 74.2% | SVM-q |

Ritchie et al. [88] | 4 | 87.0% | k-NN |

Proposed 1 (Edge TPU) | 8 | 98.13% | DL |

**Table 3.**Comparative characteristics of model sizes for the non-optimized and the optimized versions.

Size [KB] | |||
---|---|---|---|

x86 | Edge TPU | ||

Topologies | CNN3D | 12,586.58 | N/A |

CNN2D | 375.89 | 80.67 | |

MobileNetV2—1 bottleneck | 1770.96 | 200.67 | |

MobileNetV2—2 bottleneck | 2028.85 | 232.67 | |

MobileNetV2—3 bottleneck | 2287.06 | 264.67 | |

MobileNetV2—4 bottleneck | 2545.35 | 296.67 | |

MobileNetV2—5 bottleneck | 2804.27 | 328.67 | |

MobileNetV2—6 bottleneck | 3063.25 | 360.67 | |

Proposed 1 | 624.92 | 92.67 | |

Proposed 2 | 999.00 | 140.67 | |

Proposed 3 | 1543.89 | 220.67 | |

Proposed 4 | 2233.44 | 280.67 |

Inference [ms] | |||
---|---|---|---|

x86 | Edge TPU | ||

Topologies | CNN3D | 3.57 | N/A |

CNN2D | 1.16 | 3.61 | |

MobileNetV2—1 bottleneck | 2.19 | 1.19 | |

MobileNetV2—2 bottleneck | 4.17 | 1.52 | |

MobileNetV2—3 bottleneck | 5.66 | 1.65 | |

MobileNetV2—4 bottleneck | 8.52 | 1.79 | |

MobileNetV2—5 bottleneck | 8.74 | 1.92 | |

MobileNetV2—6 bottleneck | 10.42 | 2.04 | |

Proposed 1 | 5.74 | 1.28 | |

Proposed 2 | 10.18 | 1.63 | |

Proposed 3 | 14.22 | 1.76 | |

Proposed 4 | 20.73 | 1.90 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Chmurski, M.; Mauro, G.; Santra, A.; Zubert, M.; Dagasan, G. Highly-Optimized Radar-Based Gesture Recognition System with Depthwise Expansion Module. *Sensors* **2021**, *21*, 7298.
https://doi.org/10.3390/s21217298

**AMA Style**

Chmurski M, Mauro G, Santra A, Zubert M, Dagasan G. Highly-Optimized Radar-Based Gesture Recognition System with Depthwise Expansion Module. *Sensors*. 2021; 21(21):7298.
https://doi.org/10.3390/s21217298

**Chicago/Turabian Style**

Chmurski, Mateusz, Gianfranco Mauro, Avik Santra, Mariusz Zubert, and Gökberk Dagasan. 2021. "Highly-Optimized Radar-Based Gesture Recognition System with Depthwise Expansion Module" *Sensors* 21, no. 21: 7298.
https://doi.org/10.3390/s21217298