Racquet Sports Recognition Using a Hybrid Clustering Model Learned from Integrated Wearable Sensor

Racquet sports can provide positive benefits for human healthcare. A reliable detection device that can effectively distinguish movement with similar sub-features is therefore needed. In this paper, a racquet sports recognition wristband system and a multilayer hybrid clustering model are proposed to achieve reliable activity recognition and perform number counting. Additionally, a Bluetooth mesh network enables communication between a phone and wristband, and sets-up the connection between multiple devices. This allows users to track their exercise through the phone and share information with other players and referees. Considering the complexity of the classification algorithm and the user-friendliness of the measurement system, the improved multi-layer hybrid clustering model applies three-level K-means clustering to optimize feature extraction and segmentation and then uses the density-based spatial clustering of applications with noise (DBSCAN) algorithm to determine the feature center of different movements. The model can identify unlabeled and noisy data without data calibration and is suitable for smartwatches to recognize multiple racquet sports. The proposed system shows better recognition results and is verified in practical experiments.


Introduction
Medical research shows that physical exercise can provide positive benefits for human healthcare, including reduced risks of cardiovascular disease, obesity, stroke, and cancer [1], improved musculoskeletal health and stress regulation [2], and reduced psychological health burden and mental disease [3]. Physical activity recognition (PAR), which uses information acquired from a variety of sensors to automatically detect and analyze physical activities [4], has broad applications such as behavior correction and medical detection. PAR can quantify activity levels, improve exercise quality, and reduce healthcare costs. It has been regarded as an important research direction in human-computer interaction. Oja et al. [5] found that racquet sports seem to be the best forms of exercise for reducing the risk of death. Therefore, from the perspective of health and recording exercise effects, it is necessary to provide a reliable racquet sports detection device.
Vision-based PAR mainly uses red-green-blue (RGB) images [6][7][8], optical flow [9], 2D depth maps [10], and 3D skeletons [11,12]. Traditional images are susceptible to illumination variations and camera view angles. Due to inevitable annotation errors, the video dataset is complex for classification. The method based on depth maps and 3D skeletons can provide fine motion recognition, but these methods require expensive special facilities for offline calculations and have a large computational load. Acoustics, vibration, and other environment-based sensors are mostly installed in fixed locations and Figure 1. Proposed system architecture. In the training stage, the wristband collects datasets and the model feasibility is verified on a personal computer (PC). In the recognition stage, data are acquired and identified in the wristband, and the system communicates via Bluetooth.

Hardware Platform
A new integrated wearable sensor platform has been designed to achieve a miniaturized system ( Figure 2). The sensor is equipped with an IMU, MCU, Bluetooth, OLED screen, and battery charge management chip. The size of the sensor is 38 mm × 34 mm × 20 mm. The parameters of the wristband are shown in Table 1. Code is written through a universal serial bus (USB).
The IMU chosen is the MPU6050, integrating a 3-axis gyroscope and 3-axis accelerometer, and using three 16-bit analog-to-digital converters (ADCs) to convert the measured analog signals into digital signals. The full-scale range of the IMU is programmable, the accelerometer is set to ±4 g, the gyroscope is set to ±2000°/s, and the sampling rate is configured at 50 Hz, which is enough for movement feature collection [30].
The MCU chosen is the STM32F103 series chip in the 48PIN package. This chip uses the ARM Cortex-M3 microcontroller unit. The clock signal is provided by an internal 8 MHz RC oscillator, and the operating frequency is set to 72 MHz, which can provide high-speed online calculations for racket sports models. The MCU integrates timer, control area network (CAN), ADC, serial peripheral interface (SPI), I2C, USB, and universal asynchronous receiver/transmitter (UART) interfaces, which is beneficial to data interaction.
The CC2541 chip is a 2.4 GHz BLE solution and conforms to the Bluetooth v4.0 protocol stack. The voltage level of the hardware platform is 3.3 V and the regulator chooses the low dropout regulator (LDO) ME6206. To account for actual use scenarios, a low-capacity battery (Li-ion 3.7 V, 500 mAh) has been used. The USB charging circuit chooses the linear Li-ion battery charger TP4056, which uses the P-metal-oxide-semiconductor field-effect transistor (PMOSFET) structure inside and sets an anti-reverse charging circuit to ensure no overcharge. This section introduces the wristband system in terms of hardware, model, and software. The model, the most important part of this system, is composed of three parts: Data collection, data processing, and a classification algorithm.

Hardware Platform
A new integrated wearable sensor platform has been designed to achieve a miniaturized system ( Figure 2). The sensor is equipped with an IMU, MCU, Bluetooth, OLED screen, and battery charge management chip. The size of the sensor is 38 mm × 34 mm × 20 mm. The parameters of the wristband are shown in Table 1. Code is written through a universal serial bus (USB). The IMU chosen is the MPU6050, integrating a 3-axis gyroscope and 3-axis accelerometer, and using three 16-bit analog-to-digital converters (ADCs) to convert the measured analog signals into digital signals. The full-scale range of the IMU is programmable, the accelerometer is set to ±4 g, the gyroscope is set to ±2000 • /s, and the sampling rate is configured at 50 Hz, which is enough for movement feature collection [30].
The MCU chosen is the STM32F103 series chip in the 48PIN package. This chip uses the ARM Cortex-M3 microcontroller unit. The clock signal is provided by an internal 8 MHz RC oscillator, and the operating frequency is set to 72 MHz, which can provide high-speed online calculations for racket sports models. The MCU integrates timer, control area network (CAN), ADC, serial peripheral interface (SPI), I2C, USB, and universal asynchronous receiver/transmitter (UART) interfaces, which is beneficial to data interaction.

Data Collection
A total of 5 healthy subjects (3 males, 2 females; age: 25 ± 5) took part in the data collection process. Among them, one subject had received 2 years of professional training in badminton, and one subject had received 4 years of professional training in table tennis. The others were untrained people. All participants provided written informed consent before participation. Subjects were asked to wear the racquet sports recognition wristband on their dominant wrist. Subjects were all right-handed. The datasets were collected in a real training environment (gym).
In the experiment, each subject performed nine kinds of movements: Four types of table tennis (service, stroke, spin, and picking up), four types of badminton (service, drive, smash, and picking up), and walking. Each subject performed 20 tests for each movement. For a subject, the time to complete different movements was different, and the time to complete one movement was between 1 and 1.2 s. The action in the same movement set was collected continuously, and different movement sets were collected separately. Then, 100 instances were collected for each movement set, and a total of 900 instances were collected. During the experiment, the number of actions in each movements set was manually recorded to label the data set at a later stage. The CC2541 chip is a 2.4 GHz BLE solution and conforms to the Bluetooth v4.0 protocol stack. The voltage level of the hardware platform is 3.3 V and the regulator chooses the low dropout regulator (LDO) ME6206. To account for actual use scenarios, a low-capacity battery (Li-ion 3.7 V, 500 mAh) has been used. The USB charging circuit chooses the linear Li-ion battery charger TP4056, which uses the P-metal-oxide-semiconductor field-effect transistor (PMOSFET) structure inside and sets an anti-reverse charging circuit to ensure no overcharge.

Data Collection
A total of 5 healthy subjects (3 males, 2 females; age: 25 ± 5) took part in the data collection process. Among them, one subject had received 2 years of professional training in badminton, and one subject had received 4 years of professional training in table tennis. The others were untrained people. All participants provided written informed consent before participation. Subjects were asked to wear the racquet sports recognition wristband on their dominant wrist. Subjects were all right-handed. The datasets were collected in a real training environment (gym).
In the experiment, each subject performed nine kinds of movements: Four types of table tennis (service, stroke, spin, and picking up), four types of badminton (service, drive, smash, and picking up), and walking. Each subject performed 20 tests for each movement. For a subject, the time to complete different movements was different, and the time to complete one movement was between 1 and 1.2 s. The action in the same movement set was collected continuously, and different movement sets were collected separately. Then, 100 instances were collected for each movement set, and a total of 900 instances were collected. During the experiment, the number of actions in each movements set was manually recorded to label the data set at a later stage.

Data Processing
Preprocessing and feature extraction are needed for the raw data to construct the features that can effectively distinguish racquet sports.

Preprocessing
Median filtering is used to process the noisy raw signals output by MPU6050. The signals of the accelerometer and gyroscope are fused to obtain the angle. As common filtering algorithms, Unscented Kalman Filtering (UKF) [31] and Nonlinear Complementary Filtering (HBL) [14] are considered in the model. Gravitational acceleration (g = 9.8) is the benchmark for evaluating filtering algorithms. Combining the acceleration, formulas of HBL (1) and UKF (2) are obtained. The data obtained by the wristband in the static state after filtering with HBL and UKF are compared. The output results are shown in Table 2. ACC HBL is the gravitational acceleration through HBL, where ACC a is the value of the accelerometer, ACCgyro is the value of the gyroscope, and α is the weight coefficient.
ACC k is the gravitational acceleration through UKF at time k, where θ k is the rotation angle at time k, w k is the process noise at time k, and v k is the measurement noise at time k.
The data obtained by HBL are closer to the theoretical value of gravitational acceleration, and the standard deviation is slightly larger than UFK. Considering the program portability and computing power of the MCU, HBL is used for noise reduction. Figure 3 shows the triaxial signals of the badminton drive. Each movement has its own properties, so the values of the three-axis signals are very different. Movements repeat during the acquisition time, so signals change periodically.

Data Processing
Preprocessing and feature extraction are needed for the raw data to construct the features that can effectively distinguish racquet sports.

Preprocessing
Median filtering is used to process the noisy raw signals output by MPU6050. The signals of the accelerometer and gyroscope are fused to obtain the angle. As common filtering algorithms, Unscented Kalman Filtering (UKF) [31] and Nonlinear Complementary Filtering (HBL) [14] are considered in the model. Gravitational acceleration ( g 9.8 = ) is the benchmark for evaluating filtering algorithms. Combining the acceleration, formulas of HBL (1) and UKF (2) are obtained. The data obtained by the wristband in the static state after filtering with HBL and UKF are compared. The output results are shown in Table 2.
(1 ) ACCHBL is the gravitational acceleration through HBL, where ACCa is the value of the accelerometer, ACCgyro is the value of the gyroscope, and α is the weight coefficient. (2) ACCk is the gravitational acceleration through UKF at time k, where θk is the rotation angle at time k, wk is the process noise at time k, and vk is the measurement noise at time k. The data obtained by HBL are closer to the theoretical value of gravitational acceleration, and the standard deviation is slightly larger than UFK. Considering the program portability and computing power of the MCU, HBL is used for noise reduction. Figure 3 shows the triaxial signals of the badminton drive. Each movement has its own properties, so the values of the three-axis signals are very different. Movements repeat during the acquisition time, so signals change periodically.

Feature Extraction
Feature extraction is an important task for racquet sports recognition. To obtain optimized classification performance, the extracted features should be able to clearly represent the unique properties of movements and reduce redundancy [32]. Combining raw data, the adopted feature sets include the (1) acceleration signal magnitude vector (ASMV); (2) velocity signal magnitude vector (VSMV); (3) displacement signal magnitude vector (DSMV); (4) angle signal magnitude vector (θSMV).
ASMV is the L2 norm of the total acceleration vectors, where a x , a y , and a z denote the filtered accelerations along the x-axis, y-axis, and z-axis, respectively. This feature is independent of sensor orientation and measures the instantaneous intensity of human movements.
VSMV is the L2 norm of the velocity vectors by integrating acceleration vectors, and DSMV is obtained by integrating velocity vectors in the same way. (4) θSMV is the L2 norm of the total angle vectors. The angle obtained by the gyroscope is used as the optimum in a short time, and the average value of the angle obtained by acceleration is used to correct the angle periodically.

Proposed Algorithm
The K-means algorithm uses Euclidean distance as the evaluation index of similarity and takes the compact and independent cluster as the final target. The datasets are described as T = {T 1 , T 2 , . . . , T n }, the K cluster centers are given randomly initially, clusters can be denoted as C i = {C 1 , C 2 , . . . , C k }, and µ i is the mean vector of the cluster C i .
The objective function of K-means clustering is the sum of squared errors (SSE).
The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm divides data with sufficient density into clusters, which can realize arbitrary shape clustering in noise-containing datasets. It can effectively solve the problem of misclassification caused by similar sub-features.
Both badminton and table tennis movements are combined by a variety of sub-actions. For example, a badminton drive can be decomposed into detailed actions of swinging arm and turning wrist. Therefore, manually calibrating these similar movement data is very difficult and may cause human error. The designed hybrid clustering model selects the most important macro features for different movements through a four-layer structure. The first three layers use the K-means clustering algorithm to classify and encode different movements features, decompose sub-features, and find the best clustering center for movements to distinguish them. The fourth layer uses DBSCAN to eliminate the influence of the same sub-features in different movements and determine the unique sub-features of each movement to effectively identify each movement in racquet sports. The proposed multilayer hybrid clustering model is presented in Figure 4. The output of each layer is shown in Figure 5. the best clustering center for movements to distinguish them. The fourth layer uses DBSCAN to eliminate the influence of the same sub-features in different movements and determine the unique sub-features of each movement to effectively identify each movement in racquet sports. The proposed multilayer hybrid clustering model is presented in Figure 4. The output of each layer is shown in Figure 5.

Window Segmentation
Feature Maximums

K-means Clustering
Sorting Features by Magnitude

K-means Clustering
Sorting Sub-features by Frequency

Window Segmentation
The Max, Min and Mean of Features

K-means Clustering
Sub-feature Centers

DBSCAN Clustering
Decision The input feature vector of the first layer model is the maximum feature of the extracted feature sets after sliding sampling. The sliding window unit is the sampling point, and the sampling period of the data is 20 ms. A window with a length of 10 and a step size of 5 is used to segment the extracted feature sets to segment the movements. The dimension of the input feature vector is 120. Principal component analysis (PCA) is used to reduce the feature vector to one dimension to eliminate redundant features and reduce calculations. K-means is used to cluster feature maxima to normalize all unlabeled feature sets. The extracted features are then sorted by magnitude to label movements.
The second layer model clusters the first layer output using K-means to obtain the sub-features decomposed in movements. The sub-features are then sorted according to frequency, which facilitates the later distinguishing of common sub-features from individual sub-features. The output of the first two-layer model is shown in Figure 5a. The x-axis indicates the features number and the y-axis indicates the movements sub-feature labels after classification and sorting. The third layer model uses a sliding window with a length of 4 and a step size of 2 to segment the features obtained from the second layer, and takes maxima, minima, and averages of the features as inputs of the third K-means. This sliding window unit is the sub-feature point. K-means is used to divide the features in single datasets to obtain the sub-features centers of different movements. The sub-feature centers in different movement sets are shown in Figure 5b. The x-axis indicates the sub-feature labels and the y-axis indicates the sub-feature center labels. Different colors represent different movement sets (black: Walking; blue:  The input feature vector of the first layer model is the maximum feature of the extracted feature sets after sliding sampling. The sliding window unit is the sampling point, and the sampling period of the data is 20 ms. A window with a length of 10 and a step size of 5 is used to segment the extracted feature sets to segment the movements. The dimension of the input feature vector is 120. Principal component analysis (PCA) is used to reduce the feature vector to one dimension to eliminate redundant features and reduce calculations. K-means is used to cluster feature maxima to normalize all unlabeled feature sets. The extracted features are then sorted by magnitude to label movements. The second layer model clusters the first layer output using K-means to obtain the sub-features decomposed in movements. The sub-features are then sorted according to frequency, which facilitates the later distinguishing of common sub-features from individual sub-features. The output of the first two-layer model is shown in Figure 5a. The x-axis indicates the features number and the y-axis indicates the movements sub-feature labels after classification and sorting. The third layer model uses a sliding window with a length of 4 and a step size of 2 to segment the features obtained from the second layer, and takes maxima, minima, and averages of the features as inputs of the third K-means. This sliding window unit is the sub-feature point. K-means is used to divide the features in single datasets to obtain the sub-features centers of different movements. The sub-feature centers in different movement sets are shown in Figure 5b. The x-axis indicates the sub-feature labels and the y-axis indicates the sub-feature center labels. Different colors represent different movement sets (black: Walking; blue: results of the greedy algorithm are shown in Figure 5c. The x-axis indicates the number of iterations and the y-axis indicates the subsets of movement features. The sub-feature centers obtained by searching can filter wrong features, which is more accurate than the results of direct clustering. The final clustering result of the DBSCAN algorithm is shown in Figure 5d, where the x-axis indicates the sub-feature center labels and the y-axis indicates the predicted movement labels. The correspondence between color and movement set is consistent with 5b.

Software Platform
Software involves the scheduling of the model and communication in embedded platforms and a mobile phone application (App).

Operating System
The Lite_OS operating system is ported on the hardware platform for the racquet sports recognition wristband. Lite_OS's task module provides multi-task functions to switch and The fourth layer model uses the DBSCAN algorithm to cluster the sub-feature centers obtained in the third layer. The class centers are extracted as the common features of movements, and the outliers are the exclusive features of different movements. From this, the feature center set of different movements is obtained. The target action is identified based on the normalized value of the Euclidean distance from the data output by the first three layers of clustering to the feature center, so the distance obtained by unrelated actions will be filtered when it is outside the distance threshold. Considering that the sub-feature centers obtained by clustering may overlap, a greedy algorithm is used to optimize the feature sets for obtaining the smallest feature sets of movements. The search results of the greedy algorithm are shown in Figure 5c. The x-axis indicates the number of iterations and the y-axis indicates the subsets of movement features. The sub-feature centers obtained by searching can filter wrong features, which is more accurate than the results of direct clustering. The final clustering result of the DBSCAN algorithm is shown in Figure 5d, where the x-axis indicates the sub-feature center labels and the I-axis indicates the predicted movement labels. The correspondence between color and movement set is consistent with 5b.

Software Platform
Software involves the scheduling of the model and communication in embedded platforms and a mobile phone application (App). The Lite_OS operating system is ported on the hardware platform for the racquet sports recognition wristband. Lite_OS's task module provides multi-task functions to switch and communicate between tasks. The system supports task preemptive scheduling based on priority levels and time slice rotation scheduling for the same priority. The wristband collects data in real-time and optimizes the model based on multitasking concurrent processing. The program flowchart is shown in Figure 6. communicate between tasks. The system supports task preemptive scheduling based on priority levels and time slice rotation scheduling for the same priority. The wristband collects data in real-time and optimizes the model based on multitasking concurrent processing. The program flowchart is shown in Figure 6. Data sampling is set to the highest priority to collect movement data in real-time. When a set of data is collected, the collection task is suspended, and the recognition task of the multilayer hybrid clustering model starts to work and analyzes the type of movement data that has just been collected. When recognition is completed, the task is suspended, and the result is then sent to the OLED screen and App through the communication task for users to view at any time. Considering that the movements of each user are slightly different, the previous movements data are stored and learned when the system is idle to continuously optimize the features in the model. The more times the user wears it, the higher the recognition accuracy of the wristband.

Communication Protocol
Based on the TI BLE-CC254x-1.4.0 protocol stack, the management mechanism of the operating system abstraction layer (OSAL) is used to implement the Bluetooth one-master multi-slave network. The Bluetooth network automatically scans at power-on, uses MAC address matching for device screening and automatic binding, and then distinguishes the slave read-write mode according to the handle. Limited by the chip, only a maximum of 3 devices can be connected at the same time.

App
A mobile App is designed to communicate with a wristband based on the iOS operating system. App is programmed with Xcode and accepts wristband information via Bluetooth, which is convenient for future statistics and analysis. The badminton interface is shown in Figure 7a and the table tennis interface is shown in Figure 7b. It is worth noting that information is transmitted between the wristbands via the Bluetooth network. In order to avoid breaking the current network connection when the phone is connected to wristbands as the Bluetooth master mode, App is set to connect to one wristband at the same time. Data sampling is set to the highest priority to collect movement data in real-time. When a set of data is collected, the collection task is suspended, and the recognition task of the multilayer hybrid clustering model starts to work and analyzes the type of movement data that has just been collected. When recognition is completed, the task is suspended, and the result is then sent to the OLED screen and App through the communication task for users to view at any time. Considering that the movements of each user are slightly different, the previous movements data are stored and learned when the system is idle to continuously optimize the features in the model. The more times the user wears it, the higher the recognition accuracy of the wristband.

Communication Protocol
Based on the TI BLE-CC254x-1.4.0 protocol stack, the management mechanism of the operating system abstraction layer (OSAL) is used to implement the Bluetooth one-master multi-slave network. The Bluetooth network automatically scans at power-on, uses MAC address matching for device screening and automatic binding, and then distinguishes the slave read-write mode according to the handle. Limited by the chip, only a maximum of 3 devices can be connected at the same time.

App
A mobile App is designed to communicate with a wristband based on the iOS operating system. App is programmed with Xcode and accepts wristband information via Bluetooth, which is convenient for future statistics and analysis. The badminton interface is shown in Figure 7a and the table tennis interface is shown in Figure 7b. It is worth noting that information is transmitted between the wristbands via the Bluetooth network. In order to avoid breaking the current network connection when the phone is connected to wristbands as the Bluetooth master mode, App is set to connect to one wristband at the same time.

Results and Discussion
The experimental session evaluated the model and verified the wristband in a real environment.

Model Evaluation
For the evaluation, three subjects' data (60 instances) were used for training and the other two subjects' data (40 instances) were used to test classification performance. A fivefold cross-validation guarantees that each sample point has only one chance to be classified into the training set or test set during each iteration to verify the generalization ability of the proposed model. The average accuracy of the five test results is regarded as the accuracy of the model, while the more reliable F1 score is used to evaluate the precision and recall.
There are no true labels of movements in the training dataset, so ordinary accuracy cannot be used to measure the effectiveness of the proposed model. The number of movements included in each movement instance is known, so the accuracy of the model can be evaluated by comparing the ratio of predicted movement points to the total number of points with the actual movement points to the total number of points in the dataset. Although it cannot fully characterize the accuracy of sports recognition, it can be used as a criterion for model parameter search. Combining it with the top-down greedy algorithm, it can filter the wrong features to obtain the smallest feature subset. The search results are shown in Figure 5c.
The detection results of nine movements are shown in Figure 8. The x-axis of each graph in Figure 8 indicates the features number. The upper part of each graph is the processed acceleration amplitude vector, and the lower part is the sub-feature labels obtained through the model. It can be seen from Figure 8 that each movement corresponds to a different sub-feature set after classification by the multilayer hybrid clustering model, so the model can clearly identify different movements.

Results and Discussion
The experimental session evaluated the model and verified the wristband in a real environment.

Model Evaluation
For the evaluation, three subjects' data (60 instances) were used for training and the other two subjects' data (40 instances) were used to test classification performance. A fivefold cross-validation guarantees that each sample point has only one chance to be classified into the training set or test set during each iteration to verify the generalization ability of the proposed model. The average accuracy of the five test results is regarded as the accuracy of the model, while the more reliable F1 score is used to evaluate the precision and recall.
There are no true labels of movements in the training dataset, so ordinary accuracy cannot be used to measure the effectiveness of the proposed model. The number of movements included in each movement instance is known, so the accuracy of the model can be evaluated by comparing the ratio of predicted movement points to the total number of points with the actual movement points to the total number of points in the dataset. Although it cannot fully characterize the accuracy of sports recognition, it can be used as a criterion for model parameter search. Combining it with the top-down greedy algorithm, it can filter the wrong features to obtain the smallest feature subset. The search results are shown in Figure 5c.
The detection results of nine movements are shown in Figure 8. The x-axis of each graph in Figure 8 indicates the features number. The upper part of each graph is the processed acceleration amplitude vector, and the lower part is the sub-feature labels obtained through the model. It can be seen from Figure 8 that each movement corresponds to a different sub-feature set after classification by the multilayer hybrid clustering model, so the model can clearly identify different movements.  Tables 3 shows the classification effect of the multilayer hybrid clustering model on nine movements. The predicted proportion is the result of model classification, and the expected proportion is estimated by the number of movements and the duration of movements. Table 3 confirms the conclusion of Figure 8. The model has a higher precision for serving and picking-up movements, while the recognition precision of different hitting movements is not high, due to the small difference between them. The normalized confusion matrix (Figure 9) shows the average evaluation results in terms of different movements.
The average accuracy in prediction is 86.32%, with an F1 score of 82.98%. Figure 9 shows that for each of these nine movements, most movements are labeled with the correct type, and the precision of some movements is more than 90%. By looking at the results in more detail, most errors can be explained. For instance, the model makes mistakes in discriminating between the different types of stroke or drive. For MCU and human observers, there are similar movements in the badminton drive and table tennis stroke. Badminton picking up and table tennis picking up also have similar movements, while badminton smash has a large amplitude in sports, which is obviously different from other movements and has a high precision. Based on the above reasons, different movements of the same kind of racquet sports are classified.
The classification results of a single racquet sport are shown in Figure 10. The accuracy of table tennis in prediction is 92.51%, with an F1 score of 92.73%. The accuracy of badminton in prediction is 94.69%, with an F1 score of 94.53%. Figure 10 shows that in the case of single racquet sport recognition, the prediction of each movement is improved. For reference only, the model in this paper was compared to similar techniques. Martin et al. [6] proposed a Siamese spatiotemporal  Table 3 shows the classification effect of the multilayer hybrid clustering model on nine movements. The predicted proportion is the result of model classification, and the expected proportion is estimated by the number of movements and the duration of movements. Table 3 confirms the conclusion of Figure 8. The model has a higher precision for serving and picking-up movements, while the recognition precision of different hitting movements is not high, due to the small difference between them. The normalized confusion matrix (Figure 9) shows the average evaluation results in terms of different movements.
The average accuracy in prediction is 86.32%, with an F1 score of 82.98%. Figure 9 shows that for each of these nine movements, most movements are labeled with the correct type, and the precision of some movements is more than 90%. By looking at the results in more detail, most errors can be explained. For instance, the model makes mistakes in discriminating between the different types of stroke or drive. For MCU and human observers, there are similar movements in the badminton drive and table tennis stroke. Badminton picking up and table tennis picking up also have similar movements, while badminton smash has a large amplitude in sports, which is obviously different from other movements and has a high precision. Based on the above reasons, different movements of the same kind of racquet sports are classified.
The classification results of a single racquet sport are shown in Figure 10. The accuracy of table tennis in prediction is 92.51%, with an F1 score of 92.73%. The accuracy of badminton in prediction is 94.69%, with an F1 score of 94.53%. Figure 10 shows that in the case of single racquet sport recognition, the prediction of each movement is improved. For reference only, the model in this paper was compared to similar techniques. Martin et al. [6] proposed a Siamese spatiotemporal convolution (SSTC) method based on the RGB image sequence and its calculated optical flow to classify table tennis strokes. The accuracy of this method was 91.4%. The model proposed in this paper has a competitive accuracy in table tennis movement recognition. Meanwhile, compared to video recognition based on large data streams, the method based on motion sensors has more advantages in computing time and storage costs. Wang et al. [28] used a SVM to recognize the information collected by the motion sensor and were able to recognize three different badminton strokes. The accuracy of the system based on SVM was 94%, and the accuracy of the system based on PCA + SVM was 97%. The accuracy of the model proposed in this paper is slightly lower than that of the above model in badminton movement recognition. The advantage of this proposed system is that movement can be recognized on the wristband in real-time.  [28] used a SVM to recognize the information collected by the motion sensor and were able to recognize three different badminton strokes. The accuracy of the system based on SVM was 94%, and the accuracy of the system based on PCA + SVM was 97%. The accuracy of the model proposed in this paper is slightly lower than that of the above model in badminton movement recognition. The advantage of this proposed system is that movement can be recognized on the wristband in real-time.   [28] used a SVM to recognize the information collected by the motion sensor and were able to recognize three different badminton strokes. The accuracy of the system based on SVM was 94%, and the accuracy of the system based on PCA + SVM was 97%. The accuracy of the model proposed in this paper is slightly lower than that of the above model in badminton movement recognition. The advantage of this proposed system is that movement can be recognized on the wristband in real-time.

Comparison of Different Cluster Numbers
The model ranks the features of the first two layers of output, so features are highly consistent after multi-layer clustering, and the model accuracy is not affected by the randomness of the K-means algorithm. The first layer determines the number of sub-features, so the accuracy is mainly affected by the number of clusters in the first layer. Table 4 shows the different K values and their corresponding accuracy. The best classification effect is obtained when the number of clusters is 70. A smaller number of clusters will cause feature overlap, which will increase the difficulty of subsequent sub-feature discrimination. A larger number of clusters will increase noise, so the cluster numbers need to be adjusted to appropriate parameters.

. Comparison of Different Classifiers
We compared the linear discriminant function (LDF), random forest, SVM, and multilayer hybrid clustering model proposed in this paper, as shown in Table 5. The time in Table 5 is the training time of each classifier, and it can clearly reflect the computational cost. The model proposed in this paper achieves the best performance in terms of recognition accuracy and training time.

Comparison of Different Cluster Numbers
The model ranks the features of the first two layers of output, so features are highly consistent after multi-layer clustering, and the model accuracy is not affected by the randomness of the K-means algorithm. The first layer determines the number of sub-features, so the accuracy is mainly affected by the number of clusters in the first layer. Table 4 shows the different K values and their corresponding accuracy. The best classification effect is obtained when the number of clusters is 70. A smaller number of clusters will cause feature overlap, which will increase the difficulty of subsequent sub-feature discrimination. A larger number of clusters will increase noise, so the cluster numbers need to be adjusted to appropriate parameters.

Comparison of Different Classifiers
We compared the linear discriminant function (LDF), random forest, SVM, and multilayer hybrid clustering model proposed in this paper, as shown in Table 5. The time in Table 5 is the training time of each classifier, and it can clearly reflect the computational cost. The model proposed in this paper achieves the best performance in terms of recognition accuracy and training time.

Wristband Verification
A tester was randomly selected to wear the wristband for badminton and table tennis sports tests. The recognition time of the wristband includes the data collection time, algorithm recognition time, and transmission time to the OLED screen. The recognition time for different movements is different. The average recognition time of the wristband is about 1 s. The Bluetooth transmission rate is 115,200 bps, and the communication time for the wristband to the mobile phone is about 0.5 s. The time from wristband recognition to displaying the result on the mobile phone is about 1.5 s. Basically, the recognition result can be obtained on the wristband after completing one movement. The proposed wristband realizes real-time recognition of the racquet sports. The recognition results of the wristband on badminton and table tennis are shown in Figure 11a,b respectively. The test accuracy is shown in Table 6. Experiments have verified the feasibility of the racquet sports recognition wristband, and the average accuracy is above 77%.

Wristband Verification
A tester was randomly selected to wear the wristband for badminton and table tennis sports tests. The recognition time of the wristband includes the data collection time, algorithm recognition time, and transmission time to the OLED screen. The recognition time for different movements is different. The average recognition time of the wristband is about 1 s. The Bluetooth transmission rate is 115,200 bps, and the communication time for the wristband to the mobile phone is about 0.5 s. The time from wristband recognition to displaying the result on the mobile phone is about 1.5 s. Basically, the recognition result can be obtained on the wristband after completing one movement. The proposed wristband realizes real-time recognition of the racquet sports. The recognition results of the wristband on badminton and table tennis are shown in Figure 11a

Conclusions
This paper presents a racquet sports recognition system to effectively sense movement parameters. The system consists of an IMU, BLE technology, mobile application, and multi-layer hybrid clustering model. The wristband uses the integrated IMU to obtain movement data, and then runs the recognition model and performs the number counting in the MCU. The communication between phone and wristband and the networking of multiple devices are realized through the Bluetooth mesh network, which is convenient for users to track their exercise through the App and provides information-sharing functions for players and referees to improve the fairness of the game.
A multilayer hybrid clustering model similar to neural networks (NN) is proposed to improve recognition accuracy. The multi-layer K-means clustering algorithm is used for feature extraction and segmentation, and the DBSCAN is used to further classify features with the same sub-features. The model can identify unlabeled and noisy data without data calibration, which enables the sensor system to achieve greater calculation and lower energy.

Conclusions
This paper presents a racquet sports recognition system to effectively sense movement parameters. The system consists of an IMU, BLE technology, mobile application, and multi-layer hybrid clustering model. The wristband uses the integrated IMU to obtain movement data, and then runs the recognition model and performs the number counting in the MCU. The communication between phone and wristband and the networking of multiple devices are realized through the Bluetooth mesh network, which is convenient for users to track their exercise through the App and provides information-sharing functions for players and referees to improve the fairness of the game.
A multilayer hybrid clustering model similar to neural networks (NN) is proposed to improve recognition accuracy. The multi-layer K-means clustering algorithm is used for feature extraction and segmentation, and the DBSCAN is used to further classify features with the same sub-features. The model can identify unlabeled and noisy data without data calibration, which enables the sensor system to achieve greater calculation and lower energy.
The experimental results confirm that the racquet sports recognition wristband designed in this paper can obtain various effective movements features from the wrist and classify badminton and table tennis. The accuracy of the model decreases slightly in practice, but the wristband recognition results are basically consistent with actual movements. Compared to the machine vision-based methods, wristbands have great advantages in terms of privacy protection and tolerance to external environments. The wristband provides a reference solution for the commercial application of racquet sports recognition.
The dataset in this paper is mainly composed of target movements, so limiting the use of the wristband to racquet sports will have higher accuracy. In wider usage scenarios, such as daily activities, misidentification may be caused because some non-target movements have the same sub-features as target movements. In the next stage, more types of movement data will be collected to improve the accuracy of the model and evaluate more movement details.