Fitness Movement Types and Completeness Detection Using a Transfer-Learning-Based Deep Neural Network

Fitness is important in people’s lives. Good fitness habits can improve cardiopulmonary capacity, increase concentration, prevent obesity, and effectively reduce the risk of death. Home fitness does not require large equipment but uses dumbbells, yoga mats, and horizontal bars to complete fitness exercises and can effectively avoid contact with people, so it is deeply loved by people. People who work out at home use social media to obtain fitness knowledge, but learning ability is limited. Incomplete fitness is likely to lead to injury, and a cheap, timely, and accurate fitness detection system can reduce the risk of fitness injuries and can effectively improve people’s fitness awareness. In the past, many studies have engaged in the detection of fitness movements, among which the detection of fitness movements based on wearable devices, body nodes, and image deep learning has achieved better performance. However, a wearable device cannot detect a variety of fitness movements, may hinder the exercise of the fitness user, and has a high cost. Both body-node-based and image-deep-learning-based methods have lower costs, but each has some drawbacks. Therefore, this paper used a method based on deep transfer learning to establish a fitness database. After that, a deep neural network was trained to detect the type and completeness of fitness movements. We used Yolov4 and Mediapipe to instantly detect fitness movements and stored the 1D fitness signal of movement to build a database. Finally, MLP was used to classify the 1D signal waveform of fitness. In the performance of the classification of fitness movement types, the mAP was 99.71%, accuracy was 98.56%, precision was 97.9%, recall was 98.56%, and the F1-score was 98.23%, which is quite a high performance. In the performance of fitness movement completeness classification, accuracy was 92.84%, precision was 92.85, recall was 92.84%, and the F1-score was 92.83%. The average FPS in detection was 17.5. Experimental results show that our method achieves higher accuracy compared to other methods.


Introduction
Fitness can bring many benefits to the body. With the rise in health awareness, men, women, and children have gradually begun to engage in fitness activities. There are many benefits of fitness exercise; it can effectively improve cardiopulmonary capacity, increase concentration, maintain weight, etc. [1]. Most of those of exercise hope that their posture can be improved, and improving posture can effectively reduce the risk of obesity [2]. Obese bodies are prone to many chronic diseases [3], and each is more likely to lead to death, so regular exercise is important [4].
With the prevalence of COVID-19, people spend less time outdoors [5], which reduces the amount of people's physical activity. The gym industry, in particular, has been considerably affected, resulting in people being unable to go to the gym to exercise. These athletes then turn to home fitness [6], which can effectively help them avoid contact with people and effectively reduce the impact of the epidemic. In addition, home fitness does not require large fitness equipment but completes fitness exercises through dumbbells, yoga mats, horizontal bars, and other equipment, so it is deeply loved by people. However, people who build their bodies at home usually do not hire fitness trainers but learn fitness-related information from social media and mobile apps. Generally, most athlete are novices and have not received professional fitness exercise guidance, so there is a risk of injury when exercising. Common fitness injuries are usually caused by incorrect posture, heavy equipment, and excessive speed [7]. This type of sports injury is not easy to avoid by obtaining fitness knowledge only through social media. Therefore, a cheap, simple, and accurate fitness movement recognition system is important, which can effectively and instantly detect fitness movements, reduce sports injuries, and improve people's fitness awareness.
Among them, some systems use wearable devices to detect changes in human body temperature and movement, which, in addition to detecting fitness movements, can also perform preliminary detection of symptoms, such as COVID-19 [8,9]. This method lets the fitness user put on the electronic device, and calculates the three-axis changes of the electronic device when the fitness user is exercising. Then, these data are collected and analyzed using machine learning to classify fitness movements. However, this detection method has some shortcomings. When there are many types of fitness movements, it is difficult to achieve accurate detection. When the body used for the fitness movement is different from the part where the electronic device is worn, it is more difficult to identify the current fitness movement. If the electronic device is carried all over the body, the fitness user will be troubled when exercising, and the cost will be relatively high. Another method is to detect fitness movements based on computer vision, which has lower cost and does not hinder the exercise of fitness users through the detection method of computer vision. The method of detecting fitness movements based on computer vision is further divided into methods based on body nodes and methods based on image deep learning. Body-node-based methods detect fitness movements by calculating body nodes, which can be performed using OpenPose, Mediapipe, Simple Baselines, etc. [10][11][12][13]. Using these methods, nodes of the body and fitness movements can be detected through changes in the coordinates of the nodes. In addition to detecting the speed of fitness movements [10], these methods can also classify the current fitness movement type [11] or the error between fitness movements and standard movements [12,13].
However, these methods cannot detect fitness movements from various angles, especially when the user is on the side or the back, which causes detection errors due to the occlusion of nodes. The last type of detection is a method of detecting fitness movements based on deep learning of images. This type of method usually classifies fitness movements. For example, the convolutional neural network (CNN) method for detecting fitness movements [14] can classify the current fitness movements well. Such classification methods do not cause detection errors due to occlusion of body nodes. As long as the training data of the model are sufficient, fitness movements can be detected from various angles. Usually, this method requires more computation time and cannot detect the nodes of the body in detail. The fitness movement is usually a continuous movement, so if the body nodes cannot be detected in a timely and detailed manner, it is difficult to achieve real-time detection of the fitness movement. Therefore, this paper proposes a method that combines You Only Look Once Version 4 (Yolov4) and Mediapipe to detect fitness movements and uses the multilayer perceptron (MLP) to classify fitness states.
In our method, the deep transfer learning concept is used to train Yolov4 and detect fitness movements. Deep transfer learning is a new type of classification model, which has been widely used in many research fields. Due to the high cost of data collection and labeling, constructing large-scale and sophisticated data is difficult. The use of deep transfer learning can solve the problem of insufficient data. In previous studies, deep transfer learning methods have been applied to the detection of fitness movements [15]. This study corrects for human motion, which is prone to inaccurate detection when detecting complex human movements. The method we propose will also improve the problem of misclassification of fitness movements caused by the loss of Mediapipe nodes during complex movements. We searched for professionally trained fitness trainers and untrained fitness users to capture images and used them to build a database of images. This included labeling of accurate user positions and fitness movements, which were then used to train Yolov4. Finally, Yolov4 was used to initially identify the types of fitness movements and then combined with Mediapipe to detect the nodes of the human body in order to achieve instant and high-precision fitness movement detection and realize completeness of those fitness movements.

Proposed System Architecture
To be able to detect the fitness status of various backgrounds, users, shooting angles, and lighting, a sufficient image database is necessary. It takes a considerable amount of time to collect data, and the image data need to go through a long labeling process. This paper proposes a method based on deep transfer learning [16] to detect fitness movements in time and analyze the fitness status.
First, we collected a sufficient amount of fitness image data, established image database I, and trained Yolov4. We used Yolov4 to judge 12 types of fitness. Afterward, Mediapipe was used to detect the body nodes of fitness users, in which different fitness movements had different nodes of interest (NoI). The current NoI was adjusted based on the detection results of Yolov4. By calculating the angle of the NoI, one can calculate the bending angle of the current joint. The angles of these NoIs were stored as waveforms, and a waveform database W was created. The waveform was then classified by the MLP to detect the fitness status. Finally, the classification performance of Yolov4 and the MLP was evaluated. The flowchart of the proposed method is shown in Figure 1, and the description of the process is as follows: Sensors 2022, 22, x FOR PEER REVIEW 4 of 21 In Figure 1, V , I , L , and W represent the data of the training set and V , I , L , and W represent the data of the test set. V is the video database, I is the image database, L is the label image database, and W is the waveform database.

Dataset for Fitness Types Detection
Sufficient image data can be used to better train deep learning models. For the deep transfer learning method in this paper, a sufficient image database was important. We Step1.
Collect 12 types of fitness videos from 20 users and build a video database V. Step2.
Divide the video database V into a training set V tr and a test set V tt . Step3.
Save in V tr and V tt an image every 10 frames and create image databases I tr and I tt . Step4.
Mark the databases I tr and I tt according to the format of Yolov4 and obtain the databases L tr and L tt . Step5.
Use L tr to train Yolov4, obtain the trained weights W f , and then use L tt to test the performance of Yolov4. Step6.
Use V tt to detect the fitness type using Yolov4 and the body node of the fitness user using Mediapipe. Step7.
Calculate the angle of the NoI for each fitness movement to obtain the angle of joint flexion. Step8.
According to the fitness type detected by Yolov4, automatically adjust the position of the NoI. Step9.
Output and store the angle calculated by the NoI as a waveform. Step10. Calculate Completion NoI according to the included angle of the NoI and output it as a 1D waveform. Step11. Create a database W of the output waveforms and divide them into training set W tr and test set W tt . Step12. Use W tr to train the MLP and W tt to test the MLP's performance.
Step13. Evaluate the classification performance of Yolov4 and the MLP.
In Figure 1, V tr , I tr , L tr , and W tr represent the data of the training set and V tt , I tt , L tt , and W tt represent the data of the test set. V is the video database, I is the image database, L is the label image database, and W is the waveform database.

Dataset for Fitness Types Detection
Sufficient image data can be used to better train deep learning models. For the deep transfer learning method in this paper, a sufficient image database was important. We collected image database I containing whole-body fitness movements, a total of 12 types of fitness movements. The names and images of the movements are shown in Table 1.

Video Dataset
These fitness movements include common fitness movements. In particular, these movements are closer to home fitness. Home fitness is usually performed with simple equipment, such as yoga mats, dumbbells, and horizontal bars, without the need for large fitness equipment. Usually, large fitness equipment has a fixed movement trajectory, but these 12 types of fitness movements are all irregular movement trajectories. That is, different exercise users complete these fitness movements in different postures, which increases the difficulty of image recognition. Therefore, it is difficult to build an image database that can identify these fitness movements. Additionally, these image data go through a labeling process, which is time-consuming and labor-intensive.
In the experiment, 20 users were selected and videos of the 20 users when exercising were used to create a video database V. These videos contain 12 types of fitness movements by the 20 users. In the experiment, the users were asked to perform these 12 types of exercises in a row, and each exercise was repeated 3 to 5 times. Every time a fitness movement was performed, the user was required to complete a complete motion trajectory and constantly change the shooting angle. The video format was 30 frames per second, and the length and width were 540 × 540 pixels. Table 2 shows the video time captured when 20 users performed fitness movements. The total shooting time was 62 min and 47 s.

Dataset for Fitness Types Detection
Sufficient image data can be used to better train deep learning models. For the deep transfer learning method in this paper, a sufficient image database was important. We collected image database I containing whole-body fitness movements, a total of 12 types of fitness movements. The names and images of the movements are shown in Table 1. Table 1. Twelve types of fitness movements and names.

Dataset for Fitness Types Detection
Sufficient image data can be used to better train deep learning models. For the deep transfer learning method in this paper, a sufficient image database was important. We collected image database I containing whole-body fitness movements, a total of 12 types of fitness movements. The names and images of the movements are shown in Table 1. Table 1. Twelve types of fitness movements and names.

Dataset for Fitness Types Detection
Sufficient image data can be used to better train deep learning models. For the deep transfer learning method in this paper, a sufficient image database was important. We collected image database I containing whole-body fitness movements, a total of 12 types of fitness movements. The names and images of the movements are shown in Table 1. Table 1. Twelve types of fitness movements and names.

Dataset for Fitness Types Detection
Sufficient image data can be used to better train deep learning models. For the deep transfer learning method in this paper, a sufficient image database was important. We collected image database I containing whole-body fitness movements, a total of 12 types of fitness movements. The names and images of the movements are shown in Table 1.

Dataset for Fitness Types Detection
Sufficient image data can be used to better train deep learning models. For the deep transfer learning method in this paper, a sufficient image database was important. We collected image database I containing whole-body fitness movements, a total of 12 types of fitness movements. The names and images of the movements are shown in Table 1. Table 1. Twelve types of fitness movements and names.

Dataset for Fitness Types Detection
Sufficient image data can be used to better train deep learning models. For the deep transfer learning method in this paper, a sufficient image database was important. We collected image database I containing whole-body fitness movements, a total of 12 types of fitness movements. The names and images of the movements are shown in Table 1. Table 1. Twelve types of fitness movements and names.

Dataset for Fitness Types Detection
Sufficient image data can be used to better train deep learning models. For the deep transfer learning method in this paper, a sufficient image database was important. We collected image database I containing whole-body fitness movements, a total of 12 types of fitness movements. The names and images of the movements are shown in Table 1. Table 1. Twelve types of fitness movements and names.

Dataset for Fitness Types Detection
Sufficient image data can be used to better train deep learning models. For the deep transfer learning method in this paper, a sufficient image database was important. We collected image database I containing whole-body fitness movements, a total of 12 types of fitness movements. The names and images of the movements are shown in Table 1. These fitness movements include common fitness movements. In particular, these movements are closer to home fitness. Home fitness is usually performed with simple equipment, such as yoga mats, dumbbells, and horizontal bars, without the need for large fitness equipment. Usually, large fitness equipment has a fixed movement trajectory, but these 12 types of fitness movements are all irregular movement trajectories. That is, different exercise users complete these fitness movements in different postures, which increases the difficulty of image recognition. Therefore, it is difficult to build an image database that can identify these fitness movements. Additionally, these image data go through a labeling process, which is time-consuming and labor-intensive.
In the experiment, 20 users were selected and videos of the 20 users when exercising were used to create a video database V. These videos contain 12 types of fitness movements by the 20 users. In the experiment, the users were asked to perform these 12 types of exercises in a row, and each exercise was repeated 3 to 5 times. Every time a fitness movement was performed, the user was required to complete a complete motion trajectory and constantly change the shooting angle. The video format was 30 frames per second, and the length and width were 540 × 540 pixels. Table 2 shows the video time captured when 20 users performed fitness movements. The total shooting time was 62 min and 47 s.
In the selection of fitness exercises, we selected 12 fitness exercises under the advice of fitness trainers, which included chest, back, legs, abs, biceps, triceps, and preparations. These movements can be done using dumbbells or with bare hands, and they are also relatively introductory and popular of all fitness movements. To build the database, professionally trained fitness trainers and users were used to assist in the shooting. After screening, 10 voluntary users were finally selected. The movements of these 10 users were quite standard, so the captured images were used for the training set. Afterward, for fairness in the experiment, another 10 untrained users were found to assist in filming and used the test set data. Since most of the users are not professionally trained, it was better and fairer to use these 10 users for the test set. When the 20 users were shooting images, we instructed them to complete 12 fitness movements, each of which was performed 3 to These fitness movements include common fitness movements. In particular, these movements are closer to home fitness. Home fitness is usually performed with simple equipment, such as yoga mats, dumbbells, and horizontal bars, without the need for large fitness equipment. Usually, large fitness equipment has a fixed movement trajectory, but these 12 types of fitness movements are all irregular movement trajectories. That is, different exercise users complete these fitness movements in different postures, which increases the difficulty of image recognition. Therefore, it is difficult to build an image database that can identify these fitness movements. Additionally, these image data go through a labeling process, which is time-consuming and labor-intensive.
In the experiment, 20 users were selected and videos of the 20 users when exercising were used to create a video database V. These videos contain 12 types of fitness movements by the 20 users. In the experiment, the users were asked to perform these 12 types of exercises in a row, and each exercise was repeated 3 to 5 times. Every time a fitness movement was performed, the user was required to complete a complete motion trajectory and constantly change the shooting angle. The video format was 30 frames per second, and the length and width were 540 × 540 pixels. Table 2 shows the video time captured when 20 users performed fitness movements. The total shooting time was 62 min and 47 s.
In the selection of fitness exercises, we selected 12 fitness exercises under the advice of fitness trainers, which included chest, back, legs, abs, biceps, triceps, and preparations. These movements can be done using dumbbells or with bare hands, and they are also relatively introductory and popular of all fitness movements. To build the database, professionally trained fitness trainers and users were used to assist in the shooting. After screening, 10 voluntary users were finally selected. The movements of these 10 users were quite standard, so the captured images were used for the training set. Afterward, for fairness in the experiment, another 10 untrained users were found to assist in filming and used the test set data. Since most of the users are not professionally trained, it was better and fairer to use these 10 users for the test set. When the 20 users were shooting images, we instructed them to complete 12 fitness movements, each of which was performed 3 to These fitness movements include common fitness movements. In particular, these movements are closer to home fitness. Home fitness is usually performed with simple equipment, such as yoga mats, dumbbells, and horizontal bars, without the need for large fitness equipment. Usually, large fitness equipment has a fixed movement trajectory, but these 12 types of fitness movements are all irregular movement trajectories. That is, different exercise users complete these fitness movements in different postures, which increases the difficulty of image recognition. Therefore, it is difficult to build an image database that can identify these fitness movements. Additionally, these image data go through a labeling process, which is time-consuming and labor-intensive.
In the experiment, 20 users were selected and videos of the 20 users when exercising were used to create a video database V. These videos contain 12 types of fitness movements by the 20 users. In the experiment, the users were asked to perform these 12 types of exercises in a row, and each exercise was repeated 3 to 5 times. Every time a fitness movement was performed, the user was required to complete a complete motion trajectory and constantly change the shooting angle. The video format was 30 frames per second, and the length and width were 540 × 540 pixels. Table 2 shows the video time captured when 20 users performed fitness movements. The total shooting time was 62 min and 47 s.
In the selection of fitness exercises, we selected 12 fitness exercises under the advice of fitness trainers, which included chest, back, legs, abs, biceps, triceps, and preparations. These movements can be done using dumbbells or with bare hands, and they are also relatively introductory and popular of all fitness movements. To build the database, professionally trained fitness trainers and users were used to assist in the shooting. After screening, 10 voluntary users were finally selected. The movements of these 10 users were quite standard, so the captured images were used for the training set. Afterward, for fairness in the experiment, another 10 untrained users were found to assist in filming and used the test set data. Since most of the users are not professionally trained, it was better and fairer to use these 10 users for the test set. When the 20 users were shooting images, we instructed them to complete 12 fitness movements, each of which was performed 3 to These fitness movements include common fitness movements. In particular, these movements are closer to home fitness. Home fitness is usually performed with simple equipment, such as yoga mats, dumbbells, and horizontal bars, without the need for large fitness equipment. Usually, large fitness equipment has a fixed movement trajectory, but these 12 types of fitness movements are all irregular movement trajectories. That is, different exercise users complete these fitness movements in different postures, which increases the difficulty of image recognition. Therefore, it is difficult to build an image database that can identify these fitness movements. Additionally, these image data go through a labeling process, which is time-consuming and labor-intensive.
In the experiment, 20 users were selected and videos of the 20 users when exercising were used to create a video database V. These videos contain 12 types of fitness movements by the 20 users. In the experiment, the users were asked to perform these 12 types of exercises in a row, and each exercise was repeated 3 to 5 times. Every time a fitness movement was performed, the user was required to complete a complete motion trajectory and constantly change the shooting angle. The video format was 30 frames per second, and the length and width were 540 × 540 pixels. Table 2 shows the video time captured when 20 users performed fitness movements. The total shooting time was 62 min and 47 s.
In the selection of fitness exercises, we selected 12 fitness exercises under the advice of fitness trainers, which included chest, back, legs, abs, biceps, triceps, and preparations. These movements can be done using dumbbells or with bare hands, and they are also relatively introductory and popular of all fitness movements. To build the database, professionally trained fitness trainers and users were used to assist in the shooting. After screening, 10 voluntary users were finally selected. The movements of these 10 users were quite standard, so the captured images were used for the training set. Afterward, for fairness in the experiment, another 10 untrained users were found to assist in filming and used the test set data. Since most of the users are not professionally trained, it was better and fairer to use these 10 users for the test set. When the 20 users were shooting images, we instructed them to complete 12 fitness movements, each of which was performed 3 to In the selection of fitness exercises, we selected 12 fitness exercises under the advice of fitness trainers, which included chest, back, legs, abs, biceps, triceps, and preparations. These movements can be done using dumbbells or with bare hands, and they are also relatively introductory and popular of all fitness movements. To build the database, professionally trained fitness trainers and users were used to assist in the shooting. After screening, 10 voluntary users were finally selected. The movements of these 10 users were quite standard, so the captured images were used for the training set. Afterward, for fairness in the experiment, another 10 untrained users were found to assist in filming and used the test set data. Since most of the users are not professionally trained, it was better and fairer to use these 10 users for the test set. When the 20 users were shooting images, we instructed them to complete 12 fitness movements, each of which was performed 3 to 5 times according to each user's habits, with no rest time in between, and completed in the same background.

Image Dataset
The method proposed in this paper was based on deep transfer learning, so Yolov4 training was important. Yolov4 needs to use images for training, so the images in V needed to be converted to images. In the V video, each fitness movement was completed in 1 to 3 s on average [17]. To record the fitness track from 0% to 100%, the experiment stored the video as an image every 10 frames. Using this method to convert video to image can successfully record the entire fitness track, as shown in Table 3. After training Yolov4 with these images, each user's continuous movements while exercising are successfully detected. In addition to the complete recording of the motion trajectories, database I also contained images of various shooting angles of each fitness movement, as shown in Table 4. The images in database I contained complete fitness movement trajectories and images from various angles, which could better train Yolov4. We obtained a total of 13,160 fitness images from 20 users.

Image Dataset
The method proposed in this paper was based on deep transfer learning, so Yolov4 training was important. Yolov4 needs to use images for training, so the images in V needed to be converted to images. In the V video, each fitness movement was completed in 1 to 3 s on average [17]. To record the fitness track from 0% to 100%, the experiment stored the video as an image every 10 frames. Using this method to convert video to image can successfully record the entire fitness track, as shown in Table 3. After training Yolov4 with these images, each user's continuous movements while exercising are successfully detected. In addition to the complete recording of the motion trajectories, database I also contained images of various shooting angles of each fitness movement, as shown in Table  4. The images in database I contained complete fitness movement trajectories and images from various angles, which could better train Yolov4. We obtained a total of 13,160 fitness images from 20 users.  Database I contained fitness images of 20 users. To add more users, backgrounds, and shooting angles, this paper collected fitness images online. These images contained

Image Dataset
The method proposed in this paper was based on deep transfer learning, so Yolov4 training was important. Yolov4 needs to use images for training, so the images in V needed to be converted to images. In the V video, each fitness movement was completed in 1 to 3 s on average [17]. To record the fitness track from 0% to 100%, the experiment stored the video as an image every 10 frames. Using this method to convert video to image can successfully record the entire fitness track, as shown in Table 3. After training Yolov4 with these images, each user's continuous movements while exercising are successfully detected. In addition to the complete recording of the motion trajectories, database I also contained images of various shooting angles of each fitness movement, as shown in Table  4. The images in database I contained complete fitness movement trajectories and images from various angles, which could better train Yolov4. We obtained a total of 13,160 fitness images from 20 users.  Database I contained fitness images of 20 users. To add more users, backgrounds, and shooting angles, this paper collected fitness images online. These images contained

Image Dataset
The method proposed in this paper was based on deep transfer learning, so Yolov4 training was important. Yolov4 needs to use images for training, so the images in V needed to be converted to images. In the V video, each fitness movement was completed in 1 to 3 s on average [17]. To record the fitness track from 0% to 100%, the experiment stored the video as an image every 10 frames. Using this method to convert video to image can successfully record the entire fitness track, as shown in Table 3. After training Yolov4 with these images, each user's continuous movements while exercising are successfully detected. In addition to the complete recording of the motion trajectories, database I also contained images of various shooting angles of each fitness movement, as shown in Table  4. The images in database I contained complete fitness movement trajectories and images from various angles, which could better train Yolov4. We obtained a total of 13,160 fitness images from 20 users.  Database I contained fitness images of 20 users. To add more users, backgrounds, and shooting angles, this paper collected fitness images online. These images contained

Image Dataset
The method proposed in this paper was based on deep transfer learning, so Yolov4 training was important. Yolov4 needs to use images for training, so the images in V needed to be converted to images. In the V video, each fitness movement was completed in 1 to 3 s on average [17]. To record the fitness track from 0% to 100%, the experiment stored the video as an image every 10 frames. Using this method to convert video to image can successfully record the entire fitness track, as shown in Table 3. After training Yolov4 with these images, each user's continuous movements while exercising are successfully detected. In addition to the complete recording of the motion trajectories, database I also contained images of various shooting angles of each fitness movement, as shown in Table  4. The images in database I contained complete fitness movement trajectories and images from various angles, which could better train Yolov4. We obtained a total of 13,160 fitness images from 20 users. Table 3. The motion track recorded after converting the video database to images. Database I contained fitness images of 20 users. To add more users, backgrounds, and shooting angles, this paper collected fitness images online. These images contained

Image Dataset
The method proposed in this paper was based on deep transfer learning, so Yolov4 training was important. Yolov4 needs to use images for training, so the images in V needed to be converted to images. In the V video, each fitness movement was completed in 1 to 3 s on average [17]. To record the fitness track from 0% to 100%, the experiment stored the video as an image every 10 frames. Using this method to convert video to image can successfully record the entire fitness track, as shown in Table 3. After training Yolov4 with these images, each user's continuous movements while exercising are successfully detected. In addition to the complete recording of the motion trajectories, database I also contained images of various shooting angles of each fitness movement, as shown in Table  4. The images in database I contained complete fitness movement trajectories and images from various angles, which could better train Yolov4. We obtained a total of 13,160 fitness images from 20 users. Table 3. The motion track recorded after converting the video database to images. Database I contained fitness images of 20 users. To add more users, backgrounds, and shooting angles, this paper collected fitness images online. These images contained

Image Dataset
The method proposed in this paper was based on deep transfer learning, so Yolov4 training was important. Yolov4 needs to use images for training, so the images in V needed to be converted to images. In the V video, each fitness movement was completed in 1 to 3 s on average [17]. To record the fitness track from 0% to 100%, the experiment stored the video as an image every 10 frames. Using this method to convert video to image can successfully record the entire fitness track, as shown in Table 3. After training Yolov4 with these images, each user's continuous movements while exercising are successfully detected. In addition to the complete recording of the motion trajectories, database I also contained images of various shooting angles of each fitness movement, as shown in Table  4. The images in database I contained complete fitness movement trajectories and images from various angles, which could better train Yolov4. We obtained a total of 13,160 fitness images from 20 users. Table 3. The motion track recorded after converting the video database to images. Database I contained fitness images of 20 users. To add more users, backgrounds, and shooting angles, this paper collected fitness images online. These images contained

Image Dataset
The method proposed in this paper was based on deep transfer learning, so Yolov4 training was important. Yolov4 needs to use images for training, so the images in V needed to be converted to images. In the V video, each fitness movement was completed in 1 to 3 s on average [17]. To record the fitness track from 0% to 100%, the experiment stored the video as an image every 10 frames. Using this method to convert video to image can successfully record the entire fitness track, as shown in Table 3. After training Yolov4 with these images, each user's continuous movements while exercising are successfully detected. In addition to the complete recording of the motion trajectories, database I also contained images of various shooting angles of each fitness movement, as shown in Table  4. The images in database I contained complete fitness movement trajectories and images from various angles, which could better train Yolov4. We obtained a total of 13,160 fitness images from 20 users. Table 3. The motion track recorded after converting the video database to images. Database I contained fitness images of 20 users. To add more users, backgrounds, and shooting angles, this paper collected fitness images online. These images contained

Image Dataset
The method proposed in this paper was based on deep transfer learning, so Yolov4 training was important. Yolov4 needs to use images for training, so the images in V needed to be converted to images. In the V video, each fitness movement was completed in 1 to 3 s on average [17]. To record the fitness track from 0% to 100%, the experiment stored the video as an image every 10 frames. Using this method to convert video to image can successfully record the entire fitness track, as shown in Table 3. After training Yolov4 with these images, each user's continuous movements while exercising are successfully detected. In addition to the complete recording of the motion trajectories, database I also contained images of various shooting angles of each fitness movement, as shown in Table  4. The images in database I contained complete fitness movement trajectories and images from various angles, which could better train Yolov4. We obtained a total of 13,160 fitness images from 20 users. Table 3. The motion track recorded after converting the video database to images. Database I contained fitness images of 20 users. To add more users, backgrounds, and shooting angles, this paper collected fitness images online. These images contained

Image Dataset
The method proposed in this paper was based on deep transfer learning, so Yolov4 training was important. Yolov4 needs to use images for training, so the images in V needed to be converted to images. In the V video, each fitness movement was completed in 1 to 3 s on average [17]. To record the fitness track from 0% to 100%, the experiment stored the video as an image every 10 frames. Using this method to convert video to image can successfully record the entire fitness track, as shown in Table 3. After training Yolov4 with these images, each user's continuous movements while exercising are successfully detected. In addition to the complete recording of the motion trajectories, database I also contained images of various shooting angles of each fitness movement, as shown in Table  4. The images in database I contained complete fitness movement trajectories and images from various angles, which could better train Yolov4. We obtained a total of 13,160 fitness images from 20 users. Table 3. The motion track recorded after converting the video database to images. Database I contained fitness images of 20 users. To add more users, backgrounds, and shooting angles, this paper collected fitness images online. These images contained

Image Dataset
The method proposed in this paper was based on deep transfer learning, so Yolov4 training was important. Yolov4 needs to use images for training, so the images in V needed to be converted to images. In the V video, each fitness movement was completed in 1 to 3 s on average [17]. To record the fitness track from 0% to 100%, the experiment stored the video as an image every 10 frames. Using this method to convert video to image can successfully record the entire fitness track, as shown in Table 3. After training Yolov4 with these images, each user's continuous movements while exercising are successfully detected. In addition to the complete recording of the motion trajectories, database I also contained images of various shooting angles of each fitness movement, as shown in Table  4. The images in database I contained complete fitness movement trajectories and images from various angles, which could better train Yolov4. We obtained a total of 13,160 fitness images from 20 users. Table 3. The motion track recorded after converting the video database to images. Database I contained fitness images of 20 users. To add more users, backgrounds, and shooting angles, this paper collected fitness images online. These images contained Database I contained fitness images of 20 users. To add more users, backgrounds, and shooting angles, this paper collected fitness images online. These images contained screenshots taken from fitness images and videos on platforms such as Youtube and Google. These online images were stored in database I with a pixel size of 540 × 540. The total number of online images was 2964, plus 13,160 fitness images from 20 users. Therefore, database I contained a total of 16,124 images and 12 fitness types.

Image Label
When database I was prepared, the images were labeled. This paper used the image labeling tool "LabelImg" and performed labeling according to the format required by Yolov4 training. The marking process is shown in Figure 2. The labeling process generates a txt file for each image, which contains the image category and the coordinate position of the object. The markers in the experiment included fitness users, objects on their bodies, and dumbbells. screenshots taken from fitness images and videos on platforms such as Youtube and Google. These online images were stored in database I with a pixel size of 540 × 540. The total number of online images was 2964, plus 13,160 fitness images from 20 users. Therefore, database I contained a total of 16,124 images and 12 fitness types.

Image Label
When database was prepared, the images were labeled. This paper used the image labeling tool "LabelImg" and performed labeling according to the format required by Yolov4 training. The marking process is shown in Figure 2. The labeling process generates a txt file for each image, which contains the image category and the coordinate position of the object. The markers in the experiment included fitness users, objects on their bodies, and dumbbells.

Training and Testing Dataset Formation
This paper collected a complete image database to train Yolov4 and implemented deep transfer learning so that Yolov4 could better detect fitness movements. To fairly verify the performance of deep transfer learning, the experiments were divided into training and test sets. The training set V contained the fitness videos of 10 users, from no. 1 to no. 10 in Table 2. The test set was V , which contained the fitness videos of 10 users from no. 11 to no. 20. The video in V had a longer shooting time because it contained more fitness shooting angles, which enabled Yolov4 to detect more fitness shooting angles. The video in V contained only one fitness camera angle and was used to test performance. After that, the videos in video database were stored every 10 frames, and database was established. The images in the training set I were from V and online, while the images in the test set I were from V . Finally, the image database was labeled to generate training sets L and L , and the number of images is shown in Table 5. L contained a total of 12,301 images for training Yolov4, and L contained a total of 3823 images for testing the performance of Yolov4.

Training and Testing Dataset Formation
This paper collected a complete image database to train Yolov4 and implemented deep transfer learning so that Yolov4 could better detect fitness movements. To fairly verify the performance of deep transfer learning, the experiments were divided into training and test sets. The training set V tr contained the fitness videos of 10 users, from no. 1 to no. 10 in Table 2. The test set was V tt , which contained the fitness videos of 10 users from no. 11 to no. 20. The video in V tr had a longer shooting time because it contained more fitness shooting angles, which enabled Yolov4 to detect more fitness shooting angles. The video in V tt contained only one fitness camera angle and was used to test performance. After that, the videos in video database V were stored every 10 frames, and database I was established. The images in the training set I tr were from V tr and online, while the images in the test set I tt were from V tt . Finally, the image database I was labeled to generate training sets L tr and L tt , and the number of images is shown in Table 5. L tr contained a total of 12,301 images for training Yolov4, and L tt contained a total of 3823 images for testing the performance of Yolov4.

. Dataset Preparation Body Nodes Detection
Mediapipe is an open source tool published by Google in 2019. This tool is used for image vision detection. Mediapipe supports many image-vision-based human detection methods, such as face recognition, human body recognition, and gesture recognition [18]. Because Mediapipe supports a variety of programming languages, as well as open source databases, and has high accuracy and fast computing speed, it has been widely used.
This paper used the Mediapipe BlazePose algorithm provided by Mediapipe, which is a human body detection method that can calculate the 33 nodes of the human body [19], as shown in Figure 3. The algorithm is mainly aimed at the detection of human body posture and can calculate the coordinate position of each joint of the human body. There are 33 such coordinates, ranging from 0 to 32. Except for coordinate 0, "nose," all other coordinates are symmetrical. Fitness movements are carried out mainly through the movement of the joints of the body, so it is quite suitable to use Medipipe to detect the nodes of joint movements of the body. Human body detection by this method has already been trained, so no additional data collection was required to train the model. Medipipe is great for detecting fitness movements. Mediapipe is an open source tool published by Google in 2019. This tool is used for image vision detection. Mediapipe supports many image-vision-based human detection methods, such as face recognition, human body recognition, and gesture recognition [18]. Because Mediapipe supports a variety of programming languages, as well as open source databases, and has high accuracy and fast computing speed, it has been widely used.
This paper used the Mediapipe BlazePose algorithm provided by Mediapipe, which is a human body detection method that can calculate the 33 nodes of the human body [19], as shown in Figure 3. The algorithm is mainly aimed at the detection of human body posture and can calculate the coordinate position of each joint of the human body. There are 33 such coordinates, ranging from 0 to 32. Except for coordinate 0, "nose," all other coordinates are symmetrical. Fitness movements are carried out mainly through the movement of the joints of the body, so it is quite suitable to use Medipipe to detect the nodes of joint movements of the body. Human body detection by this method has already been trained, so no additional data collection was required to train the model. Medipipe is great for detecting fitness movements. Currently, there is a method of using Mediapipe to identify fitness movements. This method first uses Mediapipe to identify the nodes of the whole body and obtain the coordinate positions of the nodes. Each coordinate is then used to detect the current fitness category using a K-nearest-neighbor (K-NN) classifier [20]. Using this method, it is simple to count the nodes of the body and detect the fitness category. However, when performing fitness movements, many joints of the body are blocked, which leads to the loss of body nodes detected by Mediapipe. As shown in Figure 4a,b, when exercising with the shooting angle on the side, only half of the body nodes were detected, and the other body nodes Currently, there is a method of using Mediapipe to identify fitness movements. This method first uses Mediapipe to identify the nodes of the whole body and obtain the coordinate positions of the nodes. Each coordinate is then used to detect the current fitness category using a K-nearest-neighbor (K-NN) classifier [20]. Using this method, it is simple to count the nodes of the body and detect the fitness category. However, when performing Sensors 2022, 22, 5700 9 of 21 fitness movements, many joints of the body are blocked, which leads to the loss of body nodes detected by Mediapipe. As shown in Figure 4a,b, when exercising with the shooting angle on the side, only half of the body nodes were detected, and the other body nodes were lost. In Figure 4c, the wrist is blocked by the fitness equipment, leading to detection node error. At this time, the loss and error of the body nodes are likely to cause a misjudgment when using the K-NN algorithm to classify the fitness types. However, Yolov4 can solve this problem. Since Yolov4 is a detection method based on image vision, it does not need to rely on node detection of the body. Therefore, as long as training images are sufficient and include a variety of angles, users, and backgrounds, the classification performance of fitness types can be better. were lost. In Figure 4c, the wrist is blocked by the fitness equipment, leading to detection node error. At this time, the loss and error of the body nodes are likely to cause a misjudgment when using the K-NN algorithm to classify the fitness types. However, Yolov4 can solve this problem. Since Yolov4 is a detection method based on image vision, it does not need to rely on node detection of the body. Therefore, as long as training images are sufficient and include a variety of angles, users, and backgrounds, the classification performance of fitness types can be better.

Node Angle Detection
This paper combined two methods, Yolov4 and Mediapipe, to detect fitness movements. Yolov4 detects the fitness type, and body nodes are detected by Mediapipe. The results of the two methods for simultaneously detecting fitness movements are shown in Table 6. When the user performs fitness movements, Yolov4 and Mediapipe detect them. At this time, even if the body is blocked by fitness equipment or a node is lost due to the side shooting angle, the detection of the fitness category is not affected. After Yolov4 and Mediapipe detected movements, the key nodes of each fitness movement, that is, NoI, was calculated. The NoI of each fitness movement is shown in Table 6, where the pink node is the NoI of each movement. The angle of the NoI can be calculated, and the completion degree of the current fitness movement can be determined. Through the coordinate positions of the two yellow nodes and and the NoI node in Table 6, the included angle of the NoI can be calculated, and the calculation formula is as follows: Here is the vector of to , and is the vector of to . Through this method, the angle of NoI can be calculated , and the fitness completion degree of the current user can be known according to .

Node Angle Detection
This paper combined two methods, Yolov4 and Mediapipe, to detect fitness movements. Yolov4 detects the fitness type, and body nodes are detected by Mediapipe. The results of the two methods for simultaneously detecting fitness movements are shown in Table 6. When the user performs fitness movements, Yolov4 and Mediapipe detect them. At this time, even if the body is blocked by fitness equipment or a node is lost due to the side shooting angle, the detection of the fitness category is not affected. After Yolov4 and Mediapipe detected movements, the key nodes of each fitness movement, that is, NoI, was calculated. The NoI of each fitness movement is shown in Table 6, where the pink node is the NoI of each movement. The angle of the NoI can be calculated, and the completion degree of the current fitness movement can be determined. Through the coordinate positions of the two yellow nodes P 1 and P 3 and the NoI node P 2 in Table 6, the included angle of the NoI can be calculated, and the calculation formula is as follows: Here P 2 P 1 is the vector of P 2 to P 1 , and P 2 P 3 is the vector of P 2 to P 3 . Through this method, the angle of NoI can be calculated Angle NoI , and the fitness completion degree of the current user can be known according to Angle NoI .
The NoI is automatically adjusted according to the type of fitness detected by Yolov4. As shown in Table 6, when the user did a squat, the NoI was adjusted to the position of the knee. When the user did a biceps-curl, the NoI was adjusted to the position of the elbow. Therefore, the NoI of each exercise is different from the Angle NoI required to complete the exercise. The position of P 1 , P 2 , and P 3 and the angle range of Angle NoI for each fitness movement are shown in Table 7 [21]. Start_Angle NoI indicates the initial angle of the joint when the exercise is ready, and End_Angle NoI indicates the final bending angle of the joint when the exercise is completed. Among them, "standing" is the preparation movement, so when the user's movement is "standing," the NoI does not change and adjust. Angle NoI was adjusted according to Angle NoI calculated by the user in V tr when exercising. The NoI was automatically adjusted by the fitness type detected by Yolov4, and Angle NoI was calculated according to the angle of the NoI. Finally, the fitness completion of the current user was determined through Angle NoI . Using this method, the current fitness movement can be detected instantly and accurately. At this time, even if the body is blocked by fitness equipment or a node is lost due to the side shooting angle, the detection of the fitness category is not affected. After Yolov4 and Mediapipe detected movements, the key nodes of each fitness movement, that is, NoI, was calculated. The NoI of each fitness movement is shown in Table 6, where the pink node is the NoI of each movement. The angle of the NoI can be calculated, and the completion degree of the current fitness movement can be determined. Through the coordinate positions of the two yellow nodes and and the NoI node in Table 6, the included angle of the NoI can be calculated, and the calculation formula is as follows: Here is the vector of to , and is the vector of to . Through this method, the angle of NoI can be calculated , and the fitness completion degree of the current user can be known according to . At this time, even if the body is blocked by fitness equipment or a node is lost due to the side shooting angle, the detection of the fitness category is not affected. After Yolov4 and Mediapipe detected movements, the key nodes of each fitness movement, that is, NoI, was calculated. The NoI of each fitness movement is shown in Table 6, where the pink node is the NoI of each movement. The angle of the NoI can be calculated, and the completion degree of the current fitness movement can be determined. Through the coordinate positions of the two yellow nodes and and the NoI node in Table 6, the included angle of the NoI can be calculated, and the calculation formula is as follows: Here is the vector of to , and is the vector of to . Through this method, the angle of NoI can be calculated , and the fitness completion degree of the current user can be known according to . At this time, even if the body is blocked by fitness equipment or a node is lost due to the side shooting angle, the detection of the fitness category is not affected. After Yolov4 and Mediapipe detected movements, the key nodes of each fitness movement, that is, NoI, was calculated. The NoI of each fitness movement is shown in Table 6, where the pink node is the NoI of each movement. The angle of the NoI can be calculated, and the completion degree of the current fitness movement can be determined. Through the coordinate positions of the two yellow nodes and and the NoI node in Table 6, the included angle of the NoI can be calculated, and the calculation formula is as follows: Here is the vector of to , and is the vector of to . Through this method, the angle of NoI can be calculated , and the fitness completion degree of the current user can be known according to . At this time, even if the body is blocked by fitness equipment or a node is lost due to the side shooting angle, the detection of the fitness category is not affected. After Yolov4 and Mediapipe detected movements, the key nodes of each fitness movement, that is, NoI, was calculated. The NoI of each fitness movement is shown in Table 6, where the pink node is the NoI of each movement. The angle of the NoI can be calculated, and the completion degree of the current fitness movement can be determined. Through the coordinate positions of the two yellow nodes and and the NoI node in Table 6, the included angle of the NoI can be calculated, and the calculation formula is as follows: Here is the vector of to , and is the vector of to . Through this method, the angle of NoI can be calculated , and the fitness completion degree of the current user can be known according to . The NoI is automatically adjusted according to the type of fitness detected by Yolov4. As shown in Table 6, when the user did a squat, the NoI was adjusted to the position of the knee. When the user did a biceps-curl, the NoI was adjusted to the position of the elbow. Therefore, the NoI of each exercise is different from the required to complete the exercise. The position of , , and and the angle range of for each fitness movement are shown in Table 7 [21].
_ indicates the initial angle of the joint when the exercise is ready, and _ indicates the final bending angle of the joint when the exercise is completed. Among them, "standing" is the preparation movement, so when the user's movement is "standing," the NoI does not change and adjust.
was adjusted according to calculated by the user in when exercising. The NoI was automatically adjusted by the fitness type detected by Yolov4, and was calculated according to the angle of the NoI. Finally, the fitness completion of the current user was determined through . Using this method, the current fitness movement can be detected instantly and accurately. Table 7. The position of , , and corresponding to the body node in Figure 3. The NoI is automatically adjusted according to the type of fitness detected by Yolov4. As shown in Table 6, when the user did a squat, the NoI was adjusted to the position of the knee. When the user did a biceps-curl, the NoI was adjusted to the position of the elbow. Therefore, the NoI of each exercise is different from the required to complete the exercise. The position of , , and and the angle range of for each fitness movement are shown in Table 7 [21].

Fitness
_ indicates the initial angle of the joint when the exercise is ready, and _ indicates the final bending angle of the joint when the exercise is completed. Among them, "standing" is the preparation movement, so when the user's movement is "standing," the NoI does not change and adjust.
was adjusted according to calculated by the user in when exercising. The NoI was automatically adjusted by the fitness type detected by Yolov4, and was calculated according to the angle of the NoI. Finally, the fitness completion of the current user was determined through . Using this method, the current fitness movement can be detected instantly and accurately. Table 7. The position of , , and corresponding to the body node in Figure 3. The NoI is automatically adjusted according to the type of fitness detected by Yolov4. As shown in Table 6, when the user did a squat, the NoI was adjusted to the position of the knee. When the user did a biceps-curl, the NoI was adjusted to the position of the elbow. Therefore, the NoI of each exercise is different from the required to complete the exercise. The position of , , and and the angle range of for each fitness movement are shown in Table 7 [21].

Fitness
_ indicates the initial angle of the joint when the exercise is ready, and _ indicates the final bending angle of the joint when the exercise is completed. Among them, "standing" is the preparation movement, so when the user's movement is "standing," the NoI does not change and adjust.
was adjusted according to calculated by the user in when exercising. The NoI was automatically adjusted by the fitness type detected by Yolov4, and was calculated according to the angle of the NoI. Finally, the fitness completion of the current user was determined through . Using this method, the current fitness movement can be detected instantly and accurately. Table 7. The position of , , and corresponding to the body node in Figure 3. The NoI is automatically adjusted according to the type of fitness detected by Yolov4. As shown in Table 6, when the user did a squat, the NoI was adjusted to the position of the knee. When the user did a biceps-curl, the NoI was adjusted to the position of the elbow. Therefore, the NoI of each exercise is different from the required to complete the exercise. The position of , , and and the angle range of for each fitness movement are shown in Table 7 [21].

Fitness
_ indicates the initial angle of the joint when the exercise is ready, and _ indicates the final bending angle of the joint when the exercise is completed. Among them, "standing" is the preparation movement, so when the user's movement is "standing," the NoI does not change and adjust.
was adjusted according to calculated by the user in when exercising. The NoI was automatically adjusted by the fitness type detected by Yolov4, and was calculated according to the angle of the NoI. Finally, the fitness completion of the current user was determined through . Using this method, the current fitness movement can be detected instantly and accurately. Table 7. The position of , , and corresponding to the body node in Figure 3. The NoI is automatically adjusted according to the type of fitness detected by Yolov4. As shown in Table 6, when the user did a squat, the NoI was adjusted to the position of the knee. When the user did a biceps-curl, the NoI was adjusted to the position of the elbow. Therefore, the NoI of each exercise is different from the required to complete the exercise. The position of , , and and the angle range of for each fitness movement are shown in Table 7 [21].

Fitness (NoI) _ _
_ indicates the initial angle of the joint when the exercise is ready, and _ indicates the final bending angle of the joint when the exercise is completed. Among them, "standing" is the preparation movement, so when the user's movement is "standing," the NoI does not change and adjust.
was adjusted according to calculated by the user in when exercising. The NoI was automatically adjusted by the fitness type detected by Yolov4, and was calculated according to the angle of the NoI. Finally, the fitness completion of the current user was determined through . Using this method, the current fitness movement can be detected instantly and accurately. Table 7. The position of , , and corresponding to the body node in Figure 3. The NoI is automatically adjusted according to the type of fitness detected by Yolov4. As shown in Table 6, when the user did a squat, the NoI was adjusted to the position of the knee. When the user did a biceps-curl, the NoI was adjusted to the position of the elbow. Therefore, the NoI of each exercise is different from the required to complete the exercise. The position of , , and and the angle range of for each fitness movement are shown in Table 7 [21].

Fitness (NoI) _ _
_ indicates the initial angle of the joint when the exercise is ready, and _ indicates the final bending angle of the joint when the exercise is completed. Among them, "standing" is the preparation movement, so when the user's movement is "standing," the NoI does not change and adjust.
was adjusted according to calculated by the user in when exercising. The NoI was automatically adjusted by the fitness type detected by Yolov4, and was calculated according to the angle of the NoI. Finally, the fitness completion of the current user was determined through . Using this method, the current fitness movement can be detected instantly and accurately. Table 7. The position of , , and corresponding to the body node in Figure 3. The NoI is automatically adjusted according to the type of fitness detected by Yolov4. As shown in Table 6, when the user did a squat, the NoI was adjusted to the position of the knee. When the user did a biceps-curl, the NoI was adjusted to the position of the elbow. Therefore, the NoI of each exercise is different from the required to complete the exercise. The position of , , and and the angle range of for each fitness movement are shown in Table 7 [21].

Fitness (NoI) _ _
_ indicates the initial angle of the joint when the exercise is ready, and _ indicates the final bending angle of the joint when the exercise is completed. Among them, "standing" is the preparation movement, so when the user's movement is "standing," the NoI does not change and adjust.
was adjusted according to calculated by the user in when exercising. The NoI was automatically adjusted by the fitness type detected by Yolov4, and was calculated according to the angle of the NoI. Finally, the fitness completion of the current user was determined through . Using this method, the current fitness movement can be detected instantly and accurately. Table 7. The position of , , and corresponding to the body node in Figure 3. The NoI is automatically adjusted according to the type of fitness detected by Yolov4. As shown in Table 6, when the user did a squat, the NoI was adjusted to the position of the knee. When the user did a biceps-curl, the NoI was adjusted to the position of the elbow. Therefore, the NoI of each exercise is different from the required to complete the exercise. The position of , , and and the angle range of for each fitness movement are shown in Table 7 [21].

Fitness (NoI) _ _
_ indicates the initial angle of the joint when the exercise is ready, and _ indicates the final bending angle of the joint when the exercise is completed. Among them, "standing" is the preparation movement, so when the user's movement is "standing," the NoI does not change and adjust.
was adjusted according to calculated by the user in when exercising. The NoI was automatically adjusted by the fitness type detected by Yolov4, and was calculated according to the angle of the NoI. Finally, the fitness completion of the current user was determined through . Using this method, the current fitness movement can be detected instantly and accurately. Table 7. The position of , , and corresponding to the body node in Figure 3. Squat  24  26  28  100  170  Pull-up  12  14  16  80  170  Push-up  12  14  16  80  170  Sit-up  12  24  26  100  120  Standing  Biceps-curl  12  14  16  80  160  Bulgarian-split-squat  24  26  28 110 160 Table 7. The position of P 1 , P 2 , and P 3 corresponding to the body node in Figure 3. After Yolov4 and Mediapipe detected the fitness exercise, the user's fitness type, body joint nodes, and NoI can be obtained. The included angle of NoI can be calculated by Angle NoI , and then through the change in Angle NoI , one can understand the speed and completion of the user's fitness. The fitness completion degree was calculated according to Start_Angle NoI and End_Angle NoI in Table 7. The fitness completion degree Completion NoI is calculated as follows:

Fitness
Here Completion NoI is between 0 and 1, Start_Angle NoI indicates the initial angle set by Angle NoI for the fitness movement, and End_Angle NoI indicates the final angle set by Angle NoI when the fitness movement is completed. Completion NoI indicates the degree of completion of the fitness movement. Generally, completing a complete fitness exercise increases Completion NoI from 0% to 100%, which then decreases to 0%, and this change is stable and slow [22].

Fitness Completeness Definition and Dataset Formation
All videos contained in V tr and V tt were detected by Yolov4 and Mediapipe, and then Completion NoI of the fitness movement was calculated. Completion NoI is displayed in the form of a 1D signal waveform, and database W was created. The 1D wavelines of the video output of V tr and V tt were stored as training set V tr and test set V tt . The way of establishing database W is shown in Figure 5, wherein the wave travel of the 1D signal was established by Completion NoI every 100 frames, and the step is 50 frames. As shown in Table 8, after databases W tr and W tt were established, W tr contained a total of 657 records and W tt contained a total of 587 records. These data contained the data of 12 types of fitness movements.
Here is between 0 and 1, _ indicates the initial angle set by for the fitness movement, and _ indicates the final angle set by when the fitness movement is completed. indicates the degree of completion of the fitness movement. Generally, completing a complete fitness exercise increases from 0% to 100%, which then decreases to 0%, and this change is stable and slow [22].

Fitness Completeness Definition and Dataset Formation
All videos contained in V and V were detected by Yolov4 and Mediapipe, and then of the fitness movement was calculated. is displayed in the form of a 1D signal waveform, and database was created. The 1D wavelines of the video output of V and V were stored as training set V and test set V . The way of establishing database is shown in Figure 5, wherein the wave travel of the 1D signal was established by every 100 frames, and the step is 50 frames. As shown in Table 8, after databases W and W were established, W contained a total of 657 records and W contained a total of 587 records. These data contained the data of 12 types of fitness movements.
To perform fitness movements completely, there must be complete range of motion. Therefore, this paper simply divided the 1D waveform data into three categories: complete, no-complete, and no-movement. The three types of waveforms are shown in Figure  6. These categories were judged as follows [21,22]: • Complete: rose from 0% to 100% and then dropped to 0%, during which the change was stable and slow. In addition, when the value was between 0% and 100%, there was a short stop. • No-complete: did not rise to 100% or drop to 0% but did not stop at 0% and 100%. In addition, the value change was unstable and fast.
• No-movement: had almost no change, that is, the state of preparation for fitness movements.  To perform fitness movements completely, there must be complete range of motion. Therefore, this paper simply divided the 1D waveform data into three categories: complete, no-complete, and no-movement. The three types of waveforms are shown in Figure 6. These categories were judged as follows [21,22]:

•
Complete: Completion NoI rose from 0% to 100% and then dropped to 0%, during which the change was stable and slow. In addition, when the value was between 0% and 100%, there was a short stop.
• No-complete: Completion NoI did not rise to 100% or drop to 0% but did not stop at 0% and 100%. In addition, the value change was unstable and fast. • No-movement: Completion NoI had almost no change, that is, the state of preparation for fitness movements.

Fitness Type Detection
Yolo has achieved quite good performance in the task of object detection and also has good performance in detection speed and accuracy [23], so it is widely used in the task of real-time object detection [24]. Fitness moves are continuous, and each move is usually completed in seconds, so a way to detect objects in real time was needed, and Yolo fit the bill.
Yolo continues to improve with this release, with improved object detection accuracy and speed. The Yolov4 method was released in April 2020 [25], and it has received great attention and discussion. Compared with Yolov3, Yolov4 improves 10% AP and 12% frame per second (FPS) and uses the Cross Stage Paritial Darknet 53 (CSPDarknet53) network architecture [26], which can enable Yolov4 to provide faster detection speed and accuracy. In this paper, Darknet was used to train Yolov4. Darknet is an open source neural network architecture [27], which is written in C and CUDA languages, which can train Yolov4 simply and quickly and effectively reduce the training time. Darknet supports the use of the computer's CPU and GPU for computing, and the use of GPU computing can bring about a faster training speed.
The most important part of the deep transfer learning algorithm proposed in this paper was the training of Yolov4. The complete fitness databases and were used to train and test Yolov4.
was added to Darknet and used to train Yolov4 and then obtain weight . After that, was added to darknet, and was used to test the performance of Yolov4. The test results were compared and introduced in later sections.
The video of database V was used to test the performance of Yolov4, where the detected results of V are shown in Table 9. Each fitness category was successfully detected with a fairly high confidence score. This means that database collected in this paper had enough image data and Yolov4 was fully trained. After the detection of Yolov4, the user's fitness movement was detected in real time by category and the user's location.

Fitness Type Detection
Yolo has achieved quite good performance in the task of object detection and also has good performance in detection speed and accuracy [23], so it is widely used in the task of real-time object detection [24]. Fitness moves are continuous, and each move is usually completed in seconds, so a way to detect objects in real time was needed, and Yolo fit the bill.
Yolo continues to improve with this release, with improved object detection accuracy and speed. The Yolov4 method was released in April 2020 [25], and it has received great attention and discussion. Compared with Yolov3, Yolov4 improves 10% AP and 12% frame per second (FPS) and uses the Cross Stage Paritial Darknet 53 (CSPDarknet53) network architecture [26], which can enable Yolov4 to provide faster detection speed and accuracy. In this paper, Darknet was used to train Yolov4. Darknet is an open source neural network architecture [27], which is written in C and CUDA languages, which can train Yolov4 simply and quickly and effectively reduce the training time. Darknet supports the use of the computer's CPU and GPU for computing, and the use of GPU computing can bring about a faster training speed.
The most important part of the deep transfer learning algorithm proposed in this paper was the training of Yolov4. The complete fitness databases L tr and L tt were used to train and test Yolov4. L tr was added to Darknet and used to train Yolov4 and then obtain weight W f . After that, L tt was added to darknet, and W f was used to test the performance of Yolov4. The test results were compared and introduced in later sections.
The video of database V tt was used to test the performance of Yolov4, where the detected results of V tt are shown in Table 9. Each fitness category was successfully detected with a fairly high confidence score. This means that database L tr collected in this paper had enough image data and Yolov4 was fully trained. After the detection of Yolov4, the user's fitness movement was detected in real time by category and the user's location.

Fitness Completeness Detection
W tr and W tt contained 1D signals and were divided into three categories. Classifying 1D signals using machine learning methods is a relatively simple task and therefore does not require the use of complex network models. This paper used the MLP to classify these 1D signals [28]. The MLP, also called an artificial neural network (ANN), is a model [29] that belongs to supervised learning. It can quickly solve complex classification problems. The network model contains the input layer of the first layer, the middle hidden layer, and the final output layer. This paper used a 3-layer hidden layer and a 2-layer dense layer and used Dropout to reduce overfitting. The W tr data were used to train the MLP, after which W tt was used to test the performance of the MLP.

Fitness Completeness Detection
W and W contained 1D signals and were divided into three categories. Classifying 1D signals using machine learning methods is a relatively simple task and therefore does not require the use of complex network models. This paper used the MLP to classify these 1D signals [28]. The MLP, also called an artificial neural network (ANN), is a model [29] that belongs to supervised learning. It can quickly solve complex classification problems. The network model contains the input layer of the first layer, the middle hidden layer, and the final output layer. This paper used a 3-layer hidden layer and a 2-layer dense layer and used Dropout to reduce overfitting. The W data were used to train the MLP, after which W was used to test the performance of the MLP.

Experimental Setup
This paper used Yolov4 and Mediapipe to detect fitness movements and finally used the MLP to classify the status of the fitness movements. Among them, Yolov4 and the MLP used the databases L and W established in this paper for training. Yolov4 was trained using the yolov4.conv.137 network framework in the Darknet network, which includes many experimental settings. The experimental settings for training Yolov4 are shown in Table 10. L was used to train Yolov4, and weights were obtained after training. The iterations were set to 100,000, but the iterations required for different data types, the number of categories, and data quantities were different. To understand the iterations required for database L built in this paper, the iterations were 10,000, 20,000…100,000 to train Yolov4 and compare its performance. The experimental settings for training the MLP are shown in Table 11, using to train the MLP. Since the data size in was a 1D signal of 100, it is not like Yolov4 had length and width and image channels, but the data size was set to 100.

Fitness Completeness Detection
W and W contained 1D signals and were divided into three categories. Classifying 1D signals using machine learning methods is a relatively simple task and therefore does not require the use of complex network models. This paper used the MLP to classify these 1D signals [28]. The MLP, also called an artificial neural network (ANN), is a model [29] that belongs to supervised learning. It can quickly solve complex classification problems. The network model contains the input layer of the first layer, the middle hidden layer, and the final output layer. This paper used a 3-layer hidden layer and a 2-layer dense layer and used Dropout to reduce overfitting. The W data were used to train the MLP, after which W was used to test the performance of the MLP.

Experimental Setup
This paper used Yolov4 and Mediapipe to detect fitness movements and finally used the MLP to classify the status of the fitness movements. Among them, Yolov4 and the MLP used the databases L and W established in this paper for training. Yolov4 was trained using the yolov4.conv.137 network framework in the Darknet network, which includes many experimental settings. The experimental settings for training Yolov4 are shown in Table 10. L was used to train Yolov4, and weights were obtained after training. The iterations were set to 100,000, but the iterations required for different data types, the number of categories, and data quantities were different. To understand the iterations required for database L built in this paper, the iterations were 10,000, 20,000…100,000 to train Yolov4 and compare its performance. The experimental settings for training the MLP are shown in Table 11, using to train the MLP. Since the data size in was a 1D signal of 100, it is not like Yolov4 had length and width and image channels, but the data size was set to 100.

Fitness Completeness Detection
W and W contained 1D signals and were divided into three categories. Classifying 1D signals using machine learning methods is a relatively simple task and therefore does not require the use of complex network models. This paper used the MLP to classify these 1D signals [28]. The MLP, also called an artificial neural network (ANN), is a model [29] that belongs to supervised learning. It can quickly solve complex classification problems. The network model contains the input layer of the first layer, the middle hidden layer, and the final output layer. This paper used a 3-layer hidden layer and a 2-layer dense layer and used Dropout to reduce overfitting. The W data were used to train the MLP, after which W was used to test the performance of the MLP.

Experimental Setup
This paper used Yolov4 and Mediapipe to detect fitness movements and finally used the MLP to classify the status of the fitness movements. Among them, Yolov4 and the MLP used the databases L and W established in this paper for training. Yolov4 was trained using the yolov4.conv.137 network framework in the Darknet network, which includes many experimental settings. The experimental settings for training Yolov4 are shown in Table 10. L was used to train Yolov4, and weights were obtained after training. The iterations were set to 100,000, but the iterations required for different data types, the number of categories, and data quantities were different. To understand the iterations required for database L built in this paper, the iterations were 10,000, 20,000…100,000 to train Yolov4 and compare its performance. The experimental settings for training the MLP are shown in Table 11, using to train the MLP. Since the data size in was a 1D signal of 100, it is not like Yolov4 had length and width and image channels, but the data size was set to 100.

Fitness Completeness Detection
W and W contained 1D signals and were divided into three categories. Classifying 1D signals using machine learning methods is a relatively simple task and therefore does not require the use of complex network models. This paper used the MLP to classify these 1D signals [28]. The MLP, also called an artificial neural network (ANN), is a model [29] that belongs to supervised learning. It can quickly solve complex classification problems. The network model contains the input layer of the first layer, the middle hidden layer, and the final output layer. This paper used a 3-layer hidden layer and a 2-layer dense layer and used Dropout to reduce overfitting. The W data were used to train the MLP, after which W was used to test the performance of the MLP.

Experimental Setup
This paper used Yolov4 and Mediapipe to detect fitness movements and finally used the MLP to classify the status of the fitness movements. Among them, Yolov4 and the MLP used the databases L and W established in this paper for training. Yolov4 was trained using the yolov4.conv.137 network framework in the Darknet network, which includes many experimental settings. The experimental settings for training Yolov4 are shown in Table 10. L was used to train Yolov4, and weights were obtained after training. The iterations were set to 100,000, but the iterations required for different data types, the number of categories, and data quantities were different. To understand the iterations required for database L built in this paper, the iterations were 10,000, 20,000…100,000 to train Yolov4 and compare its performance. The experimental settings for training the MLP are shown in Table 11, using to train the MLP. Since the data size in was a 1D signal of 100, it is not like Yolov4 had length and width and image channels, but the data size was set to 100.

Fitness Completeness Detection
W and W contained 1D signals and were divided into three categories. Classifying 1D signals using machine learning methods is a relatively simple task and therefore does not require the use of complex network models. This paper used the MLP to classify these 1D signals [28]. The MLP, also called an artificial neural network (ANN), is a model [29] that belongs to supervised learning. It can quickly solve complex classification problems. The network model contains the input layer of the first layer, the middle hidden layer, and the final output layer. This paper used a 3-layer hidden layer and a 2-layer dense layer and used Dropout to reduce overfitting. The W data were used to train the MLP, after which W was used to test the performance of the MLP.

Experimental Setup
This paper used Yolov4 and Mediapipe to detect fitness movements and finally used the MLP to classify the status of the fitness movements. Among them, Yolov4 and the MLP used the databases L and W established in this paper for training. Yolov4 was trained using the yolov4.conv.137 network framework in the Darknet network, which includes many experimental settings. The experimental settings for training Yolov4 are shown in Table 10. L was used to train Yolov4, and weights were obtained after training. The iterations were set to 100,000, but the iterations required for different data types, the number of categories, and data quantities were different. To understand the iterations required for database L built in this paper, the iterations were 10,000, 20,000…100,000 to train Yolov4 and compare its performance. The experimental settings for training the MLP are shown in Table 11, using to train the MLP. Since the data size in was a 1D signal of 100, it is not like Yolov4 had length and width and image channels, but the data size was set to 100.

Fitness Completeness Detection
W and W contained 1D signals and were divided into three categories. Classifying 1D signals using machine learning methods is a relatively simple task and therefore does not require the use of complex network models. This paper used the MLP to classify these 1D signals [28]. The MLP, also called an artificial neural network (ANN), is a model [29] that belongs to supervised learning. It can quickly solve complex classification problems. The network model contains the input layer of the first layer, the middle hidden layer, and the final output layer. This paper used a 3-layer hidden layer and a 2-layer dense layer and used Dropout to reduce overfitting. The W data were used to train the MLP, after which W was used to test the performance of the MLP.

Experimental Setup
This paper used Yolov4 and Mediapipe to detect fitness movements and finally used the MLP to classify the status of the fitness movements. Among them, Yolov4 and the MLP used the databases L and W established in this paper for training. Yolov4 was trained using the yolov4.conv.137 network framework in the Darknet network, which includes many experimental settings. The experimental settings for training Yolov4 are shown in Table 10. L was used to train Yolov4, and weights were obtained after training. The iterations were set to 100,000, but the iterations required for different data types, the number of categories, and data quantities were different. To understand the iterations required for database L built in this paper, the iterations were 10,000, 20,000…100,000 to train Yolov4 and compare its performance. The experimental settings for training the MLP are shown in Table 11, using to train the MLP. Since the data size in was a 1D signal of 100, it is not like Yolov4 had length and width and image channels, but the data size was set to 100.

Fitness Completeness Detection
W and W contained 1D signals and were divided into three categories. Classifying 1D signals using machine learning methods is a relatively simple task and therefore does not require the use of complex network models. This paper used the MLP to classify these 1D signals [28]. The MLP, also called an artificial neural network (ANN), is a model [29] that belongs to supervised learning. It can quickly solve complex classification problems. The network model contains the input layer of the first layer, the middle hidden layer, and the final output layer. This paper used a 3-layer hidden layer and a 2-layer dense layer and used Dropout to reduce overfitting. The W data were used to train the MLP, after which W was used to test the performance of the MLP.

Experimental Setup
This paper used Yolov4 and Mediapipe to detect fitness movements and finally used the MLP to classify the status of the fitness movements. Among them, Yolov4 and the MLP used the databases L and W established in this paper for training. Yolov4 was trained using the yolov4.conv.137 network framework in the Darknet network, which includes many experimental settings. The experimental settings for training Yolov4 are shown in Table 10. L was used to train Yolov4, and weights were obtained after training. The iterations were set to 100,000, but the iterations required for different data types, the number of categories, and data quantities were different. To understand the iterations required for database L built in this paper, the iterations were 10,000, 20,000…100,000 to train Yolov4 and compare its performance. The experimental settings for training the MLP are shown in Table 11, using to train the MLP. Since the data size in was a 1D signal of 100, it is not like Yolov4 had length and width and image channels, but the data size was set to 100.

Fitness Completeness Detection
W and W contained 1D signals and were divided into three categories. Classifying 1D signals using machine learning methods is a relatively simple task and therefore does not require the use of complex network models. This paper used the MLP to classify these 1D signals [28]. The MLP, also called an artificial neural network (ANN), is a model [29] that belongs to supervised learning. It can quickly solve complex classification problems. The network model contains the input layer of the first layer, the middle hidden layer, and the final output layer. This paper used a 3-layer hidden layer and a 2-layer dense layer and used Dropout to reduce overfitting. The W data were used to train the MLP, after which W was used to test the performance of the MLP.

Experimental Setup
This paper used Yolov4 and Mediapipe to detect fitness movements and finally used the MLP to classify the status of the fitness movements. Among them, Yolov4 and the MLP used the databases L and W established in this paper for training. Yolov4 was trained using the yolov4.conv.137 network framework in the Darknet network, which includes many experimental settings. The experimental settings for training Yolov4 are shown in Table 10. L was used to train Yolov4, and weights were obtained after training. The iterations were set to 100,000, but the iterations required for different data types, the number of categories, and data quantities were different. To understand the iterations required for database L built in this paper, the iterations were 10,000, 20,000…100,000 to train Yolov4 and compare its performance. The experimental settings for training the MLP are shown in Table 11, using to train the MLP. Since the data size in was a 1D signal of 100, it is not like Yolov4 had length and width and image channels, but the data size was set to 100.

Fitness Completeness Detection
W and W contained 1D signals and were divided into three categories. Classifying 1D signals using machine learning methods is a relatively simple task and therefore does not require the use of complex network models. This paper used the MLP to classify these 1D signals [28]. The MLP, also called an artificial neural network (ANN), is a model [29] that belongs to supervised learning. It can quickly solve complex classification problems. The network model contains the input layer of the first layer, the middle hidden layer, and the final output layer. This paper used a 3-layer hidden layer and a 2-layer dense layer and used Dropout to reduce overfitting. The W data were used to train the MLP, after which W was used to test the performance of the MLP.

Experimental Setup
This paper used Yolov4 and Mediapipe to detect fitness movements and finally used the MLP to classify the status of the fitness movements. Among them, Yolov4 and the MLP used the databases L and W established in this paper for training. Yolov4 was trained using the yolov4.conv.137 network framework in the Darknet network, which includes many experimental settings. The experimental settings for training Yolov4 are shown in Table 10. L was used to train Yolov4, and weights were obtained after training. The iterations were set to 100,000, but the iterations required for different data types, the number of categories, and data quantities were different. To understand the iterations required for database L built in this paper, the iterations were 10,000, 20,000…100,000 to train Yolov4 and compare its performance. The experimental settings for training the MLP are shown in Table 11, using to train the MLP. Since the data size in was a 1D signal of 100, it is not like Yolov4 had length and width and image channels, but the data size was set to 100.

Fitness Completeness Detection
W and W contained 1D signals and were divided into three categories. Classifying 1D signals using machine learning methods is a relatively simple task and therefore does not require the use of complex network models. This paper used the MLP to classify these 1D signals [28]. The MLP, also called an artificial neural network (ANN), is a model [29] that belongs to supervised learning. It can quickly solve complex classification problems. The network model contains the input layer of the first layer, the middle hidden layer, and the final output layer. This paper used a 3-layer hidden layer and a 2-layer dense layer and used Dropout to reduce overfitting. The W data were used to train the MLP, after which W was used to test the performance of the MLP.

Experimental Setup
This paper used Yolov4 and Mediapipe to detect fitness movements and finally used the MLP to classify the status of the fitness movements. Among them, Yolov4 and the MLP used the databases L and W established in this paper for training. Yolov4 was trained using the yolov4.conv.137 network framework in the Darknet network, which includes many experimental settings. The experimental settings for training Yolov4 are shown in Table 10. L was used to train Yolov4, and weights were obtained after training. The iterations were set to 100,000, but the iterations required for different data types, the number of categories, and data quantities were different. To understand the iterations required for database L built in this paper, the iterations were 10,000, 20,000…100,000 to train Yolov4 and compare its performance. The experimental settings for training the MLP are shown in Table 11, using to train the MLP. Since the data size in was a 1D signal of 100, it is not like Yolov4 had length and width and image channels, but the data size was set to 100.

Fitness Completeness Detection
W and W contained 1D signals and were divided into three categories. Classifying 1D signals using machine learning methods is a relatively simple task and therefore does not require the use of complex network models. This paper used the MLP to classify these 1D signals [28]. The MLP, also called an artificial neural network (ANN), is a model [29] that belongs to supervised learning. It can quickly solve complex classification problems. The network model contains the input layer of the first layer, the middle hidden layer, and the final output layer. This paper used a 3-layer hidden layer and a 2-layer dense layer and used Dropout to reduce overfitting. The W data were used to train the MLP, after which W was used to test the performance of the MLP.

Experimental Setup
This paper used Yolov4 and Mediapipe to detect fitness movements and finally used the MLP to classify the status of the fitness movements. Among them, Yolov4 and the MLP used the databases L and W established in this paper for training. Yolov4 was trained using the yolov4.conv.137 network framework in the Darknet network, which includes many experimental settings. The experimental settings for training Yolov4 are shown in Table 10. L was used to train Yolov4, and weights were obtained after training. The iterations were set to 100,000, but the iterations required for different data types, the number of categories, and data quantities were different. To understand the iterations required for database L built in this paper, the iterations were 10,000, 20,000…100,000 to train Yolov4 and compare its performance. The experimental settings for training the MLP are shown in Table 11, using to train the MLP. Since the data size in was a 1D signal of 100, it is not like Yolov4 had length and width and image channels, but the data size was set to 100.

Fitness Completeness Detection
W and W contained 1D signals and were divided into three categories. Classifying 1D signals using machine learning methods is a relatively simple task and therefore does not require the use of complex network models. This paper used the MLP to classify these 1D signals [28]. The MLP, also called an artificial neural network (ANN), is a model [29] that belongs to supervised learning. It can quickly solve complex classification problems. The network model contains the input layer of the first layer, the middle hidden layer, and the final output layer. This paper used a 3-layer hidden layer and a 2-layer dense layer and used Dropout to reduce overfitting. The W data were used to train the MLP, after which W was used to test the performance of the MLP.

Experimental Setup
This paper used Yolov4 and Mediapipe to detect fitness movements and finally used the MLP to classify the status of the fitness movements. Among them, Yolov4 and the MLP used the databases L and W established in this paper for training. Yolov4 was trained using the yolov4.conv.137 network framework in the Darknet network, which includes many experimental settings. The experimental settings for training Yolov4 are shown in Table 10. L was used to train Yolov4, and weights were obtained after training. The iterations were set to 100,000, but the iterations required for different data types, the number of categories, and data quantities were different. To understand the iterations required for database L built in this paper, the iterations were 10,000, 20,000…100,000 to train Yolov4 and compare its performance. The experimental settings for training the MLP are shown in Table 11, using to train the MLP. Since the data size in was a 1D signal of 100, it is not like Yolov4 had length and width and image channels, but the data size was set to 100.

Experimental Setup
This paper used Yolov4 and Mediapipe to detect fitness movements and finally used the MLP to classify the status of the fitness movements. Among them, Yolov4 and the MLP used the databases L tr and W tr established in this paper for training. Yolov4 was trained using the yolov4.conv.137 network framework in the Darknet network, which includes many experimental settings. The experimental settings for training Yolov4 are shown in Table 10. L tr was used to train Yolov4, and weights W f were obtained after training. The iterations were set to 100,000, but the iterations required for different data types, the number of categories, and data quantities were different. To understand the iterations required for database L tr built in this paper, the iterations were 10,000, 20,000 . . . 100,000 to train Yolov4 and compare its performance. The experimental settings for training the MLP are shown in Table 11, using W tr to train the MLP. Since the data size in W tr was a 1D signal of 100, it is not like Yolov4 had length and width and image channels, but the data size was set to 100.

Evaluation Index
When Yolov4 and the MLP were trained, performance was tested. Among them, Yolov4 obtained 12 types of detection results and the MLP obtained 3 types of detection results, both of which belonged to the classification methods in machine learning. According to the classification results of each category, true positives (TPs), false positives (FPs), false negatives (FNs), and true negatives (TNs) were obtained. The introduction of these four evaluation indicators is as follows: According to the number of TPs, FPs, FNs, and TNs, the classification performance of Yolov4 and the MLP can be understood. When the number of TPs is large, it indicates that the number of correct classifications for the experiment is greater. Then, accuracy, precision, recall, and the F1-score were calculated as follows: In addition, the predicted box detected in Yolov4 was evaluated using the intersection over union (IoU). The calculation method of the IoU is as follows: Here, area of overlap represents the area where the actual box overlaps with the estimated box. The area of union represents the area of the union of the actual box and the estimated box. The larger the estimated overlap area with the actual box, the better the performance.
In addition to these performance evaluation indicators, the mean average precision (mAP), FPS, and Yolov4 training time were also used as evaluation indicators. Among them, there are many methods of evaluating the mAP. This paper used the PascalVOC 2010-2012 mAP algorithm [30]. The FPS is the number of frames per second that can be calculated when Yolov4 detects the video of L tt . Finally, the training time of Yolov4 was also evaluated, and the training time was affected by the iteration. Therefore, later sections will evaluate the changes in detection performance for different iterations and compare them.

Results and Discussion
The performance of Yolov4 is shown first, but before entering the performance evaluation, the iteration settings of Yolov4 training were compared and the iterations were 10 settings to train Yolov4: 10,000, 20,000...100,000. Although the higher the iteration setting is, the loss will generally continue to decrease, but this also increases the time cost of training and there is a risk of overfitting. So, a suitable iteration should be found and trained. L tr was used to train Yolov4, and L tt was used to test the performance of Yolov4. The performance of Yolov4 at different iterations is shown in Figures 7 and 8. In the performance comparison, the IoU thresholds were set at 0.5 and 0.75. The experimental results showed that when the iteration was set to 50,000, high performance was obtained. After the iteration exceeded 50,000, the performance decreased, and the performance did not improve until the iteration was 100,000. So considering the time cost of training, the iterations were set to 50,000. mated box. The area of union represents the area of the union of the actual box and the estimated box. The larger the estimated overlap area with the actual box, the better the performance.
In addition to these performance evaluation indicators, the mean average precision (mAP), FPS, and Yolov4 training time were also used as evaluation indicators. Among them, there are many methods of evaluating the mAP. This paper used the PascalVOC 2010-2012 mAP algorithm [30]. The FPS is the number of frames per second that can be calculated when Yolov4 detects the video of . Finally, the training time of Yolov4 was also evaluated, and the training time was affected by the iteration. Therefore, later sections will evaluate the changes in detection performance for different iterations and compare them.

Results and Discussion
The performance of Yolov4 is shown first, but before entering the performance evaluation, the iteration settings of Yolov4 training were compared and the iterations were 10 settings to train Yolov4: 10,000, 20,000...100,000. Although the higher the iteration setting is, the loss will generally continue to decrease, but this also increases the time cost of training and there is a risk of overfitting. So, a suitable iteration should be found and trained. L was used to train Yolov4, and was used to test the performance of Yolov4. The performance of Yolov4 at different iterations is shown in Figures 7 and 8. In the performance comparison, the IoU thresholds were set at 0.5 and 0.75. The experimental results showed that when the iteration was set to 50,000, high performance was obtained. After the iteration exceeded 50,000, the performance decreased, and the performance did not improve until the iteration was 100,000. So considering the time cost of training, the iterations were set to 50,000.  There were two performances in the experimental results. The first was the performance of using Yolov4 to detect fitness movement categories and the second the performance of using the MLP to classify fitness movement 1D signal waveforms. L tt was used to test the performance of Yolov4, and W tt was used to test the performance of the MLP. The experimental results of Yolov4 are shown in Table 12. The experimental results showed that mAP achieved high performance. This is because the image in L tt contained only one fitness user, that is, only one fitness user appears in each image. However, the mAP showed high performance, which means that database L tr completely trained Yolov4. The results showed that when the IoU threshold was set to 0.5, the accuracy was 98.56%, precision was 97.9%, recall was 98.56%, and the F1-score was 98.23%. The FPS averaged 17.5 when running on a laptop with an i7-1185G7 CPU and a GTX-1650Ti GPU. This means that when there are 12 fitness movements in the category, Yolov4's fitness category detection achieves quite high performance and has the ability to process in real time. To avoid detection errors of several frames, in the detection of V tt , a buffer of 15 frames was set, which is equivalent to a buffer time of 0.5 s. Only when 15 frames of images are incorrectly detected will the current fitness type detected by Yolov4 change and the position of NoI will change. That is, according to the experimental results in Table 12, when V tt is detected by Yolov4, it is difficult for the fitness type to be detected incorrectly. There were two performances in the experimental results. The first was the performance of using Yolov4 to detect fitness movement categories and the second the performance of using the MLP to classify fitness movement 1D signal waveforms. L was used to test the performance of Yolov4, and W was used to test the performance of the MLP. The experimental results of Yolov4 are shown in Table 12. The experimental results showed that mAP achieved high performance. This is because the image in contained only one fitness user, that is, only one fitness user appears in each image. However, the mAP showed high performance, which means that database completely trained Yolov4. The results showed that when the IoU threshold was set to 0.5, the accuracy was 98.56%, precision was 97.9%, recall was 98.56%, and the F1-score was 98.23%. The FPS averaged 17.5 when running on a laptop with an i7-1185G7 CPU and a GTX-1650Ti GPU. This means that when there are 12 fitness movements in the category, Yolov4's fitness category detection achieves quite high performance and has the ability to process in real time. To avoid detection errors of several frames, in the detection of , a buffer of 15 frames was set, which is equivalent to a buffer time of 0.5 s. Only when 15 frames of images are incorrectly detected will the current fitness type detected by Yolov4 change and the position of NoI will change. That is, according to the experimental results in Table 12,  when is detected by Yolov4, it is difficult for the fitness type to be detected incorrectly. The results of classifying using the MLP are shown in Table 13. The results showed that the accuracy was 92.84%, precision was 92.85, recall was 92.84%, and the F1score was 92.83%. Although this classification result did not achieve high performance, it could still effectively classify the fitness status. The confusion matrix of the MLP classification results is shown in Figure 9, which shows that the classification results of complete and no-complete are poor. This is because the videos in database V were not specifically required to perform the complete and no-complete fitness movements when the videos were shot. Therefore, there is not a great difference between these two categories.  The results of classifying W tt using the MLP are shown in Table 13. The results showed that the accuracy was 92.84%, precision was 92.85, recall was 92.84%, and the F1-score was 92.83%. Although this classification result did not achieve high performance, it could still effectively classify the fitness status. The confusion matrix of the MLP classification results is shown in Figure 9, which shows that the classification results of complete and nocomplete are poor. This is because the videos in database V were not specifically required to perform the complete and no-complete fitness movements when the videos were shot. Therefore, there is not a great difference between these two categories. Although a small amount of W tt data is classified into different categories, the wave patterns are similar. Although the classification performance of the MLP is degraded due to this factor, it still provides valid classification results.   In this paper, a method based on deep transfer learning was used to build a complete database to train and test Yolov4. Yolov4 is an image detection method based on deep learning. This paper used Yolov4 to classify fitness movements.
The methods previously introduced by Hobeom Jeon et al. [11] and Ali Bidaran et al. [14] are both image-based motion detection methods, and both are classified by machine learning. The method of Yongpan Zou et al. [31] and Crema et al. [32] is to let the user wear an electronic wearable device to classify fitness movements through the signals of the electronic device. These methods all classify fitness movements, so the method proposed in this paper was compared with these methods. The experimental results are shown in Tables 14 and 15. Our method had an mAP of 99.71% and an accuracy of 98.56%. Compared to Hobeom Jeon et al. [11], the mAP improved performance by 9.21%. Compared to Yongpan Zou et al. [31], Crema et al. [32], and Ali Bidaran et al. [14], accuracy improved the performance by 2.49%, 4.2%, and 5.66%, respectively. This result shows that our deep-transfer-learning-based method can provide better classification performance and lead to better detection results for subsequent fitness movements.
In the analysis of fitness movements, we divided the completion of fitness movements into three categories and use the MLP to classify them. The experimental results are shown in Table 16; the accuracy of our method was 92.84%. Compared to the method of Yongpan Zou et al. [31], accuracy improved the performance by 2.14%. Our method is cheaper and does not have to consider the power consumption and hygiene issues of wearable devices. Compared to the method of Jiangkun Zhou et al. [12], accuracy In this paper, a method based on deep transfer learning was used to build a complete database to train and test Yolov4. Yolov4 is an image detection method based on deep learning. This paper used Yolov4 to classify fitness movements.
The methods previously introduced by Hobeom Jeon et al. [11] and Ali Bidaran et al. [14] are both image-based motion detection methods, and both are classified by machine learning. The method of Yongpan Zou et al. [31] and Crema et al. [32] is to let the user wear an electronic wearable device to classify fitness movements through the signals of the electronic device. These methods all classify fitness movements, so the method proposed in this paper was compared with these methods. The experimental results are shown in Tables 14 and 15. Our method had an mAP of 99.71% and an accuracy of 98.56%. Compared to Hobeom Jeon et al. [11], the mAP improved performance by 9.21%. Compared to Yongpan Zou et al. [31], Crema et al. [32], and Ali Bidaran et al. [14], accuracy improved the performance by 2.49%, 4.2%, and 5.66%, respectively. This result shows that our deep-transfer-learning-based method can provide better classification performance and lead to better detection results for subsequent fitness movements. Table 14. Comparison of the mAP for fitness movement classification.

Evaluation Index Accuracy
Ours 98.56% Yongpan Zou et al. [31] 96.07% Crema et al. [32] 94.36% Ali Bidaran et al. [14] 92.9% In the analysis of fitness movements, we divided the completion of fitness movements into three categories and use the MLP to classify them. The experimental results are shown in Table 16; the accuracy of our method was 92.84%. Compared to the method of Yongpan Zou et al. [31], accuracy improved the performance by 2.14%. Our method is cheaper and does not have to consider the power consumption and hygiene issues of wearable devices.
Compared to the method of Jiangkun Zhou et al. [12], accuracy improved the performance by 29.65%. Experimental results showed that our proposed method has better performance.
[34], the Kinect sensor was used to analyze fitness movements. Compared to our method, it increases image depth and also increases the cost. This method can successfully detect fitness movements, but the experimental results have not shown its performance, so it cannot be compared.
According to the experimental results and the performance comparison with other methods, the method proposed in this paper has the following contributions: • This paper proposed a low-cost and effective method for current research on imagebased fitness motion detection. This method has the advantages of low cost and real-time processing, and images captured by ordinary smartphones and network cameras can be used to detect fitness movements. It is proved by the experimental results that the method proposed in this paper can be practically applied to a variety of different users, and the detection performance is effective and immediate.

•
The method proposed in this paper does not require a professionally trained fitness trainer but trains Yolov4 and detects fitness movements through deep transfer learning. To achieve high-precision detection and fair performance evaluation, this paper collected images of 20 users and online images for training and testing Yolov4. The experimental results show that the database collected in this paper is sufficient to train Yolov4, and it can detect fitness movements under different angles, backgrounds, and users' shots. • This paper proposed a method combining Yolov4 and Mediapipe to detect fitness movements. Using Yolov4 to detect fitness categories can reduce errors caused by missing nodes and can detect fitness types from more angles. By further using Mediapipe to detect body nodes, one can understand the movement changes in the body in more detail and automatically adjust the position of the NoI according to the fitness type detected by Yolov4, which can effectively reduce the misjudgment of invalid nodes of the body and focus on valid nodes. • This paper proposed a method of using the MLP to detect 1D signal waveforms of fitness movements. This must rely on a method to automatically adjust the NoI, calculate the angle of the NoI, and detect the fitness completion and speed of the fitness user. Using this method, the current state of fitness can be classified simply and effectively and the basic fitness state classification results of fitness users can be obtained.
This method can detect fitness movements in real time, but there are still many areas that can be improved, which can be considered in the future as follows:

•
The deep learning methods used in this paper include Yolov4, Mediapipe, and the MLP. Therefore, in the future, adding some other machine learning algorithms, such as Genetic Algorithm, can used greatly improve performance [35].

•
In this paper, 20 users were selected to assist in shooting fitness images, and an image database was established. However, these images required a lot of labor when marking them. In addition, when shooting these images, the background is usually the same. Therefore, in the future, we will consider using image processing to automatically identify fitness users and automatically mark them. This can greatly reduce personnel use and effectively increase the number of images.
• In this study, 20 users and 12 fitness movements were used for training. Another 10 users were used for testing our system. In the future, we will increase the number of users and the number of fitness movements.

Conclusions
This paper proposed a method for detecting fitness movements based on deep transfer learning, which is an image-based method and has the advantages of low cost, timeliness, and accuracy. The method is mainly divided into four stages to complete, namely image database collection, Yolov4 detection of fitness categories, Mediapipe detection of body nodes and joint angles, and MLP classification of fitness 1D signal waveforms. This paper collected 20 users and online image data to train Yolov4 and detect the type of fitness movements. After that, Yolov4 and Mediapipe were combined to further detect the nodes of the body, which were used to calculate the joint angle of the body NoI during fitness. Finally, the change in angle was converted into a 1D fitness signal waveform, and the MLP was used to classify it. The experimental results showed that Yolov4, which is based on deep transfer learning training, has good classification performance for the detection of fitness movements. Among them, the mAP was 99.71%, accuracy was 98.56%, precision was 97.9%, recall was 98.56%, the F1-score was 98.23%, and the average FPS was 17.5, which means its classification performance is timely and accurate. This means that the image database collected in this paper can fully train Yolov4, which can produce good classification results for subsequent research on fitness detection. In the experiment of MLP classification of fitness 1D signal waveforms, the accuracy was 92.84%, precision was 92.85%, recall was 92.84%, and the F1-score is was 92.83%. This classified the 1D signal waveforms of fitness movements and obtained valid results. Compared to other methods, our proposed method has better performance. The experimental results show that the method proposed in this paper can effectively, timely, and accurately classify fitness movements and can effectively detect the current fitness state.