A Deep Learning Method for Foot Progression Angle Detection in Plantar Pressure Images

Foot progression angle (FPA) analysis is one of the core methods to detect gait pathologies as basic information to prevent foot injury from excessive in-toeing and out-toeing. Deep learning-based object detection can assist in measuring the FPA through plantar pressure images. This study aims to establish a precision model for determining the FPA. The precision detection of FPA can provide information with in-toeing, out-toeing, and rearfoot kinematics to evaluate the effect of physical therapy programs on knee pain and knee osteoarthritis. We analyzed a total of 1424 plantar images with three different You Only Look Once (YOLO) networks: YOLO v3, v4, and v5x, to obtain a suitable model for FPA detection. YOLOv4 showed higher performance of the profile-box, with average precision in the left foot of 100.00% and the right foot of 99.78%, respectively. Besides, in detecting the foot angle-box, the ground-truth has similar results with YOLOv4 (5.58 ± 0.10° vs. 5.86 ± 0.09°, p = 0.013). In contrast, there was a significant difference in FPA between ground-truth vs. YOLOv3 (5.58 ± 0.10° vs. 6.07 ± 0.06°, p < 0.001), and ground-truth vs. YOLOv5x (5.58 ± 0.10° vs. 6.75 ± 0.06°, p < 0.001). This result implies that deep learning with YOLOv4 can enhance the detection of FPA.


Introduction
Plantar image analysis is an effective tool for assessing pathological gait and rehabilitation effectiveness widely used in clinical practice [1]. Plantar pressure patterns and distributions, such as foot progression angle (FPA), provide detailed information to evaluate walking abnormalities [2][3][4]. FPA is defined as the angle made between the line of walking progression and the long axis of the foot. FPA represents the foot placement angle of the longitudinal foot axis during gait [5][6][7]. In-toeing and out-toeing, the most common types of FPA deviations, are associated with knee pain and fall risk [8,9]. The average values of intoeing and out-toeing are established when the FPAs are <0 • and >20 • , respectively [10,11]. In addition, detecting the FPA can accelerate the rehabilitation process and reduce knee YOLO is a deep learning model commonly used to predict image data such as plantar images [50,51]. YOLO is one of the most powerful and fastest object identification algorithms based on deep learning techniques in providing fast and precise solutions in medical image detection and classification [52,53]. The YOLO networks have several versions that can help accurately detect the FPA. Considering the need for precise results of the FPA, calculations with minimum error values are essential. Therefore, several versions of YOLO networks need to be compared to determine their performance in detecting the FPA in this study. The YOLO network is a one-stage object detection algorithm that can calculate the classification results and position coordinates [54]. Clinical examination of the FPA by the human eye was beneficial to evaluate the in-toeing and out-toeing that related to the basis of postural information [18]. However, evaluating the in-toeing and out-toeing is essential for knee pain information and provides information on the knee pain rehabilitation effect [20]. In addition, changes in FPA affect rearfoot eversion of rearfoot kinematics normalization [55]. This study uses deep learning in object detection for FPA object localization coordinates. Deep learning may improve precision from reported clinical screening results and human accuracies by 10-27% [56]. The precision detection of the FPA can provide information with in-toeing [38], out-toeing [57], and rearfoot kinematics [55] to evaluate the effect of physical therapy programs on knee pain and knee osteoarthritis [5].

Materials and Methods
Data used to prepare this article were obtained from the AIdea platform provided by Industrial Technology Research Institute (ITRI) of Taiwan (https://aidea-web.tw, accessed on 21 February 2021). This study used 1424 plantar pressure images as datasets, with each image of 120 pixels × 400 pixels. A professional data annotator from the data provider labeled the dataset to classify the foot axis point coordinates in the plantar pressure dataset. The image data were divided into a training set with 900 images, a validation set with 100 images, and a prediction test with 424 images. The labeled prediction test images were used as the ground-truth dataset in this study. However, the ground-truth dataset only provided the front and rear points of the foot axis in pixel coordinates.
Furthermore, the FPA could be calculated using the arctangent formula. All calculations were performed using computer equipment with the following hardware: Core I7-10700 CPU, 32 GB RAM, NVIDIA RTX 3080 10 Gb. This study was reported according to STROBE guideline recommendations [58] for reporting observational studies that were applied during study design, training, validation, and reporting of the prediction model.
YOLO is a state-of-the-art deep learning framework for real-time object recognition. YOLO supports real-time object detection significantly faster than earlier detection networks [50]. This model can run at various resolutions, ensuring both speed and precision, which can be beneficial in measuring the FPA. YOLOv3 became one of the state-of-the-art object detection algorithms [59]. Instead of utilizing mean square error to calculate the classification loss, YOLOv3 uses multi-label classification and binary cross-entropy loss for each label. YOLOv3's backbone is DarkNet-53, which replaces DarkNet-19 as a new feature extractor. The entire DarkNet-53 network is a chain of many blocks with some strides and 2 convolution layers in between to decrease dimension. Each block has a bottleneck structure of 1 × 1, followed by 3 × 3 filters with skip connections [60]. Alexey has introduced YOLOv4, the next version of YOLOv3, which runs twice as quickly as EfficientDet while providing equivalent performance [61]. Rather than using darknet-53 layers for feature extraction, YOLOv4 uses a modified version of CSPdarknet-53 as a backbone, with cross-stage-partial connections (CSP) employed to split the feature extraction connection into two pieces [62]. Instead of the leaky ReLU function used in YOLOv3 and YOLOv4-tiny, the Mish activation function is utilized in the YOLOv4. YOLOv5 was initially uploaded on GitHub in May 2020, and the maintainer gave the network the name YOLOv5 to avoid confusion with the previous release of YOLOv4 [63]. Implementing the state-of-the-art for deep learning networks, such as activation functions and data augmentation, and the usage of CSPNet as its backbone, are the key new features and enhancements in YOLOv5 [64]. This study used YOLOv3, YOLOv4, and YOLOv5 for measuring the FPA.
The training images were inserted into the YOLO model and processed for training purposes. The information of the predicted bounding boxes could be obtained based on the anchor boxes in the YOLO model. This study compared three different versions, i.e., YOLOv3, YOLOv4, and YOLOv5x, which solved object detection efficiently and straightforwardly [65]. The model's hyperparameters were as follows: The batch size and mini-batch size were 16 and 4, respectively; the momentum and weight decay were 0.9 and 0.0005, respectively; the initial learning rate was 0.001; the epoch model was 300. The detectors were based on Python 3.7.6, PyTorch 1.7.0 (used in YOLOv5x models), and the Darknet framework (used in YOLOv3 and YOLOv4 models) Windows 10.

Regular FPA Detection Procedure
We conducted five steps to get the FPA (Figure 1) from the data training into calculating the angles. First, we needed to determine the foot profile because the diagonal FPA direction of the left and right foot was different. Second, we trained the diagonal FPA using a bounding box to get the angle-box. The box itself has four corner points in its detection. Third, detecting four angle-box corner points in the diagonal FPA requires acquiring two points (front and rear foot axis points) selected according to the left foot or right foot profiles. Fourth, we used the diagonal FPA to calculate the angle of the FPA using the arctangent formula. Fifth, to confirm our two-foot axis point coordinate predictions, we checked the distance between the predicted and ground-truth points. YOLOv5 to avoid confusion with the previous release of YOLOv4 [63]. Implementing the state-of-the-art for deep learning networks, such as activation functions and data augmentation, and the usage of CSPNet as its backbone, are the key new features and enhancements in YOLOv5 [64]. This study used YOLOv3, YOLOv4, and YOLOv5 for measuring the FPA. The training images were inserted into the YOLO model and processed for training purposes. The information of the predicted bounding boxes could be obtained based on the anchor boxes in the YOLO model. This study compared three different versions, i.e., YOLOv3, YOLOv4, and YOLOv5x, which solved object detection efficiently and straightforwardly [65]. The model's hyperparameters were as follows: The batch size and minibatch size were 16 and 4, respectively; the momentum and weight decay were 0.9 and 0.0005, respectively; the initial learning rate was 0.001; the epoch model was 300. The detectors were based on Python 3.7.6, PyTorch 1.7.0 (used in YOLOv5x models), and the Darknet framework (used in YOLOv3 and YOLOv4 models) Windows 10.

Regular FPA Detection Procedure
We conducted five steps to get the FPA (Figure 1) from the data training into calculating the angles. First, we needed to determine the foot profile because the diagonal FPA direction of the left and right foot was different. Second, we trained the diagonal FPA using a bounding box to get the angle-box. The box itself has four corner points in its detection. Third, detecting four angle-box corner points in the diagonal FPA requires acquiring two points (front and rear foot axis points) selected according to the left foot or right foot profiles. Fourth, we used the diagonal FPA to calculate the angle of the FPA using the arctangent formula. Fifth, to confirm our two-foot axis point coordinate predictions, we checked the distance between the predicted and ground-truth points.  The first training section labeled the foot profile regarding the left or right position using the bounding box in the dataset (Figure 2A). Furthermore, we input datasets labeled to three different YOLO models, i.e., YOLOv3, v4, and v5x ( Figure 2B,C). For the prediction test section, we used 424 images to get the foot profile-box of the left and right feet ( Figure 2D). A foot profile-box was used to determine the left foot or right foot position since detecting the foot profiles essential for the differentiation direction of the FPA.
The first training section labeled the foot profile regarding the left or right position using the bounding box in the dataset (Figure 2A). Furthermore, we input datasets labeled to three different YOLO models, i.e., YOLOv3, v4, and v5x ( Figure 2B,C). For the prediction test section, we used 424 images to get the foot profile-box of the left and right feet ( Figure 2D). A foot profile-box was used to determine the left foot or right foot position since detecting the foot profiles essential for the differentiation direction of the FPA. The prediction test used 424 images to get the foot profile-box.

Angle-Box
We used a bounding-box to get the diagonal FPA regarding the angle-box prediction ( Figure 3A). In the training section, we input the data labeled by a professional data annotator ( Figure 3B) and used the three versions of YOLO models, namely v3, v4, and v5x ( Figure 3C). We tested 424 images to get the angle-box prediction and determine the points acquisition based on the foot profile-box ( Figure 3D). We used the diagonal FPA on the top left and bottom right for the left foot ( Figure 3E), while the right foot diagonal FPA was used on the top right and bottom left.

Angle-Box
We used a bounding-box to get the diagonal FPA regarding the angle-box prediction ( Figure 3A). In the training section, we input the data labeled by a professional data annotator ( Figure 3B) and used the three versions of YOLO models, namely v3, v4, and v5x ( Figure 3C). We tested 424 images to get the angle-box prediction and determine the points acquisition based on the foot profile-box ( Figure 3D). We used the diagonal FPA on the top left and bottom right for the left foot ( Figure 3E), while the right foot diagonal FPA was used on the top right and bottom left.
The first training section labeled the foot profile regarding the left or right position using the bounding box in the dataset ( Figure 2A). Furthermore, we input datasets labeled to three different YOLO models, i.e., YOLOv3, v4, and v5x ( Figure 2B,C). For the prediction test section, we used 424 images to get the foot profile-box of the left and right feet ( Figure 2D). A foot profile-box was used to determine the left foot or right foot position since detecting the foot profiles essential for the differentiation direction of the FPA. The prediction test used 424 images to get the foot profile-box.

Angle-Box
We used a bounding-box to get the diagonal FPA regarding the angle-box prediction ( Figure 3A). In the training section, we input the data labeled by a professional data annotator ( Figure 3B) and used the three versions of YOLO models, namely v3, v4, and v5x ( Figure 3C). We tested 424 images to get the angle-box prediction and determine the points acquisition based on the foot profile-box ( Figure 3D). We used the diagonal FPA on the top left and bottom right for the left foot ( Figure 3E), while the right foot diagonal FPA was used on the top right and bottom left.

Point Benchmark Acquisition
After getting the angle-box, the foot axis points were used to get the distance between the ground-truth and three YOLO models ( Figure 4). In addition, the YOLO models record four corner coordinates of the angle-box prediction by converting the YOLO coordinates (x, y, w, h) into pixel coordinate prediction (P 1x , P 1y , P 2x , P 2y ) [66]. In detail, the horizontal value in the front foot axis point was calculated in Equation (1). Next, the horizontal value in the rear foot axis point was calculated in Equation (2). Then, the vertical value in the front axis point was calculated using Equation (3). Finally, the vertical value in the rear axis point was calculated using Equation (4).
where P 1x , P 1y represent the front foot axis point coordinates and P 2x , P 2y represent the rear of the foot axis point coordinates. The center of the box coordinates is x and y, the width and height of the bounding box are w and h, width and height of the images are W and H (Figure 4). While P 1x and P 1y are the top left corner coordinates for the left foot and the top right corner for the right foot. The lower right corner coordinates for the left foot and the lower-left corner for the right foot are P 2x , P 2y .

Point Benchmark Acquisition
After getting the angle-box, the foot axis points were used to get the distance between the ground-truth and three YOLO models (Figure 4). In addition, the YOLO models record four corner coordinates of the angle-box prediction by converting the YOLO coordinates (x, y, w, h) into pixel coordinate prediction (P1x, P1y, P2x, P2y) [66]. In detail, the horizontal value in the front foot axis point was calculated in Equation (1). Next, the horizontal value in the rear foot axis point was calculated in Equation (2). Then, the vertical value in the front axis point was calculated using Equation (3). Finally, the vertical value in the rear axis point was calculated using Equation (4).
where   2.1.4. Using Diagonal FPA Detection to Get the FPA After getting the foot axis points in P 1x , P 1y , P 2x , and P 2y , we used diagonal FPA from P 1x and P 1y to P 2x and P 2y to get the FPA results. Then, we calculated the FPA using the arctangent formula [67] to get the angle of the A 1 and A 2 ( Figure 5). For the calculation, we used Equation (5). where θ is the angle in the degree of FPA in each image, the θ will be used to differentiate between the ground-truth and the three YOLO prediction results. For example, A 1 is the height of the angle-box, and A 3 is the diagonal FPA of the angle-box. The least angles differentiation will conclude the suitable model of YOLO versions in this study.
arctangent formula [67] to get the angle of the A1 and A2 ( Figure 5). For the c used Equation (5).
where is the angle in the degree of FPA in each image, the will be us tiate between the ground-truth and the three YOLO prediction results. For the height of the angle-box, and A3 is the diagonal FPA of the angle-box. T differentiation will conclude the suitable model of YOLO versions in this s Confirming the foot axis's front or rear points can affect the diagonal F used the two-point distance formula [68] for each image's ground-truth co YOLOv3, YOLOv4, and YOLOv5x coordinates values. We calculated the (i.e., G1 and G2) and Pi (i.e., P1 and P2) (Figure 4) by Equation (6).
where ⃗⃗⃗⃗⃗⃗⃗⃗⃗ is the distance value between the "ground-truth diagonal FPA nates" and "YOLO's diagonal FPA points coordinates". We conducted this times to get the distance between the ground-truth and the three YOLO mo Figure 5. The example of diagonal foot progression angle (FPA) of the angle-box in the YOLO model; G 1 (G 1x and G 1y ), Ground-truth for the front foot axis point; G 2 , ground-truth for the rear foot axis point in (G 2x and G 2y ); P 1 (P 1x and P 1y ), YOLO models prediction for front foot axis point; P 2 (P 2x and P 2y ), YOLO models prediction for the rear foot axis point; A 1 , the height of angle-box; A 2 , the width of the angle-box; A 3 , the diagonal FPA of the angle-box; θ, in degrees for FPA.

Measure the Distance
Confirming the foot axis's front or rear points can affect the diagonal FPA. This study used the two-point distance formula [68] for each image's ground-truth coordinates and YOLOv3, YOLOv4, and YOLOv5x coordinates values. We calculated the distance of G i (i.e., G 1 and G 2 ) and P i (i.e., P 1 and P 2 ) ( Figure 4) by Equation (6).
where → G l P l is the distance value between the "ground-truth diagonal FPA points coordinates" and "YOLO's diagonal FPA points coordinates". We conducted this formula three times to get the distance between the ground-truth and the three YOLO models.

Statistical Analysis
After getting all the values of FPA in the ground-truth and three YOLO models in each image, we compared the front and rear of the foot axis points on three YOLO models using a paired t-test. The paired t-test was used to describe the differences between points, determine which point affected detecting the FPA, and get the angle differentiation between the ground-truth and YOLO models. Finally, we used one-way ANOVA and LSD post hoc at the significance level of 0.01 to describe the significant difference between YOLO models and the ground-truth. The data were processed using SPSS 26 (IBM, Somers, New York, NY, USA).

Training Results
Average Precision (AP) and Mean Average Precision (mAP) are the most popular metrics used to evaluate object detection models [69]. A high mAP means that the trained model performs well [60]. Average precision (AP) and loss values of YOLOv3, YOLOv4, and YOLOv5x were calculated, as shown in Table 1. For training results of the profilebox, we used the AP to get detailed results of each class of foot profile-box prediction to determine the left and right foot position [70]. For example, YOLOv4 got the precision of the foot profile-box with an AP of 100.00% for the left foot and was the same high average precision similar for the right foot in 99.78% of AP results. For the foot angle-box, we used mAP. Here, the mAP and AP are the same as the mean because there is only one object. Furthermore, the average precision (mAP) of the training for the foot angle-box for YOLOv4 (97.98%) was 14.38% which was higher than YOLOv5x (96.90%) and 11.88% higher than YOLOv3 (86.32%).

FPA Comparison
The total sample data is 424 images, while the usable sample data is 367 images. This was caused by 57 samples having missing values. Missing values occurred because the deep learning model could not recognize the image; the data were excluded from further analysis [71,72]. Compared with the FPA from the ground-truth, three versions of YOLO models were calculated using one-way ANOVA and LSD post hoc to get the angle differentiation. YOLOv4 FPA (5.86 ± 0.09 • ) did not show any significant difference compared to ground-truth (5.58 ± 0.10 • ) ( Table 2). However, YOLOv3 and YOLOv5x were different compared to the ground-truth ( Figure 6). 5.58 ± 0.10 6.07 ± 0.06 5.86 ± 0.09 6.75 ± 0.06 <0.01 * <0.01 * 0.013 <0.01 * Note: , angle in degree; GT, Ground-truth; FPA, foot progression angle; *, a significant difference (p < 0.01) Figure 6. FPA comparison between GT with YOLOv3, v4, and v5x. GT, Ground-truth; *, a significan difference (p < 0.01).

Distance between Ground-Truth Point and Prediction Point
To confirm the foot axis point, we used paired t-test to get the distance differentiation between ⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ and ⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ in three YOLO models. Furthermore, we used one-way ANOVA and Fisher's LSD post hoc to get the distance differentiation of three YOLO mod els on ⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ and ⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ (Tables 3 and 4). The results showed that all comparisons were significantly different (Figures 7 and 8). ⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ , the distance between the front points of the ground-truth (G1) and YOLO model pre diction (P1); ⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ , the distance between the rear points of the ground-truth (G2) and YOLO mode prediction (P2). The value coordinates in this calculation were in the pixel; *, a significant difference (p < 0.01).

Distance between Ground-Truth Point and Prediction Point
To confirm the foot axis point, we used paired t-test to get the distance differentiation between → G 1 P 1 and → G 2 P 2 in three YOLO models. Furthermore, we used one-way ANOVA and Fisher's LSD post hoc to get the distance differentiation of three YOLO models on → G 1 P 1 and → G 2 P 2 (Tables 3 and 4). The results showed that all comparisons were significantly different (Figures 7 and 8). → G 1 P 1 , the distance between the front points of the ground-truth (G 1 ) and YOLO model prediction (P 1 ); → G 2 P 2 , the distance between the rear points of the ground-truth (G 2 ) and YOLO model prediction (P 2 ). The value coordinates in this calculation were in the pixel; *, a significant difference (p < 0.01).
To evaluate the FPA results we found in YOLO models predictions, we used an example plantar image to test our results using angle calculations through digital image software (Photoshop CS.5, Adobe Inc., San Jose, CA, USA) by comparing the ground-truth with YOLO prediction results [73]. First, we measured the ground-truth coordinate and got the FPA. Second, we validated our prediction of the foot axis points coordinates and calculated the FPA. As a result, our forecast approached the ground-truth angle (Figure 9). → G 1 P 1 , the distance between the front points of the ground-truth (G 1 ) and YOLO model prediction (P 1 ); → G 2 P 2 , the distance between the rear points of the ground-truth (G 2 ) and YOLO model prediction (P 2 ). The value coordinates in this calculation were in pixels; *, a significant difference (p < 0.01).   → G 1 P 1 , the distance between the front points of the ground-truth (G 1 ) and YOLO model prediction (P 1 ); → G 2 P 2 , the distance between the rear points of the ground-truth (G 2 ) and YOLO model prediction (P 2 ); *, a significant difference (p < 0.01).  18.25 ± 0.62 12.80 ± 0.43 <0.01 * Note: ⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ , the distance between the front points of the ground-truth (G1) and YOLO model prediction (P1); ⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ , the distance between the rear points of the ground-truth (G2) and YOLO model prediction (P2). The value coordinates in this calculation were in pixels; *, a significant difference (p < 0.01).  To evaluate the FPA results we found in YOLO models predictions, we used an example plantar image to test our results using angle calculations through digital image software (Photoshop CS.5, Adobe Inc., San Jose, CA, USA) by comparing the ground-truth with YOLO prediction results [73]. First, we measured the ground-truth coordinate and got the FPA. Second, we validated our prediction of the foot axis points coordinates and calculated the FPA. As a result, our forecast approached the ground-truth angle (Figure 9).

Discussion
This study used the profile-box and the angle-box labeling names to get the FPA. The profile-box uses the whole plantar pressure images to determine left and right foot profiles. The angle-box is inside a plantar pressure image from the heel to the metatarsal head without the toe region and is used to predict the FPA.
This study shows the effectiveness of deep learning with a small-scale data test containing 367 plantar images. In the profile-box, the YOLO training results showed that the YOLOv4 model has the highest mAP with 99.89%, the left foot profile gets the AP with 100.00% accuracy and the right foot profile with 99.78%. Furthermore, the YOLO training showed that the YOLOv4 model gets the highest mAP with 97.98% in the angle-box. However, the results of the FPA between the YOLOv4 prediction and ground-truth angle did not significantly differ, indicating that YOLOv4 and the ground-truth have similar results ( Figure 6). Besides, the foot axis's front point may affect the accuracy of detecting the FPA (Figure 7).
Therefore, the YOLO model is suitable for detecting the FPA from plantar pressure images based on object detection. These results may indicate that YOLO can help predict the FPA. In addition, the precision of YOLO models on the FPA may contribute to clinical practice by providing information on in-toeing, out-toeing, and rearfoot kinematics, in evaluating the effect of physical therapy programs on knee pain and knee osteoarthritis.

Discussion
This study used the profile-box and the angle-box labeling names to get the FPA. The profile-box uses the whole plantar pressure images to determine left and right foot profiles. The angle-box is inside a plantar pressure image from the heel to the metatarsal head without the toe region and is used to predict the FPA.
This study shows the effectiveness of deep learning with a small-scale data test containing 367 plantar images. In the profile-box, the YOLO training results showed that the YOLOv4 model has the highest mAP with 99.89%, the left foot profile gets the AP with 100.00% accuracy and the right foot profile with 99.78%. Furthermore, the YOLO training showed that the YOLOv4 model gets the highest mAP with 97.98% in the angle-box. However, the results of the FPA between the YOLOv4 prediction and ground-truth angle did not significantly differ, indicating that YOLOv4 and the ground-truth have similar results ( Figure 6). Besides, the foot axis's front point may affect the accuracy of detecting the FPA (Figure 7).
Therefore, the YOLO model is suitable for detecting the FPA from plantar pressure images based on object detection. These results may indicate that YOLO can help predict the FPA. In addition, the precision of YOLO models on the FPA may contribute to clinical practice by providing information on in-toeing, out-toeing, and rearfoot kinematics, in evaluating the effect of physical therapy programs on knee pain and knee osteoarthritis.

YOLO Deep Learning Performance
The normal FPA is an out-toeing angle that ranges from 5 • to 13 • in children [21]. For the adult population, a normal FPA is defined as between 0 • and 20 • [10]. Our results indicate that the data used in this study was for a normal FPA (Table 2). Our results showed that the FPA was different from the ground-truth (5.58 ± 0.10 • ) and three YOLO models (v3: 6.07 ± 0.06 • , v4: 5.86 ± 0.09 • , and v5x 6.75 ± 0.06 • ) estimated between 1.3 • to 1.9 • . The YOLO model can detect and estimate the precise FPA direction of the plantar pressure image. Deep learning can also detect and estimate the spinal curve angle of the trunk kinematics and limb. For spinal disorders and deformities object detection, Galbusera et al. showed that deep learning was trained to predict kyphosis angle, lordosis angle, and Cobb angle. The predicted parameters with an automated method resulted in standard estimate errors between 2.7 • and 9.5 • [74]. Alharbi et al. showed that deep learning object detection was used to automatically measure the scoliosis angle based on X-rays images and the differentiation from results was estimated at 5 • -10 • [75].
Furthermore, Hernandez et al. predicted lower limb joint angles from inertial measurement units using deep learning for the lower limb detector and got an estimated average of 2.1 • between their ground-truth and predicted joint angles [76]. Pei et al. used deep learning to detect hip-knee-ankle angles in X-rays images, comparing the other deep learning model with a calculated angle ratio that had a deviation from the ground-truth estimate of 1.5 • [77]. Our results of different FPAs between YOLO prediction angles and ground-truth angles ranged from 1.3 • to 1.9 • , similar to the results for lower limb areas in other studies. Therefore, the YOLO models is suitable for detecting the FPA from plantar pressure images based on object detection.

YOLOv4 Showed Superior Results
In our results, YOLOv4 showed excellent performance in detecting the FPA based on plantar pressure images with a single-frame task. The reason would be that YOLOv4 had the backbone network modifications, especially in single-frame tasks, and optimized accuracy for object detection based on images [78]. Whereas YOLOv5 is advantageous in the detection based on video with a multi-frame task [64]. For example, Zheng et al. detected concealed cracks using YOLOv3, v4, and v5x with YOLOv4, proving superior prediction based on single-frame tasks [79]. Furthermore, Andhy et al. applied YOLOv4 to detect waste images based on images and precision results with the actual data [62]. Therefore, YOLOv4 s good performance in the FPA of plantar pressure may be due to the single-frame task.
The results of profile-box training showed that YOLOv4 gets 99.78% (right foot) to 100.00% (left foot) AP due to the characteristic of plantar images with one class and one object in an image. By utilizing boundaries from plantar images, the labeling makes it easier for YOLO to detect foot profiles [80]. Our result was similar to the study by Gao et al. facilitating a robotic arm grasping system in nonlinear and non-Gaussian environment detection using labeling objects on the boundary, with a YOLOv4 range of 96.70% to 99.50% AP. Therefore, YOLOv4 was chosen rather than YOLOv3 and YOLOv5 [81].
In addition, the mAP of angle-box was 97.98% in YOLOv4 was lower than the profilebox mAP of 99.89% (left foot 100.00% and right 99.78% AP). The detection of the angle-box may have limitations on prediction due to the position of the angle-box inside the pressure images with similar background color and density from the pressure. Similar background color and density were the problems of detecting a cluster of flowers and detecting eyes, nose, and mouth in the face. Wu et al. detected apple flowers in natural environments. They got the result of 97.31% mAP on YOLOv4, which had a bounding-box in the flowers with a similar background color and density of flower clusters [11]. Dagher et al. predicted that face recognition to detect the eyes, nose, and mouth was more complex than predicting the whole face [82]. It is concluded that YOLO might be good at profile detection.

Foot Profiles Prediction and Foot Axis Points Distance
Specific markers could predict the FPA front and rear point in two small boundingboxes. However, the two small bounding-boxes in the front and rear were very similar. Therefore, YOLO was not the best performance for similar objects in one image [83]. The low performance is caused by the fact that just two small boxes in the grid are anticipated and only belong to a new class of objects within the same category, resulting in an abnormal aspect ratio and other factors such as low generalization capacity [84]. Due to these reasons, we used one bounding-box, including front and rear points, to get the FPA. Similar background color and density were problems in the angle-box and may have affected FPA accuracy in object detection. FPA accuracy is based on the two points of the diagonal FPA (front and rear foot axis point). Therefore, the distance between the predicted and ground-truth points is necessary to investigate. The FPA, especially in the front foot axis point between the three YOLO models prediction and the ground-truth (9.23 to 18.25 pixel), was longer than the rear foot axis point (7.34 to 12.80 pixel). Furthermore, the front foot axis point as a density area also has a similar background of pressure to the metatarsal-phalangeal joints bone and near the other bone, affecting the detection of the used plantar pressure images [85,86]. The density and similar background can lead to low performance in predicting the bounding boxes [87]. However, the rear foot axis point is clearer than the front foot axis point. The rear foot axis point has pressure from calcaneus bone, allowing the YOLO model optimum detection with a non-maximum suppression feature [88,89]. In addition, the rear of the foot axis point is around the boundary of the plantar pressure distribution area with minimum density [90]. The results represent that the front foot axis point due to increasing density from metatarsal-phalangeal joints bone and near the other bone may affect detecting the FPA.

Limitation in Diagonal FPA Acquisitions
The main limitation of our study was the analysis dataset without in-toeing data. As we know, in-toeing is a symptom of illness in the FPA and needs further intervention. Even though we did not have the plantar image with the FPA of in-toeing in this study, our standard methods can be used to measure out-toeing. However, in-toeing measurement has required the addition of the "regular-FPA-procedure" in "labeling the foot profile in left and right categories" and "point benchmark acquisition." "Labeling the foot profile in left and right categories" needs to be modified into four classifications: "labeling the foot profile in left-in-toeing, left-out-toeing, right-in-toeing, and right-out-toeing categories." To determine foot profiles associated with in-toeing conditions by labeling plantar pictures, we used YOLO to do the first classification to get the left and right foot profiles of in-toeing conditions such as left-in-toeing and right-in-toeing. The in-toeing foot profiles position may have the other condition to measure the FPA than the out-toeing condition. In out-toeing, the diagonal FPA acquisition is the same as the "regular-FPA-detection-procedure." In contrast, in-toeing diagonal FPA acquisition is the patient's normal foot profile ( Figure 10). Therefore, it is necessary to classify the foot position before detecting the foot axis points. In addition, using more data validation sets over 3500 images may increase YOLO performance [93]. However, the current study using a small-scale validation set under 350 images showed good performance [42]. Therefore, this study used a small-scale validation set using plantar pressure images and achieved a suitable YOLO performance.

Conclusions
This study proposed three YOLO models for a suitable model for detecting the FPA. Furthermore, "point benchmark acquisition" was based on the angle-box. YOLO can detect the 4-corner coordinates of the angle-box prediction through the converting stage and then acquire the 2-point benchmark referred to as the foot position of the in-toeing foot direction ( Figure 10) [91]. The left and right foot profiles of in-toeing determine the front and rear axis points used to get the diagonal FPA used to measure the angle of the FPA [92].
In addition, using more data validation sets over 3500 images may increase YOLO performance [93]. However, the current study using a small-scale validation set under 350 images showed good performance [42]. Therefore, this study used a small-scale validation set using plantar pressure images and achieved a suitable YOLO performance.

Conclusions
This study proposed three YOLO models for a suitable model for detecting the FPA. YOLOv4 showed superior results in detecting the left and right foot profiles. Deep learning with YOLOv4 has the advantage of improving predictions of the FPA without significant differences from the ground truth. Besides, YOLOv4 has a reliable detection accuracy of FPA from plantar pressure images. The effects of the accuracy of the FPA may be from the front of the FPA point. The precision detection of the FPA can provide information with in-toeing, out-toeing, and rearfoot kinematics, to evaluate the effect of physical therapy programs on knee pain and knee osteoarthritis.