Deep Learning in Left and Right Footprint Image Detection Based on Plantar Pressure

Featured Application: In this study, the left and right footprint images were predicted based on deep learning object detection models. YOLOv4 models are the most balanced deep learning models in detecting left and right feet from footprint images. In different object detection models, the right foot showed higher accuracy than the left foot. Abstract: People with cerebral palsy (CP) suffer primarily from lower-limb impairments. These impairments contribute to the abnormal performance of functional activities and ambulation. Footprints, such as plantar pressure images, are usually used to assess functional performance in people with spastic CP. Detecting left and right feet based on footprints in people with CP is a challenge due to abnormal foot progression angle and abnormal footprint patterns. Identifying left and right foot proﬁles in people with CP is essential to provide information on the foot orthosis, walking problems, index gait patterns, and determination of the dominant limb. Deep learning with object detection can localize and classify the object more precisely on the abnormal foot progression angle and complex footprints associated with spastic CP. This study proposes a new object detection model to auto-determine left and right footprints. The footprint images successfully represented the left and right feet with high accuracy in object detection. YOLOv4 more successfully detected the left and right feet using footprint images compared to other object detection models. YOLOv4 reached over 99.00% in various metric performances. Furthermore, detection of the right foot (majority of people’s dominant leg) was more accurate than that of the left foot (majority of people’s non-dominant leg) in different object detection models.


Introduction
Cerebral palsy (CP) is a leading cause of physical disability in children. Currently, the spastic type of CP is the most common occurrence: 50 to 60% of all cases of CP [1,2]. People with spastic CP suffer primarily from impairments such as increased tone, muscle weakness, diminished selectivity, and joint contractures. These impairments contribute to poor performance of functional activities in people with spastic CP [3]. Spastic CP also affects ambulation ability, movement, and posture, with accompanying activity restrictions that significantly burden people with CP, their families, and society [4]. complex footprints resulting from foot problems such as scissor gait with abnormal foot progression angle and toe walking Deep learning based on object detection can perform many tasks such as localizing, tracking, and image discrimination in defective footprints. Thus, initial research is essential to analyze healthy people's foot profiles in left and right detection, providing a basis for understanding incomplete foot profiles in specific cases of CP. This study is a primary investigation in discovering new information on deep learning performance of defective footprint detection in CP. In addition, this study may have implications for providing new solutions for assessing the efficacy of ankle-foot orthosis in people with spastic CP.

Materials and Methods
In this study, we used the plantar pressure to present the footprint. The data used to develop this study were gathered from the AIdea website by the Industrial Technology Research Institute (ITRI) of Taiwan (https://aidea-web.tw, accessed on 21 February 2021). We randomly split 974 plantar pressure images into 70% (681 images) for the training set and 30% (293 images) for the validation set. Each image is sized 120 × 400 pixels. Furthermore, a senior expert in plantar pressure images determined the left and right foot, set as ground truth.
Based on our dataset that contained image features, this study used various object detection models: Residual Neural Network with 50 layers (ResNet-50), Dense Convolutional Network (DenseNet), and You Only Look Once (YOLO) v3, v4, v5s, v5m, v5l, and v5x to compare the network performances. As the most popular model used for plantar pressure images, the ResNet-50 used the deeper networks that support the feature extraction and achieved high accuracy in recognizing the plantar pressure images in different conditions [24] ( Figure 1A). DenseNet is a deep neural network model which transmits the input data features from the convolution layer to the following layers that are popularly used in plantar pressure images [25] ( Figure 1B). To classify images, Dewi et al. [26] introduced that multiscale feature maps can be effectively combined for YOLOv2 detection and classification of DenseNet and ResNet-50 to prevent performance loss. In this study, we used DarkNet-19 as the backbone to support DenseNet and ResNet-50 as a classifier with a YOLOv2 detector. YOLO is a state-of-the-art deep learning framework for real-time object recognition with single-stage detection. YOLO supports real-time object detection and precision in predicting the bounding box based on footprint images [18]. YOLO has several versions, including YOLOv1, v2, v3, v4, and v5 series. However, the current, more popular object detection algorithms are YOLOv3, YOLOv4, and YOLOv5 (YOLOv5l, YOLOv5m, YOLOv5s, YOLO5x) models [21].
YOLOv3 has advantages in its three detectors that affect the prediction more precisely [27] (Figure 2A). YOLOv4 with its spatial pyramid pooling and path aggregation network may improve feature extraction to localize and classify the object [28] ( Figure 2B). The latest YOLO series is the YOLOv5 algorithm, divided into four models named YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x, according to the depth and width of the network [21]. Unlike other versions, YOLOv5 uses an adaptive anchor strategy; the backbone part uses a focus structure and cross-stage partial connections (CSP) structure. In addition, the YOLOv5 series has an advantage in terms of run speed that may affect real-time detection in clinical workload [29] ( Figure 2C). Furthermore, this study used manual labeling of the left and right feet regarding the foot pressure profiles. After the manual labeling, we input the data into ResNet-50, DenseNet, YOLOv3, YOLOv4, and YOLOv5 series models to obtain a suitable model for detecting foot profiles ( Figure 3). The detection was built in Windows 10 using Python 3.7.6 (YOLOv3, YOLOv4, DenseNet, and ResNet-50) and PyTorch 1.7.0. (YOLOv5 series). The model's parameters were as follows: The batch size was 8; the momentum and weight decay were 0.937 and 0.0005, respectively; the initial learning rate was 0.01, and the epoch was 100. All experiments were carried out using a computer configured with the following hardware: Core I7 10,700 CPU, 32 GB RAM, and an NVIDIA RTX 3080 GPU. Furthermore, this study used manual labeling of the left and right feet regarding the foot pressure profiles. After the manual labeling, we input the data into ResNet-50, Dense-Net, YOLOv3, YOLOv4, and YOLOv5 series models to obtain a suitable model for detecting foot profiles ( Figure 3). The detection was built in Windows 10 using Python 3.7.6 (YOLOv3, YOLOv4, DenseNet, and ResNet-50) and PyTorch 1.7.0. (YOLOv5 series). The model's parameters were as follows: The batch size was 8; the momentum and weight decay were 0.937 and 0.0005, respectively; the initial learning rate was 0.01, and the epoch was 100. All experiments were carried out using a computer configured with the following hardware: Core I7 10,700 CPU, 32 GB RAM, and an NVIDIA RTX 3080 GPU.

Labeling Strategy
For determining the left and right feet, a senior expert in plantar pressure images with more than 15 years of experience in footprint imaging supervised the process (the 487 left foot images 487 and right foot images). The abnormal foot progression angle in the plantar pressure image may complicate the recognition of the left and right feet. Furthermore, the defective footprint features may introduce complications due to an incomplete footprint pattern, particularly in the plantar region, that challenge the prediction of the left and right feet [30]. Nevertheless, object detection achieved good results in medical images [22]. For the plantar pressure, this study used bounding box annotation to determine the left and right feet through different object detection models to achieve better accuracy [18,24]. The bounding box is used to localize and classify the left and right feet based on manual labeling to obtain a prediction.
The labeling process depends on the footprint features; labeling in the maximum area of plantar pressure images is recommended to make predictions more precise [29] ( Figure  4A). However, abnormal foot progression angle and defective footprints proved challenging. Part of the plantar pressure images was an abnormal foot progression angle wherein the direction of foot placement was more or less than 15° [18]. On the other hand, defective footprint patterns cannot provide the full pressure distribution of the footprint pattern, and sometimes the plantar pressure images cannot provide full footprint patterns during

Labeling Strategy
For determining the left and right feet, a senior expert in plantar pressure images with more than 15 years of experience in footprint imaging supervised the process (the 487 left foot images 487 and right foot images). The abnormal foot progression angle in the plantar pressure image may complicate the recognition of the left and right feet. Furthermore, the defective footprint features may introduce complications due to an incomplete footprint pattern, particularly in the plantar region, that challenge the prediction of the left and right feet [30]. Nevertheless, object detection achieved good results in medical images [22]. For the plantar pressure, this study used bounding box annotation to determine the left and right feet through different object detection models to achieve better accuracy [18,24]. The bounding box is used to localize and classify the left and right feet based on manual labeling to obtain a prediction.
The labeling process depends on the footprint features; labeling in the maximum area of plantar pressure images is recommended to make predictions more precise [29] ( Figure 4A). However, abnormal foot progression angle and defective footprints proved challenging. Part of the plantar pressure images was an abnormal foot progression angle wherein the direction of foot placement was more or less than 15 • [18]. On the other hand, defective footprint patterns cannot provide the full pressure distribution of the footprint pattern, and sometimes the plantar pressure images cannot provide full footprint patterns during locomotion [31]. For example, plantar pressure images with pressure in the rearfoot and forefoot in the metatarsal have limitations such as incomplete toe or forefoot area ( Figure 4B) and abnormal foot progression angle with high pressure on the forefoot and midfoot and less pressure on the rearfoot ( Figure 4C). Defective footprint images with high pressure on heels lost pressure in the midfoot and on the rear and forefoot ( Figure 4D). locomotion [31]. For example, plantar pressure images with pressure in the rearfoot and forefoot in the metatarsal have limitations such as incomplete toe or forefoot area ( Figure  4B) and abnormal foot progression angle with high pressure on the forefoot and midfoot and less pressure on the rearfoot ( Figure 4C). Defective footprint images with high pressure on heels lost pressure in the midfoot and on the rear and forefoot ( Figure 4D).

Deep Learning Performance
Average precision can be used as a comprehensive evaluation index to balance the effects of precision and recall and evaluate a model more thoroughly. The precision and recall curve area is the average precision value, and a larger value indicates better model performance [20].
This paper shows the model's performance by average precision, recall, and precision. This experiment set a confidence threshold of 0.5 [18]. The algorithm performance evaluation metrics chosen are precision (P), recall (R), average precision (AP), mean average precision (mAP), and F1-score. Moreover, mAP is an essential index for evaluating the model's performance, which can reflect the overall performance of the network model and avoid the problem of extreme performance of some categories and weakening of the performance of other types in the evaluation process. Furthermore, we use a simple average F1-score with the average precision value to show the methods' performance on the complete dataset for the convenience and intuitiveness of the calculation. Finally, after

Deep Learning Performance
Average precision can be used as a comprehensive evaluation index to balance the effects of precision and recall and evaluate a model more thoroughly. The precision and recall curve area is the average precision value, and a larger value indicates better model performance [20].
This paper shows the model's performance by average precision, recall, and precision. This experiment set a confidence threshold of 0.5 [18]. The algorithm performance evaluation metrics chosen are precision (P), recall (R), average precision (AP), mean average precision (mAP), and F1-score. Moreover, mAP is an essential index for evaluating the model's performance, which can reflect the overall performance of the network model and avoid the problem of extreme performance of some categories and weakening of the performance of other types in the evaluation process. Furthermore, we use a simple average F1-score with the average precision value to show the methods' performance on the complete dataset for the convenience and intuitiveness of the calculation. Finally, after obtaining all the metrics' values of left and right foot detection, we compared the P, R, AP, mAP, and F1-score to achieve a stable model based on our manual labeling.

Experimental Results
In this study, we randomly split 974 plantar pressure images into 70% for the training and 30% for the validation set. According to Table 1 and Figures 5 and 6, our proposed method can detect the foot profiles and classify them with over 60% accuracy without overfitting. YOLOv4 has the best comprehensive performance evaluation out of all the models. YOLOv4 was the model that exhibited persistent accuracy gains across various performance evaluations, including mAP (99.45%), P (99.00%), R (100.00%), and F1-score (99.00%). Meanwhile, ResNet-50 showed the lowest performance in various evaluation metrics, including mAP (66.00%), P (66.00%), R (50.00%), and F1-score (57.00%). Furthermore, ResNet-50 achieved the lowest mAP in our study. Moreover, the YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x obtained mAP values of 91-93%, which were lower than those of YOLOv3 and YOLOv4 (99%). obtaining all the metrics' values of left and right foot detection, we compared the P, R, AP, mAP, and F1-score to achieve a stable model based on our manual labeling.

Experimental Results
In this study, we randomly split 974 plantar pressure images into 70% for the training and 30% for the validation set. According to Table 1 and Figures 5 and 6, our proposed method can detect the foot profiles and classify them with over 60% accuracy without overfitting. YOLOv4 has the best comprehensive performance evaluation out of all the models. YOLOv4 was the model that exhibited persistent accuracy gains across various performance evaluations, including mAP (99.45%), P (99.00%), R (100.00%), and F1-score (99.00%). Meanwhile, ResNet-50 showed the lowest performance in various evaluation metrics, including mAP (66.00%), P (66.00%), R (50.00%), and F1-score (57.00%). Furthermore, ResNet-50 achieved the lowest mAP in our study. Moreover, the YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x obtained mAP values of 91-93%, which were lower than those of YOLOv3 and YOLOv4 (99%).  The average AP on different object detection models was 89.01% and 91.86% on the left foot and right foot, respectively. The average AP on YOLO models was 93.35% on the left foot and 95.93% on the right foot. The right foot showed higher accuracy than the left foot in average AP in all YOLO series, as well as DenseNet and ResNet-50 models. Furthermore, in the YOLOv3 model, the highest AP was 99.80% and 99.81% on the left foot

Testing Samples
We tested the images from the validation set as a prediction sample to evaluate the performance of the YOLOv4 network on several images. However, the unusual distribution of plantar pressure images decreased the accuracy. There were four frequency exam- The average AP on different object detection models was 89.01% and 91.86% on the left foot and right foot, respectively. The average AP on YOLO models was 93.35% on the left foot and 95.93% on the right foot. The right foot showed higher accuracy than the left foot in average AP in all YOLO series, as well as DenseNet and ResNet-50 models. Furthermore, in the YOLOv3 model, the highest AP was 99.80% and 99.81% on the left foot and right foot, respectively. However, the ResNet-50 model's lowest AP was 61.88% and 70.13% on the left foot and right foot, respectively.

Testing Samples
We tested the images from the validation set as a prediction sample to evaluate the performance of the YOLOv4 network on several images. However, the unusual distribution of plantar pressure images decreased the accuracy. There were four frequency examples: the defective plantar pressure images on the midfoot ( Figure 7A); the complex plantar pressure on the forefoot and midfoot ( Figure 7B); the complex plantar pressure with less pressure at the full foot ( Figure 7C); and the plantar pressure image on the forefoot with abnormal foot progression angle ( Figure 7D).

Discussion
The deep learning performance evaluation on different object detection models showed our excellent results using plantar pressure images to identify the left and right footprints. YOLOv4 performed well in various metrics such as mAP, P, R, and F1-score (Table 1 and Figure 6). Furthermore, the right foot detection achieved a higher average AP than the left foot detection ( Figure 5).
The YOLOv4 model showed good performance in detecting the left and right feet. Referring to Table 1. the results of YOLOv4 were balanced over 99.00% in various performance evaluations (mAP, P, R, and F1-score). YOLOv4 showed an R value of 100.00%, indicating that the detection was similar to the ground truth. Our results were identical to

Discussion
The deep learning performance evaluation on different object detection models showed our excellent results using plantar pressure images to identify the left and right footprints. YOLOv4 performed well in various metrics such as mAP, P, R, and F1-score (Table 1 and Figure 6). Furthermore, the right foot detection achieved a higher average AP than the left foot detection ( Figure 5).
The YOLOv4 model showed good performance in detecting the left and right feet. Referring to Table 1. the results of YOLOv4 were balanced over 99.00% in various performance evaluations (mAP, P, R, and F1-score). YOLOv4 showed an R value of 100.00%, indicating that the detection was similar to the ground truth. Our results were identical to Jiang et al. [32]. In their study, YOLOv4 was sorted out, tried all possible optimizations in the entire process, and found the best effect in each permutation and combination, affecting overall performance. As far as we know, YOLOv4 models employ a CSP DarkNet-53 classifier and spatial pyramid pooling (SPP) that connects the YOLOv3 head [33]. YOLOv4 has the advantage of high detection accuracy and precise positioning of the bounding box [18].
The YOLOv4 results indicate that the model may have advantages in multi-color channel imaging detection [34]. Compared with YOLOv4, YOLOv3 was more accurate in grayscale medical images [22]. Furthermore, YOLOv5 is beneficial in sequential imaging such as visual audio or video [35]. Our study aimed to identify the left and right feet; YOLOv4 performed more significantly than YOLOv3 and YOLOv5.
Conversely, YOLOv4 has advantages in localizing and classifying multi-object and multi-color channel image features. In our study, YOLOv4, in plantar pressure image detection, showed high performance; this may be the result of plantar pressure images with multi-color features. Our results are similar to Dewi et al. [36]. They investigated state-of-the-art object detection systems for detecting traffic sign images using YOLOv3, YOLOv4, ResNet-50, and DenseNet. They found that the YOLOv4 model exhibited higher accuracy due to multi-color imaging. The advantages of YOLOv4's performance were beneficial for predicting the left and right feet in plantar pressure images.
On the other hand, DenseNet and ResNet-50 have limitations in object detection. DenseNet has more powerful uses in medical image segmentation [37]. However, plantar pressure is not under medical images, and the DenseNet used for object detection produces a simple transition layer structure of dimensionality reduction. Therefore, it is difficult for a single sensory field to capture multilevel features of the dense block layers [38]. Meanwhile, ResNet-50 may also be influential in medical image categorization using residual networks in the same-sized object on an image [39]. In object detection with plantar pressure, images have more unusual object sizes, and loss of the image features may challenge ResNet-50 networks. Therefore, DenseNet and ResNet-50 may not be suitable models for plantar pressure image detection.
The abnormal foot progression angle and complex footprint images complicated the identification of left and right feet via plantar pressure images [18]. Furthermore, plantar pressure images are needed for bounding box annotation to specify the location of the object, which can help recognize the foot features [40].
In this study, detection of the left and the right feet was conducted by deep learning. According to Chen et al., deep learning was used to determine the flatfoot in left and right feet using 413 right foot samples and 422 left foot samples; this study achieved over 80.00% for the classification accuracy in left and right feet ( Table 2). Chen et al. and Dose et al. showed that left foot detection is higher than that of right feet [41,42]. Meanwhile, Nadeem et al. and our results showed accuracy of the left foot was lower than for right feet [43].  [44]. In our results, the left and right feet have different results of AP due to the side difference between the dominant and non-dominant legs. The dominant leg may induce higher accuracy in the right foot. The dominant leg contributes more significantly to producing the plantar pressure distribution image pattern toward forwarding propulsion, while the non-dominant leg contributes toward support [45]. It is concluded that the dominant leg bearing the main body weight may affect the pressure and lead to loss of the foot features in plantar pressure images on the non-dominant leg (left foot). This may be affected by the short walking time that requires a more dominant leg (right foot). It produces more pressure on the footprint images, for determining the left and right foot results is more accurate than long-time walking [46]. Loss of foot features in plantar pressure images may affect the left foot YOLO series detection.
We compared our results with other studies that used deep learning to detect the left and right feet. Our study used plantar pressure images to identify the left and right feet. Furthermore, object detection with bounding box annotation was used to localize and classify the left and right feet. On the other hand, several models are used to identify left and right feet in deep learning. Nadeem (Table 2). However, the above three studies did not use the object detection method and showed around a 0%-16% range from object detection. Therefore, the main reason for our research was to select the object detection method that can localize and classify the image features of plantar pressure images. As a result, it can achieve a higher accuracy (over 93%) with small datasets. Furthermore, object detection may be suitable for determining where objects are located in a given image and which category each object belongs to [47].
We used the smallest dataset with the multi-color channel in this study. The data acquisition tools may affect the accuracy of left and right foot detection. According to the studies' comparison in Table 2 our study showed high accuracy (93.35%) in left and (95.93%) right foot detection based on the data acquisition tools in plantar pressure images. Furthermore, based on the video-recorded dataset, the ANN models obtained over 90.00% accuracy with 2391 sequential images. In addition, the smallest dataset (835 images) of left and right feet based on smart insoles was predicted using TGINN and achieved over 80.00% accuracy.
Conversely, the left and right foot classification conducted from the electroencephalogram signal achieved around 78.00% precision. However, the left and right foot detection results show differentiation values, possibly due to the side difference between the dominant and non-dominant foot. In addition, the data acquisition tools in plantar pressure under a few images in the dataset may have high accuracy due to the multi-color channel in image features [34].
However, there are several limitations to the current work which could be used to develop future improvement directions. The first limitation is the different sizes of footprint patterns in plantar pressure images [48]. Serightcond, the diversity of data acquisition conditions, especially in complex target age testing in spastic CP, still needs to be studied [49]. Finally, using meta-learning or few-shot classification [50] and applying fusion of multi-modal physiological data in prediction may solve the different footprint sizes and complex target age limitations in future works [51].

Conclusions
This study aimed to determine the left and right feet in people with spastic CP using different object detection models to obtain a suitable detector based on the footprint image dataset. Our results showed that the different object detection models achieved different performances; YOLOv4 outperformed in detecting left and right feet through plantar pressure images. Compared with other models, YOLOv4 reached over 99.00% in various performance evaluations. In addition, the right foot obtained a higher accuracy prediction than the left foot. Therefore, the auto-detection of left and right feet may have implications in discovering new information about footprint images with defective features in people with CP and provide new solutions through beneficial information to manage the treatment strategy of orthosis in people with spastic CP.

Data Availability Statement:
The data used to support the findings of this study are available from the corresponding author upon request.