The ‘DEEP’ Landing Error Scoring System

Featured Application: The Landing Error Scoring System, an injury-risk screening tool used in sports to detect high risk of anterior cruciate ligament injury, can be automated using deep-learning-based computer vision on 2D videos combined with machine learning methods. The successful application of this method paves the way for the automatic detection of individuals at high risk of injury using smartphone-based applications and opens doors to addressing other related injury prevention problems. Abstract: The Landing Error Scoring System (LESS) is an injury-risk screening tool used in sports; but scoring is time consuming, clinician-dependent, and generally inaccessible outside of elite sports. Our aim is to evidence that LESS scores can be automated using deep-learning-based computer vision combined with machine learning and compare the accuracy of LESS predictions using di ﬀ erent video cropping and machine learning methods. Two-dimensional videos from 320 double-leg drop-jump landings with known LESS scores were analysed in OpenPose. Videos were cropped to key frames manually (clinician) and automatically (computer vision), and 42 kinematic features were extracted. A series of 10 × 10-fold cross-validation experiments were applied on full and balanced datasets to predict LESS scores. Random forest for regression outperformed linear and dummy regression models, yielding the lowest mean absolute error (1.23) and highest correlation ( r = 0.63) between manual and automated scores. Sensitivity (0.82) and speciﬁcity (0.77) were reasonable for risk categorization (high-risk LESS ≥ 5 errors). Experiments using either a balanced (versus unbalanced) dataset or manual (versus automated) cropping method did not improve predictions. Further research on the automation would enhance the strength of the agreement between clinical and automated scores beyond its current levels, enabling quasi real-time scoring.


Introduction
Lower-extremity injuries due to physical activities have devastating short-term and long-term consequences to the health and wellbeing of individuals [1,2] and burden societies worldwide [3,4].Non-contact injuries account for approximately 20% of injuries in game situations and 37% of injuries in training situations [5].Non-contact injuries in sport and recreation are the ones of most practical interest to coaches and clinicians as preventable through neuromuscular training programs [6].
The mechanism of non-contact lower-extremity injuries and their underlying risk factors have been linked with 'risky' movement patterns [7,8], such as knee valgus and stiff landings.3D motion analysis systems, which provide gold-standard measures for the objective quantification of human motion noninvasively, can readily identify altered movement patterns and biomechanical control.However, conventional 3D motion analysis using infrared systems requires a considerable financial outlay and an expert-user, in addition to time and space to perform the analysis.These constraints limit its practical application and use for large-scale screening of injury risk factors in physically active individuals.
As a countermeasure and to reduce technological requirements, various clinician-led movement screens have been developed [9].Even though these clinician-led screens reduce the financial costs and space requirements compared to 3D motion analysis, they nonetheless require expert clinicians and dedicated time for testing and scoring, limiting their widespread use.For instance, the Functional Movement Screen TM takes 12 to 15 min and the Tuck jump assessment takes 12 min to administer and score for one individual [9].
The Landing Error Scoring System (LESS) is one movement screen with demonstrated reliability [10,11] and validity [11,12].Clinicians evaluate 2D video recordings from three double-leg drop-jump landing tasks per individual to detect 'movement errors' linked to non-contact anterior cruciate ligament (ACL) and other lower-extremity injury mechanisms [10].The LESS consists of 17 items (Table 1), with the total number or possible errors ranging from 0 (best) to 17 (worst).Greater scores hence indicate more movement errors, poorer landing biomechanics, and greater relative risk of sustaining non-contact lower-extremity injuries.In a prospective study, Padua et al. [12] determined that scoring 5 or more errors on the LESS was associated with a 10.7 times greater relative risk of sustaining a non-contact ACL injury in youth soccer players (sensitivity 0.86, specificity 0.64).The total testing time (including set up) takes ~5 min with 3 to 4 min for a trained rater to score the three drop-jump landing trials of one individual once downloaded to a computer [10].
A few of the drawbacks of the LESS is the subjective nature of the assessment, requirement for an expert-rater, and need to view videos at a later stage [13,14].In recent years, researchers have striven to automate the LESS to streamline the process using depth sensor cameras [13,15].Dar, Yehiel, and Cale' Benzoor [13] introduced the PhysiMax system (PhysiMax Technologies Ltd., Tel Aviv, Israel) to automate LESS scoring using a personal computer, 3D Microsoft Kinect, and motion analysis software that requires limited clinical input.Their results indicated high consensus between clinician and PhysiMax LESS scores (intra-class correlation, ICC = 0.80, mean absolute difference 1.13 errors), although the clinician manually inputted the overall impression item (no.17, Table 1).Despite the automated quantification of the LESS using markerless motion capture using depth cameras provides time-cost saving benefits, there are still additional hardware-software expenditures to consider.
Deep-learning-based computer vision technologies enable the automatic identification and quantification of human motion without the need for depth sensor cameras.Numerous such systems are currently being developed.For example, OpenPose [16] is a system enabling real-time multi-person pose estimation in video streams captured by a camera.The system tracks both body pose as well as keypoints associated with joints and anatomical features.The same technology is also being deployed for solving other related problems, such as tracking lab animal motion in laboratory settings [17,18].In this work, we aim to apply deep-learning techniques to LESS score estimation.Applying these approaches to 2D video recordings would improve the accessibility to end-users and pave the way to smartphone-based applications for injury risk screening.Our aim is to evidence that LESS scores can be automated from 2D videos using deep-learning-based computer vision with machine learning and compare the accuracy of LESS predictions using different video cropping and machine learning methods.Our work substantiates that: LESS automation is possible without the need for 3D motion analysis or depth sensor cameras, random forest leads to more accurate predictions than linear or dummy (ZeroR) regression models, and that cropping method (manual versus automated) does not affect predictions.

Participants
A sample of 144 individuals (45 males and 99 females) volunteered to participate in this study.Age, height, mass, and body mass index (mean ± standard deviation) for males were 21.0 ± 5.9 years (range 17 to 42 years), 179.1 ± 7.2 cm, and 82.2 ± 13.6 kg; and for females were 17.1 ± 3.7 years (range 12 to 31 years), 169.2 ± 6.1 cm, and 64.8 ± 9.6 kg.All participants were involved in physical activity (34% participated in netball, 19% in rugby, 9% in field hockey, 9% in soccer, and 29% in other sports).On average, participants were involved in physical activity four times per week, 6 h a week.Participants had to be free from injury, pain, or any other issue that would limit physical activity participation.Previous injuries were not an exclusion criterion.Participants were recruited via word-of-mouth, research contacts, social media, and emails sent to local sports clubs.The study protocol was approved by our institution's health research ethics committee [HREC(Health)#41] and adhered to the Declaration of Helsinki.All participants and their legal guardian when younger than 16 years of age signed a written informed consent document that explained the potential risks associated with testing prior to participation.

Data Collection
We used the original LESS protocol for testing [10].Participants jumped horizontally from a 30 cm high box to a line placed at 50% of their body height, and immediately jumped upward for maximal vertical height.We placed an emphasis on jumping off the box with both feet, landing in front of the designated line, jumping as high as possible straight up in the air once they landed from the box, and completing the task in a fluid motion.We did not provide any feedback on participants landing technique unless they were performing the task incorrectly.Participants used their own footwear for testing.
After task instructions and practice jumps for familiarization (typically 1), each participant performed three successful trials of the double-leg drop-jump landing task in front of two standard video cameras capturing at 120 Hz (Sony RX10 II, Sony Corporation, Tokyo, Japan) with an actual focal length of 8.8 to 73.3 mm (35 mm equivalent focal length of 24-200 mm).We mounted the cameras on tripods placed 3.5 m in front of and to the right side of the landing area with a lens-to-floor distance of 1.3 m.We allowed participants to rest until they felt ready to perform the task again to limit fatigue between the three trials.Total testing time was typically 2 min per participant.

Clinical LESS
A qualified physiotherapist who completed over 400 LESS evaluations (IH) replayed the videos using the Kinovea software (version 0.8.15, www.kinovea.org),identified the two key frames of initial ground contact (IC) and maximal knee flexion (KF max ), and scored all trials using the 17-item LESS scoring sheet (Table 1).The clinician was blinded to the results from the automated computer-vision scoring.A total of 320 double-leg drop-jump landings from the potential 432 trials (3 jumps × 144 participants) were retained for analysis because of certain participants not completing three trials, one or both video files being not usable, or a clear misidentification of time events from the automatic cropper described in the following subsection (i.e., more than 100 ms difference with the clinician).

Automated LESS
The LESS score prediction algorithm we developed was a multistage process.Generally, the first stage consisted of processing the videos to detect the IC and KF max key frames, which involved running the frontal and lateral videos for each jump through OpenPose v.1.21[16], and then using a heuristic method to identify the key frames.Once that stage was complete, we extracted measurements from the key frames to use as features for machine learning.The final stage was the score prediction for the drop-jump landing trial from the features using a machine learning algorithm.The entire process is depicted in Figure 1.We further evaluated the predictive accuracy of the final machine learning stage using cross validation.
equivalent focal length of 24-200 mm).We mounted the cameras on tripods placed 3.5 m in front of and to the right side of the landing area with a lens-to-floor distance of 1.3 m.We allowed participants to rest until they felt ready to perform the task again to limit fatigue between the three trials.Total testing time was typically 2 minutes per participant.

Clinical LESS
A qualified physiotherapist who completed over 400 LESS evaluations (IH) replayed the videos using the Kinovea software (version 0.8.15, www.kinovea.org),identified the two key frames of initial ground contact (IC) and maximal knee flexion (KFmax), and scored all trials using the 17-item LESS scoring sheet (Table 1).The clinician was blinded to the results from the automated computer-vision scoring.A total of 320 double-leg drop-jump landings from the potential 432 trials (3 jumps × 144 participants) were retained for analysis because of certain participants not completing three trials, one or both video files being not usable, or a clear misidentification of time events from the automatic cropper described in the following subsection (i.e., more than 100 ms difference with the clinician).

Automated LESS
The LESS score prediction algorithm we developed was a multistage process.Generally, the first stage consisted of processing the videos to detect the IC and KFmax key frames, which involved running the frontal and lateral videos for each jump through OpenPose v.1.21[16], and then using a heuristic method to identify the key frames.Once that stage was complete, we extracted measurements from the key frames to use as features for machine learning.The final stage was the score prediction for the drop-jump landing trial from the features using a machine learning algorithm.The entire process is depicted in Figure 1.We further evaluated the predictive accuracy of the final machine learning stage using cross validation.In more detail, the algorithm used to detect key frames in the first stage is described in Table 2.The input to the algorithm are the frontal and lateral videos for a single drop-jump landing trial, and the output are cropped versions of the same In more detail, the algorithm used to detect key frames in the first stage is described in Table 2.The input to the algorithm are the frontal and lateral videos for a single drop-jump landing trial, and the output are cropped versions of the same videos where the first and last frames correspond to the IC and KF max key frames, respectively.Once cropping is complete, two videos in which the first frame corresponds to IC and the last frame corresponds to KFmax pass to the second stage.In the second stage of processing, features are extracted from both videos and merged into a single 'example' that will be used for machine learning.A total of 42 kinematic features from the two key frames in each video were generated.The features are a mixture of angles between specific OpenPose keypoints (shown in Figure 3) and ratio between distances.The specific features are listed in Table 3.A total of six angles were extracted from all four key frames with an additional eight features (mixture of angles, distances, and distance ratios) being extracted from the two frontal key frames only, for a total of 40 measurements.Two further features, being the length in frames of the cropped frontal and lateral videos, were also included.The basic method is to track the location of the ankles (using OpenPose and COCO 18-points model [16]) across the frames to detect the frame in which landing occurs based on the original and rolling window plots (Figure 2), and additionally to track the body and knee keypoints so that the ankle/knee/body angle can be calculated and used to identify the point of maximum knee flexion.Once these two points are identified in both videos, then the frames before and after the key frames are cropped away.This stage generally reduces the length of the original videos from several seconds down to less than 250 ms.
Once cropping is complete, two videos in which the first frame corresponds to IC and the last frame corresponds to KF max pass to the second stage.In the second stage of processing, features are extracted from both videos and merged into a single 'example' that will be used for machine learning.A total of 42 kinematic features from the two key frames in each video were generated.The features are a mixture of angles between specific OpenPose keypoints (shown in Figure 3) and ratio between distances.The specific features are listed in Table 3.A total of six angles were extracted from all four key frames with an additional eight features (mixture of angles, distances, and distance ratios) being extracted from the two frontal key frames only, for a total of 40 measurements.Two further features, being the length in frames of the cropped frontal and lateral videos, were also included.

Step Description
Input: F, Frontal view video; L, Lateral view video 1.
Following feature extraction, we then used a machine learning algorithm to predict the LESS score associated with the drop-jump landing videos.To evaluate the predictive effectiveness of the various machine learning algorithms, we generated features for all 320 drop-jump landings in the dataset using the approach described above.It was also noticed that the distribution of the LESS scores in the dataset was imbalanced, with the majority of LESS scores falling in the range 4-6.Given that unbalanced datasets can potentially affect the accuracy of machine learning techniques, we additionally generated a balanced version of the dataset consisting of 153 drop-jump landing trials with at most 20 trials per LESS score.All evaluations of machine learning techniques were applied to both datasets.
The machine learning techniques chosen to be evaluated were random forest regression, because it is a state-of-the-art machine learning approach and generally performs well 'out of the box' on most problems in practice; and linear regression, which is a widely understood linear modelling technique.Unlike random forest regression, linear regression produces an interpretable model, but it has the disadvantage of being unable to model interactions between features.Given that the full dataset was imbalanced, we also evaluated a dummy regressor (ZeroR) that simply predicts the mean LESS score from the training data.For the original dataset, this method was expected to have reasonably high accuracy, but lower accuracy for the balanced dataset.All machine learning methods implemented were available in WEKA 3.8.0[19], and returned floating point numbers (i.e., decimals) that added granularity to the data.

Statistical Method
As noted in Section 2.3, 320 double-leg drop-jump landings were analysed.A series of 10 × 10-fold cross validation experiments were applied on full (320 videos) and balanced (153 videos, ≤ 20 videos per LESS score) to predict the scores using random forest for regression, linear regression, and dummy regression (ZeroR) models in WEKA [20].To assess the effectiveness of the automated cropping algorithm in the context of the overall system, we additionally ran the entire pipeline with crops generated by the clinician.Mean absolute error and Pearson correlation coefficient (r) were calculated to assess the accuracy of the predictions.Predictions were then converted to a binary category and sensitivity-specificity for categorising individuals at high risk of non-contact ACL injury (LESS ≥ 5 errors [12]) were assessed for each method.The outcomes of the models were compared using paired corrected t-tests in WEKA [20], and the timestamps of the key frames IC and KF max respectively compared between manual (clinician) and automated (OpenPose) cropping methods using unpaired t-tests assuming homoscedasticity.Since the LESS score was treated as a regression problem, actual (clinical LESS) versus predicted (automated LESS) and Bland-Altman [21] plots were used to allow for a visual inspection of the models.Statistical significance was set at p ≤ 0.05.

Results
The mean LESS score from the 320 drop-jump landings was 5.5 ± 1.8 errors (range 0 to 12 errors) as rated by the clinician.The absolute time difference between manually identified IC and KF max was 26.5 ± 17.0 (p = 0.484) and 32.8 ± 18.0 ms (p = 0.445) for the frontal videos, and 53.5 ± 16.2 (p = 0.125) and 20.8 ± 16.3 ms (p = 0.827) for the sagittal videos.
Random forest yielded the lowest mean absolute error (1.23) and greatest correlation (r = 0.63) between actual and predicted scores based on results from the cross validation experiments (Table 4).Sensitivity (0.82) and specificity (0.77) were reasonable for high (LESS ≥ 5 errors) and low (LESS < 5 errors) injury risk categorisation.Experiments using a balanced (versus unbalanced) dataset or manually (versus automated) cropping methods did not improve predictions.An actual versus predicted plot from the random forest regression is depicted in Figure 4, and two Bland-Altman plots on the same dataset in Figure 5.Note that both conventional (mean difference ± 1.96 standard deviation) and regression-based (regressed difference between methods on the mean of the two methods ± 2.46 standard deviation of the residual) Bland-Altman plots were generated given the non-uniform differences in mean [21].Random forest yielded the lowest mean absolute error (1.23) and greatest correlation (r = 0.63) between actual and predicted scores based on results from the cross validation experiments (Table 4).Sensitivity (0.82) and specificity (0.77) were reasonable for high (LESS ≥ 5 errors) and low (LESS < 5 errors) injury risk categorisation.Experiments using a balanced (versus unbalanced) dataset or manually (versus automated) cropping methods did not improve predictions.An actual versus predicted plot from the random forest regression is depicted in Figure 4, and two Bland-Altman plots on the same dataset in Figure 5.Note that both conventional (mean difference ± 1.96 standard deviation) and regression-based (regressed difference between methods on the mean of the two methods ± 2.46 standard deviation of the residual) Bland-Altman plots were generated given the non-uniform differences in mean [21].

Discussion
The use of the LESS to assess injury risk is common in sport science and clinical practice [9,22], but scoring is time consuming, clinician-dependent, and generally inaccessible for large-scale screening outside of elite sports.This study provides evidence that the LESS can be automated using deep-learning-based computer vision combined with machine learning methods without the need for 3D motion analysis or depth sensor cameras.A clear benefit of automating LESS scoring is immediate feedback to end-users.The successful application of this method paves the way for the automatic detection of individuals at high risk of injury using smartphone-based applications of LESS videos (Video S1: https://youtu.be/q1wiGt4K8MU).
The characteristics of an ideal injury risk screening tool are good reliability, validity, and predictive value for injury incidence.In practical or field settings, an ideal screening method is easy to administer without an expert, and has minimal financial, spatial, and temporal requirements.Ideally, the screening tool provides immediate results and is accessible to everyone, from the recreational to elite athlete, as well as novice to expert rater.Overall, the LESS responds to most of these stated requirements.The test demonstrates acceptable reliability and validity [10,11,23], as well as predictive value for non-contact ACL injury using a threshold of 5 errors [12].The inter-rater reliability of the total LESS score is good to excellent, with ICC ranging from 0.83 to 0.92 [10,11,23] and typical errors at 0.71 LESS errors [10].The results from the current study indicate that the typical errors from the automated processing and scoring of the LESS through computer vision when applying the random forest model (Table 2) are less than half an error greater than scores taken from two expert clinicians.In fact, certain individual LESS items yield suboptimal psychometric properties between raters and 3D motion analysis [23].More

Discussion
The use of the LESS to assess injury risk is common in sport science and clinical practice [9,22], but scoring is time consuming, clinician-dependent, and generally inaccessible for large-scale screening outside of elite sports.This study provides evidence that the LESS can be automated using deep-learning-based computer vision combined with machine learning methods without the need for 3D motion analysis or depth sensor cameras.A clear benefit of automating LESS scoring is immediate feedback to end-users.The successful application of this method paves the way for the automatic detection of individuals at high risk of injury using smartphone-based applications of LESS videos (Video S1: https://youtu.be/q1wiGt4K8MU).
The characteristics of an ideal injury risk screening tool are good reliability, validity, and predictive value for injury incidence.In practical or field settings, an ideal screening method is easy to administer without an expert, and has minimal financial, spatial, and temporal requirements.Ideally, the screening tool provides immediate results and is accessible to everyone, from the recreational to elite athlete, as well as novice to expert rater.Overall, the LESS responds to most of these stated requirements.The test demonstrates acceptable reliability and validity [10,11,23], as well as predictive value for non-contact ACL injury using a threshold of 5 errors [12].The inter-rater reliability of the total LESS score is good to excellent, with ICC ranging from 0.83 to 0.92 [10,11,23] and typical errors at 0.71 LESS errors [10].The results from the current study indicate that the typical errors from the automated processing and scoring of the LESS through computer vision when applying the random forest model (Table 2) are less than half an error greater than scores taken from two expert clinicians.In fact, certain individual LESS items yield suboptimal psychometric properties between raters and 3D motion analysis [23].More specifically, no significant agreement between raters was found for knee and trunk flexion at IC, and poor agreement between rater and 3D motion capture analysis was found for knee flexion at IC, lateral trunk flexion at IC, and symmetric foot contact at IC [23].As such, a certain level of disagreement between clinical ratings and computerised ratings is expected.
As seen in Figures 4 and 5, the estimated error is not uniform across the range of LESS scores, but depends on the target.For example, trials with a low actual LESS score tend to have a positive error (the prediction is an overestimation) and trials with a higher actual LESS score tend to have a negative error (the prediction is an underestimation).If these biases stemmed from the over representation of the mid-range LESS values (i.e., majority of LESS scores falling in the range 4-6), the balanced dataset should have provided more accurate predictions, which was not the case.It might be possible to attempt correcting predictions to improve accuracy in future work using probability calibration methods, such as Platt Scaling and Isotonic Regression.The large errors in LESS score predictions were attributed to inaccurate foot and IC key frame detection.The newest body model in OpenPose (Body 25) contains 25 points, including coordinates that define the feet and enable computations of angles at the ankles [24].Improving the LESS score automation relies on either refining body part detection or training a new system specifically to solve this problem.
In previous research, depth sensor technology has been used to automate LESS scoring [13,15].Comparisons between automated and expert clinicians indicate a mean difference of 1.20 errors [15], mean absolute difference of 1.13 errors [13], intra-class correlation of 0.80 [16], and percentage agreement of the individual items ranging from 55-100% [13,15].These research findings are comparable to our lowest mean absolute error (1.23), greatest correlation (r = 0.63), and agreement in risk classification (sensitivity 0.82, specificity 0.77) between actual and predicted scores from the cross validation experiments using random forest regression.In contrast to the PhysiMax system [13,15], our approach did not require the clinician to add the overall impression manually (no.17 in Table 1) given that the LESS items were not scored one-by-one.Although the lack of individual-item scores might be perceived as a limitation of the deep LESS approach; no subjective rating from the clinician or hardware other than a handheld camera or smart portable device are required.Furthermore, only the final LESS score has shown predictive value in terms of injury risk [12]; hence, the individual items are of lesser clinical value.
The better accuracy achieved by random forest can be explained by the fact that the features (angles, distances, and ratios) are likely correlated and related in a non-linear manner.Decision tree ensembles in general are better able to cope with correlated variables and model non-linear patterns [25].Linear regression, on the other hand, achieves optimal results when the predictor variables are independent and do not interact.We also foresee a possibility of processing the raw video images themselves and attempting direct deep learning-based classification with minimal pre-processing.Such an approach would obviate the need to use OpenPose or a similar pose-tracking tool.However, taking such an approach would be challenging because of the lack of training data relative to size of datasets usually used to train deep image recognisers.Another significant disadvantage of the proposed approach is that deep learning needs GPU-based acceleration hardware, and is therefore currently unable to process videos independently on consumer smartphones.That said; the rapidly increasing computational power of consumer smartphones and the current trend in research of compressing deep models [26] so that they run efficiently on mobile devices should solve this problem in the next few years.
One of the main concerns in clinical screening tools are their subjective nature and reliance on visual observations to estimate angles, which are challenging to quantify accurately [27,28].During the LESS, a small kinematic difference (e.g., knee angle 29 • , 1-error present; knee angle 30 • , 0-error absent) can result in poor agreement between raters and between clinical LESS scores and motion capture scores.Recent technological advances have allowed the more objective quantification of human motion using wearable technology [29,30].Inertial measurement units are able to measure linear and angular motion of individual body segments and centre of mass, and are proposed as more accurate means of identifying risky movement patterns than through visual observations [31].Although inertial measurement units are relatively inexpensive; they are not commonly used in clinical environments and an expert is still needed to process and interpret data signals.The automated scoring process here developed using standard video recordings offers an alternative solution that can possibly improve consistency of LESS ratings, removing the subjective interpretation of the task.Moving forward, the reliability of deep LESS scores, validity of OpenPose derived data during the dynamic double-leg drop-landing task, and predictive ability of the method need empirical support.
An indisputable advantage of automated scoring using deep-learning-based computer vision combined with machine learning methods or markerless methods from depth sensor cameras is immediate results and feedback to patients, athletes, coaches, or healthcare professionals.
Our developed method that automates LESS scores provides a viable solution to decreasing scoring time, increasing accessibility to non-expert raters, and delivering immediate results without any additional expenditure other than conventional video recordings.Conventional 2D video recordings are adequate for quantifying kinematics [32][33][34] and are readily accessible through tablets or smartphones.The successful application of this method would pave the way for the automatic detection of individuals at high risk of injury using smartphone-based applications of LESS and 2D video footage (Video S1: https://youtu.be/q1wiGt4K8MU).Other than expediting mass injury risk screening initiatives in youth or team sports, LESS automation could be a valuable and convenient tool to track injury risk factors over time and to assess the effectiveness of intervention programs at improving landing mechanics (Video S2: https://youtu.be/Ve_QJu0fuLs).The proposed method could be extended to other injury risk screening methods based on 2D camera recordings to decrease manual labour and time required for screening initiatives; e.g., the Cutting Movement Assessment Scale [35] and Tuck jump assessment [36].
This preliminary investigation provides evidence that it is feasible to automate the LESS from 2D video recordings alone.Further research could lead to improved automation outcomes and enhance the strength of the agreement between clinical and automated LESS scores beyond its current levels.The newest body model in OpenPose (Body 25) contains 25 points, including coordinates that define the feet and enable computations of angles at the ankles [24].Although the timestamped IC key frame in frontal and sagittal videos were comparable between the clinician and scripted process (mean difference: 32.8 ms, p = 0.445 and 20.8 ± 16.3 ms, p = 0.827), using the foot coordinates rather than ankle and body coordinates would certainly enhance precision.A number of videos from the available dataset were not used because of a clear misidentification of time events from the automatic cropper (i.e., more than 100 ms).We were unable to determine the reason underlying the mislabelling of these videos upon visual inspection.We speculate that rerunning the current experiment using the COCO + Foot model might lead to the correct identification of key events in a greater number of our database videos, increasing the number of eligible videos for analysis.The increased number of coordinates from the 25-point Body rather than 18-point COCO model would also allow us to extract a greater number of features from the processed videos and use these as input in the subsequent regression experiments.

Conclusions
We provide evidence that the Landing Error Scoring System (LESS)-an injury-risk screening tool-can be automated using deep-learning-based computer vision combined with machine learning methods.Further research on the automation would enhance the strength of the agreement between clinical (gold standard) and automated (predicted) LESS scores, and risk classification beyond its current levels.Automation of the LESS using standard 2D recordings would facilitate mass injury-risk screening initiatives with quasi real-time feedback, without the need of depth cameras or expert clinicians.The successful application of this method would pave the way for the automatic detection of individuals at high risk of injury using smartphone-based applications of LESS and 2D video footage (Video S1: https://youtu.be/q1wiGt4K8MU),increasing accessibility of injury-risk assessment methods beyond elite athletes and removing depth-sensor camera requirements.It may also open doors to other related injury prevention problems.Future work includes updating the framework using the newest body model in OpenPose (Body 25) to extract a greater number of features and more accurately detect key frames.

Figure 1 .
Figure 1.Flow diagram of data processing leading to comparing 'gold standard' clinical LESS scores from an expert rater to 'automated' predicted LESS scores from the automation process.Abbreviations: IC, initial contact; KFmax, maximal knee flexion; LESS, Landing Error Scoring System; RF, random forest.

Figure 1 .
Figure 1.Flow diagram of data processing leading to comparing 'gold standard' clinical LESS scores from an expert rater to 'automated' predicted LESS scores from the automation process.Abbreviations: IC, initial contact; KF max , maximal knee flexion; LESS, Landing Error Scoring System; RF, random forest.

Figure 2 .
Figure 2.This figure is an example of the original (blue line) plot and rolling window (orange line) plot for the right ankle keypoint of one individual during a drop-jump landing trial taken from the lateral view video.More specifically, (a) the blue line depicts the distance of the right ankle to the left boarder (y-axis) in each video frame (x-axis); (b) the orange line is the 20-frame rolling median of the original blue line; (c) the black bars indicate the intersections of the two lines, whereas the red dotted line represents the distance between two consecutive intersection points.Figure (d) is a zoomed-in view of the intersections around the initial contact key frame.Figure (e) highlights the points (f) and (g) as the initial contact key frame on the rolling window plot and original plot, respectively.

Figure 2 .
Figure 2.This figure is an example of the original (blue line) plot and rolling window (orange line) plot for the right ankle keypoint of one individual during a drop-jump landing trial taken from the lateral view video.More specifically, (a) the blue line depicts the distance of the right ankle to the left boarder (y-axis) in each video frame (x-axis); (b) the orange line is the 20-frame rolling median of the original blue line; (c) the black bars indicate the intersections of the two lines, whereas the red dotted line represents the distance between two consecutive intersection points.Figure (d) is a zoomed-in view of the intersections around the initial contact key frame.Figure (e) highlights the points (f) and (g) as the initial contact key frame on the rolling window plot and original plot, respectively.

3 .
Find F key frames IC and KF max 3.1.Based on the coordinates of the left and right ankle (both visible in F), find the intersections of the original and rolling window plots a for each ankle 3.2.Calculate the distances between each consecutive intersection point pairs 3.3.Find the first point of the pair of intersection points with the longest distances for each ankle 3.4.Identify the first point of the pair of intersection points that has the lowest x value (i.e., number of frames) as IC 3.5.Based on the coordinates of the body keypoint, find the intersections of the original and rolling window plots a 3.6.Calculate the distances between each consecutive intersection point pairs 3.7.Identify the first point of the pair of intersection points with the longest distances as KF max 4. Find L key frames IC and KF max 4.1.Based on the coordinates of the individual's right ankle (which is closest to the camera L), find the intersections of the original and rolling window plots 4.2.Calculate the distances between each consecutive intersection point pairs 4.3.Identify the first point of the pair of intersection points with the longest distances as IC4.4 Based on the coordinates of the body keypoint, find the intersections of the original and rolling window a plots 4.5.Calculate the distances between each consecutive intersection point pairs 4.6.Identify the first point of the pair of intersection points with the longest distances with upper/positive trend as KF max 5. Crop the videos (F and L) according to IC and KF max key frames.Output: F', cropped version of frontal view video; L, cropped version of lateral view video Notes.a Rolling window plot, plot of median values from a rolling 20-frame window.See Figure 2. Abbreviations.F, Frontal view video; IC, initial contact; L, Lateral view video KF max , maximal knee flexion.Appl.Sci.2020, 10, 892 8 of 16

Figure 3 .
Figure 3. OpenPose's COCO 18-points model keypoint positions (left image)[16] and example of a frontal (middle image) and lateral (right image) view processed video at the maximal knee flexion key frame.

Figure 3 .
Figure 3. OpenPose's COCO 18-points model keypoint positions (left image) [16] and example of a frontal (middle image) and lateral (right image) view processed video at the maximal knee flexion key frame.

Figure 4 .
Figure 4. Actual (clinical) versus predicted (automated) LESS score plots from the random forest regression using full dataset (n = 320) and automatic cropping method.Dashed lines represent the 5error threshold that defines high risk of injury (i.e., scoring 5 or more errors during LESS has been associated with a 10.7 times greater relative risk of sustaining a non-contact anterior cruciate ligament injury[12]).Note that the clinical scores are integers and predicted scores are decimals, which adds granularity.Abbreviations: LESS, Landing Error Scoring System.

Figure 4 .
Figure 4. Actual (clinical) versus predicted (automated) LESS score plots from the random forest regression using full dataset (n = 320) and automatic cropping method.Dashed lines represent the 5-error threshold that defines high risk of injury (i.e., scoring 5 or more errors during LESS has been associated with a 10.7 times greater relative risk of sustaining a non-contact anterior cruciate ligament injury[12]).Note that the clinical scores are integers and predicted scores are decimals, which adds granularity.Abbreviations: LESS, Landing Error Scoring System.

Figure 5 .
Figure 5. Bland-Altman [21] plots depicting the difference in predicted (automated) and actual (clinical) LESS scores versus the mean scores with (A) conventional 95% limits of agreement (mean difference ± 1.96 standard deviation), and (B) regression-based limits of agreement (regressed difference between methods on the mean of the two methods ± 2.46 standard deviation of the residual).

Figure 5 .
Figure 5. Bland-Altman [21] plots depicting the difference in predicted (automated) and actual (clinical) LESS scores versus the mean scores with (A) conventional 95% limits of agreement (mean difference ± 1.96 standard deviation), and (B) regression-based limits of agreement (regressed difference between methods on the mean of the two methods ± 2.46 standard deviation of the residual).

Table 2 .
Algorithm used to detect key frames from the two input videos.

Table 3 .
Measurements extracted from key frames and used as kinematic features.

Table 3 .
Measurements extracted from key frames and used as kinematic features.

Table 4 .
Results from machine learning experiments.

Table 4 .
Results from machine learning experiments.