Investigation of Thresholds for Asymmetry Indices to Represent the Visual Assessment of Single Limb Lameness by Expert Veterinarians on Horses Trotting in a Straight Line

Simple Summary Visual gait evaluation made by the equine veterinarian is an essential part of the diagnosis of locomotor disorders. Measurement of movement asymmetry can provide objective support for diagnosis. However, their interpretation remains complex as horses considered to be healthy may show some degree of asymmetry. This study aims to establish and analyze the threshold values for different indices that can be used to discriminate a healthy horse from a horse considered lame by expert clinicians in their daily practice. At least 88% of healthy horses had an upward range of movement of the withers between −10% and 7% of asymmetry. The withers asymmetry of at least 84% of the forelimb lame horses was out of these thresholds. As well, at least 86% of healthy horses had an upward range of movement of the pelvis between −7% and 18% of asymmetry. At least 83% of the hindlimb lame horses were out of these pelvis asymmetry thresholds. Despite the quite low number of horses included in this study (224), these thresholds provide a first help to avoid overinterpretation of asymmetry when using objective gait analysis systems. Abstract Defining whether a gait asymmetry should be considered as lameness is challenging. Gait analysis systems now provide relatively accurate objective data, but their interpretation remains complex. Thresholds for discriminating between horses that are visually assessed as being lame or sound, as well as thresholds for locating the lame limb with precise sensitivity and specificity are essential for accurate interpretation of asymmetry measures. The goal of this study was to establish the thresholds of asymmetry indices having the best sensitivity and specificity to represent the visual single-limb lameness assessment made by expert veterinarians as part of their routine practice. Horses included in this study were evaluated for locomotor disorders at a clinic and equipped with the EQUISYM® system using inertial measurement unit (IMU) sensors. Visual evaluation by expert clinicians allocated horses into five groups: 49 sound, 62 left forelimb lame, 67 right forelimb lame, 23 left hindlimb lame, and 23 right hindlimb lame horses. 1/10 grade lame horses were excluded. Sensors placed on the head (_H), the withers (_W), and the pelvis (_P) provided vertical displacement. Relative difference of minimal (AI-min) and maximal (AI-max) altitudes, and of upward (AI-up) and downward (AI-down) amplitudes between right and left stance phases were calculated. Receiver operating characteristic (ROC) curves discriminating the sound horses from each lame limb group revealed the threshold of asymmetry indice associated with the best sensitivity and specificity. AI-up_W had the best ability to discriminate forelimb lame horses from sound horses with thresholds (left: −7%; right: +10%) whose sensitivity was greater than 84% and specificity greater than 88%. AI-up_P and AI-max_P discriminated hindlimb lame horses from sound horses with thresholds (left: −7%; right: +18% and left: −10%; right: +6%) whose sensitivity was greater than 78%, and specificity greater than 82%. Identified thresholds will enable the interpretation of quantitative data from lameness quantification systems. This study is mainly limited by the number of included horses and deserves further investigation with additional data, and similar studies on circles are warranted.


Introduction
Movement asymmetry is commonly used as an indicator of locomotor disorders by horses. Indeed, the aim of locomotor examination is to identify any impairment and to locate its source. Currently, lameness is visually evaluated by veterinarians. However agreement between veterinarians about lameness grade assessment is low, particularly for subtle lameness detection [1,2]. Modern gait analysis tools provide quantitative measures of asymmetry. The most versatile tool, the inertial measurement units (IMUs), can be used in a clinical setting [3,4]. The issue about the relationship between visual lameness assessment and gait asymmetries measurement systems has been raised. Asymmetry of vertical displacement of the head and pelvis has shown relevant increase with induced lameness [5][6][7]. But even horses perceived by the veterinarian to be sound have demonstrated physiological asymmetrical gait [8][9][10]. Despite the known capacity of withers asymmetry for detecting compensatory movements, it has been studied relatively less than the head [11,12].
In this context, thresholds of asymmetry parameters which correspond to visual evaluation of lameness by veterinarians have been studied. Asymmetry thresholds of the head (>6 mm) and the pelvic (>3 mm) vertical displacement were used for the first time by McCracken et al. [13]. They were probably based on a confidence interval calculated with two repeated measures on 236 horses [14]. These thresholds have been adjusted to the method of data construction used in other IMU systems [15]. A growing number of studies have used these thresholds as an objective lameness detection [11,12,16]. However numerous asymmetry values of sound horses have been over these thresholds [8,17,18]. This might be explained by undetected subclinical, pain-mediated disease or by biological variation, but no consensus has yet been reached [19,20]. Recently, a discrimination method of statistical analysis was applied on 25 Thoroughbred racehorses to redefine higher thresholds, (14.5 mm for the head and 7.5 mm for the pelvis) [21]. In this study, the focus was on specificity because the objective was to screen horses before racing. These results have given guidelines but require further investigations with heterogeneous horses and lameness types using a clinical environment faced by practitioners.
The goals of this clinical observational study were (i) to establish which asymmetry indices have the best sensitivity and specificity to reflect the visual assessment of singlelimb lameness made by expert clinicians as part of their routine practice. (ii) Then, for the relevant indices, the aim was to determine the threshold of lameness detection and lame limb identification. This first study was limited to the following conditions: at trot, in hand, on a straight line and on a hard surface.

Materials and Methods
This clinical observational retrospective study was approved by the clinical research ethics committee (ComERC n•2022-01-19).

Locomotor Examination
After collecting the anamnesis and performing the examination of the locomotor system, the veterinarian evaluated the horse locomotion without warm-up. As part of the dynamic locomotor examination, horses were trotted by their owner/groom on a straight line of 25 meters long. The handler was asked to run at adequate speed and to keep a steady pace. The ground surface was made of asphalt. Visual evaluation was performed by one of the five expert veterinarians graduated as DESV (French certification as a specialist in equine locomotor pathology) and certified ISELP (International Society of Equine Locomotor Pathology). Based on this evaluation on the straight line, horses were classified into five groups: right forelimb (RF) lame, left forelimb (LF) lame, right hindlimb (RH) lame, left hindlimb (LH) lame, and sound horses.
In total, 381 horses were screened and were evaluated lame on a straight line. Among them, 209 horses showed lameness grade ranging between 2/10 (inclusive) and 6/10 (inclusive) on a 11-grades scale equivalent to the UK scale (where 0 is: Sound and 10 is: Non-weight bearing lameness) [22][23][24]. Horses showing lameness on multiple limbs on the straight line were excluded (n = 33). With these criteria, 67 horses showed RF lameness, 62 horses showed LF lameness, 23 horses showed RH lameness, and 23 horses showed LH lameness. Flowchart is provided as Figure S1. Lameness grades included in each group are summarized in Table 1. Forty-nine horses were included in the group of "sound" horses. These sound horses have been presented at the clinic for pre-purchase examination or for gait evaluation prior to further training. In this group were included individuals who met all of the following criteria (1) and (2). (1) The sound horses were in training and judged by their owners to be capable of performing all the exercises required for their sport level. (2) A full locomotor examination of these horses by an expert clinician revealed no abnormalities deemed significant under any of the examination conditions. This examination included: walk, trot on a hard circle at both reins, on a hard straight line, four flexion tests (one for each limb), trot on a soft circle at both reins.

Data Collection
During the locomotor examination, as part of the clinical routine, horses were systematically equipped with the EQUISYM ® (Arioneo, LIM France, Nouvelle-Aquitaine, France) system consisting of seven wireless IMUs placed on the head, the withers, the pelvis, and the four cannon bones (Figure 1). They recorded tri-axial angular velocity within a range of 2000 • /s and tri-axial acceleration within a range of 16 g, at a frequency of 200 Hz during approximately two trot-ups, corresponding to a mean of 14.7 ± 7.8 trot strides on a straight line. Data were recorded on the sensors and downloaded wirelessly.
During the locomotor examination, as part of the clinical routine, horses were systematically equipped with the EQUISYM® (Arioneo, LIM France, Nouvelle-Aquitaine, France) system consisting of seven wireless IMUs placed on the head, the withers, the pelvis, and the four cannon bones (Figure 1). They recorded tri-axial angular velocity within a range of 2000°/s and tri-axial acceleration within a range of 16 g, at a frequency of 200 Hz during approximately two trot-ups, corresponding to a mean of 14.7 ± 7.8 trot strides on a straight line. Data were recorded on the sensors and downloaded wirelessly.

Data Processing
First, stance phase periods, e.g. foot-on and foot-off times, were determined based on the analysis of the gyroscopic signals recorded on the four cannon bones owing to the method developed by Hatrisse et al. [25]. One stride was defined as the time between two consecutive foot-on of the left forelimb.
Then vertical displacements of the head, withers and pelvis were segmented into strides. The acceleration signal measured along the dorso-ventral axis of the horse was integrated twice and high-pass filtered using a fourth-order Butterworth filter with a cutoff frequency set to 1 Hz to obtain displacement curves [4,26].
Based on the vertical displacement of the head (_H), withers (_W) and pelvis (_P) occurring along a stride, four variables were calculated for each sensor location. The following asymmetry indices (AI), expressed as a percentage of the maximal range of motion within a stride, were used to compare left vs right part of the stride ( Figure 2): AI-Min was the left-right difference of the lowest point of the vertical excursion; AI-Max was the leftright difference of the highest point of the vertical excursion; AI-up was the left-right difference of the upward range of motion during the propulsion phase; and AI-down was the left-right difference of the downward range of motion during the damping phase. Positive AI value indicated a smaller movement amplitude during the right stance than during left stance, and negative AI value indicated the opposite.

Data Processing
First, stance phase periods, e.g., foot-on and foot-off times, were determined based on the analysis of the gyroscopic signals recorded on the four cannon bones owing to the method developed by Hatrisse et al. [25]. One stride was defined as the time between two consecutive foot-on of the left forelimb.
Then vertical displacements of the head, withers and pelvis were segmented into strides. The acceleration signal measured along the dorso-ventral axis of the horse was integrated twice and high-pass filtered using a fourth-order Butterworth filter with a cut-off frequency set to 1 Hz to obtain displacement curves [4,26].
Based on the vertical displacement of the head (_H), withers (_W) and pelvis (_P) occurring along a stride, four variables were calculated for each sensor location. The following asymmetry indices (AI), expressed as a percentage of the maximal range of motion within a stride, were used to compare left vs. right part of the stride ( Figure 2): AI-Min was the left-right difference of the lowest point of the vertical excursion; AI-Max was the left-right difference of the highest point of the vertical excursion; AI-up was the left-right difference of the upward range of motion during the propulsion phase; and AIdown was the left-right difference of the downward range of motion during the damping phase. Positive AI value indicated a smaller movement amplitude during the right stance than during left stance, and negative AI value indicated the opposite.
All calculations were performed with custom-made Matlab2020a (The MathWorks, Natick, MA, USA) scripts. All calculations were performed with custom-made Matlab2020a (The MathWorks, Natick, MA, USA) scripts.

Data Analysis
Mean and standard deviation (SD) were calculated from data collected in each group. Normality was assessed using graphical methods [27]. Open software RStudio (RStudio Inc., Boston, MA, USA, version 4.1.3) was used, including the packages ROCR, pROC and boot. The four AIs calculated from head, withers, and pelvis were analyzed. Receiver operating characteristic (ROC) curves were plotted to discriminate each lame limb group (RF, LF, RH, LH) from the control group (sound horses). Area under curve (AUC) of the ROC curves was calculated. Then, thresholds with highest specificity and sensitivity using the top-left method were calculated. The top-left method involves choosing the threshold related to the curve point closest to the upper-left corner of the graph. 95% confidence interval (95% CI, which values are expressed into [;] in the text) was obtained from the repartition of the best specificities and sensitivities calculated for 400 samples, using the bootstrap method based on resampling to estimate the confidence interval [28]. In this study, indices were considered having good discrimination capacity if the sum of sensitivity and specificity was strictly higher than 150% [29].

Descriptive Results
Mean ± SD for each AI and for each horse group are summarized in Table 2 and boxplots are plotted in Figure 3.

Data Analysis
Mean and standard deviation (SD) were calculated from data collected in each group. Normality was assessed using graphical methods [27]. Open software RStudio (RStudio Inc., Boston, MA, USA, version 4.1.3) was used, including the packages ROCR, pROC and boot. The four AIs calculated from head, withers, and pelvis were analyzed. Receiver operating characteristic (ROC) curves were plotted to discriminate each lame limb group (RF, LF, RH, LH) from the control group (sound horses). Area under curve (AUC) of the ROC curves was calculated. Then, thresholds with highest specificity and sensitivity using the top-left method were calculated. The top-left method involves choosing the threshold related to the curve point closest to the upper-left corner of the graph. 95% confidence interval (95% CI, which values are expressed into [;] in the text) was obtained from the repartition of the best specificities and sensitivities calculated for 400 samples, using the bootstrap method based on resampling to estimate the confidence interval [28]. In this study, indices were considered having good discrimination capacity if the sum of sensitivity and specificity was strictly higher than 150% [29].

Descriptive Results
Mean ± SD for each AI and for each horse group are summarized in Table 2 and boxplots are plotted in Figure 3.
RF lame horses showed higher mean values (sign of a reduced movement on the right) than sound horses for all AIs of the head and withers, and discrete lower mean values (sign of a reduced movement on the left) for all AIs of the pelvis, except AI-down_P. Like a mirror, LF lame horses showed lower mean values than sound horses for all AIs of the head and withers, and higher mean values for all AIs of the pelvis, except AI-max_H and AI-down_P.
Horses with RH lameness showed higher mean values (sign of a reduced movement on the right) than sound horses for all AIs of the head and pelvis, except AI-down_P, and showed discrete lower mean values (sign of a reduced movement on the left) for all AIs of the withers. Like a mirror, LH lame horses showed lower mean values than sound horses for all AIs of the head and pelvis, except AI-max_H and AI-down_P, and they showed higher mean values for all AIs of the withers. Head (_H) Animals 2022, 12, x 6 of 15  RF lame horses showed higher mean values (sign of a reduced movement on the right) than sound horses for all AIs of the head and withers, and discrete lower mean values (sign of a reduced movement on the left) for all AIs of the pelvis, except AI-down_P. Like a mirror, LF lame horses showed lower mean values than sound horses for

Forelimb Lameness Discrimination
ROC curves are presented in Figure 4 for forelimbs lameness discrimination. Calculated from these ROC curves, AUC, best sensitivity and specificity, and threshold associated are summarized in Table 3. ROC curves for RF lameness discrimination showed highest horses for all AIs of the head and pelvis, except AI-max_H and AI-down_P, and they showed higher mean values for all AIs of the withers.

Forelimb Lameness Discrimination
ROC curves are presented in Figure 4 for forelimbs lameness discrimination. Calculated from these ROC curves, AUC, best sensitivity and specificity, and threshold associated are summarized in Table 3   . ROC curves discriminating horses with left forelimb (LF) lameness from sound horses; and discriminating horses with right forelimb (RF) lameness from sound horses, plotted for each sensor location (head, withers, pelvis); and plotted for asymmetry indice: AI-min (blue), AI-max (red), AI-up (black) and AI-down (cyan). The best specificity and sensitivity point of each curve is represented by a circle. The dashed black line is the hypothesized ROC curve with discrimination capacity only due to perfect chance. Table 3. Area under the curve (AUC) of the receiver operating characteristic (ROC) curve discriminating sound and forelimb lame horses, for each asymmetry indice (AI). Best sensitivity, specificity, and associated threshold were calculated using top-left method of ROC analysis. [95% confidence interval] were calculated plotting ROC analysis on 400 population resamplings (bootstraps). Results for which the sum of sensitivity and specificity is over 150% for both sides are in bold.

Hindlimb Lameness Discrimination
ROC curves are presented in Figure 5 for hindlimbs lameness discrimination. Calculated from these ROC curves, AUC, best sensitivity and specificity, and threshold associated are summarized in Table 4  and discriminating horses with right hindlimb (RH) lameness from sound horses, plotted for each sensor location (head, withers, pelvis); and plotted for asymmetry indice : AI-min (blue), AI-max (red), AI-up (black), and AI-down (cyan). The best specificity and sensitivity point of each curve is represented by a circle. The dashed black line is the hypothesized ROC curve with discrimination capacity only due to perfect chance.

Asymmetry Thresholds of Reliable Indices
With a sum of sensitivity and specificity over 150%, head (AI-min_H and AI-up_H) and withers (AI-min_W and AI-up_W) indices discriminated the LF lame horses from sound horses. As well, head (AI-min_H, AI-max_H, AI-up_H, and AI-down_H) and withers (AI-min_W and AI-up_W) indices discriminated the RF lame horses from sound horses.
With a sum of sensitivity and specificity over 150%, withers (AI-up_W) and pelvis (AI-min_P, AI-max_P, and AI-up_P) indices discriminated the LH lame horses from sound horses. As well, head (AI-min_H and AI-up_H) and pelvis (AI-max_P and AI-up_P) indices discriminated the RH lame horses from sound horses.
Thresholds and their 95% CI associated with a sum of sensitivity and specificity over 150% for both right and left lameness discrimination were plotted in Figure 6.

Asymmetry Thresholds of Reliable Indices
With a sum of sensitivity and specificity over 150%, head (AI-min_H and AI-up_H) and withers (AI-min_W and AI-up_W) indices discriminated the LF lame horses from sound horses. As well, head (AI-min_H, AI-max_H, AI-up_H, and AI-down_H) and withers (AI-min_W and AI-up_W) indices discriminated the RF lame horses from sound horses.
With a sum of sensitivity and specificity over 150%, withers (AI-up_W) and pelvis (AI-min_P, AI-max_P, and AI-up_P) indices discriminated the LH lame horses from sound horses. As well, head (AI-min_H and AI-up_H) and pelvis (AI-max_P and AI-up_P) indices discriminated the RH lame horses from sound horses.
Thresholds and their 95% CI associated with a sum of sensitivity and specificity over 150% for both right and left lameness discrimination were plotted in Figure 6. . Thresholds (red line) of asymmetry indices (AI) (in % of asymmetry) for forelimb and hindlimb lameness discrimination. Only the AIs with the sum of sensitivity and specificity over 150% for both right and left lameness are plotted. Three range of values are represented: yellow for 95% confidence interval (95% CI) around the threshold, green for values below the 95% CI ("sound" horses) and red for values beyond the 95% CI ("lame" horses).

Discussion
Description of AIs provided by the head, withers, and pelvis from sound horses showed that almost no asymmetry was detected on the withers, whereas head and pelvis Figure 6. Thresholds (red line) of asymmetry indices (AI) (in % of asymmetry) for forelimb and hindlimb lameness discrimination. Only the AIs with the sum of sensitivity and specificity over 150% for both right and left lameness are plotted. Three range of values are represented: yellow for 95% confidence interval (95% CI) around the threshold, green for values below the 95% CI ("sound" horses) and red for values beyond the 95% CI ("lame" horses).

Discussion
Description of AIs provided by the head, withers, and pelvis from sound horses showed that almost no asymmetry was detected on the withers, whereas head and pelvis showed slightly lower range of movement during the RF-LH stance phase. In lame horses, the withers and pelvis showed reduced range of movement during the stance phase of respectively the front and hind lame limb. The head also showed reduced movement during the stance phase of a lame forelimb. Conversely, the head showed increased movement during the stance phase of a lame hindlimb.
Our study confirms the hypothesis that head and withers vertical displacements are indicators of forelimb lameness. Indeed, head indices (AI-min_H, AI-up_H) and withers indices (AI-min_W and AI-up_W) are the indices with the highest sensitivity and specificity (sum of sensitivity and specificity greater than 150%) for discriminating horses with forelimb lameness from sound horses. Among them, AI-up_W discriminated forelimb lameness with the highest sensitivity (>84%) and specificity (>88%).
For hindlimb lameness, pelvic vertical displacement was the most consistent indicator. Withers and head vertical displacement were also modified, with compensatory movements but only indices from the pelvis (AI-max_P and AI-up_P) discriminated both sides hindlimb lameness with sensitivity over 78% and specificity over 82%.
Among the four indices (AI-up, AI-max, AI-min, AI-down) used in this study, it was shown that AI-down indice has systematically a low sensitivity and specificity for discriminating both hindlimb and forelimb lame horses from sound horses. This result suggests that the AI-down indice should not be considered as the most useful indice in future work.
Main results give the following guidelines: associated with the highest sensitivity and specificity, AI-up_W discriminates LF lame horses under −10% of asymmetry and RF lame horses over +7% of asymmetry from sound horses. Associated with the highest sensitivity and specificity, AI-up_P discriminates LH lame horses under −7% of asymmetry and RH lame horses over +18% of asymmetry from sound horses. These observations confirm that the upward movement of the pelvis has the highest power to discriminate hindlimb lameness [30].
Higher relevance of the withers data than the head data contradicts previous studies [5,21]. Head shows greater movement asymmetry than withers, helping the visual assessment for forelimb lameness. However, the head is subjected to random movements existing in restless horses [7,8,10,31]. This study demonstrates that the withers movement provides useful and relevant information to detect forelimb lameness. Although having a lower reliability, head movement indices provide additional information useful to differentiate forelimb and hindlimb lameness [11,32].
Moreover, absolute threshold values of the head asymmetry were different in our study between the right and the left side of lameness discrimination (AI-up_H: +24% for RF vs. −36% for LF; AI-min_H: +6% for RF vs. −33% for LF). This difference may reflect different types of lameness between the RF and the LF lame horses in our reference population. This difference could also reflect an artefactual reduced movement of the head during the LF stance phase, compared to the RF stance phase for sound horse. Explanation could be found because horses were trotted in-hand on their left side by their owner or groom. This artefactual asymmetry may be induced by the handler despite instructions not to hold the head too firmly and to release the lunge. The head was either pulled forward, either hold backward depending on the spontaneous speed of the horse. This difference was not highlighted in other studies, which were maybe performed under more standardized conditions [33]. To a lesser extent, a similar phenomenon to the other side appeared on the pelvis in sound horses (AI-min_P mean of +6%; AI-up_P +5%; AI-down_P +5%).
Here AIs were divided by the range of movement, generating relative indices expressed in %. Values in millimeters are also provided in Table S1. Normalizing values seems natural in order to compare movement measured in a heterogeneous population, possibly including ponies. In addition, normalizing may lead to an easier comparison of asymmetry indices processed by different gait analysis systems [15,34]. Previous studies [13,21] have however expressed thresholds in millimeters. Pfau et al. [21] found that the threshold of HDmin was 14.5 mm for forelimb lameness discrimination. As well, PDmax discriminated hindlimb lameness from 10 mm with a low sensitivity. Contrary to our results, PDmin was more reliable than PDmax, and the head was more reliable than the withers. Pfau et al. focused on specificity in a selection context (racing Thoroughbreds for the purpose of "lameness screening") where false positives should be avoided. Conversely, in a clinical context, a fair balance between sensitivity and specificity must be found to limit both false positives (inducing unnecessary costly investigations and anxiety of the owner) and false negatives [21]. This choice may explain differences with the results of Pfau et al. [21]. Other differences were: the lack of differentiation between right and left lameness, the study of a homogeneous population, and the subjective evaluation made by five assessors using video.
In the present study, we noticed that forelimb lameness also decreased the pelvic vertical range of motion during the lame limb stance phase. This observation has been previously noticed in other studies [35,36]. LH lameness showed a small impact on the head but decreased the withers movement during RF stance phase. Contrary to LH lameness, RH lameness increased the head movement during LF stance phase. This supports the hypothesis that the head, the withers, and the pelvis provide complementary information about the lameness location [11,37].
The low number of hindlimb lame horses (23 RH and 23 LH) may induce a bias. Furthermore, all lameness were included regardless of the type of injury diagnosed. It is obvious that some indices may be more or less modified according to the type of injury. To go further, more horses and specific groups for each type of diagnosed injury or clinical manifestation will be needed. This must be in the future roadmap.
Another limitation of this study is the clinical reference used to detect lame horses from non-lame horses and single limb lameness vs. multi-limbs lameness because this clinical assessment is recognized by definition as being subjective [2]. In the present study, visual examination of lameness by expert clinicians in the real context of the clinical examination, in the field, has been chosen as a reference to establish thresholds and calculate their sensitivity and specificity. This choice must, of course, be discussed as it is well known that visual assessment by experts is subject to many uncertainties (e.g., lack of repeatability and reproducibility) [1]. Visual assessment is not considered a "gold standard" in the present study, but only as a reference to what exists in the best possible conditions. In order for clinicians to appropriate the tools for quantifying locomotor asymmetries, it seems indeed necessary to give them an idea of the threshold values which, on these devices, correspond to what they are used to seeing and concluding subjectively with the classical (even imperfect) methods. This first step seems necessary because it is only once these benchmarks have been established that real progress can be made in interpreting the data from the quantification systems. The challenge here is to avoid the slightest asymmetry measured by a quantification system from being mistaken for lameness. It should indeed be remembered that the definition of lameness refers to a veterinarian's diagnosis and not to a machine. The machine can only be considered as a quantitative aid to a multi-factorial medical decision. In this study, the real condition of clinical routine was deliberately chosen in order to reflect the real-life examination of lame horses. Five highly experienced veterinarians were involved. Their experience and their identical and consistent method of assessment are likely to increase the agreement rate [35,38], although this result can be discussed [1]. The agreement could for example be slightly increased if the experts had not been informed of the owner's request. In this context, the lowest grade of lameness (1/10) was deliberately excluded because of weaker agreement for very subtle asymmetries [1].

Conclusions
Although quite small 95% CIs were found, an increased number of horses improved threshold accuracy. This study highlights the most relevant indices (AI-up_W for forelimb lameness and AI-up_P for hindlimb lameness) and indicates an order of magnitude of the thresholds and their 95% CIs. These thresholds can be used as a first support to discriminate between lame (from grade 2/10) and non-lame horses, bearing in mind the value of the 95% CIs which prohibits the use of these thresholds as an absolute cut-off value. In any case, these indicators can only be interpreted in the light of a global clinical expertise taking into account that there is not only one type of lameness but multiple clinical manifestations of locomotor disorders depending on the type of lesion. Subtle lameness (1/10 grade) have not been included here; further studies are warranted to refine the thresholds for horses with subtle lameness.
Moreover, forelimb and hindlimb lameness were analyzed separately and multi-limb lame horses were excluded. It should therefore be kept in mind that the interrelationship between the movements of the head, withers, and pelvis still requires further work. Future studies with multivariate analysis are needed to provide more information on the lame limb identification and relationship between the indices in various clinical circumstances.
In the longer term, the application of this study is aimed at a wider range of conditions in veterinary practice. A main limitation is that all measures were recorded under specific and standardized conditions. In the following years, further data are needed to refine lameness detection thresholds under conditions where physiological asymmetries are known to be higher (circles for instance) [38].
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ani12243498/s1. Figure S1: Flowchart of the exclusion/inclusion criteria of the screened horses, which were visually considered lame on a straight line by an expert veterinarian during his routine practice. Table S1: Table of the mean asymmetry indice (AI) in % and the mean of the non-normalized differences (Diff) in centimeters between minima and/or maxima of the vertical displacement of the head (_H), the withers (_W), and the pelvis (_P) for all horses included in the study at the trot on the straight line. These horses were visually assessed as sound, right forelimb (RF) lame, left forelimb (LF) lame, right hindlimb (RH) lame, or left hindlimb (LH) lame. Funding: This research was funded by the Agence Nationale de la Recherche (LabCom "CWD-Vetlab", Contract ANR 16-LCV2-0002-01), the European Regional Development Fund (FEDER) and the Region Normandie (Project "EquiSym", Contract 20E01636).

Institutional Review Board Statement:
The animal study protocol was approved by the Ethics Committee of COMITE D'ETHIQUE EN RECHERCHE CLINIQUE-ComERC (n•2022-01-19).

Informed Consent Statement:
Informed consent was obtained from the owner of all the subjects involved in the study.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request.