Femoral Neck Thickness Index as an Indicator of Proximal Femur Bone Modeling

Simple Summary Canine hip dysplasia development results in femoral neck modeling and an increase in thickness. The main objective of this work was to describe a femoral neck thickness index to quantify femoral neck width and to study its association with the degree of canine hip dysplasia using the Fédération Cynologique Internationale scoring scheme. A total of 53 dogs (106 hips) were randomly selected for this study. Two examiners performed femoral neck thickness index estimation to study intra- and inter-examiner reliability and agreement. Statistical analysis tests showed excellent agreement and reliability between the measurements of the two examiners and the examiners’ sessions. All joints were scored in five categories by an experienced examiner according to the Fédération Cynologique Internationale criteria, and the results from examiner 1 were compared between these categories. The comparison of mean femoral neck thickness index between hip dysplasia categories using the analysis of variance test showed significant differences between groups. These results show that femoral neck thickness index is a parameter capable of evaluating proximal femur bone modeling and that it has the potential to enrich conventional canine hip dysplasia scoring criteria if incorporated into a computer-aided diagnosis software. Abstract The alteration in the shape of the femoral neck is an important radiographic sign for scoring canine hip dysplasia (CHD). Previous studies have reported that the femoral neck thickness (FNT) is greater in dogs with hip joint dysplasia, becoming progressively thicker with disease severity. The main objective of this work was to describe a femoral neck thickness index (FNTi) to quantify FNT and to study its association with the degree of CHD using the Fédération Cynologique Internationale (FCI) scheme. A total of 53 dogs (106 hips) were randomly selected for this study. Two examiners performed FNTi estimation to study intra- and inter-examiner reliability and agreement. The paired t-test, the Bland-Altman plots, and the intraclass correlation coefficient showed excellent agreement and reliability between the measurements of the two examiners and the examiners’ sessions. All joints were scored in five categories by an experienced examiner according to FCI criteria. The results from examiner 1 were compared between FCI categories. Hips that were assigned an FCI grade of A (n = 19), B (n = 23), C (n = 24), D (n = 24), and E (n = 16) had a mean ± standard deviation FNTi of 0.809 ± 0.024, 0.835 ± 0.044, 0.868 ± 0.022, 0.903 ± 0.033, and 0.923 ± 0.068, respectively (ANOVA, p < 0.05). Therefore, these results show that FNTi is a parameter capable of evaluating proximal femur bone modeling and that it has the potential to enrich conventional CHD scoring criteria if incorporated into a computer-aided diagnosis capable of detecting CHD.


Introduction
Canine hip dysplasia (CHD) is an inherited orthopedic disease predominant in large and giant dog breeds that causes lameness and disability. Phenotypic expression of CHD is influenced by genetic defects and environmental stresses that trigger hip joint laxity and incongruency, which often leads to bone modeling and progression to secondary osteoarthritis (OA) [1,2]. Molecular tests for the diagnosis of CHD have already been developed, but they still have not achieved acceptable diagnostic accuracy for the disease [3,4]. Radiography has remained the established imaging technology for diagnosing CHD, as it plays an important role in the selection of breeding stock with the aim of reducing the incidence of the disease in offspring [1,2]. In general, there is not a consistent relationship between clinical signs and radiographic joint changes [5,6]. The ventrodorsal hip extended (VDHE) view is recommended worldwide for CHD screening [7]. However, in young animals, a ventrodorsal hip stress view can also be used to evaluate the hip joint laxity [1,2]. Hip laxity is considered a main risk factor for CHD, but there are some important differences in the progression and final severity of CHD [5,6]. An image of good technical quality of the VDHE view requires radiographic images without pelvic tilting and with adequate femur extension and alignment [7]. However, in the last decades, despite the widespread use of CHD screening radiographs, the prevalence of the disease remains high in some breeds due to a number of factors. These include: variability between radiologists' assessments, which is due to different levels of expertise among radiologists; screening systems that are not yet sufficiently standardized and strict, allowing for some subjectivity; the late appearance of unequivocal pathognomonic radiographic signs, at an age stage in which sometimes the dog has already entered the breeding pool; and the absence of evaluation of potential early hip joint changes in this view [8][9][10]. Therefore, the current incidence of CHD highlights the need to introduce new parameters to improve diagnosis, such as the hip congruency index, which has the potential to confer greater objectivity to the assessment of hip congruency, consequently benefiting overall diagnostic and scoring accuracy [11].
Worldwide, there are three main international entities for CHD scoring: the Fédération Cynologique Internationale (FCI), with implementation in the countries of continental Europe, the British Veterinary Association/The Kennel Club (BVA/KC), used mainly in the United Kingdom, and the Orthopedic Foundation for Animals (OFA), used in the United States of America [1,6,12]. All of these scoring systems place great emphasis on bone modeling and OA [12]. They also take a qualitative evaluation approach to these parameters, which leaves some margin for interpretation and error [8,13,14]. Early screening for CHD, commonly referred to as the PennHIP method, is based on distinct hip abnormalities. It assesses hip joint laxity by analyzing the femoral head separation from the acetabulum under stress by measuring the distraction index (separation distance divided by the radius of the femoral head) [1,2,6].
In CHD, early joint osteoarthritis and a consequent increase in joint fluid leads to incongruency, laxity of the soft tissue of the hip joint, and subluxation that results in abnormal stresses placed on the bony and soft tissue components of the joint. The joint capsule is attached to the margin of the acetabulum and around the femoral neck. The femoral neck is considered to be normal when its diameter narrows slightly directly below the head. The presence of biomechanical imbalance results in cartilage wear and tear and in the development of mechanosensitive pathways that drive proteases to initiate the mechanism of joint breakdown, subchondral and periosteal reaction, and new bone production in the capsule attachment area and neighboring tissues, particularly around the junction between the head and neck [15,16]. One of the main signs of CHD is the widening of the appearance of the femoral neck on a craniocaudal radiograph view due to osteophyte development in conjunction with the flattening of the femoral head, the former becoming progressively thicker until it is indistinguishable from the head in severe stages of CHD due to bony proliferation (exostoses) [17][18][19]. The FCI proposes a 5-grade classification system to represent the severity of the disease: A (normal), B (near normal/transition), C (mild), D (moderate), and E (severe) [6,12,16,17]. Current recommendations in CHD FCI scoring are to include Norberg angle measurement, joint space evaluation, congruence, osteoarthritic signs, and all aspects of hip joint changes, commonly referred as Brass' method [17]. A previous study reports that the femoral neck thickness (FNT) is altered in hip joints classified as near normal (grade B) by the FCI system, adopting a slightly cylindrical shape. This morphological change becomes even more evident in hip joints classified as moderate grade (grade D) [17].
The main objective of this study was to create a new measurement method focused on alterations of the femoral neck as a means of determining proximal femoral changes associated with bone modeling and OA for CHD. For this purpose, we calculated the femoral neck thickness index (FNTi), an objective parameter that relates the minimal FNT to the diameter of the ipsilateral femoral head, and compared it to the different FCI grades. Our hypothesis was that there would be an association between FCI grades and FNTi, and that FNTi would increase with disease severity. To our knowledge, the FNTi's association with the FCI grades has not been previously studied and could be integrated into a classification system as a parameter for evaluating bone modeling of the proximal femur.

Materials and Methods
This was a retrospective study based on the evaluation of VDHE views that were randomly selected from the Veterinary Teaching Hospital of the University of Trás-os-Montes and Alto Douro database and from the Danish Kennel Club database, obtained between 2010 and 2023. Recorded data included breed, sex, and weight. The inclusion criteria were dogs older than 12, 15, or 18 months (according to FCI recommendations for medium, large, and giant breeds) and VDHE views with adequate technical quality in terms of image (good bone contrast and spatial resolution) and pelvis positioning (femurs parallel to each other, patellae centred between the femoral condyles, and pelvic symmetry) for CHD scoring. VDHE views showing radiographic signs compatible with other hip or hindlimb diseases, such as bone fractures, previous surgeries, neoplasia, and knee osteoarthritis, were excluded. All five FCI categories were similarly represented in the sample. Due to the observational nature of this study, owner consent and ethical committee approval were waived.

Radiographic Measurements
The minimal FNT was measured by drawing a straight line, roughly perpendicular to the anatomical axis of the femoral neck, connecting the two closest points between the proximal and distal margins of the femoral neck in a VDHE view. The femoral head diameter was determined as a diameter of a circle outlining the margin of the femoral head ( Figure 1). These measurements were performed by examiner 1 (E1) in two independent sessions (S1 and S2) to evaluate repeatability and by examiner 2 (E2) to test the reproducibility using specific DICOM viewer and editor software (Dys4Vet version 2.0, accessed between 1 October and 31 December 2022). The FNTi was determined by dividing the FNT by the femoral head diameter. scored using FCI criteria: grade A (normal hip-Norberg angle (NA) > 105° and excellent congruency); grade B (borderline or transitional hip joint-NA around 105° and mild incongruency); grade C (slight CHD-NA around 100°, centre of femoral head outside of dorsal acetabular margin, and moderate incongruency); grade D (moderate CHD-NA > 90°, signs of osteoarthritis, and obvious incongruency); and grade E (severe CHD-NA < 90°, signs of osteoarthritis, and severe incongruency) [12,17]. The NA was measured between a line joining the centre of the circle encompassing the femoral heads and another line connecting each centre of the femoral head with the ipsilateral, effective cranial acetabular rim [20]. Some radiographic parameters related to the femoral head centre position and the dorsal acetabular margin and joint space were also considered for final FCI joint scoring [17].
The E1 and E2 measurements were performed by I.P. and P.F., respectively, and the hip FCI scoring was performed by M.G. in a single-blind fashion for each examiner (i.e., E1 and E2 were unaware of each FNTi measurement and FCI scores, and the FCI scorer was unaware of FNTi measurements).  In order to associate FNTi with the five FCI grades for CHD, the hip joints were scored using FCI criteria: grade A (normal hip-Norberg angle (NA) > 105 • and excellent congruency); grade B (borderline or transitional hip joint-NA around 105 • and mild incongruency); grade C (slight CHD-NA around 100 • , centre of femoral head outside of dorsal acetabular margin, and moderate incongruency); grade D (moderate CHD-NA > 90 • , signs of osteoarthritis, and obvious incongruency); and grade E (severe CHD-NA < 90 • , signs of osteoarthritis, and severe incongruency) [12,17]. The NA was measured between a line joining the centre of the circle encompassing the femoral heads and another line connecting each centre of the femoral head with the ipsilateral, effective cranial acetabular rim [20]. Some radiographic parameters related to the femoral head centre position and the dorsal acetabular margin and joint space were also considered for final FCI joint scoring [17].
The E1 and E2 measurements were performed by I.P. and P.F., respectively, and the hip FCI scoring was performed by M.G. in a single-blind fashion for each examiner (i.e., E1 and E2 were unaware of each FNTi measurement and FCI scores, and the FCI scorer was unaware of FNTi measurements). Parametric tests were used for statistical analysis [21]. The paired t-test was used for comparison of duplicate E1S1-E1S2 measurements to evaluate repeatability, as well as E1S1-E2 measurements to evaluate reproducibility [22]. The Bland-Altman analysis and the intraclass correlation coefficient (ICC) were used to investigate intra-and inter-examiner agreement and reliability, respectively. In the Bland-Altman method, the 95% limits of agreement (LA) were calculated as the mean difference (d) ± 1.96 standard deviation (SD) [22][23][24][25]. Measurements were considered in agreement when the 95% confidence interval (CI) of the mean differences included zero and equivalent when the 95% upper and lower LA were small (irrelevant difference) [23,24]. The ICC was considered as poor, moderate, good/acceptable, and excellent reliability when the lower limit of 95% CI was <0.50, ≥0.50-0.75, ≥0.75-90, and ≥0.90, respectively [25,26]. Cohen's d was used to measure the effect size when significant differences between measurements were registered: negligible < 0.20, small ≥ 0.20, medium ≥ 0.50, and large ≥ 0.80 [27].

Statistical Analysis
The comparison of FNTi values of E1S1 measurement between FCI categories was performed using the Welch's ANOVA, followed by the post hoc Games-Howell test. The null hypothesis was that there were no significant differences in the FNTi mean values between the FCI categories [28]. A p-value of <0.05 was considered statistically significant. The statistical analysis was performed considering each joint individually.

Results
Measurements were performed on 106 hip joints. The FNTi mean ± SD in E1S1 was 0.86 ± 0.06; in E1S2 it was 0.86 ± 0.06; and in E2 it was 0.87 ± 0.06. The main statistical analysis results related to the intra-examiner (repeatability) (p > 0. 05  A total of 19 (18%) hip joints were scored as FCI grade A, and the FNTi mean ± SD was 0.809 ± 0.024; 23 (21%) hip joints were scored as FCI grade B, and the FNTi mean ± SD was 0.835 ± 0.044; 24 (23%) hip joints were scored as FCI grade C, and the FNTi mean ± SD was 0.868 ± 0.022; 24 (23%) hip joints were scored as FCI grade D, and the FNTi mean ± SD was 0.903 ± 0.033; and 16 (15%) hip joints were scored as FCI grade E, and the FNTi mean ± SD was 0.923 ± 0.068. Data were assessed for normality using the Shapiro-Wilk test (p A = 0.501; p B = 0.275; p C = 0.739; p D = 0.983; p E = 0.383), and Levene's test indicated unequal variances among group samples (p A,B,C,D,E < 0.05). Significant statistical mean differences using Welch's ANOVA followed by the post hoc Games-Howell test were verified in FCI categories with means marked with different letter superscripts (p < 0.05) (Table 2, Figure 4).  A total of 19 (18%) hip joints were scored as FCI grade A, and the FNTi mean ± SD was 0.809 ± 0.024; 23 (21%) hip joints were scored as FCI grade B, and the FNTi mean ± SD was 0.835 ± 0.044; 24 (23%) hip joints were scored as FCI grade C, and the FNTi mean ± SD was 0.868 ± 0.022; 24 (23%) hip joints were scored as FCI grade D, and the FNTi mean ± SD was 0.903 ± 0.033; and 16 (15%) hip joints were scored as FCI grade E, and the FNTi  A total of 19 (18%) hip joints were scored as FCI grade A, and the FNTi mean ± SD was 0.809 ± 0.024; 23 (21%) hip joints were scored as FCI grade B, and the FNTi mean ± SD was 0.835 ± 0.044; 24 (23%) hip joints were scored as FCI grade C, and the FNTi mean ± SD was 0.868 ± 0.022; 24 (23%) hip joints were scored as FCI grade D, and the FNTi mean ± SD was 0.903 ± 0.033; and 16 (15%) hip joints were scored as FCI grade E, and the FNTi  mean ± SD was 0.923 ± 0.068. Data were assessed for normality using the Shapiro-Wilk test (pA = 0.501; pB = 0.275; pC = 0.739; pD = 0.983; pE = 0.383), and Levene's test indicated unequal variances among group samples (pA,B,C,D,E < 0.05). Significant statistical mean differences using Welch's ANOVA followed by the post hoc Games-Howell test were verified in FCI categories with means marked with different letter superscripts (p < 0.05) (Table 2, Figure 4).

Discussion
The objective of this study was to create a methodology that would allow a more objective radiographic evaluation of the changes that the proximal femur undergoes in CHD (namely, thickening of the neck), and to understand this relationship across the various FCI degrees of CHD in the VDHE view. The use of a parametric test in statistical analysis was based on the Central Limit Theorem, which states that in large sample sizes (n > 30), the distribution of standardized samples' means tends to be normally distributed

Discussion
The objective of this study was to create a methodology that would allow a more objective radiographic evaluation of the changes that the proximal femur undergoes in CHD (namely, thickening of the neck), and to understand this relationship across the various FCI degrees of CHD in the VDHE view. The use of a parametric test in statistical analysis was based on the Central Limit Theorem, which states that in large sample sizes (n > 30), the distribution of standardized samples' means tends to be normally distributed independently of the distribution of the population from where it originated [21]. The FCI scoring can be considered a dynamic system that has undergone regular updates over the years in order to harmonize CHD classifications in different countries and to improve the reliability of associating morphological alterations with the genetic profile of the animal. We highlight here the CHD panelist meetings of Dortmund 1991 and Copenhagen 2007 and 2022 [29]. In dysplastic hips, progressive bone modeling and osteophyte formation are induced by osteoarthritic pathways that result in subchondral and periosteal response with new bone production. Particularly around the junction between the head and neck, biomechanical stresses in the hip joint occur at a faster rate, and new bone and osteophytes are placed in some areas and reabsorbed in other areas of the femur and acetabulum [15]. These changes in the bone structure are concurrent with hip osteoarthritis [30]. Pinna et al. (2022) observed a significant increase in the thickness of the femoral neck in VDHE views of hips classified as grade B, a grade assigned to hips considered healthy and in which no osteoarthritic signs are visible [17]. In another study, Andronescu et al. (2015) studied the ratio between head volume and femoral neck volume on 3D computed tomography images of dogs at high risk of developing CHD (distraction index > 0.3) from 16 to 32 weeks of age and found a decrease in the ratio, even though the differences between values corresponding to OA severities were not statistically significant [18]. Therefore, the study of bone modeling and OA as two interrelated topics seems particularly important to us. We use the FNTi to get around the difficulty that exists in veterinary medicine to use absolute measures in anatomic measurements due to the different sizes of dogs. The relationship between measurements and the size of the femoral head is a strategy that has been successfully used previously for other purposes, such as the hip distraction index (mainly in young animals without severe bone changes) [1,6] and the hip congruency index [11]. In a previous study, the femoral neck width was related with the femoral neck length to create a widening index [31]. However, this methodology was not followed in this work because we think that the relationship between the length of the femoral neck and the size of the breed is much less studied and used than the diameter of the femoral head [6,11]. As such, there may be greater variability in the length of the femoral neck between breeds of similar sizes, which makes this parameter a less effective ratio measure. On the other hand, artificial intelligence is currently being introduced in digital image analysis [32,33], and since the identification of the femoral head seems to us to be an essential parameter in the application of artificial intelligence to the diagnosis of CHD, any index that resorts to its use can be more easily integrated in the near future for this purpose.
Bone modeling is admittedly one of the aspects of classification that is undervalued by the FCI criteria. In some iterations of the FCI criteria [34], morphological changes of the proximal femur are only explicitly mentioned in the most severe degree of dysplasia (grade D), pointing out the characteristic mushroom-like appearance that the femoral head assumes. Other changes related to bone modeling of the proximal femur have been overlooked and are presumably relegated to the topic of "osteoarthritic signs", addressed with a yes/no question [34,35], which makes the evaluation less accurate. The BVA/KC uses a phenotypic evaluation criteria based on a points system, addressing changes in bone shape from minor modeling up to severe OA [19]. However, it is still a qualitative scoring system, so there is also some subjectivity in the analysis because it ends up being dependent on the level of experience of the examiner. Taking this into consideration, there is a perceived need for novel approaches that grant more objectivity to the assessment of bone modeling and, in addition, CHD scoring.
Our results showed that the methodology behind the FNTi had excellent reliability and agreement. Given the results of the paired t-tests and Bland-Altman plots with the mean differences near zero and the narrow 95% CI, there was no evidence of bias between the examiners' measurements, and they can be considered statistically similar. The intraexaminer ICC was 0.94, and the lower bound of the 95% CI was 0.92, which translates to excellent reliability. This indicates adequate repeatability and reproducibility of the described FNTi determination methodology. Significant mean differences between FNTi examiners' measurements were observed in the paired samples t-test. However, the recorded effect size was negligible (d = 0.14) and, considering the corresponding Bland-Altman plot (Figure 3), the 95% LA can be interpreted as clinically small because 95% of all calculated inter-examiner differences lie in the short range of −0.048 to 0.034. Furthermore, the inter-examiner ICC shows excellent reliability. This shows that even though the two examiners themselves produce somewhat different values, their measurements are clearly related and functionally consistent. Both examiners had a similar amount of practice time with the method beforehand, so no conclusions can be drawn based on experience. On the other hand, one would expect the mean differences to be greater among bigger FNTi mean values due to the subjectiveness of the annotation imposed by exostosis and/or osteophyte formation in the concave fossa, making the bone contours more intricate, which could have possibly caused disparities between sessions and examiners. Ultimately, by analyzing the two diagrams (Figures 2 and 3), we can see that the mean difference values do not increase or decrease in proportion to the mean FNTi values, thus concluding that there is no proportional bias.
The Welch's ANOVA of FNTi values in FCI categories revealed a statistically significant main effect (p < 0.001), indicating that not all FCI categories had the same mean FNTi value. Post hoc comparisons using the Games-Howell post hoc procedure were conducted to determine which pairs of the five categories' means differed significantly. The results indicate that different FCI categories have mean FNTi values that increase gradually with the degree of severity of the disease: grade A hips had a statistically significant lower mean FNTi value than grades C, D, and E (p < 0.05); grade B had significantly lower mean FNTi value than grades D and E (p < 0.05); and grade C had significantly lower mean FNTi value than grade E (p < 0.05). Therefore, the null hypothesis is rejected, which supports our assumption that FNTi changes with FCI CHD grades. This parameter can help distinguish between CHD grades, potentially improving FCI scoring criteria. However, there is some overlap in the FNTi values corresponding to the different FCI categories, specifically between adjacent categories. This is one of the reasons that an assessment solely dependent on this is ambiguous and, therefore, impractical. The "whiskers" of the box plot corresponding to category E stretch over a wider range of values than the other box plots, overlapping almost every other category. This can be explained by how long hip subluxation has been in place and by the difficulty of adequate delimitation of the femoral head for its diameter measurement in joints with very severe bone osteoarthritic deformations. In severe cases with long-term subluxation, the surface of the femoral head cartilage at the non-articulated margin of the head and neck becomes thickened due to a lack of contact with the opposite acetabular surface [15]. Hence, a long-term subluxated hip favors the appearance of the so-called mushroom head deformity, which inflates the diameter of the femoral head compared to the neck thickness, thus producing lower FNTi values pertaining to category E, values that lie in the lower "whisker" of the box plot. We strongly advocate that this parameter should never be used by itself to classify hips, but rather in complementarity with other parameters. On the other hand, given the recognized variability that exists in the progression of CHD, femoral neck bone modeling, and femoral head size, it is expected that more robust FNTi results will be obtained if a study is performed using only one breed in the sample [5].
Since our study sample was sourced from two different databases and included a variety of breeds, the results can be more easily extrapolated to general canine populations at risk of CHD. Nonetheless, as a limitation of this study, it is important to note that some breeds prone to CHD are clearly over-represented (Portuguese breeds); others, on the contrary, are under-represented (Labrador Retriever and Rottweiler). As such, in the future, there is a need to conduct more studies on a wider range of breeds and patients.

Conclusions
This study describes a methodology that allows for the evaluation of bone modeling of the proximal femur in the VDHE view, which can be used in the future with confidence as a criterion for CHD scoring. The FNTi shows adequate intra-and inter-examiner measurement agreement and reliability. Mean FNTi values are gradually higher in the different FCI categories, with statistically significant differences. The FNTi method shows potential to make CHD classification more objective if incorporated as a scoring criterion,