Validity and Reliability of the Helkimo Clinical Dysfunction Index for the Diagnosis of Temporomandibular Disorders

The Helkimo Clinical Dysfunction Index (HCDI) is a simple and quick test used to evaluate subjects affected by temporomandibular disorders (TMDs), and its psychometric properties have not been tested. The test evaluates movement, joint function, pain and musculature, providing a quick general overview that could be very useful at different levels of care. For this reason, the aim of this study was to validate the use of the HCDI in a sample of patients with TMD. Methods: The sample consisted of 107 subjects, 60 TMD patients and 47 healthy controls. The study evaluated concurrent validity, inter-rater concordance and predictive values. Results: The HCDI showed moderate to substantial inter-rater concordance among the items and excellent concordance for the total scores. The correlation with other TMD assessment tests was high, the correlation with dizziness was moderate and the correlation with neck pain, headache and overall quality of life was poor. The prediction of TMD showed a sensitivity of 86.67%, a specificity of 68.09% and an area under the curve (AUC) of 0.841. Conclusions: The HCDI is a valid and reliable assessment instrument; its clinimetric properties are adequate, and it has a good ability to discriminate between TMD-affected and TMD-unaffected subjects.


Introduction
Temporomandibular joint disorders (TMDs) are a very prevalent condition that, according to some authors, are present in 27.4% of adolescents [1] and 25% of adults [2]. Costs in European public hospitals due to erroneous diagnosis of TMD exceed a minimum of €52 and a maximum of €425, with a mean of €146, according to the amounts received from mutual insurance companies and insurers [3]. The analysis of the aetiology of TMDs has focused on several factors such as inflammatory diseases [4], fractures and trauma [5,6], as well as biomedical models related to temporomandibular joints, muscles of mastication and occlusal factors [7]. The management of TMDs includes clinical examination [8] and the use of imaging techniques both for diagnosis and for monitoring the efficacy of treatments [9,10], which classically included the use of botulinum toxin [11], occlusal splint therapy [12] and polyphenols as potential therapeutic agents [13]. TMDs are related to headache, neck pain, shoulder pain, insomnia, vertigo, ocular pain and hearing loss [14], and 91% of TMD patients reported pain, 61.2% joint clicks or crepitation and 53.3% temporomandibular joint limited range of movement [15].
Due to the wide list of related symptoms, diagnostic criteria for temporomandibular disorders (DC/TMDs) were designed for the performance of an exhaustive assessment of each patient [16]; for this reason, an important requirement of time is needed for adequate evaluation with these internationally accepted criteria, which are considered the gold-standard reference test for the diagnosis of temporomandibular disorders. The test examines 12 dimensions that evaluate mandibular movement, type of bite, pain on movement, pain on touch of the musculature, alterations in mandibular movement and headache [16].
According to the cost of misdiagnosis and the time necessary to perform the reference test for TMD diagnosis, it would be beneficial to find a simpler and quicker tool to use as a diagnostic method for TMD in primary care. The Helkimo Clinical Dysfunction Index (HCDI) has been widely used for the clinical diagnosis of TMDs [17][18][19]. It is a simple and quick test that assesses limitations of mandibular movement, pain and joint function. However, the studies that analysed the reliability [20,21] and validity of this tool are old, used a very small sample, applied incorrect statistical techniques and were limited to the analysis of a single clinimetric property [22,23].
Therefore, a thorough analysis of the main properties of the HCDI is necessary, using the DC/TMD protocol as a reference. For this reason, the aim of the study was to assess and test the psychometric properties of the HCDI in patients with TMD.

Participants
To meet the objectives of this work, a cross-sectional validation study was designed. The protocol of this study received the approval of the Research Ethics Committee of Jaén, Spain (date of approval: 27 April 2020; internal code ABR.20/2.TFM). This study was conducted in accordance with the Declaration of Helsinki, good clinical practice guidelines and all applicable laws and regulations, and written informed consent was obtained from all subjects to participate in the study.
The sample size calculation was carried out using the recruitment of at least 10 subjects per item of the scale as a criterion, with a minimum of 80 subjects for validity studies and 20 for reliability [24]. This study was developed between May and August 2020. The sample was selected from the patients of the Dental Medical Center Doctores López Collantes, which provides stomatology services (Dos Hermanas, Sevilla, Spain). and from those at the FisioMedic Clinic (Dos Hermanas, Sevilla, Spain), which provides physiotherapy, general medicine and traumatology services. Recruitment was performed by telephone contact and personal interviews.

Measurements
Once the patients were selected, demographic data were recorded: age, sex, height, weight, body mass index (BMI), educational level, work situation, smoking status, alcoholic habits and physical activity [25].
The diagnostic validity of the HCDI was measured according to the DC/TMD protocol, which is the gold-standard diagnostic test for TMD. The DC/TMD protocol is composed of 12 items that assess muscle and joint pain, pain during jaw movement, headache, bites, noise, obstacles or blockages during jaw movement and discomfort in the palpation of the muscles of the temporomandibular joint. Finally, a diagnostic tree is used to specify a diagnostic result. The DC/TMD protocol has a sensitivity of 86%, a specificity of 98% and an inter-examination reliability of 85% [16].
The main measure was the HCDI. The instrument is comprised of five items, with each assessment having three possible answers, scored as 0, 1 or 5. The first item (A) is related to the limitation in the range of jaw movement and is subdivided into four sections: the maximum opening of the mouth and the protrusion and lateral shift to both sides. In the opening of the mouth, a value of more than 40 mm scores 0 points, a value between 30 and 39 mm scores 1 point and opening less than 30 mm scores 5 points; protrusion and lateral mouth shifts score 0 if the measurement is 7 mm or more, 1 point if the range of motion is between 4 and 6 mm and 5 points if the range is less than 4 mm. These subsections of item A are added together to obtain a subtotal that scores 0 if the sum of the four sections is 0, 1 point if the subtotal is between 1 and 4 points and 5 points if the subtotal is greater than 4 points. The second item (B) evaluates the alterations of joint function that produce deviations, sounds and/or joint locks or blockages; the third item (C) evaluates the presence of pain when performing some movements; the fourth item (D) evaluates muscular pain in the masticatory muscles; and the fifth item (E) evaluates the presence of discomfort or pain in the prearticular area of the temporomandibular joint (TMJ) through palpation. From the sum of the 5 items, we identify no TMJ involvement if the score is 0, mild TMJ involvement when the score ranges from 1 to 9, moderate TMJ involvement if the score ranges between 10 and 19 and severe TMJ involvement for a score between 20 and 25. Previous studies have shown that the HCDI is able to detect TMD-affected subjects with rheumatoid arthritis, with a statistically significant difference between affected and unaffected subjects [26][27][28].
Concurrent validity was also measured with Fonseca's anamnestic index (FAI), which is made up of 10 questions that can be answered with yes, no or sometimes, and these answers are scored 10, 0 or 5, respectively. This questionnaire classifies patients according to the affectation, with a total score between 0 and 100. The test categorises temporomandibular disorder as not affected when the score is between 0 and 15 points, mild affectation when the score is between 20 and 40 points, moderate affectation when the score is between 45 and 65 points and severe affectation when the score is between 70 and 100 points. The FAI has a Cronbach alpha of 0.826, an intraclass correlation coefficient of 0.937, a cut-off point of >35 points, a sensitivity of 83.33% and a specificity of 77.97% [29,30]. Similarly, the short version of Fonseca's anamnestic index (SFAI) was also considered; it is a five-question questionnaire that is answered and scored the same as the standard version of the FAI, and the questionnaire categorises patients as unaffected by TMD when the scores is between 0 and 15 points and as affected by TMD when the score is between 20 and 50 points. The SFAI has a sensitivity of 86% and a specificity of 95.5% based on a cut-off point of >17.5 [31].
Pain perception was evaluated by the Numerical Pain-Rating Scale (NPRS) test. The subjects indicate their perceived pain with a number between 0 (no pain) and 10 (the worst pain possible). This tool was used to quantify both the neck and the temporomandibular joint and is the pain assessment test preferred by Spanish-speaking patients. The test has a strong correlation with the Visual Analogue Scale (VAS) and the Four-category Verbal Rating Scale (VRS-4) instruments, with the NPRS being preferred by patients; the Kaiser-Meyer-Olkin (KMO) value is 0.85, with a Bartlett sphericity of <0.01, a landing factor of 0.95 and a lack of implementation percentage of <0.01% [32].
To evaluate the possibility of associated neck disability, the Neck Disability Index test was used; it is a 10-question survey, with answers being reported as a number between 0 and 5. For each question, a score of 0 refers to the total absence of disability, while a score of 5 refers to total disability. In this line, a total score between 0 and 5 indicates absence of disability, 5-14 points indicate low disability, 15-24 point indicates moderate disability and 35-50 points indicate great disability. Cronbach's alpha is 0.89, and the intraclass coefficient is 0.98, with a Pearson's correlation coefficient with the visual analogue pain scale of r = 0.65 and with the Northwick Park neck pain questionnaire of r = 0.89 [33].
The presence of vertigo and balance problems was assessed by the Dizziness Handicap Inventory (DHI). This questionnaire is composed of 25 questions that can be answered with yes, no or sometimes, scoring 4, 0 and 2 points, respectively. This questionnaire assesses physical, emotional and functional dimensions, each of which has an independent score in addition to the total score. There is a high correlation between each of the dimensions and the total score (p < 0.01); factorial analysis shows a structure formed by three components, and there is perfect correlation with the Dizziness Characteristics and Impact on Quality of Life (UCLA-DQ) (>0.75) [34][35][36].
Headache-associated symptoms were measured with the Headache Impact Test (HIT-6), which is an evaluation questionnaire consisting of six questions that can be answered with usual, almost always, sometimes, rarely and never, with a total score between 36 and 78 points. The correlation between the HIT-6 in different languages is high, it has high reliability, and its items are comparable [37].
Finally, the quality of life was assessed using the 12-item Short-Form Health Survey (SF-12). This questionnaire is the short version of the SF-36 and retains its self-administered form. It results in a Mental Component Summary score and a Physical Component Summary score (PCS-12), differentiating between the two components of the quality of life. The weights of the Spanish version of the SF-12 are similar to those of the original American version, with a correlation of >0.9. The questionnaire explains 91% of the variance of the SF-36 in the sum of the components, and the coefficient of internal consistency is 0.9 for the SF-36 and slightly lower for the SF-12 [38].

Statistical Analysis
Descriptive analysis was performed by calculating means and standard deviations for continuous variables and frequencies and percentages for categorical variables. The Kolmogorov-Smirnov test was used to verify the normality distribution of the continuous variables, and the Levene test was used to test the homoscedasticity of the samples. The confidence level was set at 95% (p < 0.05).
To test the agreement between the two raters for the total HCDI score, the intraclass correlation coefficient (ICC) of Shrout and Fleiss was used in a one-way random effects model of the absolute agreement type; it estimates the reliability of single ratings [39]. Reliability was considered poor when the ICC was <0.40, moderate when the ICC was between 0.40 and 0.75, substantial when the ICC was between 0.75 and 0.90 and excellent when the ICC was >0.90. From the ICC, the standard error of measurement (SEM) and the minimum detectable change (MDC) were calculated. The SEM was calculated as the baseline standard deviation (SD) (σbase) minus the square root of (1-Rxx), where Rxx is the ICC. The MDC was quantified at the 95% confidence level (MDC95) from the SEM formula as follows: MDC95 = 1.96 * σbase * " √ (1-ICC), where 1.96 is the z-value corresponding to the 95% confidence interval (MDC95). The MDC provides a good tool for translating the ICC into units of change in the instrument. For measured agreement between two raters for the items, a weighted Kappa coefficient, weighted by quadratic weights, was used [40]. The agreement was considered null if Kappa was <0.00, insignificant if Kappa was between 0.00-0.20, discreet if Kappa was between 0.21-0.40, moderate if Kappa was between 0.41-0.60, substantial if Kappa was between 0.61-0.80 and almost perfect if Kappa was between 0.81-1.00 [41]. In addition, Bland-Altman charts were generated to evaluate the limits of agreement [42].
To analyse the concurrent validity of the HCDI with the FAI, NPRS, NDI, DHI, HIT-6 and SF-12, Pearson's correlation coefficient r was used. The correlation coefficient was considered strong if it was >0.50 and moderate if it was between 0.30 and 0.50 [43].
The ability of the HCDI to discriminate between TMD patients and healthy subjects was determined using receiver operating characteristic (ROC) curves. First, the classification of the subjects as TMD patients or healthy controls was carried out based on the diagnostic criteria of the DC/TMD protocol, and the total score obtained in the HCDI was evaluated as a variable. In the ROC curve, the fraction of true positives (sensitivity) was represented as a function of the fraction of false positives for different cut-off points. The area under the curve (AUC) was also calculated as a measure of the ability of the score to discriminate between the two diagnostic groups (TMD patients or healthy subjects). The AUC was considered statistically significant when the 95% confidence interval did not include 0.5 [44]. Values between 0.5 and 0.7 indicated low accuracy, values between 0.7 and 0.9 indicated good accuracy and values greater than 0.9 indicated high accuracy [45].

Results
In all, 158 people were contacted, but the final sample was composed of 107 participants (60 TMD patients and 47 healthy controls), as 51 did not meet the selection criteria or refused to participate. The sociodemographic and anthropometric characteristics of the sample are shown in Table 1.

Inter-Rater Reliability
Results showed a maximum weighted kappa value of 0.774 for item C and a minimum value of 0.426 for item A2. Based on these values, reliability ranged from moderate to substantial, while the total score of the scale reached an excellent degree of concordance of 0.905 (Table 2). Figure 1 shows the Bland-Altman plot. Table 3 shows concurrent validity of the Helkimo Clinical Dysfunction Index with other specific and generic instruments.

Validity and Accuracy of the TMD Diagnostic Ability
ROC curve analysis found an optimal cut-off point of more than 1 point in the HCDI score that showed a sensitivity of 86.67% with a specificity of 68.09% for the diagnosis of TMDs, making the DC/TMD protocol the gold standard (Table 4). This analysis showed an area under the curve (AUC) of 0.841 (Figure 2), which can be interpreted as good accuracy.

Validity and Accuracy of the TMD Diagnostic Ability
ROC curve analysis found an optimal cut-off point of more than 1 point in the HCDI score that showed a sensitivity of 86.67% with a specificity of 68.09% for the diagnosis of TMDs, making the DC/TMD protocol the gold standard (Table 4). This analysis showed an area under the curve (AUC) of 0.841 (Figure 2), which can be interpreted as good accuracy.

Discussion
This study evaluated the clinimetric properties of the Helkimo Clinical Dysfunction Index. The data obtained suggested that it is a valid and reliable instrument for evaluating patients with TMD, determining the degree of severity of the condition and discriminating between affected and unaffected patients with TMD. In this study, a total sample of 107 patients was used (60 TMD patients and 47 healthy subjects), and all of them were evaluated by this test, which lasted approximately 4 min. The two groups were comparable, except that a higher proportion of females who suffered from TMD, which is a consistent observation among TMD studies [17,27]. This fact may have led to a reduction in the mean weight and height and a higher proportion of university-educated subjects among the female population [46].
Despite being a commonly used tool for TMD assessment [19], few authors have studied the HCDI in depth. In 1987, Van der Weele et al. conducted an argumentative analysis of the HCDI, studying the pertinence of the construction of such a test to evaluate patients with TMD according to the evidence of the moment. They concluded that there was insufficient scientific evidence to support the use of these items in a diagnostic test for TMD [28]. However, in the analysis of the current scientific evidence regarding the pertinence of the use of these items in a diagnostic test for TMD, there is a general consensus that supports their use, and no evidence casts doubt on it [19,47]. In 2007, Da Cunha et al. conducted a comparative study between the HCDI and the craniomandibular test. As in the present study, they found greater affectation of TMD among women, who represented 70% of the total sample of affected people in the study, and a mean age of 46

Discussion
This study evaluated the clinimetric properties of the Helkimo Clinical Dysfunction Index. The data obtained suggested that it is a valid and reliable instrument for evaluating patients with TMD, determining the degree of severity of the condition and discriminating between affected and unaffected patients with TMD. In this study, a total sample of 107 patients was used (60 TMD patients and 47 healthy subjects), and all of them were evaluated by this test, which lasted approximately 4 min. The two groups were comparable, except that a higher proportion of females who suffered from TMD, which is a consistent observation among TMD studies [17,27]. This fact may have led to a reduction in the mean weight and height and a higher proportion of university-educated subjects among the female population [46].
Despite being a commonly used tool for TMD assessment [19], few authors have studied the HCDI in depth. In 1987, Van der Weele et al. conducted an argumentative analysis of the HCDI, studying the pertinence of the construction of such a test to evaluate patients with TMD according to the evidence of the moment. They concluded that there was insufficient scientific evidence to support the use of these items in a diagnostic test for TMD [28]. However, in the analysis of the current scientific evidence regarding the pertinence of the use of these items in a diagnostic test for TMD, there is a general consensus that supports their use, and no evidence casts doubt on it [19,47]. In 2007, Da Cunha et al. conducted a comparative study between the HCDI and the craniomandibular test. As in the present study, they found greater affectation of TMD among women, who represented 70% of the total sample of affected people in the study, and a mean age of 46 years in affected patients, which agrees with the mean age of 43 years observed in this study [27].
Oliveira de Santis et al. conducted the only study analysing the psychometric characteristics of the HCDI and the American Association of Orofacial Pain (AAOP) index in subjects aged between 6 and 18 years, using the DC/TMD protocol as a reference. The authors found a non-statistically significant difference between genders, a sensitivity of 53.40% and a specificity of 77.27% for the HCDI, as well as a low level of accordance between the test being considered and the gold standard [47]. Nonetheless, in the present study, the sensitivity obtained was 86.67%, while the specificity was 68.09%. These differences in the results may be due to the difference in age between samples (46.25 years old in our study, 8.18 years old in the one of Oliveira de Santis et al.), which could indicate that the HCDI is more useful for adults than children.
The present study had some limitations. First, the study sample had a higher proportion of women due to the higher proportion of women affected by TMD. Furthermore, although this study analysed the most common psychometric properties, we did not study the sensitivity to change or the ability to discriminate between different TMD populations. Additionally, this study was carried out on a sample of resident patients in a well-defined geographic location, which limits the generalisation of the results obtained.

Conclusions
The study shows that the HCDI is suitable for the diagnosis of TMD. The inter-observer concordance was between moderate and substantial for each of the items and excellent for the total score of the test. The HCDI has strong concurrent validity with the FAI, SFAI and NPRS orofacial assessment instruments; moderate validity with the NPRS neck pain assessment, emotional and physical facets and the total DHI value; and poor validity with respect to HIT-6 instruments, the mental and physical components of the SF-12 and the functional component of the DHI. The HCDI shows a sensitivity of 86.67%, a specificity of 68.09% and an AUC of 0.841 to predict the presence of TMD.