Repeatability of the Vibroarthrogram in the Temporomandibular Joints

Current research concerning the repeatability of the joint’s sounds examination in the temporomandibular joints (TMJ) is inconclusive; thus, the aim of this study was to investigate the repeatability of the specific features of the vibroarthrogram (VAG) in the TMJ using accelerometers. The joint sounds of both TMJs were measured with VAG accelerometers in two groups, study and control, each consisting of 47 participants (n = 94). Two VAG recording sessions consisted of 10 jaw open/close cycles guided by a metronome. The intraclass correlation coefficient (ICC) was calculated for seven VAG signal features. Additionally, a k-nearest-neighbors (KNN) classifier was defined and compared with a state-of-the-art method (joint vibration analysis (JVA) decision tree). ICC indicated excellent (for the integral below 300 Hz feature), good (total integral, integral above 300 Hz, and median frequency features), moderate (integral below to integral above 300 Hz ratio feature) and poor (peak amplitude feature) reliability. The accuracy scores for the KNN classifier (up to 0.81) were higher than those for the JVA decision tree (up to 0.60). The results of this study could open up a new field of research focused on the features of the vibroarthrogram in the context of the TMJ, further improving the diagnosing process.


Introduction
There are two general means of assessing motion in human joints -quantity and quality. The quantity of movement can be measured with a simple ruler or a goniometer, but also with highly specialized motion capture system such as VICON ® [1][2][3].
The quality of the movement is rarely analyzed due to fewer available diagnostic devices, with vibroarthrography (VAG) being one of them [4][5][6]. It is a tool for assessing the quality of motion in human joints [7][8][9]. It has been used for assessing joint's sounds, for example, in the knees, ankles, and temporomandibular joints (TMJ) [5,10,11]. Figure 1 shows exemplary vibroarthrograms of TMJ joints: Figure 1a, asymptomatic, and Figure 1b, symptomatic.
Temporomandibular disorders (TMD) are the most common cause of orofacial pain [12]. TMD is an umbrella term for different symptoms that may occur in the orofacial area, especially in temporomandibular joints [12]. It is estimated that the prevalence of TMD in the general population is between 3 and 12% [13][14][15], mostly affecting women up to 4-8 times more often than men, of an age between 16 and 45 years [16][17][18][19]. Among the pain of mastication muscles and/or pain of the TMJ itself, joint sounds are one of the most frequent intra-articular joint disorders in this joint with a prevalence between 21.1% and 73.3% [20]. According to the Diagnostic Criteria for Temporomandibular Disorders (DC/TMD), which are the most frequently used diagnostic protocol for TMD for both scientific research and clinical practice, a joint's sounds may manifest as clicks, crepitus, or eminence clicks [1]. Joint sounds mainly concern younger patients, i.e., below 27 years old [21].
There are two main clinical methods of examining TMJ's sounds. One of them is a simple tactile palpation also used in the (R)DC/TMD protocols [22,23], while the other is an examination using a stethoscope [24]. Previous research showed that both methods present low ICC values and a high number of false-positive results regarding crepitus [7]. The above-mentioned premise has led to searching for alternative and more objective examination methods regarding joint sounds in the TMJ [25][26][27]. VAG became one of the potential solutions.
Since the palpatory assessment of mechanical vibrations produced by a moving joint is subjective and seems inaccurate in its form [6,[28][29][30], the usage of VAG would be an attempt to render the examination more objective.
The most commonly used device for recording and analyzing vibroacoustic signals is a joint vibration analysis (JVA) system, which was designed to examine temporomandibular joints, and consists of two accelerometers and the accompanying software enabling registration and signal analysis [4,5,31,32]. The built-in software determines features characterizing vibroacoustic signals on the basis of the frequency spectrum (features described in the following section). On the basis of these features, the classification of the TMJ pathology in the form of a JVA decision tree was developed [33].
VAG is based on the analysis of vibroacoustic signals produced by the friction created inside the joint. Previous studies regarding knee joints showed that OA knees have a greater frequency, higher peaks, and longer duration concerning the vibroacoustic emissions compared to healthy knees [6,34,35]. An up-to-date VAG is not only used to determine the condition of the cartilage, but also for other intra-articular elements of the joints, such as the menisci in the knee joint and the temporomandibular joint's disc [5,36]. VAG gives the opportunity to dynamically examine joint function in movement, unlike static imaging showing only the structure. Information obtained by both means could be beneficial for the diagnosing process and treatment options. Additionally, VAG is a very cost-effective tool to incorporate into the process.
Some authors indicated the interpretation subjectivity of vibroacoustic signal analysis, emphasizing the low intrarater reliability [7]. On the other hand, the latest papers showed good and very good repeatability expressed with the intraclass correlation coefficient (ICC) [4,6,28,31,37].
Due to inconclusive research results, the aim of this study was to investigate the repeatability of the specific features of the vibroarthrogram in temporomandibular joints using accelerometers. Additionally, a k-nearest-neighbor classifier was defined to differentiate TMJ VAG signals into symptomatic and asymptomatic groups, and was more accurate than the state of the art.
Currently, only a few features describing the vibration signal of the TMJ joint are defined. By providing evidence for the repeatability of a signal in the measurement chain proposed in this study, a whole new field of research is opened up. In future research, it would be possible to define VAG features that allow for the better classification of a signal (that is, joint diagnosis). VAG signal classification could also be unified with clinical and scientific examination protocols, such as the (R)DC/TMD questionnaire, improving the clinical reasoning process in differentiating pathologies, especially since the JVA decision tree is based on Piper's classification and is not the most used (R)DC/TMD protocol [38,39]. In addition, the measurement chain described in this publication is used in the diagnosis of other joints, specifically the knee joint, demonstrating the versatility of the described approach. Thus, it is possible to study vibroarthrographic signals in the context of other, as-yet unexplored, joints.

Material
The study comprised 94 subjects divided into two groups on the basis of the presence/absence of joint sounds (clicks representing a disc displacement with reduction according to RDC/TMD) examined by palpation: study group (n = 47, 10 men, 37 women; age mean 27.5 (SD 5.6) years)-participants with a presence of joint sounds in the clinical examination, and control group (n = 47, 15 men, 32 women; age mean 26.8 (SD 6.8) years)participants without any sounds or other TMD symptoms in the clinical examination. The female-to-male (FM ) ratio for the examination group was 3.7:1; for the control group, it was 2.1:1. There were no significant differences between the groups regarding age and sex (p > 0.05). The group allocation was based on the RDC/TMD examination protocol. The study obtained the approval of the ethical committee of the Józef Piłsudski University of Physical Education, Warsaw, Poland (signature SKE 01-30/2021) in accordance with the latest revision and standards of the Helsinki Declaration. All participants signed a written consent form prior to joining the project.

Methods
The examination was performed with the subjects in a seated position. The VAG sensors were mounted on a headband placed over the subject's TMJs (both right and left at the same time) and fixed with a double-sided adhesive tape to avoid any excessive motion during the jaw movements ( Figure 2). The location of the joints was determined by palpation performed by a single TMD specialist.
Prior to examination, the subjects were instructed on how to perform the procedure during the recordings. Each recording session lasted for 20 s, and comprised 10 cycles of maximal unassisted opening and closing of the jaw guided by a metronome set at 80 beats per minute. The recording sessions were performed twice, with a 5-min break between them with the removal and replacement of the device. Sensor placement and all recordings were performed by the same examiner.
The VAG signals were collected with accelerators (model 4513B-002) and a multichan-

Statistical Analysis
To obtain the intraclass correlation coefficient (ICC), the two-way mixed-effects, absolute-agreement, multiple-measurement model was chosen [40], i.e., ICC(A, 1), as defined in [41]. Confidence intervals (lower: CI L and upper: CI U ) were calculated in accordance with [42]. Specific equations used to obtain both the ICC and the confidence intervals are included in Appendix A.
The ICC was calculated for three groups: separate study group (symptomatic), separate control group (asymptomatic), and the two groups combined. In each group, the ICC was determined for seven VAG features. According to [43], the sample size used in this study (94 subjects) was enough to estimate an ICC greater than around 0.7. The control and the study group comprised 47 subjects each, which, according to [43], was enough to estimate ICC above around 0.8. Comparing the ICC values calculated for the separate groups to those for the combined group allowed for us to draw additional conclusions about the differentiation capabilities of the specific features. When the ICC for the combined group was lower than the ICC of the separate groups, we could assume that the feature could be used to distinguish between the separate classes, since a lower ICC value indicates greater variability between subjects.

Vag Signal Features
Seven VAG features were calculated in accordance with the previous research mentioned earlier. All features were defined in the frequency domain on the spectrum obtained via fast Fourier transform [31]. Those features were:

2.
Integral below 300 Hz (IB3): area under the spectrum up to the 300 Hz mark.

4.
Ratio of integral below and above 300 Hz (IBAR): area under the spectrum up to the 300 Hz mark divided by the area under the spectrum curve above the 300 Hz mark.

5.
Peak amplitude (PA): value of the highest amplitude of the spectrum. 6.
Peak frequency (PF): value of the frequency at which peak amplitude occurred. 7.
Median frequency (MF): value of the frequency at which areas under the curves above and below it are equal.

Classification
To test the classification capabilities of the studied features, we constructed a simple k-nearest neighbors (KNN) classifier and compared it to the JVA decision tree classifier [33].
The KNN classifier [44] was defined for 10 neighbors and Euclidean distance. To improve classification, features were standardized, i.e., rescaled to have a mean value of 0 and standard deviation of 1 [45]. Classification metrics were obtained for 10-fold cross-validation [46].
Previous studies reported the VAG spectrum to be of 0.1 Hz resolution in the 0 to 1000 Hz range [31] or 0 to 500 Hz with a resolution of 1 Hz [47]. In our setup, the VAG signal was obtained with 5 kHz sampling frequency and lasted for 20 s, resulting in frequencies up to 2500 Hz with a resolution of 0.05 Hz.
Therefore, three types of features were used as the JVA decision tree classifier:

1.
Raw features, i.e., features obtained for the 0-2500 Hz range with a resolution of 0.05 Hz.

2.
Norm1 features, i.e., features obtained for the spectral curve up to 1 kHz and resampled to a resolution of 0.1 Hz [31].

3.
Norm2 features, i.e., features obtained for the spectral curve up to 500 Hz, resampled to a resolution of 1 Hz [47].
Each resampling was performed using simple linear interpolation. Additionally, the ICC was computed for each feature defined on those two resampled spectra, for which the results are included in Appendix C.
Classification results were, therefore, obtained for 8 setups: the JVA tree classifier with raw, norm1, and norm2 features, and the KNN classifier for raw features. Each classification was performed for two signal groups: the first and second measurements.

Results
Intraclass correlation coefficients of the raw features with the corresponding 95% confidence intervals are included in Table 1. The values of the ICC for the computed features using normalized spectra are included in Appendix C (Tables A2-A4).    A comparison of the classification metrics (true-positive and true-negative rates, and accuracy) between different classifiers is shown in Table 2. The confusion matrices of each classifier are3 included in Appendix E (Figures A7-A10).

ICC
According to Koo and Li [40], ICC values can indicate poor, moderate, good, and excellent reliability when they fall within the ranges of 0.0-0.5, 0.5-0.75, 0.75-0.9, and 0.9-1.0, respectively. For the combined groups, ICC indicated excellent reliability for the integral below 300 Hz feature. The total integral, integral above 300 Hz, and peak amplitude features had good reliability. Moderate reliability was determined for the IBAR and median frequency features. Only the peak amplitude had an ICC value below 0.5, indicating poor reliability. However, according to [43], for the sample size of 94 participants used in this study, only ICC values of around 0.7 and above were reasonably interpretable.
The values of the ICC calculated for the separate control group were generally much lower, with only two features, i.e., total integral and the integral above 300 Hz, indicating good reliability. This means that the variability of the features' values between participants was just a little greater than that between the measurements of the same participant. This can indicate that VAG signals for participants in the control group were similar in terms of mentioned features. This was to be expected to some extent, since the features were defined to differentiate between symptomatic and asymptomatic joints in the first place. The values of the study group were generally significantly greater than those of the control group. This means that the variability between participants was much greater than that between measurements of the same joint. Again, however, the specific values of the ICC determined for the control and study groups alone are to be taken with a grain of salt. The reliable interpretation of ICC values with a sample of 47 subjects is possible for values above around 0.8 [43]. Additional tables regarding the power analysis of the ICC calculation [48] are included in Appendix B.

Differences in Features between Groups
Box plots from Figures 3-5 indicate that the values of the total integral, integral below 300 Hz, integral above 300 Hz, and peak amplitude features were higher for the symptomatic group. This suggests that, in general, symptomatic joints generate signals of higher energy. The ratio (IBAR) feature indicates that, for most signals in both groups, for the raw and norm1 features, the integral above 300 Hz was lower than the integral below 300 Hz. This difference was more profound in the asymptomatic group. Figure 5 shows that, for norm2 features (i.e., spectrum resampled to 1 Hz resolution up to 500 Hz), the ratio was also higher is control group. However, because of spectral normalization, the values of the IBAR feature were generally lower than 1, i.e., the integral above 300 Hz was higher than the integral below 300 Hz. The peak frequency and median frequency features indicate that the spectrum tended to lower values in the symptomatic group.

Classification
The JVA decision tree classifier did not work with the raw and norm1 features, classifying each signal as generated by an asymptomatic joint. A couple of signals for which the features were computed using the norm2-normalized spectrum were classified as symptomatic. For this classification setup, accuracy was 0.596 and 0.564 for the first and the second measurements, respectively. Low accuracy is to be expected, however, since the JVA decision tree [33] was defined for a completely different signal acquisition setup.
The KNN classifier achieved 0.787 and 0.809 accuracy scores for the first and the second signals, respectively. This may be considered to be rather low. However, only features known from the literature using different setups were considered.
Features describing the vibroarthrogram in the same measurement setup but a different joint [6,49] could prove to be much more informative. This will be carried out in future studies. Frequency range maps [49] could also be obtained using independent feature evaluation coefficients of the classification algorithm [50].

Comparison to Previous Research
Designations can be found in the literature on the use of JVA features of vibroacoustic signals that best characterize individual TMJ pathologies. The research results indicate that the characteristics of vibroacoustic signals that best describe disk displacement with reduction (IIa according to the (R)DC/TMD) are: (1) high peak amplitude values [33], (2) total integral values above 80 Pa (even reaching 1000 Pa in sharp displacements) [33], (3) and high integral parameter below 300 Hz values [33,51].
In the current study, these parameters were characterized by good (PA and TI) and excellent (IB3) repeatability between the two measurements, which suggests that they would be valuable predictors in differentiating study-group joint clicks from those of the control group. Further analysis of the results confirmed this assumption, because all of them (PA, TI, and IB3) significantly differed in both groups, where the presence (or lack) of joint clicks was the main criterion for qualifying participants to the study and control groups. PA and IB300 features showed the highest differences between the groups among all analyzed features of the VAG signals. All "integral" features were higher for the symptomatic group, further confirming the previous findings.
In connection with the postulation outlined by the authors of the above publications to conduct further research on this problem, work has emerged over the years showing that vibroacoustic signals registered in the temporomandibular joint are repeatable [4,5,31]. The same results were presented by the authors using a piezoelectric accelerometer while examining other peripheral joints of the knee or ankle [6,28,37].
In conclusion, the results of this research are in line with previous reports on the proposed parameters of vibroacoustic signals best illustrating joint clicks.
The previous literature of the acoustic emission analysis of different joints (such as the knee joint) [52][53][54] is quite informative, especially in the context of healthy/osteoarthritis distinction (classification accuracies of over 90%). Therefore, VAG analysis in the TMJ context requires much more further research, especially in the area of feature extraction.

Further Research
Currently, there are only a few defined features of the vibration signals of the TMJ joint. However, by providing evidence for the repeatability of signals in the proposed measurement chain in this study, an entirely new field of research has opened up.
Future research could benefit from focusing on the VAG features used to diagnose other joints, such as the knee. This would allow for a more accurate diagnosis of the TMJ. Exemplary features could include previously used measures in knee-joint VAG analysis. Those include time-domain features, such as rectified average value, root mean square value, and shape factors [55,56], and frequency-domain features, such as spectral entropy [57]. Additional preprocessing such as (ensemble) empirical mode decomposition [58,59] could also be studied in the context of TMJ.
Additionally, since the measurement chain used in this study was accurate for both the TMJ and knee joint [6], it would be of great value to study vibroarthrographic signals in the context of other joints.

Conclusions
Most of the investigated features showed moderate-to-excellent reliability, indicating a high repeatability of VAG signals, thus allowing for diagnostic usage for further research.
Temporomandibular joint sound diagnostics using the KNN classifier showed higher accuracy scores compared with those of the JVA decision tree, but the results could still be considered to be rather low. Data Availability Statement: Data available on request due to privacy restrictions. The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy restrictions.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. Equations Used to Calculate the ICC and It's Confidence Intervals
To obtain the intraclass correlation coefficient (icc), the two-way mixed-effects, absoluteagreement multiple-measurements model was chosen [40], i.e., ICC(A, 1), as defined in [41]: , where: (A1) MSBS is the value of the mean square between subjects, MSE is the mean square error, MSBM is the mean square between measurements, k is the number of measurements for each subject, and n is the number of subjects. Confidence intervals (lower: CI L and upper: CI U ) were calculated in accordance with [42]: where F * and F * denote 97.5-th percentiles of the F distribution with n − 1 degrees of freedom (DOF) in the numerator and ν DOF in the denominator for F * , and inversely for F * .

Appendix B. Post Hoc Power Analysis
Post hoc power analysis was conducted for the ICC values from Table 1 according to [48]. Power values were obtained for an alpha level of 0.05, and the null hypothesis of ICC was equal to 0.0 and 0.5. Results are included in Table A1.

Appendix D. Boxplots for Separate Measurements
Figures A1-A6 include box plots of each feature for the first and the second measurements, respectively. Box plots were very similar between the measurements. Therefore, the interpretation of the box plots obtained for the separate measurements was the same as that of the combined measurements (box plots in Figures 3-5).