Validity and Reproducibility of the Peer Assessment Rating Index Scored on Digital Models Using a Software Compared with Traditional Manual Scoring

The aim of this study was to assess the validity and reproducibility of digital scoring of the Peer Assessment Rating (PAR) index and its components using a software, compared with conventional manual scoring on printed model equivalents. The PAR index was scored on 15 cases at pre- and post-treatment stages by two operators using two methods: first, digitally, on direct digital models using Ortho Analyzer software; and second, manually, on printed model equivalents using a digital caliper. All measurements were repeated at a one-week interval. Paired sample t-tests were used to compare PAR scores and its components between both methods and raters. Intra-class correlation coefficients (ICC) were used to compute intra- and inter-rater reproducibility. The error of the method was calculated. The agreement between both methods was analyzed using Bland-Altman plots. There were no significant differences in the mean PAR scores between both methods and both raters. ICC for intra- and inter-rater reproducibility was excellent (≥0.95). All error-of-the-method values were smaller than the associated minimum standard deviation. Bland-Altman plots confirmed the validity of the measurements. PAR scoring on digital models showed excellent validity and reproducibility compared with manual scoring on printed model equivalents by means of a digital caliper.


Introduction
For high standards of orthodontic treatment quality to be maintained, frequent monitoring of treatment outcomes is a prerequisite for orthodontists. The orthodontic indices widely used in clinical and epidemiological studies to evaluate malocclusion and treatment outcome [1][2][3] include the Index of Orthodontic Treatment Need (IOTN) [4], the Index of Complexity Outcome and Need (ICON) [5], the American Board of Orthodontics objective grading system (ABO-OGS) index [6], the Peer Assessment Rating (PAR) index [7], and the Dental Aesthetic Index (DAI) [8].
The PAR [7] is an occlusal index developed to provide an objective and standardized measure of static occlusion at any stage of treatment using dental models. Therefore, this index is widely used among clinicians whether it is in the private or the public sector, including educational institutions. In fact, in the UK, the use of this index is obligatory in all orthodontic clinics offering public service to audit orthodontic treatment outcome, and it is used as a measure for quality assurance. The assessment of malocclusion can be recorded at any stage of orthodontic treatment, such as pre-and/or post-treatment, whereas the difference in PAR scores between two stages evaluates treatment outcome. Its validity and reliability on plaster models have been reported in England [9], as well as in the United States [10]. It is also a valid tool for measuring treatment need [11]. 2 of 11 The use of plaster models and digital calipers has been acknowledged as the gold standard for study model analyses and measurements [12,13]. Traditionally, PAR scoring is performed on plaster models by means of a PAR ruler or a combination of a digital caliper and a conventional ruler. Several studies have used this method to assess malocclusion, treatment need, treatment outcomes, and stability of occlusion [11,[14][15][16][17][18][19]. However, the human-machine interface has evolved, influencing orthodontics significantly, shifting from a traditional clinical workflow towards a complete digital flow, where digital models have become more prevalent. Digital models enable patients' records to be stored digitally and for essential orthodontic assessments, such as diagnosis, treatment planning, and assessment of treatment outcome, to be carried out virtually through several built-in features, such as linear measurements [13,20], Bolton analysis, space analyses [21], treatment planning [22], and PAR scoring [12,23]. However, this modern paradigm demands adaptation and assessment of applicability in orthodontic clinical work. Nevertheless, assessments of the validity and reproducibility of 3-dimensional (3-D) digital measurement tools remain scarce.
Digital models can be obtained either directly or indirectly and can be printed or viewed on a computer display. Scanned-in plaster models are the indirect source of digital models and are as valid and reliable as conventional plaster models [12,13,24,25]. In the present study, digital models were obtained directly from an intraoral scanner. Emphasis on evaluating a complete virtual workflow was recently implemented by three studies [26][27][28]. Brown et al. [26] concluded that 3-D printed models acquired directly from intraoral scans provided clinically acceptable models and should be considered as a viable option for clinical applications. Luqmani et al. [28] assessed the validity of digital PAR scoring by comparing manual PAR scoring using conventional models and a PAR ruler with automated digital scoring for both scanned-in models and intraoral scanning (indirect and direct digital models, respectively). The authors concluded that automated digital PAR scoring was valid and that there were no significant differences between direct and indirect digital model scores.
However, to our knowledge, the digital non-automated PAR index scoring tool of the Ortho Analyzer software has not been previously validated. Therefore, the purpose of this study was to assess the validity and reproducibility of digital scoring of the PAR index and its components on digital models using this software, compared with conventional manual scoring on printed model equivalents.

Sample Size Calculation
A sample size calculation was performed using the formula given by Walter et al. [29]. For a minimum acceptable reliability (intra-class correlation (ICC)) of 0.80, an expected reliability of 0.96, with a power of 80% and a significance of 0.05, a sample of 12 subjects was needed. It was decided to extend the sample to 15 subjects.

Setting
The study was conducted at the Section of Orthodontics, School of Dentistry and Oral Health, Aarhus University, Denmark. This type of study is exempt from ethics approval in Denmark (Health Research Ethics Committee-Central Jutland, Denmark, case no. 1-10-72-1-20).

Sample Collection
The study sample consisted of 15 consecutive patient records (the first record being randomly chosen) selected from the archives, according to the following inclusion criteria: (1) patients had undergone orthodontic treatment with full fixed appliances at the postgraduate orthodontic clinic between 2016 and 2018; and (2) digital models before and after treatment were available. No restrictions were applied with regards to age, initial malocclusion and end-of-treatment results.
The digital models for both treatment stages; pre-treatment (T0) and post-treatment (T1), had been directly generated by a TRIOS intraoral scanner (3Shape, Copenhagen, Denmark) as stereolithographic (STL) files, imported and analyzed through Ortho Analyzer software (3Shape, Copenhagen, Denmark). Subsequently, 30 digital models were printed, to generate 15 model equivalents for each stage, by means of model design software (Objet Studio, Stratasys, Eden Prairie, MN, USA) and a 3-D printing machine (Polyjet prototyping technique; Objet30 Dental prime, Stratasys, Eden Prairie, MN, USA), in the same laboratory and with the same technique.

Measurements
The PAR scoring was performed at T0 and T1 by two methods: (1) digitally, on the direct digital models using a built-in feature of the Ortho Analyzer software ( Figure 1); and (2) manually, on the printed model equivalents using a digital caliper (Orthopli, Philadelphia, PA, USA), measured to the nearest 0.01 mm with an orthodontic tip accuracy of 0.001, except for overjet and overbite, which were measured with a conventional ruler. Two operators (AG and SG), previously trained and calibrated in the use of both techniques, performed all the measurements independently. Reproducibility was determined by repeated measurements on all models by both methods and by both raters at a one-week interval and under identical circumstances.
(1) patients had undergone orthodontic treatment with full fixed appliances at the postgraduate orthodontic clinic between 2016 and 2018; and (2) digital models before and after treatment were available. No restrictions were applied with regards to age, initial malocclusion and end-of-treatment results.
The digital models for both treatment stages; pre-treatment (T0) and post-treatment (T1), had been directly generated by a TRIOS intraoral scanner (3Shape, Copenhagen, Denmark) as stereolithographic (STL) files, imported and analyzed through Ortho Analyzer software (3Shape, Copenhagen, Denmark). Subsequently, 30 digital models were printed, to generate 15 model equivalents for each stage, by means of model design software (Objet Studio, Stratasys, Eden Prairie, MN, USA) and a 3-D printing machine (Polyjet prototyping technique; Objet30 Dental prime, Stratasys, Eden Prairie, MN, USA), in the same laboratory and with the same technique.

Measurements
The PAR scoring was performed at T0 and T1 by two methods: (1) digitally, on the direct digital models using a built-in feature of the Ortho Analyzer software ( Figure 1); and (2) manually, on the printed model equivalents using a digital caliper (Orthopli, Philadelphia, PA, USA), measured to the nearest 0.01 mm with an orthodontic tip accuracy of 0.001, except for overjet and overbite, which were measured with a conventional ruler. Two operators (AG and SG), previously trained and calibrated in the use of both techniques, performed all the measurements independently. Reproducibility was determined by repeated measurements on all models by both methods and by both raters at a oneweek interval and under identical circumstances. The PAR scoring was performed using the UK weighting system according to Richmond et al. [7] and included five components, scoring various occlusal traits which constitute malocclusion: anterior segment, posterior segment, overjet, overbite, and centerline ( Table 1). The scores of the traits were summed and multiplied by their weight. The component-weighted PAR scores were summed to constitute the total weighted PAR score. The PAR scoring was performed using the UK weighting system according to Richmond et al. [7] and included five components, scoring various occlusal traits which constitute malocclusion: anterior segment, posterior segment, overjet, overbite, and centerline ( Table 1). The scores of the traits were summed and multiplied by their weight. The component-weighted PAR scores were summed to constitute the total weighted PAR score. Essential information about each case was considered, such as impacted teeth, missing or extracted teeth, plans for any prosthetic replacements, and restorative work previously carried out that affected the malocclusion.

Statistical Analyses
Data collection and management were performed by means of the Research Electronic Data Capture (REDCap) tool hosted at Aarhus University [30,31]. Statistical analyses were carried out with Stata software (Release 16, StataCorp. 2019, College Station, TX, USA).
Descriptive statistics were used to analyze the total PAR scores at different time points, between raters and methods used. Paired sample t-tests were used to compare PAR scoring between both methods and raters at a significance level of <0.05. Both methods were assessed by ICC for intra-and inter-rater reproducibility. Intra-and interrater variability were determined by calculation of the error of the method according to Dahlberg's formula [32]. The agreement between the digital and manual scoring methods performed by the two raters was determined by a scatter plot and Bland-Altman plots.

Validity
Paired-sample t-tests showed no significant differences in the mean total PAR scores and in the PAR components between both methods (Tables 2 and 3). The scatter plot ( Figure 2a) and Bland-Altman plots (Figure 2b) illustrate agreement of the measurements conducted with both methods.

Reproducibility
ICC for the total PAR scores and the PAR components at both time points and for both methods fell in the 0.95-1.00 range for intra-and inter-rater reproducibility (Tables 2 Figure 2. (a) Scatter plot of the total weighted PAR scores measured by both digital and manual methods and both raters, with a line of unity; (b) Bland-Altman plots: inter-rater agreement for the total PAR scores measured by the digital and manual PAR scoring methods at T0.

Reproducibility
ICC for the total PAR scores and the PAR components at both time points and for both methods fell in the 0.95-1.00 range for intra-and inter-rater reproducibility (Tables 2 and 3). All error-of-the-method values for the total PAR score and its components were smaller than the associated minimum standard deviation.

Discussion
Orthodontic model analysis is a prerequisite for diagnosis, evaluation of treatment need, treatment planning and analysis of treatment outcome. The present study assessed the validity and reproducibility of PAR index scoring for digital models and their printed model equivalents.
Digitization of plaster models (scanned-in) was introduced in the 1990s [33]. The advantages of digital models include the absence of physical storage requirements, instant accessibility, and no risk of breakage or wear [22,34]. The analysis of scanned-in models is as valid as that for plaster models [24]. However, over the last ten years, technology has evolved drastically, offering high image resolution of digital models and upgraded platforms needed for their analyses. Hence, intraoral scanning has gained popularity worldwide. Several studies have confirmed the accuracy of direct digital models to be as accurate as that of plaster models. Consequently, direct digital models are used as an alternative to conventional impression techniques and materials [35,36]. In the present study, we used the direct digital model technique.
Numerous intraoral scanners and software were developed over the last decade, with various diagnostic tools. In the present study, Ortho Analyzer software was used for scoring the digital models, and a digital caliper was used for scoring the printed models. Analyzing digital models can be associated with some concerns. The main concern with the use of digital software is, in fact, adjusting the visualization of a 3-D object on a twodimensional screen. An appropriate evaluation requires a correct model orientation. For instance, in this study, cross bites were difficult to visualize, and rotation of the model was required to fully comprehend the magnitude of the cross bite. This problem was also reported by Stevens et al. [12]. In addition, segmentation of the dental crowns, which is an inevitable step to create a virtual setup before carrying out the scoring, is time-consuming. To ensure accurate tooth displacement measures, one should pay attention when placing the points, parallel to the occlusal plane, rotating the model adequately to facilitate good visualization of the contact points.
Another concern when dental measurements are performed on printed models, is the consideration of the printing technique and the model base design. Two studies [26,27] evaluated the accuracy of printed models acquired from intraoral scans. Brown et al. [26] compared plaster models with printed models using two types of 3-D printing techniques and concluded that both digital light processing (DLP) and polyjet printers produced clinically acceptable models. Camardella et al. [27] compared 3-D printed models with different base designs using two types of printing techniques and concluded that 3-D printed models from intraoral scans created with the polyjet printing technique were accurate, regardless of the model base design. By contrast, 3-D printed models with a horseshoe-shaped base design printed with a stereolithography printer showed a significant transversal (intra-arch distances) contraction, and a horseshoe-shaped base with a posterior connection bar was accurate compared with printed models with a regular base. Therefore, in the present study, a polyjet printing technique and regular model base designs were used to ensure accuracy.
Luqmani et al. [28] compared automated PAR scoring of direct and indirect digital models (CS 3600 software; Carestream Dental, Stuttgart, Germany) with manual scoring of plaster models using the PAR ruler. The authors found that manual PAR scoring was the most time-efficient, whereas indirect digital model scoring was the least time-efficient. The latter had minor dental cast faults that led to time-consuming software adjustments. However, automated scoring was more efficient than the software scoring used by Mayers et al. [23], which required operators to identify each relevant landmark. Hence, indirect scoring depends on the quality of the dental casts and can be time-consuming, depending on the software used. In the present study, PAR scoring was not possible as the software used (Ortho Analyzer) does not have the feature of automated scoring.
In the present study, the slightly higher reproducibility of PAR scoring for the digital method was not significant, with both methods proving to be highly reproducible. The high reproducibility of the manual method coincides with the results of Richmond et al. [7]. The reproducibility and variability of the direct digital method were similar to the findings described by Luqmani et al. [28]. Furthermore, the limited variability between both methods demonstrated the high validity of the digital method compared with the conventional manual method, used as the gold standard.

Conclusions
PAR scoring on digital models using a software showed excellent reproducibility and presented good validity compared with manual scoring, considered as the gold standard.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.