Short Versions of the Arabic Psychosocial Impact of Dental Aesthetics Questionnaire for Yemeni Adolescents: Cross-Sectional Derivation and Validation

Objectives: To shorten the 24-item Arabic Psychosocial Impact of Dental Aesthetics Questionnaire (PIDAQ(A)) for adolescents in Yemen. Material and methods: Two shortening methods derived six-item and nine-item versions: the item impact method selected items with the highest impact scores as rated by 30 participants in each subscale; and the regression method was applied using data of 385 participants from the PIDAQ(A) validity study, with the total PIDAQ(A) score as the dependent variable, and its individual items as the independent variables. The four derived versions were assessed for validity and reliability. Results: The means of the six-item and nine-item short versions of both methods were close. Cronbach’s alpha values extended from 0.90 to 0.92 (intra-class correlations = 0.85–0.88). In criterion validity, strong significant correlations were detected between scores of all short versions and the 24-item PIDAQ(A) score (0.96–0.98; p < 0.001). Construct validity displayed significant associations among all short versions and self-perceived dental appearance rank and self-perceived need for orthodontic braces rank (p < 0.05). Mean scores of all short versions were significantly different between adolescents with severe malocclusion and those with slight malocclusion in discriminant validity tests. In conclusion, all PIDAQ(A) short versions are valid and reliable.


Introduction
Oral-health-related quality of life (OHRQoL) questionnaires are used to obtain patientbased outcomes that refer to the individual's self-evaluation of the perception of a disease, and determine its perceived impact on quality of life [1]. In this context, OHRQoL measures tend to focus on the functional, psychological, and social impacts of the oral conditions on the life of the patient [2], and each of these instruments has its specific focus, purpose, and length [3].
Nevertheless, in epidemiological surveys, and even in clinical settings, the use of the questionnaire may be restricted by its length and the burden placed on persons [4]. Large scale population surveys are facing a persistent decrease in their response rates [5][6][7][8].
Various factors impact the decrease in response rates, including the effect of the instrument length [9]. Subsequently, item non-responses, if they occur, affect the validity of the results, and the utility of the data [10]. To address such situations, the application can

Derivation of Short Version(s) of PIDAQ(A)
The PIDAQ(A) short versions were derived based on the processes used by a previous study [4]. The two methods were used with the intention of deriving a subset of 6-and 9-items that could capture as much information as possible from the 24-item PIDAQ(A).

The Item Impact Method
The item impact method for generating short versions has used data acquired from the PIDAQ item impact study. 30 Yemeni school children, aged 12 to 17 years old, participated in a face-to-face interview. From the 30 participants, 56.7% were females and 43.3% were males. The participants were asked to specify which of the 24 items of the questionnaire describe problems that he/she had experienced in the past three months, and if yes, they were asked to rate the importance of the items on a four-point Likert scale ranging from a little bothered (score = 1), somewhat bothered (score = 2), bothered a lot (score = 3), to extremely bothered (score = 4) [4]. An impact score was calculated for each item by multiplying the item's mean importance rating by the percentage of participants who reported positive responses [4]. Then, items were ranked within the three PIDAQ(A) subscales (DSC, PSI, and AC) according to their impact scores. The higher the impact score, the higher the level of psychosocial effect due to malocclusion problems. The final short version included the top 3-and 2-ranked items with the highest impact score in each subscale [4,25]. The 9-item and 6-item short versions utilizing the impact method are known as PIDAQ(A)-ISV9 and PIDAQ(A)-ISV6, respectively.

The Regression Method
The regression method was applied to the data collected in a cross-sectional study of adolescents in Yemen to test the psychometric validations of the PIDAQ(A). Details of this study have been reported previously [24]. The short versions using stepwise regression were designated as the PIDAQ(A)-RSV9 and PIDAQ(A)-RSV6. The regression was used with the total PIDAQ(A) score obtained by adding the scores of all items as the dependent variable, and the independent variables were the individual items [4,26]. The first step to deriving the short version was generating a single model with all PIDAQ(A) items, and then a forward stepwise procedure with the aggregate for total score as the dependent variable was applied. The highest predictors of the overall score were recognized [4]. The stepwise regression was applied to result the adjusted R square (R 2 ), and the estimated coefficients for all items. The top 3 and 2 items from each subscale that entered the model and produced the largest contribution to the coefficient of variation (R 2 ) were selected for the PIDAQ(A)-RSV9 and PIDAQ(A)-RSV6, respectively.

The Psychometric Validations
The psychometric properties of the four PIDAQ(A) short versions comprised reliability, which involved the internal consistency and reproducibility, and testing the validity, which was achieved by criterion, convergent, and discriminant validities. The data used for this analysis were collected as part of the PIDAQ(A) 24-item questionnaire validation study in the cross-cultural adaptation work [24]. The PIDAQ(A) dataset included responses for 385 12-17-year-old Yemeni school adolescents. The proportion of females was higher (55.8%) than males (44.2%). The participants used the five-response Likert scale which has a rank order from 1 (never) to 5 (strongly agree) to rank their response to the long version (PIDAQ(A)-24), and each item results in a score from 1 to 5. All of their response scores for all items were combined into a final score. For the four short versions, scores were computed by summing up the scores of their items. The higher the total score, the higher the level of perception of being affected by malocclusion. Simultaneously, the clinical examination data for assessing the malocclusion for all of those participants were used in the analysis. In this clinical examination, the dental health and aesthetics components of the index of orthodontic treatment needs (IOTN-DHC and IOTN-AC), and the awareness component of the Perception of Occlusion Scale (POS), were measured [24].
A reliability assessment was conducted to examine the internal consistency and reproducibility for all short versions. Criterion validity was examined by determining the extent to which each of the short versions correlate with the PIDAQ(A)-24. In addition, criterion validity was estimated by testing the correlation between each short version score and scores of the Arabic version of Child Oral Impacts on Daily Performances (Child-OIDP) questionnaire [27]. The questionnaire of Child-OIDP assesses oral impact from malocclusion on eight daily activities, i.e., eating, speaking, cleaning teeth, relaxing, smiling, emotional stability, socializing, doing schoolwork, and socializing, where the "Position of the teeth" and "Spaces between" represent the malocclusion problems [28].
Convergent and discriminant validities, being the two dimensions of construct validity, were assessed by comparing the strength of associations of the four short versions with scores of global measures. For convergent validity, the associations of the short versions with perceived dental appearance were evaluated. Similarly, the associations between the short versions and satisfaction with dental appearance were also assessed. Participants ranked their response as excellent, good, average, or poor for perceived dental appearance, and very satisfied, satisfied, dissatisfied, or very dissatisfied for satisfaction with dental appearance [17,18,24].
For discriminant validity, the associations between the four short versions and each one of the following were assessed: self-rated perceived need (MI-S) and investigatorrated need for orthodontic treatment (MI-D), the dental health component of the index of orthodontic treatment needs (IOTN-DHC), and the perceived need for orthodontic treatment. In this study, the aesthetics component of the index of orthodontic treatment needs (IOTN-AC), and the awareness component of the Perception of Occlusion Scale (POS), represented the malocclusion index (MI-S and MI-D) [17,18,24]. Participants ranging in the upper quartile of the MI-S and MI-D were compared with their analogues ranging in the lower quartile. For perceived need for orthodontic treatment, the item response option was dichotomous (yes/no).

Statistical Analysis
IBM SPSS v.23 was used for data analysis. The analytic process performed in the study was identical to that used in our previous study [24] for assessing the association of short versions with other measures. For reliability analysis, the internal consistency was assessed for each of the PIDAQ(A) short versions by measuring the Cronbach's α, Cronbach α if item deleted, inter-item correlation, and item-total correlation separately. For reproducibility, intraclass correlation coefficient (ICC) was calculated for the four PIDAQ(A) short versions. Bland and Altman analysis and paired t-test were performed to assess any significant change between the scores of the first and second administrations for each of the short versions [17,18,24].
For criterion validity, the correlation between the PIDAQ long version and each short version was determined using Spearman's correlation, whereas the associations between each short version score and Child-OIDP total score was tested using the Pearson correlation coefficient. The score of Child-OIDP index was computed based on previous studies [17,18,24]. The frequency of the oral impact in any of the eight activities was multiplied by its severity. If there was no impact due to malocclusion, the score was recorded as 0. For convergent validity, the comparison of the PIDAQ short versions with the satisfaction with dental appearance and the perceived dental appearance were assessed by a Kruskal-Wallis test. For discriminant validity, comparisons of the relationships between scores of all short versions with the malocclusion index (MI-S and MI-D) were achieved using an independent t-test. The MI-S and MI-D malocclusion index, analysis of the severity of malocclusion, and effect size estimate were adapted from previous studies [17,18,24]. Cohen's standardized effect size (ES) was computed to evaluate the difference between the measurements of the upper and lower groups [29]. ES could be considered as several levels of clinical meaningfulness (small: 0.2 ≤ ES < 0.5; moderate: 0.5 ≤ ES < 0.8; large: 0.8 ≤ ES) [29]. A comparison of the relationship between the short version scores with the IOTN-DHC was accomplished using the independent t-test, whereas the Mann-Whitney statistics were applied for the perceived need for orthodontic treatment. For the floor and ceiling effects, when more than 15% of participants had scores that were at the upper or lower limit, floor or ceiling effects were considered present, respectively [30].
Ethical approval was met for the primary study from the Faculty of Dentistry, Universiti Malaya, Malaysia, and the Faculty of Dentistry, Thamar University, Yemen.

Derivation of the Short Versions PIDAQ(A)
Four short versions of the PIDAQ(A) were derived. Two versions (six and nine items) were developed using the item impact method, and the other two versions (six and nine items) were developed using the regression method [4].

Item Impact Method
In the item impact method, the two or three items with the highest impact score from each of the three PIDAQ(A) subscales were included (Table 1). Item 23, item 6, and item 2 were the items with the highest impact scores: 1.817, 1.524, and 1.412 in the DSC, PSI, and AC subscales, respectively. DSC: dental self-confidence; PSI: psychosocial impact; AC: aesthetic concern; * the selected items for the construct of the item impact short versions.

Stepwise Regression Method
In the regression method, the top three and two items from each subscale (DSC, PSI, and AC) entering the model and making the largest contribution to the coefficient of variation (R2) were selected for the nine-item and six-item short versions, respectively. The order of the best predictors from PSI, DSC, and AC subscales were items 24, 14, and 6; items 16, 10, and 1; and items 21, 4, and 12, respectively ( Table 2). In addition, Table 2 shows the coefficient (B) average for item predictors of the three PIDAQ(A) subscales. The regression coefficients (B) differ significantly from zero (p < 0.001), indicating a significant association between the predictors and the PIDAQ(A). All item predictors were positively related to the total PIDAQ(A) score.

Content of the PIDAQ Short Versions
PIDAQ-ISV9 and PIDAQ-RSV9 are slightly similar where both of them shared five of their nine items, whereas the particular items for PIDAQ-ISV9 are 2, 1, 20, and 23, and the particular items for PIDAQ-RSV9 are 10, 14, 19, and 21 (Table 3, Supplemental Figures S1 and S3). PIDAQ-ISV6 and PIDAQ-RSV6 share four of their six items (Table 4, Supplemental Figures S2 and S4). PSI subscales of the six-item short versions were completely identical. DSC: dental self-confidence; PSI: psychosocial impact; AC: aesthetic concern; * selected items for the regression short versions; ** p value < 0.001.  3.2. Descriptive Statistics Table 5 shows that all PIDAQ(A) short versions revealed considerable variability in adolescents' perceptions about the impact of malocclusion on their psychosocial condition. The means of the nine-item short versions were close to being similar, whereas their standard deviations (SD) were almost identical. Concerning the six-item short versions, their means and SD were almost identical. Floor and ceiling effects that represented the minimum and the maximum observed values were below the recommended maximum frequency of 15% in all short versions (Table 5). PIDAQ(A)-ISV9 and PIDAQ(A)-RSV9 revealed that, respectively, 78.7% and 74.3% of the participants experienced malocclusion problems that impact "strongly and very strongly". These proportions indicated that the PIDAQ(A)-ISV9 was more sensitive in detecting the most affected adolescents than PIDAQ(A)-RSV9. The PIDAQ(A)-ISV6 and PIDAQ(A)-RSV6 were slightly similar in their sensitivity in detecting the affected adolescents (60.5% and 61.0%). On the other hand, all short versions' proportions were smaller than the PIDAQ(A) long version's proportion, which was 85.5%.  Reproducibility test findings of the four short versions are shown in Table 7. The ICC values of all short versions were above 0.80 (p < 0.05). No statistically significant differences were detected between test and retest (T1 and T2) for the RSV-9 version only. The scores of the smallest detectable change (SDC) were lower in the regression short versions (RSV-9 and RSV-6). A Bland and Altman analysis showed that more than 90% of the scores of the second reproducibility test (T2) were within the limits of agreement for all versions.  Table 8 shows the correlations between all short versions of PIDAQ(A) and the PI-DAQ(A) long-version. The results showed that the correlations between the 4 short versions and the long-version instrument were almost perfect. The correlation coefficients for all versions were nearly identical to each other. The correlation coefficient for the PIDAQ(A)-RSV9 was the highest (rho: 0.983). Table 8. The criterion validity: correlations between the short versions and the long version of PIDAQ(A) (n = 385).

Convergent Validity
For convergent validity, the 4 short versions and self-perceived dental appearance were significantly associated (p < 0.01) ( Table 10). In a similar vein, all short versions were significantly associated with perceived satisfaction with dental appearance (p < 0.01) ( Table 11). The trends in self-perceived dental appearance and satisfaction were statistically significant.

Discriminant Validity
The results of the discriminant validity showed that the mean scores of the short version questionnaires gradually increased with an increasing severity of malocclusion. Severity of malocclusion was illuminated by the investigator-rated (MI-D) and self-rated (MI-S) malocclusion scales. For both indices, statistically significant differences (p < 0.01) were shown between participants who were reported with no or slight malocclusion, and those reported with severe malocclusions (Table 12).   Table 13 presents the associations between the four short versions and the selfperceived need for braces. There was a significant association between self-perceived need for braces and the four short versions respectively. The results also showed that the associations between IOTN-DHC and PIDAQ(A) short versions were statistically significant (Table 14). There were statistically significant differences in the mean scores of the short versions between adolescents reported by the investigator having little grade of malocclusion and the adolescents with very great of malocclusions amongst the four short versions (p < 0.01).  As an ultimate corollary of the findings, for all PIDAQ(A) short versions, when the mean score increased, the effects of malocclusion problems on Yemeni adolescents were higher. In addition, the correspondence between the four short versions and the long version of PIDAQ(A) was high.

Discussion
The original PIDAQ instrument is a 23-item self-reporting questionnaire assessing the extent to which a person has experienced psychosocial problems due to malocclusion. The PIDAQA(A) was validated in the first part of our study [24]. In this study, the PIDAQ(A) short versions have been derived, tested for the psychometric validation, and compared with the 24-item PIDAQ(A).
This study produced four short versions of the PIDAQ(A). They were referred as PIDAQ(A)-ISV-9, PIDAQ(A)-RSV-9, PIDAQ(A)-ISV-6, and PIDAQ(A)-RSV-6. The methods used to produce the short versions followed those in the previous studies [4,15], and the item impact method and the regression method were applied. Concurrently, each of the shortening techniques that were used produced nine-item and a six-item measures. As a result, two or three items per each subscale were considered as the minimum number of items. The item impact method involved reduction of the items from the PIDAQ(A)-24 to include items that showed the highest impact scores in each subscale, whereas in the forward regression method, the first three and two items from each subscale entering the model, and which produced the highest adjusted R 2 , were selected. Two items per subscale were considered the minimum number of items [4].
As recommended by Jokovic et al. [4], the shortening of an instrument should take into account using more than one method to ascertain the effect of the approach on outcomes, because different methods can produce various short version measures, which may differ in items and properties. Therefore, this study had used two different methods to generate the short version.
Shortening a questionnaire is an effective way to increase the response rate [10,31]. Sahlqvist et al. [10] reported in their study that shortening a relatively lengthy instrument significantly increased the responses. Findings of their study revealed that an increase in the response rate was due to the shortening of the original questionnaire. Also, using the short version will be a useful substitute to the long questionnaire when time and financial costs are limited.
To date, there is no literature support for shortening the PIDAQ instrument, but there are other OHRQoL measures which have been shortened, such as the short versions of the Child Perceptions Questionnaire (CPQ11-14) [4,32] and the Oral Health Impact Profile (OHIP) Questionnaire, which underwent item reductions [33][34][35][36].
The results of the study showed that the PIDAQ(A) short versions exhibited considerable sensitivity, where proportions of PIDAQ(A)-ISV-9, PIDAQ(A)-RSV-9, PIDAQ(A)-ISV-6, and PIDAQ(A)-RSV-6 (78.7%, 74.3%, 61.5%, and 61.0%, respectively) revealed that the short versions of questionnaires detected substantial variability in adolescents' perceptions of the effect of malocclusion in their life. In comparing their values, PIDAQ(A)-ISV9 was the highest (78.7%). Therefore, it seems reasonable to assert that the selected items for the short versions concern the most frequent and annoying problems reported by adolescents. However, values of all short versions were smaller than the long questionnaire (85.5%).
The psychometric properties have been achieved in terms of reliability (internal consistency and reproducibility) and validity (criterion, convergent, and discriminant validities). The short versions were comparable with the long version of the PIDAQ(A) in terms of sensitive and discriminant validities. This assessment is based on the responses from 385 children who were the participants of the main study.
For internal consistency, the ISV-9 and RSV-9 versions have almost similar Cronbach's α values, (0.92) and (0.91), respectively, whereas the ISV-6 and RSV-6 versions showed identical Cronbach's α values (0.90). The ICC-score-evaluated reproducibility was generally good for the four short versions, with scores ranging from 0.90 to 0.92 for all short versions. These scores were slightly similar to those in the PIDAQ(A) long version, where all of its subscales values were between 0.89 and 0.96 [24]. Nevertheless, RSV-9 version was better than others in the reproducibility test, and the RSV-9 version was the only version that showed no statistically significant differences between test and retest administrations. Assessment of criterion validity by comparing the correlation between all short versions and PIDAQ(A)-24 revealed that the correlation coefficient was highest for PIDAQ(A)-RSV9 (rho: 0.983). Whereas the correlation coefficient for the PIDAQ(A)-ISV9 was lesser (rho: 0.969). In addition, in criterion validity, the correlations between the short versions' scores with the Child-OIDP score were shown to be statistically significant for all short versions (p < 0.001). The correlation coefficient with the Child-OIDP performance scores for the four short versions were slightly similar, but the ISV-9 was the highest (0.656). For construct validity, an analysis of convergent validity, as well as discriminant validity, showed statistically significant differences in associations with other global scales in all short versions.
Consequently, the high correlations between the PIDAQ(A)-24 and the short versions suggest that they are measuring the same construct [4]. All short versions were examined for cross-sectional validation, and all short forms showed good validity and reliability. Thus, the null hypotheses were rejected. In accordance with the results of the study, the short versions were almost identical. It may be worth pointing out that though the nine-item short versions presented interesting items, the results showed that the further shortening of the questionnaire to a six-item questionnaire was also valid, and as reliable as the nine-item version. In regard to the presence of differences between them, these differences were mostly negligible. However, the regression short versions were slightly stronger when compared to the impact short versions in reproducibility and criterion validity, that can be elucidated by the fact the items identified for the regression short versions were those that elucidate the most discrepancy in the total scores of the PIDAQ(A)-24 item.
The final consideration is if the regression method is better than the item impact method, or vice versa. Coste et al. [37] took the view that the expert-based method is more expedient. The utility of the item impact method is in selecting the items which are considered more meaningful for the individuals who will be answering the questionnaire. These individuals may be considered to be experienced with the impact of the discussed circumstances on their quality of life [4]. On the contrary, the short version, which was developed by statistical considerations represented by the regression method, performed reasonably well [4]. Nonetheless, Locker and Allen [15] considered that the approach of generating a short version measure is less important than its properties and content, this consideration was assisted by the findings of this study.
Eventually, PIDAQ(A) short versions can be used to distinguish Yemeni adolescents on the impact on their dental aesthetics when there is difficulty in using the PIDAQ(A) long version. The optional limitation of the study was that no short versions were administrated on their own. For further studies, we recommend measuring any changes in response rate when using both the long and short versions of the PIDAQ(A). Close attention should be paid to whether using the PIDAQ(A) short versions increases the response rate or not.

Conclusions
The PIDAQ(A) short versions were empirically shown to be valid, reliable, and appropriate for use in cross-sectional studies among Yemeni adolescents. The short versions of the PIDAQ(A) appeared to have had no negative impact in the validity and reliability when compared with the long version. The regression short versions are slightly more recommended for use than the impact short versions.
Author Contributions: Conceptualization, study design, methodology, data assessment, data analysis, data interpretation, manuscript drafting, manuscript writing, and final manuscript approval  Informed Consent Statement: Informed consent was acquired from all participants included in the study and their parents prior to conduct the study.
Data Availability Statement: Not applicable.