1. Introduction
Facial appearance refers to the visual characteristics of the face, including its shape, symmetry, skin quality, and overall look, and plays a vital role across numerous aspects of life, affecting not only superficial interactions, but also deeper social, psychological, and economic outcomes [
1]. In professional settings, facial appearance has been shown to influence hiring decisions, career advancement, and perceptions of leadership. Socially, facial attractiveness is associated with acceptance, popularity, and relationships, impacting social networking and romantic connections [
1,
2,
3]. From a psychological perspective, the self-perceived facial appearance can profoundly impact self-esteem, confidence, and mental health [
4]. Concerning the economic implications of facial appearance, more attractive individuals may earn higher wages than their less attractive counterparts and are more persuasive in marketing and advertising, influencing consumer preferences and decisions, particularly for appearance-relevant products [
5]. Lastly, the significance of facial appearance in human life also has ethical and cultural dimensions. Ethical debates arise in the context of cosmetic surgery and the societal pressures to conform to certain aesthetic standards. Cultural differences in what is considered attractive emphasize the diversity of human societies and the subjective nature of beauty [
6]. Thus, the impact of facial appearance on human life is multifaceted [
1,
7].
A convex facial profile, frequently associated with Class II malocclusion, is a prevalent skeletal configuration that is often perceived as less aesthetically pleasing, particularly when characterized by a prognathic maxilla and a retrognathic mandible [
8,
9]. Consequently, patients often seek treatment for this condition, which involves orthodontic interventions, more invasive orthognathic surgery approaches, or a combination of both [
10,
11]. Enhancing facial appearance is a primary motivator for seeking treatment and is closely tied to patient satisfaction [
12,
13,
14].
The extent to which orthodontic interventions alone can significantly enhance facial appearance has been questioned, even in growing patients. This skepticism is supported by cephalometric data [
15,
16] as well as facial perception studies [
17,
18]. Patients with convex profiles, after their growth ceased, primarily have two treatment alternatives. The first option is orthodontic treatment, which focuses on specific modifications within the dentoalveolar structure and is commonly referred to as camouflage orthodontic treatment. This approach relies on the retraction of protruding maxillary incisors to improve both dental occlusion and facial aesthetics, without addressing the underlying skeletal discrepancy. Beyond enhancing dental aesthetics, camouflage orthodontic treatment also aims to optimize dental function, establish proper occlusion, and support long-term oral health [
19]. The second option involves orthognathic surgery, a more invasive intervention that also seeks to enhance facial appearance—a key consideration for many patients. However, the tangible benefits derived from each intervention are not always well-defined, contributing to an ongoing debate in the scientific literature that influences patient decision-making and, consequently, treatment planning [
19,
20,
21].
In a prior investigation that assessed treatment effects on facial profile photos, the perception of facial appearance alterations strongly favored the combined orthodontic and orthognathic approach over exclusive orthodontic treatment [
20]. Nevertheless, earlier research on convex profile adolescents who received conventional orthodontic appliances suggested that the observed profile improvements largely diminished when frontal and profile facial images were presented simultaneously to the evaluators [
17,
18]. Thus, we assessed here the facial outcomes of combined orthodontic and orthognathic intervention compared to orthodontic camouflage treatment through the simultaneous presentation of profile and frontal facial photos to rater groups. We hypothesized that Class II Division 1 convex profile patients would exhibit similarly perceivable changes in facial appearance whether treated with a combination of orthognathic and orthodontic treatment or with orthodontic (camouflage) treatment alone, particularly when both the frontal and profile resting views are evaluated. Understanding perceptions across diverse rater groups on this critical issue will aid in informed decision-making and treatment planning for our patients.
2. Materials and Methods
The study protocol was approved by the Research Ethics Committee of the Dental School, National and Kapodistrian University of Athens, Greece prior to study commencement (date of approval: 22 June 2018, protocol number: 361). This retrospective comparative study was reported in accordance with the STROBE guidelines for cohort studies (
Supplementary File S1). All evaluated patients provided informed consent, allowing their data to be used for research. No eligible patient refused participation. The methods are similar to those of a previous publication from our team where the evaluators rated pairs of profile facial images [
20]. In the present study, originally derived data from simultaneous assessments of profile and frontal facial image configurations were analyzed and compared to previously published data on single profile photo ratings of the same sample [
20]. Our goal was to determine whether presenting additional facial aspects—beyond the profile, which is the primary treatment target—reduces or eliminates perceived differences in outcomes [
20], as observed in another setting [
17,
18]. Viewing more than one facial aspect is considered a more realistic representation of real-life perception, as people naturally observe faces from multiple angles during daily interactions. Apart from the latter strategy, both studies referred to the same patient sample and all other methodological aspects were applied similarly for comparability reasons. However, most methodological considerations will be repeated here to allow for proper comprehension of the study by the readers.
2.1. Sample
The study sample was sourced from the Postgraduate Clinic of the Department of Orthodontics, Dental School, National and Kapodistrian University of Athens, Greece and was identical to that of the previous publication [
20]. The sample was selected consecutively from the most recently treated Class II Division 1 patients with a convex facial profile who met the inclusion criteria. The goal was to form two sex-matched groups of 18 patients each (Groups A and B). The sample size was selected based on empirical data, considering also resource constraints and practical feasibility in terms of available patients and number of required raters [
17,
18,
22]. A post-hoc power analysis was conducted using G*Power (version 3.1.9.6) [
23] to determine the required sample size for a MANOVA with two independent variables (2 × 4 design) and five dependent variables. The analysis assumed a medium effect size (
f2v = 0.2) based on Cohen’s guidelines [
24], an alpha level of 0.05, and a desired statistical power of 0.80. The results indicate that a total sample size of 32 evaluated patients was required. Therefore, the present sample size of 36 patients was deemed adequate for the primary study outcomes. Group A included non-growing patients with a convex profile who received orthodontic treatment with full-fixed appliances in conjunction with mono- or bi-maxillary orthognathic surgery. Group B included non-growing, convex-profile patients who received solely orthodontic treatment with fixed appliances. The specific plans were tailored per case, according to each patient’s needs and demands, and were not considered in sample selection.
Eligibility criteria required patients to have complete initial and final diagnostic records, including medical and dental histories, details of orthodontic or orthognathic treatment, pre-treatment panoramic and cephalometric radiographs, pre- and post-treatment dental models, as well as adequate-quality intraoral and facial photographs. Evaluated patients needed a Class II Division 1 malocclusion before treatment (molar Class II > half cusp in both sides, overjet 6–12 mm, and no functional shift ≥ 2 mm), a convex skeletal configuration (5° < ANB < 9°), and a convex facial profile (males: 15° < facial contour angle < 25°, females: 17° < facial contour angle < 27°) [
25]. Additional criteria included an Frankfort mandibular plane angle (FMA) angle of 17.5–32.5°, treatment duration of 1–5 years, no history of aesthetic facial surgery, White European ancestry, no craniofacial anomalies or syndromes, no marked facial asymmetries assessed independently by two authors through visual examination, ceased skeletal growth (CS5-CS6, age over 15 years), a full dental arch without considering the third molars, completed treatment without discontinuation, and no use of fixed mandibular advancement devices [
20].
During sample selection, only the initial diagnostic records were utilized, while the final records were reviewed solely to confirm their availability. From each patient’s diagnostic data file, the initial and final lateral and frontal facial photographs were used for the assessment of the perceived changes in facial appearance by the raters. These were captured with the Frankfurt horizontal plane parallel to the floor, the teeth lightly occluded in maximum intercuspation, and the lips in a relaxed position.
2.2. Facial Photographs
To ensure consistency across images, digital photographs were processed using Adobe Photoshop (Version 22.0.1, Adobe Inc., San Jose, CA, USA) to minimize variation in hairstyle, brightness and contrast, standardize vertical facial height using Na’ to Me’ soft tissue points, and adjust the background to white [
20]. Three independent authors visually examined the photographs to detect any noticeable features (e.g., moles, scars) or accessories (e.g., earrings, tattoos) that might influence the evaluations. Such elements were masked during image processing. Following image adjustment, a configuration of four images per patient, consisting of pre- and post-treatment profile and frontal facial photos, was set in a landscape-oriented A4-size page as shown in
Figure 1. The subsequent 36 patient photo configurations were printed and presented to the raters as described below.
2.3. Rater Groups
Following a previously published method [
20], image sets were evaluated by four rater groups: (a) orthodontists, (b) oral and maxillofacial surgeons, (c) patients with convex profiles, and (d) laypeople. The number of rated patients per rater session was defined at 12 so that the raters would not experience fatigue or difficulty in the process [
17,
18,
22]. Therefore, the patients under evaluation were randomly assigned into three groups of twelve (six patients from each of treatment group, with equal representation of sexes) using the website
www.random.org (accessed on 23 June 2021). Each patient was then evaluated by 10 members from each rater group for the first three groups, and by 20 laypersons. For this, 30 orthodontists, 30 oral and maxillofacial surgeons, 30 Class II patients, and 60 laypersons rated the patient photos to assess perceived changes in facial appearance following the two treatment regimes.
To form the rater groups of convex profile patients and laypeople, the first white European subjects that agreed to participate were included, aiming at equal sex distribution, a wide age range between 15 and 65 years of age, and a wide range of educational level and socioeconomic status. Patients with a convex profile were recruited from the waiting area of the Postgraduate Orthodontic Clinic, with the goal of matching their age and sex to those of the post-treatment study sample (within ±3 years, or ±1 year for individuals under 19). Laypeople were selected from various locations and were not patients of the dental clinic. The first thirty specialists and final-year resident physicians who agreed to participate were included in each rater group. None of the raters had any relation to the patients. Certain raters of the specialists’ groups also participated in an analogous previous study [
20].
2.4. Questionnaires
Each rater completed a brief personal details questionnaire before assessing the sets of photographs for each patient (
Figure 1) one by one. Within each group, six cases (equally divided by sex) were randomly assigned a display format in which the initial photographs appeared on the right and the final ones on the left, while the other six cases followed the opposite configuration.
Each photograph set was paired with a validated questionnaire [
17,
18] containing five items. The raters were asked to assess the change in facial appearance, the change in the facial area below the nose, the change in the upper and lower lip, and the change in the chin between the left and right photos, and rate it on a 100 mm visual analogue scale (VAS) from “extremely negative” to “extremely positive” (
Figure 2).
All questionnaires were administered between November 2021 and July 2022 by two researchers who were calibrated to approach the raters similarly. A pilot evaluation using a non-sample case was conducted. The raters were not informed about the specific study’s purpose or that the images depicted treated cases. Raters completed all questionnaires in a quiet, well-lit, and controlled setting to minimize distractions, and under the discreet supervision of a primary researcher (S.P.). At the rater level, the different photographic setups—each involving distinct patient sets—were assessed at separate time points, with a minimum interval of three months between evaluations to prevent potential carry-over effects.
2.5. Data Collection and Verification
The measurement from the starting point of the visual analogue scale (VAS) to the point marked by each rater for each question was recorded using an electronic digital caliper (Jainmed, Seoul, Republic of Korea), converting the ratings into continuous variables. These values were documented in millimeters with an accuracy of two decimal places and entered into a Microsoft Excel spreadsheet (Microsoft Corporation, Redmond, WA, USA). In instances where the final photographs were positioned on the left side, the VAS scores were adjusted by subtracting the recorded value from 100 to maintain consistency with the rest of the dataset.
The method error in measuring rater responses on the VAS was assessed previously and proved negligible [
20]. Intra-rater reliability for the same questionnaire, used with a similar sample and rater population, has been tested previously and found to be satisfactory [
18], and the questionnaire validity has been verified [
17,
18].
2.6. Statistical Analysis
Statistical analyses were performed with IBM SPSS statistics for Windows (Version 29.0. IBM Corp., Armonk, NY, USA), following the approach used in a previous related study [
20]. Levene’s test was used to assess the homogeneity of variances, while the Shapiro–Wilk test, along with Q-Q plots and histograms, was employed to evaluate data normality. Depending on the data distribution, either parametric or non-parametric methods were applied.
Group similarity in key characteristics was assessed using the Mann–Whitney U test.
The newly generated dataset for the present study comprised 3600 questionnaire ratings collected from 150 evaluators, using a new photographic setup (simultaneous presentation of profile and frontal photos to the raters). Each patient was rated by 20 laypeople and 10 members from each of the other rater groups. Median scores per patient were calculated for each rater group and considered reliable indicators for further statistical analysis. All collected data were used, and there were no missing data or dropouts.
To assess consistency across rater groups, the intraclass correlation coefficient (ICC) was calculated using a two-way mixed-effects model with absolute agreement and average measures. ICC values exceeding 0.7 were interpreted as having strong inter-rater reliability, while values between 0.5 and 0.7 indicated moderate consistency. This assessment, together with group comparisons, supported the questionnaires’ concurrent and statistical conclusion validity.
Group differences between the orthognathic surgery and conventional orthodontic treatment cohorts were examined using a two-way multivariate analysis of variance (MANOVA). The five questionnaire responses were treated as dependent variables, whereas treatment type and rater group functioned as independent factors. When MANOVA yielded significant results, individual ANOVAs were performed for each questionnaire item, followed by post-hoc analyses using Fisher’s least significant difference (LSD) test to identify specific group differences. The present data were analyzed for the first time in this manuscript and were additionally compared with previously published data based on single-profile photo ratings of the same patient sample [
20]. Differences between perceived changes in facial appearance by viewing profile only versus combined profile and frontal photos were tested with analogous multivariate analysis, followed by post-hoc tests, where applicable.
All cases were two-sided with an alpha level of 0.05. Bonferroni correction was applied for pairwise post-hoc multiple comparisons where necessary.
4. Discussion
The present study evaluated perceived differences in facial changes induced by two distinct treatment regimens in Class II Division 1 malocclusion patients with convex facial profiles. The first approach comprised orthodontic treatment combined with orthognathic surgery and the second approach orthodontic treatment alone. The two approaches differed significantly, with the combined orthodontic and orthognathic surgery approach showing clear benefits in enhancing facial appearance despite its invasiveness and associated risks [
26,
27,
28,
29]. The lower third of the face—particularly the lower lip and chin—contributed most to the perceived differences in facial appearance between the treatment groups. However, several patients refuse to undergo orthognathic surgery due the increased costs and the fear for the operation itself, as well as for the morbidity and the complications related to the postoperative period [
30,
31]. On the other hand, a primary reason individuals with increased facial convexity seek treatment is to improve their facial appearance [
12,
13,
14,
30]. Therefore, these findings underscore the value of combined orthodontic and orthognathic surgery treatment for patients prioritizing aesthetic improvement, offering critical insights to guide decision-making in treatment planning. This should not be viewed as diminishing the value of orthodontic treatment alone, since it might positively affect dental and eventually smile aesthetics [
32,
33]. However, if the enhancement of facial appearance is a major treatment goal, which is often the case, the limitations of single orthodontic treatment should be clearly communicated to patients.
At the sample selection stage, only the initial diagnostic data of consecutively treated patients were considered, while the availability of final diagnostic data was confirmed separately. This process prioritized soft tissue parameters relevant to facial appearance, ensuring baseline similarity between treatment groups in the primary variable of interest—the soft tissue facial convexity—which is critical for valid comparisons. While there was a slight difference in severity between the groups, with the surgical cases being more severe, this reflects the clinical reality and does not undermine the validity or relevance of the comparisons. Instead, it provides a realistic basis for evaluating how these treatment approaches affect facial appearance in patients with such facial configurations. Features such as symmetry or nose shape may influence aesthetic perception, especially in frontal views. However, these characteristics are expected to be randomly distributed between the treatment groups, reflecting normal variation within the sample. As such, they are not expected to systematically bias the results in favor of one group over the other. As shown in
Supplementary Tables S1 and S2, both treatment groups achieved similar overjet and overbite values at the end of treatment, which fall within the range of normal occlusion. This suggests that, despite the fact that final diagnostic records were not used as inclusion criteria, both treatment modalities were capable of producing satisfactory occlusal outcomes, with no major post-treatment occlusal differences observed between the groups. Nonetheless, we acknowledge that dentoalveolar movements differed between the groups, particularly in the camouflage group, where upper incisor retraction or retroclination are commonly expected. Such movements may have influenced soft tissue changes and contributed to the observed differences in facial appearance between the groups. Treatment plans were tailored individually to each patient’s unique needs and preferences and were not part of the criteria for sample selection. This approach allowed the sample to reflect real-world clinical variability. Our analysis focused on the average morphological changes achieved by each treatment approach and their corresponding perceptions. As a result, the applicability of the findings is grounded in the actual morphological outcomes rather than in the specifics of individual treatment plans and responses, which may vary even among patients with similar clinical conditions [
34].
The effectiveness of orthodontic treatment alone in meeting this need has been questioned, even among growing patients aiming at enhancing mandibular growth [
15,
16,
17,
18]. Previous research on growing patients treated with functional appliances reported only modest improvements in profile appearance. Perceptions of different groups were consistent [
17] and the small improvements attributed to treatment diminished when profile and frontal facial images were presented simultaneously to the raters [
18]. The aforementioned studies reported a modest improvement of approximately 10% in the facial appearance of all tested groups, attributed primarily to the maturation from preadolescence to adolescence. On the contrary, significant improvements of about 20% in facial profile appearance were consistently perceived by different groups of evaluators in non-growing patients that were subjected to orthognathic surgery [
20] compared to no improvement with orthodontic camouflage treatment. The present study highlighted that the considerable differences between treatment groups were similarly perceived when presenting simultaneously facial and profile photos to the raters. This is a noteworthy finding, especially given that the orthognathic intervention only modifies facial morphology, which is just one among several factors that could influence the perception of facial appearance [
35]. Previous studies have shown that assessments are modified when different facial views are presented to the raters [
18,
36,
37]. The fact that the considerable improvement perceived in facial profiles remained similarly perceivable when frontal photos were also presented indicates that the changes in overall facial appearance were fundamental. Therefore, the present study offers important insights to the actual treatment effects on facial appearance so that the patients can receive evidence-based information regarding the expected outcomes, and the anticipated positive impact of treatment on social, psychological, and even economic outcomes. This will facilitate evidence-based decision making during individualized treatment planning relative to the important outcome of facial appearance, which should be considered along with a number of other factors [
38].
Previous research has shown that raters with diverse backgrounds perceive certain facial outcomes differently [
21,
34,
39] and that although objectively measured phenotypic traits contribute significantly to facial attractiveness, a series of other factors is also important [
35]. Importantly, the absence of significant effects from the rater group and of any interaction between rater and treatment group indicates that perceptions of facial changes were consistent across evaluator types. This suggests that the observed improvements in facial appearance following orthognathic surgery were perceived similarly by professionals and laypersons alike. Such consistency enhances the external validity of the findings, as they are unlikely to be biased by rater background. The significant effects were solely attributed to the treatment type, with surgical intervention consistently yielding higher ratings of perceived improvement. The use of actual patient images instead of largely modified ones is considered a closer approximation of the reality of human interactions, and thus, of associated effects [
40,
41]. Actual patient photos have been rarely used previously for such outcomes, and the existing studies present conflicting findings [
42]. The study of Shell and Woods [
43] showed similar effects of both treatments on facial attractiveness, whereas the study of Proffit et al. [
44] identified only minor differences of about 5%, favoring surgical outcomes. Both studies rated sets of frontal and profile facial images for facial attractiveness, but assessed separately the pre- and post-treatment conditions. Rater groups were also not precisely defined and analyzed. We used slightly modified facial photos to retain the original appearance of individuals, while limiting the effects of confounding factors such as hairstyle and prominent marks or jewelry. Moreover, we applied a robust methodology regarding questionnaire validity and different rater groups that are important from various perspectives in decision-making or treatment impact [
17,
18,
20,
22]. Another strength of the present study is the simultaneous presentation of the pre- and post-treatment images asking the raters to assess changes after randomizing the treatment phase status. With this design, various individual factors that could potentially confound the assessments—such as skin color, texture, hair color, hairstyle, and certain local morphological features [
1,
35]—are controlled, enhancing the precision of the outcome, which specifically focuses on the perceived impact of morphological changes on facial appearance. Although the surgery cases appeared to be slightly more severe at baseline, the differences between the groups were relatively small and not statistically significant, ensuring their baseline comparability for analysis, particularly regarding soft tissue parameters relevant to facial appearance. We acknowledge that there was a slight, statistically significant difference in skeletal severity between the groups, which might reflect clinical reality and should be taken into account during outcome interpretation. The present study identified clear, substantial differences that remained consistent across different types of raters and facial views [
20]. These differences likely emerged as a result of the methodological considerations employed.
The absence of an a priori power calculation could be considered a study limitation, particularly for testing interaction terms. However, a post-hoc analysis confirmed that the sample size was sufficient to achieve adequate statistical power for detecting a medium effect size. We determined a reasonable sample size based on empirical evidence and resource availability, balancing feasibility in terms of both patient and rater numbers. All data are comprehensively presented, allowing readers to critically evaluate the outcomes. Consistent with previous similar studies [
17,
18,
22], the detection of significant differences between treatment groups suggests sufficient power to address the primary outcomes. We did not conduct a specific study to determine the optimal number of raters, but based our selection on the assumption that the median response from the chosen number of raters would provide a representative assessment for each patient. While no universally established number of raters exists for facial appearance assessment, our decision was informed by empirical evidence. Saito et al. explored the effective number of subjects and raters for inter-rater reliability studies, supporting that our chosen sample size does not introduce significant sampling issues [
45]. Based on this and our findings, we believe our approach strikes a reasonable balance between reliability and feasibility. The sample comprised individuals of white European ancestry, as they represented the vast majority of patients treated at the sample collection site. Including a small number of individuals from other racial backgrounds would not have allowed for proper control of potential confounding effects due to this factor. The present results need to be tested in diverse racial groups to determine their generalizability and to explore potential variations in outcomes that may be influenced by genetic, cultural, or environmental factors. The inclusion criteria were defined to exclude extremes in facial morphology, where we would expect greater effects by the surgical approach, but this, as well as the respective effects of camouflage orthodontic treatment, remains to be tested. This study did not test the factors underlying the decisions made by the patients and their doctors, but focused on the perceived morphological outcomes of the interventions. Pre-treatment facial attractiveness and facial morphology were not thoroughly assessed and should be investigated in future studies as potential mediators of these findings. Additionally, a separate evaluation of frontal views was not performed. In this project, we prioritized two image configurations: profile-only and combined frontal with profile views. This decision was based on the clinical relevance of the profile in treatment evaluation and the need to reflect real-life facial perception. Adding a third, frontal-only assessment would have required additional rater sessions spaced sufficiently to avoid confounding effects. Future studies could benefit from independently assessing frontal views to explore their specific contribution to perceived outcomes. The present assessment used static images. Functional assessment through actual interactions or presentations of videos might have modified the outcomes. Finally, this study assessed exclusively changes in facial appearance, not accounting for dental or smile aesthetics, which might have affected the outcomes [
33].
Future research could incorporate three-dimensional video recordings or live assessments that include dynamic evaluations of functioning during interpersonal communication. Additionally, studies could investigate the long-term impact of combined treatment on patient satisfaction and functional outcomes, offering a more comprehensive understanding of its overall benefits.