The Use of Eye-Tracking Technology in Pediatric Orofacial Clefts: A Systematic Review and Meta-Analysis

This systematic review and meta-analysis assessed the quality of the peer-reviewed literature and evaluated the usefulness of eye-tracking technology in evaluating observers’ perceptions of pediatric patients with orofacial clefts. PubMed, Science Direct, Wiley, and Web of Science were searched. Articles were screened in accordance with the Preferred Reporting Items for Systematic Review and Meta-analysis guidelines, and their methodological quality was assessed. Of the 10,254 identified studies, 12 were included. Eleven studies were cross-sectional, and one was a prospective cohort study. The main areas of interest analyzed were the eyes, nose, and mouth. Nine studies used assessment scales to analyze the link between perceived attractiveness and visualization patterns and measures. For the fixation duration outcome, six studies were eligible for inclusion in the meta-analysis. All studies reported on fixation duration in milliseconds and reported on a standard deviation. The meta-analysis demonstrated a significant difference in the measurements between the control groups and the patients with orofacial clefts. This might indicate the usefulness of eye-tracking technology as a metric for assessing the success of cleft repairs based on the perceptions of different populations. Future studies should be comprehensively reported on for comparability and reproducibility purposes.


Introduction
Cleft lip and/or palate (CL+/P) affects more than 10 million infants worldwide [1].Approximately every 3 min, a child with some form of orofacial cleft is born [1].Primary lip surgery is performed from infancy to 3 months of age to enhance esthetics and function [2].This is followed by surgical correction of the palate by 6-9 months of age to allow for better dental and facial growth, feeding, and speech [3].Once the upper canines begin their eruption path, alveolar bone grafting is needed at 7-9 years of age [4].Repeated surgical interventions to enhance facial esthetic outcomes during the developmental years of these patients might result in secondary deformities [4].
A negative social perception is often associated with patients with CL+/P due to their perceived unattractive facial appearance [5].Thus, numerous studies have recruited laypeople, professionals, and potential peers to assess their perceptions of the attractiveness of patients with CL+/P [6][7][8].Furthermore, the facial perceptions of different population groups are essential because of their indirect effects on the emotional and social well-being of children with orofacial clefts [6][7][8].
The use of eye-tracking technology in dentistry is in its infancy.Eye-tracking machines allow for the tracing of gaze patterns, fixation duration (the time spent gazing on a certain region), fixation counts (number of eye visits on a single location), time to first fixation, and other parameters [9].Recently, an increasing number of studies have utilized eye-tracking technology to assess observers' perceptions of patients with CL+/P [7,[10][11][12].For the quantitative data analysis of the study outcomes, we estimated the mean difference between experimental and control groups with a 95% confidence interval using a random effects meta-analysis.We pooled data across studies when the following three criteria were met: a minimum of three studies were sufficiently homogenous in outcome reporting (i.e., used the same measurement tool), reported results for an experimental and a control group, and had complete reporting of the outcome and its measure of variance.To determine the sample size of the experimental and control groups, we calculated the product of the number of participants by the number of images assessed in each group (Appendix B).

Results
In total, 10,254 studies were identified; however, 7313 duplicates were removed.As a result, only 2941 studies were screened by title and abstract.A total of 2918 irrelevant studies were excluded.The remaining 23 full-text studies were assessed for eligibility.Eleven studies were excluded because they did not fulfill the inclusion criteria.Thus, a total of 12 studies were included in the study [6][7][8][10][11][12][18][19][20][21][22][23] (Figure 1).random effects meta-analysis.We pooled data across studies when the following three criteria were met: a minimum of three studies were sufficiently homogenous in outcome reporting (i.e., used the same measurement tool), reported results for an experimental and a control group, and had complete reporting of the outcome and its measure of variance.To determine the sample size of the experimental and control groups, we calculated the product of the number of participants by the number of images assessed in each group (Appendix B).

Study Design and Characteristics
The details of the study design of each included article are presented in Table 2.The studies were conducted on populations across 7 countries, primarily the United States (3 studies) [7,11,23], Brazil (3 studies) [6,19,20], Germany (2 studies) [8,12], the United Kingdom (2 studies) [10,22], and Canada (1 study) [21].One study obtained results from a sample population from several countries (the United States, Egypt, and Thailand) [18].All these studies were published in the past decade, from 2017 to 2022.Eleven studies were cross-sectional, and one was a prospective cohort study.

Study Design and Characteristics
The details of the study design of each included article are presented in Table 2.The studies were conducted on populations across 7 countries, primarily the United States (3 studies) [7,11,23], Brazil (3 studies) [6,19,20], Germany (2 studies) [8,12], the United Kingdom (2 studies) [10,22], and Canada (1 study) [21].One study obtained results from a sample population from several countries (the United States, Egypt, and Thailand) [18].All these studies were published in the past decade, from 2017 to 2022.Eleven studies were cross-sectional, and one was a prospective cohort study.

Observers and Stimulus Specifications
Table 3 lists the observers and stimulus material specifications.The observers in the study sample populations included laypeople, professionals with plastic surgery experience, children and adolescents, and mothers of infants with CL+/P.The sample size across all studies ranged from 11-403 participants (mean = 79.8), with an even sex distribution (femaleto-male ratio 1:1.04).The participants' visual acuities were determined by self-report in eight studies, one study tested participants' vision [18], but three studies did not report on it [7,10,23].The stimulus materials presented to the sample observers in all the included studies consisted of pediatric populations with ages ranging from infancy to adolescence.Eleven studies used static images that displayed unrepaired CL+/P defects, unilateral repaired CL+/P with or without a secondary defect, or bilateral CL+/P repair.One study investigated live interactions of mothers and their infants with CL+/P defects [10].

Eye-Tracking Apparatus and Settings
The eye-tracking apparatuses and applications used in all the included studies are detailed in Table 4. Various eye-tracking systems were used to record visual gaze.The most frequently used eye-tracking system was a screen-based machine (n = 9), followed by eye-tracker glasses (n = 2) [10,11], and a head-mounted eye tracker (n = 1) [22].The calibration of observers to the eye-tracker system was implemented prior to performing the viewing task (n = 11).One study did not report on the calibration process.In most studies, the observers were seated at a viewing distance ranging from 50 to 75 cm (median, 60 cm).The sampling rate varied greatly because of the different eye-tracking systems used (range, 30-500 Hz; median, 60 Hz), and five studies did not report on the sampling rate.The time given to complete each viewing task was reported on in eight studies and ranged from 3 to 10 s (median, 5 s).In one study, where the stimulus material was live infants, the viewing time was not restricted, and mothers were allowed to interact freely with their infants [10].

Assessment Tools and Major Findings
Table 5 describes the different assessment measures and summarizes the findings for each included study.For the data analysis, each study had determined a specific AOI, which were the regions of the face or body on which the gaze patterns were to be assessed.The main AOI analyzed were the eyes, nose, and mouth.The studies identified certain measures to analyze participants' gaze patterns.All the studies used the total fixation duration for each AOI as a primary measure to describe their findings.Other measures reported on were total fixation counts, time to first fixation, duration of first fixation, and fixation point heatmaps.Nine studies used Likert scales or questionnaires to assess the link between perceived attractiveness and the recorded visualization patterns and measures [6][7][8]10,12,[18][19][20]22].

Assessment of Risk of Bias
The assessment results for the risk of bias are shown in Table 6.The risk of bias was moderate in eight studies, low in two studies, and high in one study.Mothers of infants with a cleft gazed less often at their infant's mouth and more often on facial areas other than the eyes or mouth compared to the control group.
Participants fixated significantly longer on the mouths of infants with CL, with participants looking even longer at the mouths of infants with the most severe clefts.Infants with CL were also rated as significantly less cute than unaffected infants.Participants' visual attention was directed most strongly to the upper-lip AOI in cleft-repaired faces.Individuals with a personal or family history of facial deformity visually fixated more on the perioral region of faces with repaired CL.Cleft-repaired faces were rated as less attractive by an independent rater group and garnered greater visual attention by the observer group on the upper-and lower-lip AOI when compared to naive observers.Participants spent more time looking at the mouth and scar region when viewing images of CL repair with secondary deformities.When comparing a normal lip to a lip with a scar without a secondary deformity, there was no significant difference in the total fixation duration at the mouth region, indicating that a successful primary lip repair does not attract observers' attention to the mouth.The image of bilateral CLP was primordial and strongly captured.The nose area was secondary to the areas of the lips and eyes in images without fissures.In bilateral CLP, observers fixed their attention more frequently on the upper lip than on the eyes when shown faces at rest.In images without scarring, the capture was in the eye area and not on the upper lip.Images without scars scored higher attractiveness grades.Shorter fixations on the eyes and longer fixations on the nose and mouth of adolescents with CLP compared to their unaffected peers.Adolescents with CLP tended to spend less time fixating on the eyes.In the attractiveness/valence ratings, CLP adolescents were rated more negatively.Smiling altered the scan path toward the mouth for all faces and the valence was rated higher compared to neutral faces.Mouths and teeth had greater fixation durations regardless of the grade of IOTN.There were significant differences in the time until the first fixation on the scar of the repaired CL region for IOTN grade 1.The presence of a CL scar on the upper lip did not attract the eyes of laypeople observers of different ages, regardless of the degree of malocclusion in the non-smile image.IOTN grade 1 repaired CL regions received the highest VAS scores.The older the age, the greater the tendency to give a higher VAS score for the same malocclusion.Older observers gave higher scores than younger ones.

Quantitative Analysis
For the fixation duration outcome, six studies were eligible for inclusion in the metaanalysis.All studies reported on fixation duration in milliseconds and reported a standard deviation.The meta-analysis results, obtained from six studies with 2776 image assessments in the experimental arm and 2288 image assessments in the control arm, showed that fixation duration differed between the experimental and control groups by 1 millisecond (standardized mean difference (SMD) 0.98, 95% CI 0.23-1.72,p-value = 0.01) (Figure 2).
Children 2023, 10, x FOR PEER REVIEW 15 of 19

Quantitative Analysis
For the fixation duration outcome, six studies were eligible for inclusion in the metaanalysis.All studies reported on fixation duration in milliseconds and reported a standard deviation.The meta-analysis results, obtained from six studies with 2776 image assessments in the experimental arm and 2288 image assessments in the control arm, showed that fixation duration differed between the experimental and control groups by 1 millisecond (standardized mean difference (SMD) 0.98, 95% CI 0.23-1.72,p-value = 0.01) (Figure 2).

Discussion
This systematic review is the first to comprehensively summarize the existing literature on the application of eye-tracking technology in the assessment of observers' facial perceptions of pediatric patients with orofacial clefts.Twelve articles met the inclusion criteria; high levels in the hierarchy of evidence were not found in the study designs.Variability in the reporting study methodology was found among the studies, possibly due to the relatively new introduction of eye-tracking technology in dental research [24].Comparatively, eye-tracking technology was introduced in the field of medicine in 1991 and its popularity peaked in 2011, 2015, and 2017 [25].
Several strengths were identified in the included studies, allowing for the reproducibility of the study settings.First, all the included studies allocated and identified AOIs, including the eyes, nose, and lips.In addition, they all reported the name of the eye-tracking machine, completed the calibration process (except for one study), and reported on the software used (except for two studies).Moreover, eye calibration prior to the experiment was completed in all the studies except for the study that used live infants [10], where the use of live stimuli hindered the applicability of the calibration process.This justified the use of images rather than live interactions in most studies.
The reproduction of the stimulus images involved digitally creating defects in normal images or enhancing the defect; mirroring the images to detract from the presence of asymmetries; and removing any facial distractors.Although the images might have

Discussion
This systematic review is the first to comprehensively summarize the existing literature on the application of eye-tracking technology in the assessment of observers' facial perceptions of pediatric patients with orofacial clefts.Twelve articles met the inclusion criteria; high levels in the hierarchy of evidence were not found in the study designs.Variability in the reporting study methodology was found among the studies, possibly due to the relatively new introduction of eye-tracking technology in dental research [24].Comparatively, eye-tracking technology was introduced in the field of medicine in 1991 and its popularity peaked in 2011, 2015, and 2017 [25].
Several strengths were identified in the included studies, allowing for the reproducibility of the study settings.First, all the included studies allocated and identified AOIs, including the eyes, nose, and lips.In addition, they all reported the name of the eyetracking machine, completed the calibration process (except for one study), and reported on the software used (except for two studies).Moreover, eye calibration prior to the experiment was completed in all the studies except for the study that used live infants [10], where the use of live stimuli hindered the applicability of the calibration process.This justified the use of images rather than live interactions in most studies.
The reproduction of the stimulus images involved digitally creating defects in normal images or enhancing the defect; mirroring the images to detract from the presence of asymmetries; and removing any facial distractors.Although the images might have closely mimicked reality, one study asked the observers whether the images looked original or digitally manipulated [21].This indicates the considerable possibility that the images looked fake, which might have affected the study outcomes.Control stimuli were applied in 83% of the studies to compare the perceptions of the same observer to patients with or without orofacial clefts, or to steer the viewers' knowledge of the study.
Observers were selected from different population groups.For example, one study considered mothers of children with orofacial clefts as observers [10], other studies recruited laypeople and normal children, and one study targeted adolescents with CL+/P [8].The perceptions of these populations are of great value because they make up the social surroundings and communities in which patients with CL+/P interact in their daily lives.One study included plastic surgeons as observers [7], providing important information for guiding surgeons towards the best-practice recommendations used for repairing orofacial clefts.
Although observers' genders were thoroughly reported on, the stimulus gender was not addressed in one third of studies.This might have affected the outcomes, as gender differences play a role in the visual perception of observers [26].
In the studies that used static images, the viewing distance was reported to be within the recommended range of 50-75 cm [14], except in two studies that did not report on this [11,20].Fixating the viewing distance at the beginning and throughout the test time is essential to ensure that the machine can track the gaze.Another way to ensure that the observer maintains the viewing distance for proper eye movement reading is to provide a means to control head movement, which was reported on in some studies [18,21,22].Only one study, where mothers looked at their live infants [10], reported on free-viewing.
Several parameters can be used to assess the numbers reported on by the eye-tracking software.These include fixation counts, fixation duration, first fixation duration, and time until the first fixation.Additionally, the visualization pattern can be recorded and viewed as a short video clip.In addition to the measurements retrieved from the machine, data from other assessment scales were reported on in approximately 75% of the studies.For example, the Likert scale was used to rate depression, cuteness, attractiveness, esthetics, and attention in most of the studies.The findings obtained from these assessment scales supported the machine findings and provide an understanding of how observers internally perceived stimuli, especially regarding attractiveness.
This study has limitations.First, in the NIH risk-of-bias assessment tool, four items on the checklist were given a score of zero because of the nature of the cross-sectional study design.Of the 12 included studies, nine articles had a moderate level of evidence, and two had a low risk-of-bias level.Second, there may have been heterogeneity in the included studies in the meta-analysis due to differences in the eye-tracker hardware used and in the methods of application.If the measurement tool had been unified amongst the studies, more relevant parameters could have been compared in the quantitative analysis.Lastly, although major relevant databases were searched for this systematic review, future studies should utilize other databases and include studies in other languages.

•
The inclusion and exclusion criteria for the observers and stimulus-for example, whether participants were conditioned to view syndromic children must be thoroughly reported on;

•
The genders of the stimulus material must be diversified, as gender differences could play a role in observers' perceptions;

•
The time considered for fixation must be defined, and the time given to view each image should be uniform to allow for reproducibility and comparability of the reported results; • An assessment tool specifically designed for observers' perceptions of esthetics in CL+/P individuals is needed; • Future studies should consider using a modified NIH risk-of-bias scoring tool to exclude irrelevant questions to the study design from the rating.

Conclusions
This systematic review and meta-analysis assessed the quality of the studies that applied eye-tracking technology in evaluating the perceptions of different populations toward pediatric patients with orofacial clefts.In this study, it was found that the methodological quality of most of the included studies was moderate.Most studies measured fixation duration and utilized a supplemental measurement scale to assess viewers' perceptions.The meta-analysis demonstrated a significant difference in the measurements between the control groups and the patients with orofacial clefts.This might indicate the usefulness of eye-tracking technology as a metric for assessing the success of cleft repairs based on the perceptions of different populations.However, study designs, eye-tracking hardware, and eye-tracking software should be unified to allow for future comparability and reproducibility.

Figure 1 .
Figure 1.PRISMA flowchart diagram presenting the selection scheme for the articles.

Figure 1 .
Figure 1.PRISMA flowchart diagram presenting the selection scheme for the articles.

-
Visual analog scale of attractiveness of 0 = complete disagreement, to 100 = complete agreement.

Table 1 .
Search strategy on databases.

Table 2 .
Study characteristics of the included studies (n = 12).
NRCL+/P = cleft lip and or cleft palate; NR = not reported.

Table 3 .
Observers and stimuli characteristics of the included studies (n = 12).

Table 4 .
Eye-tracking apparatus and application of the included studies (n = 12).

Table 5 .
Assessment methods and findings of the included studies (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the includ 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studie 12).
NA, NR CD, NA, NR CD, NA, NR CD, NA, NR CD, NA, NR CD, NA, NR CD, NA, NR CD, NA, NR CD, NA, NR CD, NA, NR CD, NA, NR CD, NA, NR = cannot determine, not applicable, not reported.

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the includ 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studie 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the includ 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studie 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational c 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the includ 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studie 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational c 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for 12).

Table 6 .
Assessment of studies' quality using the NIH quality ass 12).

Table 6 .
Assessment of studies' quality using the N 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the includ 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studie 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational c 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for 12).

Table 6 .
Assessment of studies' quality using the NIH quality ass 12).

Table 6 .
Assessment of studies' quality using the N 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for 12).

Table 6 .
Assessment of studies' quality using the N 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the includ 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studie 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational c 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for 12).

Table 6 .
Assessment of studies' quality using the NIH quality ass 12).

Table 6 .
Assessment of studies' quality using the N 12).

Table 6 .
Assessment of studies' quality using the N 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the includ 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studie 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for 12).

Table 6 .
Assessment of studies' quality using the NIH quality ass 12).

Table 6 .
Assessment of studies' quality using the N 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the included articles (n = 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational cohort and cross-sectional studies for the includ 12).

Table 6 .
Assessment of studies' quality using the NIH quality assessment tool for observational c 12).

Table 6 .
Assessment of studies' quality using the NIH quality ass 12).
CD, NA, NR = cannot determine, not applicable, not reported.