Does Head Orientation Influence 3D Facial Imaging? A Study on Accuracy and Precision of Stereophotogrammetric Acquisition

This study investigates the reliability and precision of anthropometric measurements collected from 3D images and acquired under different conditions of head rotation. Various sources of error were examined, and the equivalence between craniofacial data generated from alternative head positions was assessed. 3D captures of a mannequin head were obtained with a stereophotogrammetric system (Face Shape 3D MaxiLine). Image acquisition was performed with no rotations and with various pitch, roll, and yaw angulations. On 3D images, 14 linear distances were measured. Various indices were used to quantify error magnitude, among them the acquisition error, the mean and the maximum intra- and inter-operator measurement error, repeatability and reproducibility error, the standard deviation, and the standard error of errors. Two one-sided tests (TOST) were performed to assess the equivalence between measurements recorded in different head angulations. The maximum intra-operator error was very low (0.336 mm), closely followed by the acquisition error (0.496 mm). The maximum inter-operator error was 0.532 mm, and the highest degree of error was found in reproducibility (0.890 mm). Anthropometric measurements from alternative acquisition conditions resulted in significantly equivalent TOST, with the exception of Zygion (l)–Tragion (l) and Cheek (l)–Tragion (l) distances measured with pitch angulation compared to no rotation position. Face Shape 3D Maxiline has sufficient accuracy for orthodontic and surgical use. Precision was not altered by head orientation, making the acquisition simpler and not constrained to a critical precision as in 2D photographs.


Introduction
The quantitative analysis of the human face has always received large attention from both scientists and artists [1]. Qualitative analysis of the face is a daily, often unconscious, process. Facial appearance allows personal identification, communication, and interaction with the environment, and it gives information about the individual's health state. Anthropometric evaluation is carried out by several medical specialists using techniques that require accuracy and precision [2]. In this way, aesthetic and maxillofacial surgeons, otolaryngologists, dentists, oral surgeons, and orthodontists can all document clinical cases and compare different images of the same patient (e.g., pre-treatment and post-treatment) [3]. An objective, accurate, and reliable system for quantifying the soft tissues of the face in three dimensions and in color is still being studied [4].
Anthropometric analysis of soft tissues is an integral part of orthodontic diagnosis along with therapeutic planning, and two-dimensional (2D) photography has been, for years, one of the main devices for the analysis of facial measurements. Despite this, such a technique carries considerable limitations. It can reproduce reality but only in two dimensions, leaving out valuable information about the depth and the transversal dimension of a face. Moreover, in 2D the position of the patient inside the framework of the image is critical. Any error in the vertical, horizontal, or rotational positioning of the face during the shot creates distortions, making images practically useless. A 3D photograph, on the contrary, can be shot even in the presence of less than ideal positioning, which can then be easily corrected on the computer. The 3D image, therefore, is not only a general improvement of 2D, but a means that provides a completely new concept of the image, by being able to perform measurements and overlaps and, thus, making objective what was previously only a clinical impression. In addition, stereophotogrammetry provides an accurate assessment of the aesthetic outcome of the therapy, which, being not invasive, can also be repeated with frequent surveys over time. Moreover, even if clinicians and researchers are still discussing how it may be possible to reach a reproducible natural head position (NHP), the smoothest variations on the positioning of the patient's head during a 2D photography session could unavoidably alter results within a single assessment or time-to-time comparisons.
Interest in overcoming the limitations of direct measurements and of 2D photogrammetry has led to the development of numerous non-invasive methods for capturing and quantifying craniofacial surface morphology [5].
Today, we have sophisticated digital scanning devices that complete our evaluations with third-dimension (3D) data, and even more recently the fourth dimension (4D), that includes movement [6][7][8]. In the literature, various 3D imaging systems of soft facial tissues are described; currently, the gold standard is represented by three-dimensional stereophotogrammetry [9,10]. Each new stereophotogrammetric system requires a validation process that establishes accuracy and precision before clinical use by identifying possible grinding and extrinsic errors to the system. Chung et al. [11] give an overview of the three-dimensional scanning device types available. Despite the huge amount of literature about the new three-dimensional system, a clear and objective evaluation of accuracy and reliability under different circumstances is missing in many studies. Verifying any in vivo hypothesis necessarily presupposes a previous validation of the system in vitro, in terms of technical validation and knowledge of system errors in different acquisition conditions.
The goals of this study are (1) to validate the present digital 3D photogrammetric device in terms of measurement error; (2) to compare craniofacial measurements obtained from 3D images which were generated from alternative captures of a mannequin head with different degree of yaw, pitch, and roll rotations. The hypothesis is that anthropometric measures recorded in different conditions of head orientation are equivalent to each other.

Polishape Technology
The scanner we used was Face Shape 3D Maxi Line, developed by Polishape 3D Srl (Bari, Italy). This photogrammetric system counted six Canon EOS 1100D (12.2 megapixels, lenses focal length: 50 mm) reflex cameras, each one fixed on rigid support with a specific inclination. Two lateral and external professional flashes were used to minimize any possible external light distortions and to obtain uniform light on the surface of the object.

Object and Data Acquisition
After the required calibration procedure, the subject could be shot and the corresponding 3D image created by a specific software by a 3D rendering function; in this study, we used Viewbox 0.4 ® (dHAL Software, Kifissia; Greece).
We used a mannequin head as a subject, which is ideal for in vitro experimentation due to immobility and absence of facial mimicry. To improve image acquisition, the texture of the dummy was faded and opaque to reduce light reflection, and, as suggested by the literature data [12], the following 22 anthropometric points were marked with an eyeliner (Figure 1  After the required calibration procedure, the subject could be shot and the corresponding 3D image created by a specific software by a 3D rendering function; in this study, we used Viewbox 0.4 ® (dHAL Software, Kifissia; Greece).
We used a mannequin head as a subject, which is ideal for in vitro experimentation due to immobility and absence of facial mimicry. To improve image acquisition, the texture of the dummy was faded and opaque to reduce light reflection, and, as suggested by the literature data [12], the following 22 anthropometric points were marked with an eyeliner ( Figure 1  The dummy was placed on external support equipped with a graduated scale that allowed the operator to orientate the dummy on the three planes of the space with extreme control on specific angular rotational values during the pitch, roll, and yaw movements ( Figure 1).
To investigate the accuracy and precision of the system, a series of captures was taken with the mannequin head with no rotation (reference position) and with various pitch, roll, and yaw angulations. The protocol of image acquisition is reported in Table 1.  The dummy was placed on external support equipped with a graduated scale that allowed the operator to orientate the dummy on the three planes of the space with extreme control on specific angular rotational values during the pitch, roll, and yaw movements ( Figure 1).
To investigate the accuracy and precision of the system, a series of captures was taken with the mannequin head with no rotation (reference position) and with various pitch, roll, and yaw angulations. The protocol of image acquisition is reported in Table 1. Next, all shots were processed, and the anthropometric points previously marked on the dummy were also marked on the three-dimensional reconstructions obtained. Their x, y, and z coordinates were collected from the software and saved in an Excel spreadsheet.
The Next, all shots were processed, and the anthropometric points previously marked on the dummy were also marked on the three-dimensional reconstructions obtained. Their x, y, and z coordinates were collected from the software and saved in an Excel spreadsheet.
The  The 14 measurements were chosen to cover various facial regions, having different size and orientation on the transverse, frontal, and sagittal plane.

Data Processing and Operational Definitions
The Euclidean distance between two landmarks has been calculated as the square root of the sum of the differences in the three dimensions of the space, as indicated in the following formula: d = ∆x + ∆y + ∆z , an analog to the target registration error (TRE) described in several articles [13,14].
Errors may be introduced during imaging acquisition, placement of landmarks on The 14 measurements were chosen to cover various facial regions, having different size and orientation on the transverse, frontal, and sagittal plane.

Data Processing and Operational Definitions
The Euclidean distance between two landmarks has been calculated as the square root of the sum of the differences in the three dimensions of the space, as indicated in the following formula: d = ∆x 2 + ∆y 2 + ∆z 2 , an analog to the target registration error (TRE) described in several articles [13,14].
Errors may be introduced during imaging acquisition, placement of landmarks on the images, or calculation of distances. To estimate the relative contributions of these sources of errors, the precision of the system was investigated in terms of repeatability (same team, same experimental set-up) and reproducibility (different team, different experimental setup). Notably, repeatability included intra-operator, inter-operator, and acquisition errors.
All the investigators were orthodontists, with at least 20 years of clinical experience in recognizing anthropometric points.
To assess intra-operator error, the same investigator repeated (10 times) the placement of anthropometric landmarks and measurements of linear distances on the same 3D reconstruction taken in the reference position. All parameters were measured again by a second investigator on the previous acquisition and compared (inter-operator error). Acquisition error was assessed by measuring, using the same operator, the selected parameters of 5 different 3D image captures in reference position. To investigate reproducibility error, we compared measurements performed by the two operators on different acquisitions. To avoid recall bias, a minimum of 24 h was allowed to elapse between measurement sessions.
To assess how the head position might affect the accuracy of anthropometric measurements, each set of linear measurements of yaw, roll, and pitch was compared with data obtained from the reference position, and the mean value of the linear measurements derived from each set of 3D images taken with yaw, roll, and pitch angles was compared to the mean of those obtained in the reference position.

Data Analysis
Statistical analysis was performed using various software, among them the opensource statistical software Jamovi [15], which is based on the widespread open-source statistical system "R", the free epidemiological software Winpepi [16], and Microsoft Excel.
To quantify measurement error magnitude, the following measurement error indices were calculated: mean error, maximum error, standard deviation (of errors), standard error (of errors), and coefficient of variation (CV).
To demonstrate the equivalence between measurements obtained in alternative head positions, the mean difference between repeated measures was calculated, and the Westlake-Schuirmann two one-sided test (TOST) was used [17,18]. The mean equivalence of two sets of measurements was defined as a difference not higher than the maximum error of the system, which had been previously calculated in the reference position. The null hypothesis for equivalence was that there was a substantial difference (greater than the maximum error) between the measurements performed in different conditions of head orientation. In the case of rejection of the null hypothesis, equivalence can be assumed. It must be noted that the null hypothesis for equivalence test is just the opposite of that conventionally used in superiority tests, where rejection of the null states that a difference exists. To prove equivalence, the test must reject the hypothesis of difference [17,18]. The test was repeated for each set of measurements generated from the captures of the dummy with yaw, pitch, and roll angulations compared to those with no rotation. Results were considered statistically significant for a p-value less than 5% (p < 0.05). Bonferroni's correction was applied for multiple comparisons.

Intra-Operator Error
The mean of standard errors was 0.041 mm, with a minimum value of 0.023 mm in the Glabella-Frontotemporal (l) measurement and a maximum value of 0.081 mm in the Nasion-Pogonion measurement; such a measurement also reached the highest maximum error (0.336 mm).

Inter-Operator Error
The inter-operator standard error was higher than the intra-operator value with a mean value of 0.064 mm, a minimum value of 0.04 mm in the Chelion-Gonion (r) measurement, and a maximum value of 0.111 mm in the Nasion-Pogonion measurement, whose maximum error was found to be 0.532 mm.

Acquisition Error
The mean standard error of acquisition was 0.103 mm with a minimum value of 0.066 mm for Glabella-Frontotemporale (r), and a maximum value of 0.177 mm for Zygion (r)-Tragion (r). The highest maximum error was found in the Cheek (l)-Tragion (l) measurement of 0.496 mm.

Reproducibility Error
The mean standard error was 0.086 mm with a distribution of values between 0.032 mm for Glabella-Frontotemporale (r) and 0.164 mm for Cheek (l)-Tragion (l). The latter is also characterized by the largest value of the maximum recorded error of 0.890 mm.
The error magnitude statistics are reported in Table 2, and the maximum error distribution is represented in Figure 3. Table 2. Error magnitude statistics: the table shows the descriptive statistics used to quantify measurement error magnitude. Maximum error, standard deviation, standard error, and coefficient of variation were calculated for each distance between landmarks in investigating intra-operator, inter-operator, acquisition, and reproducibility error. All measures are expressed as mm.

Measurements
Error   Figure 4 reports the impact of various sources of errors. As expected, the maximum intra-operator error was very low (0.336 mm) as it contained only one variable: the operator itself. It was followed by the acquisition error (0.496 mm), which resulted from repeated captures under the same condition, and is intrinsic to the imaging device. Adding  Figure 4 reports the impact of various sources of errors. As expected, the maximum intra-operator error was very low (0.336 mm) as it contained only one variable: the operator itself. It was followed by the acquisition error (0.496 mm), which resulted from repeated captures under the same condition, and is intrinsic to the imaging device. Adding the variable of a second operator (inter-operator error), the maximum error increased to the value of 0.532 mm, thus suggesting that the "operator" variable played a greater role in increasing the variability of the results.  The higher degree of maximum error was found in reproducibility (0.890 mm), which combined both the error due to digitization and the imaging system.

Error Analysis and Rotations Equivalence
The statistical equivalence between anthropometric measurements coming from the alternative acquisition conditions was calculated considering an equivalence limit of 0.89 mm, which represented the maximum error of the measurement system at the reference position.
The mean difference between linear measures and equivalence data is reported in Table 3.
Yaw versus reference position: the two sets of measurements were found to be statistically equivalent (test of equivalence: p <0.01). The highest mean difference was 0.422 mm for the Cheek (l)-Tragion (l) measurement.
Roll versus reference position: the two series of measurements were found to be statistically equivalent (test of equivalence: p <0.01). The maximum difference between the averages was 0.543 mm calculated for Zygion (l)-Tragion (l).
Pitch versus reference position: the two sets of measurements were statistically equivalent (test of equivalence: p < 0.05) with the exception of two distances, Zygion (l) -Tragion (l) (p = 0.510) and Cheek (l)-Tragion (l) (p = 0.166). The maximum difference between the averages was 0.814 mm calculated for Zygion (l)-Tragion (l). Table 3. Mean differences with 95% confidence intervals between linear measurements performed on 3D images taken in the reference position with those obtained on acquisitions with yaw, roll, and pitch angulations. When the TOST p-value was less than 5% (p < 0.05) we rejected the null hypothesis of nonequivalence and concluded that the measurements were equivalent.

Yaw
Roll Pitch The higher degree of maximum error was found in reproducibility (0.890 mm), which combined both the error due to digitization and the imaging system.

Error Analysis and Rotations Equivalence
The statistical equivalence between anthropometric measurements coming from the alternative acquisition conditions was calculated considering an equivalence limit of 0.89 mm, which represented the maximum error of the measurement system at the reference position.
The mean difference between linear measures and equivalence data is reported in Table 3.
Yaw versus reference position: the two sets of measurements were found to be statistically equivalent (test of equivalence: p < 0.01). The highest mean difference was 0.422 mm for the Cheek (l)-Tragion (l) measurement.
Roll versus reference position: the two series of measurements were found to be statistically equivalent (test of equivalence: p < 0.01). The maximum difference between the averages was 0.543 mm calculated for Zygion (l)-Tragion (l).
Pitch versus reference position: the two sets of measurements were statistically equivalent (test of equivalence: p < 0.05) with the exception of two distances, Zygion (l)-Tragion (l) (p = 0.510) and Cheek (l)-Tragion (l) (p = 0.166). The maximum difference between the averages was 0.814 mm calculated for Zygion (l)-Tragion (l). Table 3. Mean differences with 95% confidence intervals between linear measurements performed on 3D images taken in the reference position with those obtained on acquisitions with yaw, roll, and pitch angulations. When the TOST p-value was less than 5% (p < 0.05) we rejected the null hypothesis of nonequivalence and concluded that the measurements were equivalent.

Discussion
The goal of the present study was to analyze a new six-camera stereophotogrammetry system and to evaluate the accuracy and the reliability of craniofacial measurements obtained from 3D surface captures with different degrees of head orientation. In particular, the analysis focused on searching for errors, whether they were specific to the system or related to the operator.
To the best of our knowledge, no studies have been performed to investigate the accuracy of the present stereophotogrammetric system in a clinical setting, and there are no studies on measures variations induced by head rotations in the three planes of space.
A previous study [19] has demonstrated that facial landmarks do not have the same reproducibility dividing them into highly, moderately, and poorly reproducible landmarks. Examples such as Zygion, Gonion, and Tragion have been shown to have poor reproducibility.
In this study, the highest maximum error was recorded in the measurements involving the Tragion. We can assume that the removal of these parts, reducing the measurements onto the facial oval, would improve the precision. If we exclude measurements made in the lateral regions, the maximum error drops considerably (0.535 vs. 0.89 mm).
Intra-operator maximum error was very low (0.334 mm), and acquisition error and inter-operator error were very similar (0.496 mm/0.532 mm). As expected, the maximum error of reproducibility was the highest (0.89 mm). Reproducibility refers to the variation of measurements made under changing conditions, which in the present experiment were due to measurements being made by two operators on acquisitions with different head angulations; thus, such a value combines various types of errors owing both to digitization and to the imaging system itself. Lateral regions, along with interoperability variability, were sources of greater acquisition and reproducibility errors, especially in measurements involving the Tragion area.
In the context of medical facial treatment, a patient's photographs represent an extremely important datum both for the follow-up and simple clinical documentation. So far, in 2D photographs it has always been the practice to search the NHP of the patient's head as a reference position. Despite that, among the scientific community, there is still discussion on how to reach the NHP and whether it can be reproducible.
Cassi et al. [20], in a recent review, focused on techniques to establish the NHP, and how to transfer it to the cephalostat, together with an overview of the three-dimensional recording methods recently introduced into clinical practice. Several studies have successfully measured the reproducibility and stability of the NHP, both in a short and long timelapse [21][22][23].
On the other hand, although the NHP has less variability than intracranial reference lines, it is also influenced by balance, vision, and proprioception from joints and muscle involved in maintaining erect posture. Therefore, it depends on the subject's neuromuscular condition as well, and it may be difficult to obtain in some patients, especially children, and subjects with neuromuscular disorders, vertebral column deformity, and alterations in eye muscles balancing [24].
In the literature, some protocols for obtaining the NHP might influence reproducibility, and there is also some evidence that the success might depend on the operator [25]. To some other authors [26], the perception of correct anatomical alignment changed considerably with time. They say that different observers disagreed on the correct anatomical alignment, and the agreement among multiple observers was bad for pitch, moderate for yaw, and good for the roll. Therefore, even if the NHP was perfectly repeatable, the problem of the correct tracking of the camera would remain. The evidence that a change in the relative position of the face/camera system in pitch/roll/yaw does not compromise the result (at least in clinical terms) is a relief, not only from the "NHP problem" but also from the problem of the alignment of the detection system of the image.
In light of this, since the facial scanning system is continuously being developed as a valid alternative to 2D photography, we wanted to investigate if head rotations on the three spatial planes could represent any critical aspects in terms of reproducibility and precision, with notable implications on the success of the results.
We can accordingly conclude that the position of the dummy does not influence the precision, accuracy, and repeatability of anthropometric measurements, at least within 16 degrees on right and left for yaw and roll, and within 11 degrees upward and downward for pitch.
These results agree with data from studies conducted with other stereophotogrammetric systems. In fact, Lubber et al. stated that by progressively changing the spatial orientation of a dummy (by rotating and translating its head from a neutral position), the mean error of measurements on the corresponding 3D image remained very low and steady within the central range of movement (TRE: 0.195 mm), which then showed very little increases along with large spatial variations of the dummy [27]. Moreover, other authors assess that images captured by a stereophotogrammetric device are highly repeatable. In their study, the same authors have found the error associated with the placement of landmarks on the 3D images to be sub-millimeter, therefore irrelevant enough to be able to assert that landmark digitalization can be acquired with a high degree of precision using this technology [28]. Ayoub et al. [29] identified operator error to be accurate within 0.2 mm, and the average discrepancy of point location for three operators involved was 0.79 mm.

Conclusions
In conclusion, we can state that the Face Shape 3D Maxi Line has sufficient accuracy for orthodontic and surgical use, especially in the median areas of the face. Despite the presence of areas of non-equivalence (lateral areas), the differences are clinically acceptable. Considering the magnitude of the intra-operator and inter-operator errors, which represent a significant proportion of the total error, we might suppose also that acting on them and the learning curve might reduce the system error.
Based on the results, we might assert that such precision is not altered by the rotation of the head on three planes of space, making the acquisition process even simpler and less constrained to a predetermined position, which is not always easy to obtain with all patients. This improves the comparison of acquisitions obtained at different times and conditions, facilitating the clinician during long-term treatment.

Strengths and Limitations
The mannequin head represented an ideal object, as it did not move or perform facial expressions. This allowed it to reach a very high degree of precision, without the influence of human variability on the photographed subject. On the other hand, the estimates of precision and accuracy might be inflated by the experimental setting, and the lack of a human sample is a limitation to the study, namely not being able to assess the effect of the stretching of the soft tissues in the precision of the measurement.
Further studies are needed to confirm the results obtained in vitro by repeating the study in vivo. Living subjects, as opposed to inanimate mannequin heads, may be affected by motion artefacts such as breathing or swallowing, and thickness and soft tissue characteristics, also related to different age ranges, might influence the measurement error of 3D surface imaging systems.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.