Investigating the Reliability of Novel Nasal Anthropometry Using Advanced Three-Dimensional Digital Stereophotogrammetry

Three-dimensional surface imaging systems (3DSI) provide an effective and applicable approach for the quantification of facial morphology. Several researchers have implemented 3D techniques for nasal anthropometry; however, they only included limited classic nasal facial landmarks and parameters. In our clinical routines, we have identified a considerable number of novel facial landmarks and nasal anthropometric parameters, which could be of great benefit to personalized rhinoplasty. Our aim is to verify their reliability, thus laying the foundation for the comprehensive application of 3DSI in personalized rhinoplasty. We determined 46 facial landmarks and 57 anthropometric parameters. A total of 110 volunteers were recruited, and the intra-assessor, inter-assessor, and intra-method reliability of nasal anthropometry were assessed through 3DSI. Our results displayed the high intra-assessor reliability of MAD (0.012–0.29, 0.003–0.758 mm), REM (0.008–1.958%), TEM (0–0.06), rTEM (0.001–0.155%), and ICC (0.77–0.995); inter-assessor reliability of 0.216–1.476, 0.003–2.013 mm; 0.01–7.552%, 0–0.161, and 0.001–1.481%, 0.732–0.985, respectively; and intra-method reliability of 0.006–0.598°, 0–0.379 mm; 0 0.984%, 0–0.047, and 0–0.078%, 0.996–0.998, respectively. This study provides conclusive evidence for the high reliability of novel facial landmarks and anthropometric parameters for comprehensive nasal measurements using the 3DSI system. Considering this, the proposed landmarks and parameters could be widely used for digital planning and evaluation in personalized rhinoplasty, otorhinolaryngology, and oral and maxillofacial surgery.


Introduction
Objective and comprehensive craniofacial soft tissue measurements provide a quantitative basis for surgeons' consultation, as well as for preoperative and post-operative outcome comparison and follow-up. Advances in craniofacial anthropometry in the last few decades have combined the objective, yet mostly two-dimensional (2D), common and direct anthropometric examinations with tape measure, caliper, and angular measurement, as well as subjective 2D photography. Moreover, 2D photogrammetry has been widely applied to evaluate rhinoplasty outcomes such as nasal tip position, nasal alare width, and nostril shape, as well as in nasal analysis in ethnic groups [1,2]. However, a significant amount of time is required to record the complexity of the nose in detail [3][4][5].
Compared to direct measurements, 3DSI offers more detailed and extensive measurements, including distance, curvature, volume, angle, and surface area [6,7]. With the rise of 3DSI in the last decade, there have been some studies on its application for nasal anthropometry. Their findings have shown considerable reliability and feasibility in the planning and follow-up of rhinoplasty and craniomaxillofacial surgery. Nevertheless, these reports have significant limitations. Firstly, the diverse demographics of the subjects suggest that they are insufficient for stratification [8,9]. In addition, the 3DSI devices involved in these studies are technically obsolete and lack supporting analysis software for data computation. The 3D models need to be exported to third-party software, which increases systematic errors [9,10]. Furthermore, although some scholars have compared 3DSI measurements with direct measurements using tape and calipers, there is little discussion of the errors and biases generated within 3DSI and the errors in the placement of landmarks by different assessors, which is not adequate to prove the reliability of 3DSI in nasal anthropometry [10,11]. Most importantly, only a very limited number of facial landmarks and nasal anthropometric parameters were included in these studies, which limits their relevance to clinical practice [8][9][10][11].
For the mentioned reasons, we have introduced and implemented a considerable number of novel nasal landmarks into an established portfolio of 3D derived landmarks, as well as more comprehensive and original nasal anthropometric parameters, including angles, distances, and ratios, to ensure standardized and accurate coverage of the nasal region. There is no evidence to indicate whether different users and 3D image capture sessions produce consistent measurements. Therefore, the reliability of these nasal landmarks and parameters should be rigorously validated before their broad application in clinical practice. Our study focused on validation of the reliability and consistency of these novel nasal landmarks and anthropometric parameters using a new generation of 3DSI system, the latest Vectra XT-based 3D technology and matched specialized 3D medical measurement software developed by Canfield Inc. (Parsippany, NJ, USA), to depict the 3D information of nasal anthropometry and to provide objective and personalized stereo instructions for related clinical consultation.

Volunteers and Recruitment
A total of 110 healthy Caucasian volunteers (55 males and 55 females) between 18 and 65 years were enrolled in this study (Table 1). Each participant gave written informed consent before enrolment. Exclusion criteria were facial malformations, former maxillofacial surgery, and volunteers diagnosed with epilepsy or other seizure disorders. The study was performed in line with the Declaration of Helsinki and was approved by the local university's ethical committee (REF: 266-13). 2.2. 3D Surface Imaging Device (3DSI) The Vectra XT 3DSI System (Canfield Inc., Parsippany, NJ, USA) is a three-pod passive stereophotogrammetry system with six cameras at a fixed position, specially created for the healthcare sector. The cameras simultaneously capture all images in 3.5 ms, which limits the risk of motion artefacts. It is designed specifically for medicine and can build a 3D model with a 360-degree view from every angle, and various forms of treatment or beforeand-after comparisons can be evaluated in this way. Its high reliability for intraoperative facial imaging and measuring facial volume changes has been validated in our previous research reports [6,7]. Based on its widespread use and former validation, it served as the 3DSI in our current study [4,6,7,11,12].

Image Acquisition
In this study, 3DSI was performed for each session with Vectra XT consecutively. Subjects had any jewelry and hair removed from the face, forehead, and ears to fully expose the area to be scanned. Male volunteers were asked to shave, as hair is a major limitation for 3D imaging. All volunteers were asked to keep their mouth closed, without clenching the teeth, and remain in a relaxed, neutral facial expression in the same chair with a fixed backrest. They had to adopt an upright, non-excessive sitting position, and close their eyes. In our consultation rooms, the lighting conditions and background were not specifically altered, so as to achieve conditions similar to our daily lives. Each assessor underwent a separate 3D scan session.

Data Evaluation
A variety of anatomical landmarks and clinical measurements can be performed during the clinical routine of facial surgery. Surgeons used validated anatomical landmarks to obtain a wide range of 2D and 3D measurements. In this study, for each subject, 19 novel landmarks and 27 classic standardized facial landmarks ( Figure 1 and Table 2) were placed using the system supportive software Mirror (Canfield Scientific; NJ, USA). The duplicability of landmarks' designation, landmark positioning, and the data collection procedure was based on Farkas, former related studies, and the cephalometry literature [13][14][15][16]. Subsequently, 49 novel anthropometric parameters and 8 classic parameters were defined. Among them, the parameters were divided into four types of measurements: 2 surface linear distances, 27 projective linear distances, 18 angels, and 10 ratios. Table 3 and Figures 2-4 show the composition and visualization of these measurements.

Classification Measurements Abbreviation Landmarks or Definitions
Classic

Statistical Analysis
Five statistical indices were calculated to assess intra-and inter-assessor reliability ( Table 4). The intraclass correlation coefficient (ICC) indicates high reliability when the value is close to 1 and low reliability when the value is close to 0. Four classes of ICC were defined according to consensus: <0.5, poor; 0.5 to 0.75, moderate reliability; 0.75 to 0.9, good reliability, and ≥0.9, excellent [17,18]. Table 4. Summary of reliability estimates evaluated.
Mean absolute difference (MAD) is expressed as the absolute value of the difference between the average value of each variable between two measurements. Technical error of measurement (TEM) is the square root of the variance of measurement error and is calculated as listed in Table 4 [18].
As the magnitudes of MAD and TEM were highly positively correlated with the magnitude of the measurements, we combined relative error measurement (REM) and relative TEM (rTEM) to compare the measurement bias of different variables. REM provides an estimate of diversity relative to the magnitude of the measurement, and rTEM reflects bias. REM and rTEM were calculated by dividing the MAD and TEM by the grand mean of the target variables, then multiplying by 100 [8]. Based on the classification criteria proposed in previous research, REM was classified into five levels: <1%, excellent; 1-3.9%, very good; 4-6.9%, good; 7-9.9%, moderate; and >10%, poor [19,20]. The range of excellence for intra-examiner rTEM was <1.5% and inter-examiner <2.0% [21].
Statistical analyses were performed using SPSS Statistics 23.00 (IBM, Armonk, NY, USA). Data normality was tested using the Kolmogorov-Smirnov test for all measurements and all the results were consistent with a normal distribution. We used the GraphPad Prism 8 (GraphPad Software Inc., San Diego, CA, USA) to depict the figures. A difference was considered statistically significant at a probability level of ≤0.05 to guide conclusions.

Ethical Approval
Written informed consent for participation in this study was obtained from all participants following the Declaration of Helsinki protocols (1996). This study was conducted in accordance with regional legislation and good clinical practice (1996) and with approval from the Ethics Committee of the Ludwig Maximilians University of Munich (REF: 266-13).

Overall
The age of the study participants ranged from 18 to 65 years. Among the volunteers we recruited, there was no statistical difference in age between males (42.23 ± 8.31 years old) and females (40.53 ± 7.99 years old). Supplementary Table S1 show the descriptive statistics (mean and standard error, SD) for classic and novel parameters of intra-and inter-assessor as well as the corresponding p-values; the p value with an asterisk is less than 0.05, which is statistically significant. The measurements were categorized into four types (surface distance, linear distance, angle, and ratio). The intra-assessor, inter-assessor, and intra-method reliability are described in the following.

Intra-Assessor Reliability with 3D Images
For classical measurements, all eight parameters showed excellent reliability, with an ICC above 0.9. The MAD of almost all linear distances was less than 0.3 mm; only the magnitude for face width (FW) reached 0.758 mm. The angle parameters' MAD were 0.044 and 0.29 degrees, respectively.
The REM of all classical parameters were less than 1% and the rTEM of all classical parameters showed very good reliability (Table 5, Figure 5). Table 5. The intra-assessor and inter-assessor reliability of classic parameters.

Intra-Assessor
Inter-Rater  All 49 novel measurements displayed good reliability, with an ICC above 0.75 (Table 6). Moreover, 33 measurements showed excellent reliability, with an ICC larger than or equal to 0.9. TFCA had the highest ICC at 1 and the lowest NOAI at 0.77. The MAD of the surface distances GNS and DSL were 0.1 and 0.216 mm, respectively. For most linear distances, the MAD were less than 0.3 mm, except 0.335 mm for alare length left (ALLl). All the angles' MAD were less than 0.3 degrees and MAD of ratio measurements were less than 0.01, except for nostril aspect ratio left (NARl, 0.013).
For REM, 42 parameters were less than 1% and the remaining seven measurements were between 1 and 2%. Among them, the DBW showed the highest REM, with 1.958%. Furthermore, the rTEM of all novel measurements showed very good reliability (Table 6, Figure 6). Table 6. The intra-assessor and inter-assessor reliability of novel parameters.

Inter-Assessor Reliability with 3D Images
For classical measurements, the inter-assessor ICC of all eight parameters fell into the good reliability category, with values greater than 0.81. The MAD of angular measurements were less than 1 degree. For linear distance, the MAD of six parameters were less than 1 mm, except for face width (FW) and nasal length (NL), which showed MAD of 1.791 and 2.013 mm, respectively. Five parameters had an excellent REM of less than 1% and the remaining three parameters (FW, NRW, NL) had a REM of between 1 and 3.9% (very good). The rTEM of all classical parameters were less than 0.25% (excellent) (Table 5, Figure 7). For novel measurements, 18 parameters had ICC values greater than or equal to 0.9 (excellent), and 25 parameters had ICC values between 0.80 and 0.89, indicating good reliability. The ICCs of TW, NLAl, IFAL, NDA, and CSn ranged from 0.75 to 0.79. All measurements showed good reliability, with ICCs above 0.75, except for TL, which was 0.73. The MAD of surface distance were 0.803 and 0.513 mm for GNS and DSL, respectively. The MAD of most linear distances were less than 1 mm. Only two parameters had an MAD slightly greater than 1 mm. The MAD were less than 1 degree for most angular parameters except VNAr, VNAl, and FCA. The REMs of 13 parameters were excellent and less than 1%. A total of 29 parameters had a REM between 1 and 3.9% (very good), while six parameters had a REM between 4 and 7% (good). The NSAr had the highest REM, with 7.522% (moderate). The rTEM was excellent for all parameters (<2%), with a maximum of 1.45% for TL (Table 6, Figure 8).

Intra-Method Reliability with VECTRA XT 3D Imaging System
ICCs were excellent across classical and novel measurements; all parameters were greater than or equal to 0.95 (Tables 7 and 8). For classical measurements, the MAD was less than 0.3 mm for all linear distances, and 0.598 and 0.145 degrees for the angles. The REM were less than 0.5% and the rTEM were less than 0.04% for all parameters (excellent) (Table 7, Figure 9).
For novel measurements, the MAD of surface distance, GNS and DSL were 0.05 and 0.379 mm, respectively. Similar to intra-assessor reliability, the MAD was less than 0.2 degrees for all angular parameters. For the ratios, all MADs were less than 0.01 and the largest MAD value was 0.006 for NARI.
The REM of all parameters were less than 1%. A total of 15 parameters had a REM of less than 0.1% and 13 parameters had a REM between 0.1 and 0.2%. The remaining 21 parameters had a REM greater than 0.2%, with DBW having the largest REM at 0.9%. The rTEM was less than 0.1% for all parameters in the excellent category. The rTEM was less than 0.005% for 14 parameters, between 0.005 and 0.02% for 19 parameters, and above 0.02% for 16 parameters (Table 8, Figure 10).

Discussion
This study assessed the accuracy and reliability of nasal anthropometry derived from 3D stereophotogrammetry. We introduced 46 novel and conventional 3D landmarks, as well as 57 corresponding novel and classical linear and surface distances and angular and ratio parameters for the quantitative analysis of perinasal morphometric parameters. These landmarks and parameters provide complete coverage of the nose and perinasal surface.
The mean values of the measurements ranged from 6.264 to 186.334 mm for distance parameters, 23.824 • to 174.779 • for angular parameters, and from 0.179 to 2.437 for ratios. A very high level of agreement was found for intra-assessor reliability, with ICCs above 0.9 for 42 of the 57 parameters and above 0.8 for all parameters except NOAI. Furthermore, the validation results of the intra-assessor reliability showed that the REM (<1%) for almost all parameters and the rTEM (<1.5%) for all parameters were in the excellent category.
For inter-assessor reliability, 42 of the 57 parameters were greater than 0.85, and 50 parameters were greater than 0.8. The results showed that inter-assessor reliability was slightly lower than the intra-assessor reliability, suggesting individual bias in the placement of landmarks despite the same workflow [22]. In terms of MAD, the most significant differences were 2.013 mm for the distance parameter, 1.471 • for the angle parameter, and 0.065 for the ratio parameter. However, their correspondent REM were less than 3.9%, which is in the good category. We suggested that the main reason for the relatively significant MAD of these parameters is their own larger measured values. In terms of the rTEM, all parameters were less than 1%, except for TL (1.458%). Even so, the rTEMs for all parameters were in the excellent category. These findings suggest that despite the slight deviations in the locating of landmarks, measurements of inter-assessor reliability have proven to be highly consistent and reliable.
For intra-method reliability, the ICC was above 0.95 for most measurements, except for the NOAI of 0.948. The superb results of the intra-method assessment demonstrated the high reliability of the 3D imaging system. Considering the intra-and inter-assessor reliability, the landmark determination and placement protocol has been thoroughly evaluated and provides an effective and valid reference for further comparative and clinical research.
For the comparison of classic and novel measurements, most introduced novel nasal anthropometric parameters perform as reliably as the classical parameters. In terms of intra-assessor reliability, the REM of 42 novel parameters were in the excellent category and seven showed very good reliability. In terms of inter-assessor reliability, 42 novel parameters had reliability above the very good category, and six showed good reliability. In terms of intra-method reliability, all 49 novel parameters were in the excellent category. The rTEM of all novel parameters showed excellent reliability across intra-and interassessor and intra-method reliability. The largest deviations were concentrated around the nasal tip and nostrils. The reason for this might be the lack of consensual definition of the nasal tip boundary and the nostril short axis on 3D images. These resulted in the variation in the identification of the tip defining points and nostril short axis by different assessors. Nevertheless, almost all parameters displayed good reliability in the intra-and inter-assessor as well as the intra-method validation.
Our study demonstrated the excellent reliability of a novel 3D derived nasal anthropometry as well as the landmark-based setting approach. The results showed that most landmarks on 3D images obtained with the VECTRA XT, as well as the distances and angles between landmarks, are highly reliable. Reliability is one of the most commonly used indicators to assess the errors arising from a novel measurement process. It refers to the overall consistency of a measurement. The measurement is highly reliable if it produces similar results under consistent conditions or is consistent from one test occasion to another [23]. In this study, we implemented the five most frequently used estimates (MAD, REM, TEM, rTEM, ICC) based on previous studies to evaluate the avoidance of terminological confusion and make it easier for the reader to understand [11,18,19,24].
Although there have been some previous studies on nasal anthropometry using 3DSI, the facial landmarks and measurement parameters included are far from adequate for normal clinical practice. In particular, researchers did not perform the angle measurement and the linear measurement of the nasal tip area, and they similarly ignored the proportional relationship between the nose and the entire face [8][9][10]15,25]. To achieve harmony through invasive and non-invasive procedures, it is often necessary to correct the disproportions. The proportions of the face are of inestimable value when assessing the patient's facial profile in consultations, as well as in surgical planning and assessment [26]. For this reason, we have attempted to fill this gap by introducing a richer set of facial landmarks and more detailed measurement parameters in order to provide a more comprehensive and objective reference. They need to be rigorously validated before being widely used in clinical practice and relevant research. In the current study, our results show that the newly defined facial landmarks and all measurement parameters used are sufficiently reliable to be used in clinical nasal anthropometry or basic research, especially for personalized rhinoplasty, consultation, as well as the design of maxillofacial surgery and pre-and post-operative follow-up.
In recent years, personalized medicine has become an increasingly popular conception [27,28]. Beauty seekers also demand a more detailed and all-sided level of personalized plastic surgery. The main drawback for patients undergoing traditional rhinoplasty is that the post-operative results are far from what is expected. Currently, the implant materials commonly used in rhinoplasty are manufactured in a standard mold and then sculpted by the surgeon on the operating table to match the external shape of the patient's nose. The surgical outcome depends more on the aesthetics, experience, and skill level of the surgeon, and many patients do not realize until after the operation that the result is far from what they expected and have to remove the implant again [29,30]. With the introduction and popularity of 3DSI, digitally personalized plastic surgery has become possible. The facial landmarks and nasal anthropometric parameters involved in this study can provide plastic surgeons with more accurate and precise data on the patient's nasal morphology, providing a more quantitative and objective reference for implant design and customization. It also allows an accurate comparison of changes in the patient's pre-and post-operative nasal morphology. Furthermore, these 3DSI-based measurements combined with 3D printing technology can be used to personalize the design of the implant to better match the patient's nasal morphology and avoid intra-operative re-sculpting, thereby significantly reducing the surgery time and alleviating the patients' discomfort.

Limitations and Perspectives
There were also some limitations in our study. Despite manual soft-tissue landmark placement in addition to automated landmark placement with the Vectra software, assessordependent errors could not be completely excluded. In this regard, we believe that, for those landmarks that involve relatively less reliability, such as TDP, NOAr, and NOAl, assessors can mark these points on the face manually in advance to reduce the assessor-dependent error and obtain more objective and accurate results.
In our future research, we will recruit various groups of participants, evaluating the method's reliability in subjects in wider age ranges and in different races. Moreover, we will use these facial landmarks and anthropometric parameters to compare the differences in nasal morphology between ethnic groups.

Conclusions
This research introduces 46 facial landmarks and 57 detailed three-dimensional digital nasal and perinasal anthropometric parameters, demonstrating their high reliability for the analysis of nasal morphological features. It offers essential evidence and an initial reference for the application of 3D nasal anthropometry in clinical practice. Compared to previous studies in the oral and maxillofacial area on mannequins and nasal or morphology analysis based on 3DSI, our study included much more comprehensive and detailed anthropometric parameters. This provides clinicians with a more comprehensive and extensive range of nasal morphological data. As the first study we know of using 3D stereophotography to evaluate the reliability of nose measurements in detail, this study could be the primary foundation for this field. This technology can be used for surgical planning and the evaluation of post-operative effects in the field of otolaryngology, plastic and cosmetic surgery, and maxillofacial surgery that seeks to change the nasal morphology.

Institutional Review Board Statement:
This study was conducted in accordance with regional legislation and good clinical practice (1996) and with approval from the Ethics Committee of the Lud-wig Maximilians University of Munich (ID: 266-13).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: The data are not publicly available due to the protection of patients' privacy and image rights.