A Novel Framework for Assessing Facial Attractiveness Based on Facial Proportions

: In this paper, we present a novel framework for automatically assessing facial attractiveness that considers four ratio feature sets as objective elements of facial attractiveness. In our framework, these feature sets are combined with three regression-based predictors to estimate a facial beauty score. To enhance the system’s performance to make it comparable with human scoring, we apply a score fusion technique. Experimental results show that the attractiveness score obtained by the proposed framework better correlates with human assessments than the scores from other predictors. The framework’s modularity allows any features or predictors to be integrated into the facial attractiveness measure. Our proposed framework can be applied to many beauty-related ﬁelds, such as the plastic surgery, cosmetics, and entertainment industries.


Introduction
Attractiveness is one of the most critical social characteristics of the human face.Several studies have shown that facial attractiveness is considered to be a basis for social and intellectual competencies, such as mate choice and income [1,2].In recent years, beauty-related industries have grown rapidly in many countries [3].Thus, the analysis and measurement of facial attractiveness have garnered attention from scientists, physicians, and artists because of its many potential applications in entertainment, virtual media, plastic surgery, and cosmetic industries.Although it is debatable whether facial attractiveness is objective or subjective, recent empirical results support the idea that attractiveness is objective, achievable by measuring cross-cultural consistency [4], brain activity patterns [5], and infant preference [6].Studies in psychology and medical science have also explored aesthetic evidence of what makes a face more attractive, finding facial averageness [7] and symmetry [8] to be important.That is, facial attractiveness increases as averageness and symmetry increase.Facial skin colors and texture are also significantly related to face attractiveness [9,10], and has been included in evaluations of facial attractiveness [11,12].Moreover, research has shown that attractive faces should follow certain defined ratios of facial proportions, such as neoclassical canons [13] and the golden ratio [14] which have been believed to be ideal ratios for beautiful faces since ancient times by artists, physicians, and orthodontists [15].In machine learning, several methods have been proposed to assess facial attractiveness by encoding these objective factors from a face.However, due to the improper feature extraction, these elements have not been investigated thoroughly.Moreover, they have not been compared and combined in an overall framework, even though this approach might decrease the gap between human and machine performance.Therefore, there is a need to develop an efficient framework as an initial step towards a machine obtaining human-level performance in predicting facial attractiveness.In this paper, we focus on developing a framework for assessing facial attractiveness based on facial proportion factors widely believed to be objective elements for facial beauty.The remainder of the paper is organized as follows.Section 2 presents a brief summary of previous machine learning approaches for assessing or enhancing facial attractiveness.Section 3 explains the proposed framework.Section 4 discusses the experimental results and possible applications.Conclusions and future work are addressed in Section 5.

Related Work
Researches in psychology and biology settle this problem by hypothesizing which aspects of faces make them attractive.Studies indicate that features such as symmetry, averageness, and sexual dimorphism influence the perception of attractiveness.Jones [12] demonstrated that healthy-looking women appear more attractive based on these three features.Galton [16] proved that average faces have significant correlation with facial beauty, whereas very attractive faces are not average [17].Based on facial averaging, face symmetry indeed increases, which supports evolutionary biology studies that indicate the positive influence of facial symmetry on attractiveness [18].Further, sexual dimorphism-the secondary sexual characteristics appearing during puberty can affect attractiveness judgments [19].Specifically, different sexually dimorphic features for males and females make them appear more masculine or feminine.Many studies have provided evidence that masculinity and femininity are more convincing attractiveness features than symmetry [12,20,21].Skin color and texture are also known to be related to attractiveness.As many researchers have proposed a link between attractiveness and traits that appear healthy, the health of facial skin might be a surface property that positively influences attractiveness judgments.Fink et al. [9] evaluated facial beauty via human scoring from skin colors and textures.In Jones et al. [22], facial skin is correlated with male facial attractiveness and is a visual cue for attractiveness judgments.Since facial attractiveness is influenced by a range of factors, studies have considered both facial shape and appearance for facial attractiveness judgments.Kagian et al. [11] analyzed facial attractiveness based on facial appearance and geometry.Bronstad and Russell [21] described the effect of facial shape and appearance (texture) on attractiveness measurements using pixel information in an image, and confirmed that the two criteria had similar effects.Jones [12] also evaluated facial attractiveness by landmark-based feature distance (facial shape) and skin color and showed that both facial shape and skin characteristics significantly affect facial beauty measurements.
Recently, machine learning approaches have been developed to assess facial attractiveness.Aarabi et al. [23] developed a preliminary automatic facial beauty scoring system based on a vector of eight ratio values between facial landmarks (eyes, eyebrows, mouth and nose).The 12 human assessors scored 40 training images on a four-point scale.To estimate a test image's score, its feature vector was computed and then the average score of the 10 nearest faces in the training set was calculated.Gunes and Piccardi [24] collected 215 female face images and the images which were rated on a 10-point scale by 48 human assessors.A vector of 13 ratio values, including the golden ratio, was calculated between facial landmarks.Then a trained, tree-based classifier was used to determine the beauty score of the test images.Kagian et al. [11] extended their earlier work [25] to develop a regression-based facial attractiveness predictor.They adopted both geometric distance features and skin color features to estimate facial attractiveness.From the 84 facial landmarks, 3486 distance values were calculated; these were reduced to 90 values by principal component analysis (PCA).The eight feature components calculated from skin and hair color values were combined with these 90 features.A total of 98 features were used by three predictors (linear regression, a support vector machine, and a Gaussian process).The results of rating 91 frontal face images on a seven-point scale showed that the system with the best predictor (linear regression) had a Pearson correlation of 0.82 with ratings by 27 human raters.A similar regression-based approach [26] was proposed to determine the relation between three facial proportion factors-the neoclassical canon, symmetry and the golden ratio-with human ratings.From the face recognition technology (FERET) database, 420 frontal face images were selected and scored on a 10-point scale by 36 human assessors.These three feature sets were extracted from 29 facial landmarks and statistical analysis software was used to calculate the correlation between the machine and human ratings.Unfortunately, these predictors had little relation to beauty scores.However, motivated by the work [11], a data-driven approach to enhancing facial attractiveness was proposed in [27].Similarly, 234 distance feature components were calculated from 84 facial landmarks.Then, with a trained beautification engine based on support vector regression (SVR) or a K-nearest neighbor (K-NN) algorithm, the system searched the face space for a nearby point with a higher predicted attractiveness rating.Finally, the triangulation of the original face is warped toward those of the beautiful faces more similar to the original one.In recent years, attractive measurement and evaluation of 3D faces have been introduced because they produce more precise and accurate prediction results.O'Toole et al. [28] calculated average face shapes and textures based on 3D scan data of 200 men and women, and proved that averageness influences attractiveness in 3D faces, just as in 2D faces.Fink et al. [29] revealed that skin texture has a considerable influence on the attractiveness evaluation of 3D faces by constructing a skin map with various skin colors that is fitted to a 3D face template to normalize face shape.Jang et al. [30] estimated facial attractiveness through manually detected landmarks on a 3D face, and calculated the height of the nose, the horizontal and vertical curvatures of the forehead, the curvature of the cheek, and the chin volume.They showed that more attractive faces have features such as more protruding noses with greater nasolabial angles and greater vertical curvature of the forehead.They also described that more attractive faces are highly correlated with the beauty scores measured by the neoclassical canon and symmetry.Since 3D faces have volume and surface curvature, unlike 2D faces, there is a need to modify steps such as facial landmark extraction and distance calculation to measure the beauty of 3D faces.Vezzetti et al. [31] introduced an automatic landmark extraction method on a 3D face.Marcolin et al. [32] also devised facial geometrical descriptors representing symmetry features for the attractiveness analysis of 3D faces.Liao et al. [33] manually selected landmarks for the golden ratio, neoclassical canon, and symmetry on a 3D face, and these distances and ratio values were revised by considering the surface characteristics of a 3D face.Table 1 summarizes the related studies.In this paper, we assume that facial attractiveness is related to harmonious facial proportions.To prove this assumption, our method is based on the ratio features that reflect several important facial proportions.Moreover, a facial beauty score can be calculated automatically by learning of the relation of ratio features and human ratings.

Proposed Framework
A flowchart of the proposed framework is illustrated in Figure 1.The framework consists training and testing procedures.In training, four facial ratio feature sets (RFSs) were extracted from each training image, along with the average attractiveness score from human raters.The optimal parameter sets of each predictor (based on regression) were obtained for each feature set.In testing, given an arbitrary face image, the trained predictor for each feature set was applied to corresponding feature sets to obtain scores for each.Then, the final score was fused at the score level.Due to the modular structure of the framework, any features and predictors can be applied.In the remainder of this section, we explain the proposed method in detail.

Facial Landmark Localization
To detect facial landmarks automatically, the active shape model (ASM)-based method [34] is applied to each face.The open source software STASM [35] was utilized as an initial detector for 81 predefined facial landmarks.Then, only 31 landmarks (covering the eyebrows, eyes, noses, mouth, forehead, and head contours), sufficient for calculating ratio feature values, were selected (see Figure 2).

Ratio Feature Set (RFS) Extraction
After acquiring facial landmarks, ratio feature components which have been reported as an objective elements for facial attractiveness are extracted.Neoclassical canon refers to the ratio used by Renaissance artists to draw a beautiful face.Farkas et al. [13] summarized these rules into nine neoclassical features.Neoclassical canonical features were proposed long ago and, despite the changes in beauty standards over time, are still valid measures in the present age in anatomy, art, and medicine studies [36,37].Although there is controversy over the correlation between the golden ratio and facial beauty [38][39][40][41], we tested it as one of our ratio feature sets, because it accounts for the ratio between elements of the eyes, nose, and mouth, as well as the aspect ratio of the width and height of the face.Symmetry is known to be important in evaluating facial attractiveness [12,18], and has been defined and studied in various ways.In many previous studies, symmetry was measured along the vertical axis of the face.In this study, we measured symmetry based on the center point of the face, thus considering both the left and right sides of the face [12,26].In addition, eight ratio values have been proposed by artists and scientists [23,[42][43][44].This measure assumes that the facial contour is an ellipse and detects the exact position of the eyes and mouth to calculate eight proportional features, enabling simple measurements with several facial landmarks.Schmid et al. [26] analyzed the role of symmetry, neoclassical canons, and golden ratio for the determination of facial attractiveness.The basic premise is that portions of an attractive face should follow certain defined ratios.They summarized these principles in nine neoclassical canons (RFS-1), 14 golden ratios (RFS-2) and nine symmetry values (RFS-3).Moreover, according to [23], they applied eight ratio values (RFS-4) to determine the facial attractiveness.We call these four types of feature vectors as ratio feature sets (RFSs) hereafter.To assess facial attractiveness based on facial proportion features, we applied nine neoclassical canon (RFS-1), 14 golden ratio (RFS-2), nine symmetry (RFS-3), and eight ratio (RFS-4) values.A detailed description and used landmarks for calculating the feature sets are shown in Table 2 and Figure 3.In Table 2, dist(a,b) means Euclidean distance of the landmark a and b and mid(a,b) calculates the average point of the landmark a and b.These RFSs were incorporated as input vectors in our framework.Due to the different value ranges of each feature set, these RFSs were normalized.In the golden ratio set (RFS-2), attractive faces should have ratios approaching a value of 1.618.In the neoclassical canon (RFS-1) and symmetry (RFS-3) sets, the ratios should approach 1.Therefore, to normalize features into a unit interval [0, 1], we applied the exponential function to each ratio value as follows: where x and x n denote the original and normalized ratio values, respectively, and m is 1.618 for RFS-2 and 1 for RFS-1 and RFS-3.Meanwhile, RFS-4 does not follow any predefined ratio, so min-max normalization was performed.

Dataset and Facial Attractiveness Rating by Humans
We used a collection of 80 frontal face image of Asian women with neutral expression.Half were randomly selected for training, and the remainder were used for testing.Thirteen human assessors (five men and eight women) were asked to score the attractiveness of each face on a seven-point scale (one being the least attractive and seven being the most attractive).There is subjectivity in each rater's scoring [24].For example, a score of four could be considered a high score for one rater, and an average score for another rater, we applied both z-score normalization and linear scaling transformation to keep the score values of each human rater within the same range.First, given a score set S of a human rater, the following z-score normalization was applied.
where s i and z i denote the ith original and normalized score values, respectively, and mean(•) and std(•) denote the mean and standard deviation of a score set S. Then, to scale a score range from one (lower bound) to seven (upper bound), the z-scores are transformed according to: where l b and u b denote the lower bound and upper bound of a target score range, and min(•) and max(•) denote the minimum and maximum values of a given score set, respectively.Then, the final attractiveness score of each image was calculated by averaging each score from all human referees.Figure 4 shows the distribution of the average scores of each face in the training set (40 images).

Predictor Construction
The predictors used in our framework were trained based on the feature sets (Section 3.1.2)and attractiveness ratings from humans (Section 3.1.3).In this study, we adopted three well-known regression-based predictors, support vector regression (SVR), K-nearest neighbor (K-NN) and artificial neural network (ANN).SVR transforms and analyzes data in a high-dimensional feature space mapped by a kernel function; accordingly, it is possible to arrive at an optimal linear function from a small amount of data.Here, we applied a Gaussian radial basis function (RBF) for kernel selection; parameters for training the RBF-SVR can be optimally selected during training and validation.K-NN regression is used to estimating facial attractiveness scores by analyzing a certain number K of the nearest feature vectors to the training samples.We used a weighted average of the K nearest neighbors, the weight of which was decided by the Euclidean distance of the K closest training samples.For ANN regression, we applied a multi-layer perceptron (MLP) composed of an input layer, hidden layers, and an output layer.Each layer has one or more neurons directionally linked with the neurons from the previous and next layers.A sigmoid function was applied as the activation function to compute the output of the hidden layer in each neuron.Due to the paucity of large training data, we used leave-one-out cross validation (LOOCV) to estimate the optimal parameter sets for each predictor.LOOCV involves using one training sample as a validation set and the remaining samples as the training set when building a target model.This process was repeated for each of the available training samples.Then, the optimal parameter sets allowing the lowest mean square error (MSE) were applied to build each predictor.The predictors then estimated attractiveness scores in a testing procedure.The parameter ranges considered in each score predictor are shown in Table 3.

Testing Procedure
In testing, given a new image, RFSs were extracted by the same procedure as in the training step.Then, trained score predictors were applied to each RFS.Hence, for each RFS, three intermediate attractiveness scores were obtained.Even the score for each predictor can be utilized to estimate the attractiveness score, we applied a score fusion scheme to reduce the dependence on single score predictors and increase prediction performance.Score level fusion techniques are widely used in pattern recognition applications to enhance accuracy [45].Among various kinds of score fusion schemes (average, product, maximum, minimum rule, etc.), average rule performed best in our case.
Table 4 shows the Pearson correlation value according to different fusion scenarios (different predictor fusion).As shown in Table 4, average score of three predictors has the best performance compared to the performance of the single or average score of only two predictors.Accordingly, for each RFS, the average score (S RFS = avg(S SVR , S KNN , S ANN )) of the three predictors was calculated.Then the final attractiveness score (S F ) was calculated as follows:

Comparison with Human-Level Performance
The aim of this evaluation was to analyze the performance of each RFS with that of human assessors and to validate the performance improvement by our framework.As described in Section 3.1.3,40 female faces were evaluated on a seven-point scale by 13 human assessors; additionally, SVR, KNN and ANN predictors were applied to each of the four RFSs and their average was the final attractiveness score.We therefore compared how human scoring correlates with machine scoring of attractiveness measures.Comparative performance was evaluated using Pearson correlation and MSE.As shown in Figure 5 and Table 5, each feature set has relatively high correlation with human raters in the test set.
In Figure 5, the x-axis represents scores from human raters on a seven-point scale, and the y-axis denotes corresponding prediction values from our framework.Neoclassical canons (RFS-1) and golden ratio values (RFS-2) have similar high correlation values.However, in the case of symmetry (RFS-3), the distribution of predicted values is more scattered, which corroborates studies that have shown that symmetry has less influence on facial attractiveness than other measures [12,20,21].Score fusion for each RFS shows the highest correlation with human judgement.Since the factors that determine facial attractiveness are diverse and complicated, a single feature does not entirely estimate the attractiveness of a face, verifying that fusing complementary feature sets can improve attractiveness evaluation performance.
Also, we performed multiple regression analysis based on second order polynomial model to find which measure is the most predictive of human scores.As shown in Figure 5f, the regression curve for score fusion (blue curve) is the most similar with the ideal case (black dotted line).The second similar measure is RFS-2, and the less predictive ones are RFS-1 and RFS-3.As a result, fusion of four feature sets was found to be the most predictive of the human scores.After fitting data with our model, we have to evaluate the goodness of fit.To calculate the goodness of fit, we applied the well-known R-Square (R 2 ) statistic.R 2 measures how successful the fit is in explaining the variation of the data so it represents the reliability of the predicted regression function.It is defined as the ratio of the sum of squares of the regression and the total sum of squares.It can take on any value between 0 and 1, with a value closer to 1 indicating that a greater proportion of variance is accounted for by the model.As shown in Table 5, the R 2 value of score fusion is the highest compared to other measure (four RFSs).Consequently, it was confirmed that score fusion is the most predictive of the human score among the attractiveness scores of each RFS.

Applications
Our framework can be applied to many areas, such as beauty ranking and plastic surgery.For example, Figure 6 demonstrates the predicted attractiveness scores of two female Korean celebrities.The two celebrities have relatively higher attractiveness scores than the average in our test set (3.69).Figure 7 shows the validation of a facial beauty enhancement method [27] using the proposed framework.An increase in beauty score is observed from the original to the beauty-enhanced face image.Hence, it is possible to evaluate surgery results, or to recommend methods of attractiveness enhancement based on our framework.

Conclusions
In this study, we have proposed a novel framework for assessing facial attractiveness.The proposed framework utilizes four types of ratio feature sets derived from universal standards of facial beauty.To enhance the system's performance to be comparable to human rating, three types of regression-based predictors were incorporated to estimate an attractiveness score, and score level fusion was performed.Experimental results showed that the attractiveness score obtained by score fusion better correlates with those of human assessors than scores from other predictors, indicating that a fusion of multiple facial proportion features, rather than a single feature, can provide better performance for automatic facial beauty evaluation and enhancement.In addition, our result showed that symmetry performed more poorly than other proportion-based features, supporting the conclusion of other studies.
Our proposed method has shown that simple proportion-based facial attractiveness measurements, such as symmetry, the golden ratio, and neoclassical canon, are associated with human judgement.Combining complementary proportion features can improve prediction performance and it is the most predictive of the human scores.Moreover, the modularity of the proposed framework enables any features or predictors to be integrated into automatic facial attractiveness measurements.Therefore, the framework can be utilized in various attractiveness evaluation and enhancement applications.Our future work consists of consolidating the framework by increasing the face dataset with diverse human referees; more facial shape features, such as averageness and femininity; considering facial appearance features such as colors, tones, and texture; finding the most predictive features and modeling method compatible with human scoring; analyzing face attractiveness in various races and genders; and extending the framework to measure the facial impressions (trustworthiness, dominance, etc.) in social dimensions.

Figure 1 .
Figure 1.A proposed framework for assessing facial attractiveness.

Figure 4 .
Figure 4. Histogram of average scores by human raters in the training set.The mean and standard deviation value of the scores are 3.42 and 1.12, respectively.

Figure 5 .
Figure 5. Correlation between predicted scores and average human scores in the test set (a-e) and regression curves of predictive scores and average human scores in the test set (f): (a) RFS-1; (b) RFS-2; (c) RFS-3; (d) RFS-4; (e) score level fusion; (f) regression curves.

Figure 6 .Figure 7 .
Figure 6. of predicted attractiveness score.Two Korean female celebrities, (a,b), obtain scores of 5.1 and 5.34, respectively.Note that the average and standard deviation values of our test set are 3.69 and 0.69, respectively.

Table 1 .
Related work for assessing facial attractiveness.

Table 2 .
Descriptions of the four feature sets (RFSs).

Table 3 .
Parameter ranges of each score predictor.

Table 4 .
The performance (Pearson correlation value) comparison of different fusion scenarios.

Table 5 .
Correlation, MSE and R-squared between the predicted and human scores.