Integrated Multi-Model Face Shape and Eye Attributes Identiﬁcation for Hair Style and Eyelashes Recommendation

: Identifying human face shape and eye attributes is the ﬁrst and most vital process before applying for the right hairstyle and eyelashes extension. The aim of this research work includes the development of a decision support program to constitute an aid system that analyses eye and face features automatically based on the image taken from a user. The system suggests a suitable recommendation of eyelashes type and hairstyle based on the automatic reported users’ eye and face features. To achieve the aim, we develop a multi-model system comprising three separate models; each model targeted a different task, including; face shape classiﬁcation, eye attribute identiﬁcation and gender detection model. Face shape classiﬁcation system has been designed based on the development of a hybrid framework of handcrafting and learned feature. Eye attributes have been identiﬁed by exploiting the geometrical eye measurements using the detected eye landmarks. Gender identiﬁcation system has been realised and designed by implementing a deep learning-based approach. The outputs of three developed models are merged to design a decision support system for haircut and eyelash extension recommendation. The obtained detection results demonstrate that the proposed method effectively identiﬁes the face shape and eye attributes. Developing such computer-aided systems is suitable and beneﬁcial for the user and would be beneﬁcial to the beauty industrial.


Introduction
The human face plays an important role in daily life. Pursuing beauty, especially facial beauty, is the nature of human beings. As the demand for aesthetic surgery has increased widely over the past few years, an understanding of beauty is becoming increasingly important for medical settings. The physical beauty of faces affects many social outcomes, such as mate choices and hiring decisions [1,2]. Thus, the cosmetic industry has produced various products that target to enhance different parts of the human body, including hair, skin, eye, eyebrow, and lips. Not surprisingly, research topics based on the face features have a long track record in psychology, and many other scientific fields [3]. During recent decades, computer vision systems have played a major role in obtaining an image from a camera to process and analyse it in a manner similar to natural human vision system [4]. Computer vision algorithms have recently attracted increasing attention and been considered one of hottest topics due to its significant role in healthcare, industrial and commercial applications [4][5][6][7][8][9]. Facial image processing and analysis are essential techniques that help extract information from images of human faces. The extracted information such as locations of facial features such as eyes, nose, eyebrows, mouth, and lips, can play a major role in several fields, such as medical purposes, security purposes, cosmetic industry, social media applications, and recognition [5]. Several techniques have been developed to localise these parts and extract them for analysis [10]. The landmarks used in computational face analysis often resemble the anatomical soft tissue landmarks that are used by physicians [11].
Recently, advanced technologies such as artificial intelligence and machine/deep learning algorithms have helped the beauty industry in several ways, from providing statistical bases for attractiveness and helping people alert their looks to developing products which would tackle specific needs of customers [12,13]. Furthermore, cloud computing facilities and data centre services have gained a lot of attention due to their significant role for customers' access to such products by building web-based and mobile applications [14].
In the literature, there have been many facial attribute analysis methods presented to recognise whether a specific facial attribute is present in a given image. The main aim of developing facial attribute analysis methods was to build a bridge between feature representations required by real-world computer vision tasks and human-understandable visual descriptions [15,16]. Deep learning-based facial attribute analysis methods can generally be grouped into two categories: holistic methods which exploit the relationships among attributes to get more discriminative cues [17][18][19] and part-based methods that emphasise facial details for detecting localisation features [20,21]. Unlike the existing facial attribute analysis methods, which focus on recognising whether a specific facial attribute is present in a given face image or not, our proposed method suggests that all concerned attributes are present but in more than one label.
Furthermore, many automated face shape classification systems were presented in the literature. Many of these published face classification methods consider extracting the face features manually then passing them to three classifiers for classification, including linear discriminant analysis (LDA), artificial neural networks (ANN), and support vector machines (SVM) [22,23], k-nearest neighbours [24], and probabilistic neural networks [25]. Furthermore, Bansode et al. [26] proposed a face shape identification method based on three criteria which are region similarity, correlation coefficient and fractal dimensions. Recently, Pasupa et al. [27] presented a hybrid approach combining VGG convolutional neural network (CNN) with SVM for face shape classification. Moreover, Emmanuel [28] adopted pretrained Inception CNN for classifying face shape using features extracted automatically by CNN. The work presented by researchers showed progress in face shape interpretation; however, existing face classification systems require more effort to achieve better performance. The aforementioned methods perform well only on images taken from subjects looking straight towards the camera and their body in a controlled position and acquired under a clear light setting.
Recently, many approaches were developed for building fashion recommender systems [29,30]. Conducting a deep search in the literature seeking the existing recommendation systems leads to find out two hairstyle recommendation systems [27,31]. The system developed by Liang and Wang [31] considers many attributes such as age, gender, skin colour, occupation and customer rating for recommending hairstyle. However, the recommended haircut style might not fit the beauty experts' recommendations based on face shape attributes. Furthermore, the hairstyle recommender system presented by [27] is not general and focuses only on women. To the best of our knowledge, our proposed eyelashes and hairstyle recommendation system is the first study conducted to automatically make a recommendation of a suitable eyelash type based on computer vision techniques.
Facial attributes such as eye features, including shape, size, position, setting, face shape, and contour determine what makeup technique, eyelashes extension and haircut styles should be applied. Therefore, it is important for beautician to know clients' face and eye features before stepping into the makeup-wearing, haircut style and eyelashes extension applying. The face and eye features are essential for beauty expert because different types of face shape and eye features are critical information to decide what kind of eye shadows, eyeliners, eyelashes extension, haircut style and colour of cosmetics are best suited to a particular individual. Thus, automation of facial attribute analysis tasks based on developing computer-based models for cosmetic purposes would help to easing people's life and reducing time and effort spent by beauty experts. Furthermore, virtual consultation and recommendation systems based on facial and eye attributes [27,[31][32][33][34][35][36] have secured a foothold in the market and individuals are opening up to the likelihood of substituting a visit to a physical facility with an online option or use a specific-task software. Use of the innovation of virtual recommendation systems has numerous advantages; including accessibility, enhanced privacy and communication, cost saving, and comfort. Hence, our work's motivation is to automate the identification of eye attributes and face shape and subsequently produce a recommendation to the user for the appropriate eyelashes and suitable hairstyle.
This research work includes the development of system with friendly graphical user interface that can analyse eye and face attributes from an input image. Based on the detected eye and face features, the system suggests a suitable recommendation of eyelashes and hairstyles for both men and women. The proposed framework integrates three main models: face shape classification model, gender prediction model for predicting the gender of user, and eye attribute identification model to make a decision on the input image. Machine and deep learning approaches with various facial image analysis strategies including facial landmark detection, facial alignment, and geometry measurements schemes are designed to establish and realise the developed system. The main contributions and advantages of this work are summarised as follows:

•
We introduce a new framework merging two types of virtual recommendation, including hairstyle and eyelashes. The overall framework is novel and promising. • The proposed method is able to extract many complex facial features, including eye attributes and face shape, accomplishing the extraction of features and detection simultaneously. • The developed system could help the worker in beauty centres and reduce their workload by automating their manual work and can produce subjective results, particularly with a large dataset. • Our user-friendly interface system has been evaluated on a dataset provided with a diversity of lighting, age, and ethnicity. • The labelling of a publicly available dataset with three eye attributes, five face shape classes, and gender has been conducted by a beauty expert to run our experiments. • In the face shape classification method, we developed a method based on merging the handcrafted features with deep learning features. • We presented a new geometrical measurement method to determine the eye features, including eye shape, position, and setting, based on the coordinates of twelve detected eye landmarks.
The remainder of this paper is presented as follows: in Section 2, the materials and proposed framework are described and explained; results of the proposed system are revealed and discussed in Section 3, and finally, the work has been concluded in Section 4.

Data Preparation: Facial and Eye Attributes
A publicly available dataset, MUCT [37], is used to conduct our experiments and carry out the evaluation on the developed face shape and eye attribute identification system. The MUCT database consists of 3755 faces captured from 276 subjects with 76 manual landmarks. The database was created to provide more diversity of lighting, age and gender with an obstacle (some subjects wear glasses) than other available landmarked 2D face databases. The resolution of images is 480 × 640. Five webcams were used to photograph each subject. The five webcams are located in five different positions but not on the left of the subject. Each subject was photographed with two or three lighting setting producing ten different lighting setups in all five webcams. To achieve diversity without too many images, not every subject was photographed with every lighting setup.
The MUCT dataset does not provide face shape and eye attribute labels. Therefore, we recruited a beauty expert to provide the ground truth by labelling the images in the dataset. To formulate our research problem, we initially set up the attributes required to be identified. Face shape and eye specifications/attributes required to be detected in this research work are defined as follow:
Identifying these attributes would help automatically report eyelashes and haircut style recommendations that meet the eye and face attributes. Illustrations of these attributes are shown in Figures 1 and 2. The process of determining the facial shape and eye attributes manually by beauty experts can be carried out in several stages, such as outlining the face, measuring the width and length of the face, jaw, forehead, cheekbones, the gap between eyes, and outer and inner corner of the eye. According to the beauty experts, the face shape can be recognised as five categories, namely: oval, square, round, oblong, and heart shapes [41]. The guidelines provided by beauty experts Derrick and Brooke [41,42] have been followed to label the face shape of 276 subjects into five classes. Furthermore, to provide the ground truth of eye attributes, the detailed guideline and described criteria presented in [43] has been followed by a beauty expert. The distribution of data labelled manually is depicted by exploratory data analysis in Figures 3 and 4 describing the distribution of attributes over 276 subjects and per one camera, respectively.

Developed Multi-Model System
To develop a personalised and automatic approach to finding the appropriate haircut and/or eyelash style for men and women, we develop a multi-model system comprising three separate models, as shown in Figure 5. Each individual model is dedicated to achieving a specific task, including face shape identification, eye attribute identification, and gender identification. The diagram shows that the eye features extracted from an input image by the developed eye feature identification model are merged with identified face shape class. The gender of the individual in the input image is detected by the gender identification model. Thus, the face and eye attributes are combined with identified gender information. If the gender in the input image is a male, then a hairstyle recommendation for men is given; otherwise, both hairstyle and eyelashes recommendations are reported. The description of the three models are explained as follows: Figure 5. Block diagram of proposed method including three models: face shape identification model, eye feature identification model, gender identification model.

Face Shape Identification Model
To identify the face shape, our developed model [44] that was designed by merging hand-crafted features with automatically learned features was trained and tested on data from MUCT. MUCT data has been randomly split into 3255 images for training, and 500 images was retained for testing. The developed model (shown in Figure 6) achieves the classification as follows: (1) detect the face region and crop it using a model trained on the histogram of oriented gradients (HOG) features with Support Vector Machine (SVM) model as classifier [45], (2) the detected face is aligned using the detected face landmarks (68 landmarks) by the ensemble of regression tree method (ERT) [46], and (3) Finally, the aligned images are used for training and evaluating Inception V3 convolutional neural network [47] along with hand-engineered HOG features and landmarks to classify the face into one of five classes. For further details about the face shape identification model, please refer to our developed system in [44].

Eye Attributes Identification
The geometry of face/eye measurements based on the coordinates of detected eye landmarks are determined to develop this model. Thus, the eye specifications (shape, setting, position) are obtained. First, the face region is detected, and the landmarks of the eye region are localised using the methods presented in our face shape classification system [44][45][46]. We further customised the landmark predictor to return just the locations of the eyes using dlib library. Only six landmarks per eye-12 in two eyes-are taken into consideration for predicting eye attributes, as shown in Figure 7. Given 12 eye landmark points P n = P 0 , P 1 , P 2 , ..., P n where n = 0, 1, 2, ..., 11, the eye specifications can be calculated using geometrical eye measurements as follows: Figure 7. The six landmarks taken into consideration to predict the eye attributes using geometrical measurements.
• Eye Setting: The setting of eye Se can be determined by calculating the ratio of the distance between two eyes L 1 to the eye width W eye . The width of the eye W eye can be found by measuring the distance from the start and the end of an individual eye. The distance between two facial landmark points d can be obtained using the Euclidean distance.
The threshold values for category classification are empirically chosen and estimated as follow: From Equation ( 6), the threshold value used to determine (Close) label is calculated by finding the median of subjects' eye setting measurements. Thus, the eye setting value of a subject that is less than the median is labelled as (Close). Likewise, the middle number between the median and the highest number (the maximum) is harnessed to differentiate between (Proportional) and (Wide) labels. For example, the eye setting value of a subject that is greater than the median and less than the middle number is labelled as (Proportional); otherwise, it is labelled as (Wide). • Eye Shape: The criterion for discriminating round eye from almond eyes is that, for round eyes, the white sclera area below the iris is visible [43] as illustrated in Figure 8. Our proposed criteria exploits eye-aspect-ratio rule to determine the eye shape as follows: Figure 8. Round vs. almond shape.
The threshold values for category classification are empirically chosen and estimated as follow: From Equation (8), the threshold value used to differentiate between (Almond) and (Round) labels is calculated by finding the median of subjects' eye shape measurements. Thus, the eye shape value that is less than the median is labelled as (Almond); otherwise, it is labelled as (Round). • Eye Position: Eye position categorised into up-turned, down-turned and straight can be determined using geometrical calculations by measuring the slope angle (degree).
The slope angle is defined as the angle measured between a horizontal line and a slanted line (the line drawn between landmark points (P 3 , P 6 ) and (P 0 , P 3 ) for eye 1 , (P 6 , P 9 ) for eye 2 ). The equation of eye position identification can be defined as follows: where θ represents the angle between the line drawn between landmark points (P 3 , P 6 ) and (P 0 , P 3 ) for eye 1 , (P 6 , P 9 ) for eye 2 and each point is represented as (x, y).
where m 1 and m 2 are the slope of each line formulated to calculate the angle θ which is defined as follows: The threshold values for category classification are empirically chosen and estimated as follow: From Equation (12), the threshold value used to determine (Up-turned) label is calculated by finding the median of subjects' eye position measurements. Thus, the eye position value that is greater than the median is labelled as (Up-turned). Likewise, the value between zero and the median is labelled as (Straight). Any value of less than zero (negative) is identified as (Down-turned).

Gender Identification
Gender identification is a pre-requisite step in the our developed framework. If the gender in the input image is male then the recommendation engine reports only the suitable hairstyle for men, otherwise; the system recommends for hairstyle and eyelash extension for women. The deep learning approach developed in [48] has been adopted to achieve this task. Convolutional neural network composed of three convolutional layers, each followed by a rectified linear operation and pooling layer is implemented. The first two layers also follow the normalisation using local response normalisation. The first convolutional layer contains 96 filters of 7 × 7 pixels, the second convolutional layer contains 256 filters of 5 × 5 pixels, the third and final convolutional layer contains 384 filters of 3 × 3 pixels. Finally, two fully connected layers are added, each containing 512 neurons.
The dataset of the Adience Benchmark [48] is used to train the gender identification model. The Adience Benchmark is a collection of unfiltered face images collected from Flickr. It contains 26,580 images of 2284 unique subjects that are unconstrained, extreme blur (low-resolution), with occlusions, out-of-plane pose variations, and have different expressions. Images are first rescaled to 256 × 256, and a crop of 227 × 227 is fed to the network. All MUCT images composed of 52.5% female were used for testing.

Rule-Based Hairstyle and Eyelash Recommendation System
The right shape of artificial eyelash extensions can be used as a way to correct and balance facial symmetry and open the eyes wider. Eyelash Extensions can produce an illusion of various eye effects giving an appearance of makeup on the eye without applying it. It also can create an appearance of a lifted eye, or even a more feline look being produced [49]. For the best hairstyle, instead of choosing the latest fashion, it should be attempted instead to pick a style that fits face shape. The correct haircut that fits the face shape will frame and balance it expertly while showing the best features for a flattering and complementary appearance [50,51].
The guidelines from specialised websites [49][50][51] are followed to implement the rules of the recommendation engine for hairstyle and eyelashes in our proposed system. The rulebased inference engine system focuses on knowledge representation and applies rules to obtain new information by existing knowledge. When the data (eye attributes, face shape, gender) matches rule conditions, the inference engine can interpret the facts in the knowledge base by applying logical rules and deduce the suitable recommendation using if-then operations.

Results and Discussions
A new approach based on merging three models in one framework has been proposed for the simultaneous detection of the face shape, eye attributes, and gender from webcam images. The designed framework achieves the detection by extracting hand-crafted features, learning features automatically, and exploiting the face geometry measurements. To evaluate the performance of the proposed system, different measurements, including accuracy (Acc.), sensitivity (Sn.), specificity (Sp.), Matthews Correlation Coefficient (MCC), Precision (Pr), and F 1 score, have been calculated. These metrics are defined as follows: Speci f icity (Sp) = tn tn + f p × 100 (14) where Precision = tp tp+ f p , Recall = Sensitivity, tp: True Positive, tn: True Negative, f p: False Positive, f n: False Negative Table 1 depicts the confusion matrix of face shape classes, reporting an identification accuracy of 85.6% on 500 test images (100 image per each class). To compare our face shape classification model with the existing face shape classification methods, Table 2 presents a comparison in terms of classification accuracy. It can be noticed that our developed face shape classification system [44] outperforms the other methods in the literature.

Method Accuracy
Inception V3 [28] 84.4% Region Similarity, Correlation and Fractal Dimensions [26] 80% Active Appearance Model (AAM), segmentation, and SVM [22] 72% Hybrid approach VGG and SVM [27] 70.3% 3D face data and SVM [23] 73.68% Geometric features [24] 80% Probability Neural Network and Invariant Moments [25] 80% Our model 85.6% Initially, images belong to one subject were excluded from data because of failure detecting the eye landmarks of this subject in the eye attributes identification system due to extreme eye closing and obstacle. Thus, the number of subjects that will be considered for evaluation is 275. For the gender identification, the gender detection model has achieved prediction performance of 0.9018, 0.8923, 0.9103, 0.8992, and 0.8958 in terms of accuracy, sensitivity, specificity, precision, and f 1 score measures; respectively, on 275 subjects. Likewise, the gender detection model achieved 0.9064, 0.8989, 0.9136, 0.9088, and 0.9038; respectively, using the same metrics in terms of 748 image per camera. Table 3 presents the confusion matrix obtained from gender identification system. The first part of Table 3 shows the gender prediction in terms of the number of subjects, whereas the second part explores the prediction in terms of images per camera. In terms of comparison with the existing methods, authors in [48] achieved an accuracy of 0.868 on the Adience dataset [48]. On the same dataset, Duan et al. [52] reported accuracy of 0.778 using convolutional neural network (CNN) and extreme learning machine (ELM) method. On Pascal VOC [53] dataset, De et al. [54] obtained accuracy of 0.8739 based on a deep learning approach. There are five webcams used to photograph each subject. In the following experiments, we name the webcams with five letters (A-E) including, Cam A, Cam B, Cam C, Cam D, and Cam E. Symbols X and Y (used in Cam X vs. Cam Y) refer to any individual webcam from the five cameras. Our developed system has provided flavour results on eye attributes identification. Tables 4, A1 and A2 in Appendix A report eye setting attributes detection performance using various evaluation metrics and state the obtained results from the proposed system by comparing the eye setting attribute prediction performance in one camera with ground truth using confusion matrices, and compare the prediction performance in each camera with another camera using confusion matrices, respectively. Likewise, Tables 5, A3 and A4 report eye position classes detection performance using various evaluation metrics and state the obtained results from the proposed system by comparing the eye position classes prediction performance in one camera with ground truth using confusion matrices, and comparing the prediction performance in each camera with another camera using confusion matrices, respectively. In the same way, Tables 6, A5 and A6 present eye shape labels prediction performance.  From the reported results of eye attributes, it can be noticed that the model achieves the highest performance on eye shape detection, reporting accuracy of 0.7535, sensitivity of 0.7168, specificity of 0.7861, precision of 0.7464, F1 score of 0.7311 and MCC of 0.5043 while the lowest results were obtained from eye position class identification giving 0.6869, 0.7075, 0.8378, 0.6917, 0.6958, and 0.5370 using the same evaluation metrics, respectively. Likewise, the model reveals the results of 0.7021, 0.6845, 0.8479, 0.7207, 0.6926, and 0.5501 on eye setting detection in terms of the same aforementioned metrics. Although the system shows many misclassified prediction of eye attributes, it could be considered to be tolerated prediction cases (skew is small). By "tolerated prediction", we mean that most misclassified cases are resulted from predicting the class into average labels, which is clear in all the reported confusion matrices. For example, in Table A3, the major predictions of images in Cam E were misclassified as straight rather than upturned label giving: downturned label: 149, straight: 64, upturned: 0. This proves that the predicted measurements of attributes are closer to the ground truth value. This also can be clearly noticed in the reported confusion matrix of the eye setting shown in Table A1.
To study the correlation of prediction among different webcams, all the evaluation metrics mentioned earlier, as well as confusion matrices, have been utilised to depict the variance and closeness of prediction. For instance, Table 4 presents the correlation between cameras (such as Cam A vs. Cam B and so on) in terms of evaluation metrics. Whilst, Table A2 reports the confusion matrices explaining the correlation of prediction among different webcams, which shows good prediction agreement among images captured from different cameras' positions. Matthews Correlation Coefficient (MCC) is a correlation measurement that returns a value between +1 and −1, where −1 value refers to a total disagreement between two assessments and +1 represents the perfect detection (agreement). The range of obtained MCC value (between two webcams) was (0.3250-0.9145), which indicates of strong correlation of prediction among various webcam positions.
In comparison with the existing methods considering eye attributes detection, Borza et al. [55] and Aldibaja [56] presented eye shape estimation method. Yet, they have not reported the accuracy of the shape detection and instead reported eye region segmentation performance. To the best of our knowledge, our proposed method for eye attributes detection seeking the suitable eyelashes recommendation is the first work carried out automatically based on computer vision approaches. Thus, we could not find more related works in the literature to conduct further comparison with the results obtained from our framework. To further investigate our model's performance, we validated the developed framework on external data represented by sample images of celebrities, as shown in Table 7. Thus, we have demonstrated that the system generalises well on another data and performs well under variant conditions such as camera position, obstacle (wearing glasses), light condition, gender and age. Yet, on the other hand, it should be considered that the proposed framework brings up some limitations, including: (1) the developed system has not been evaluated on diverse ethnic and racial groups, (2) the presented research work has not considered eye situation characteristics analysis such as mono-lid, hooded, crease, deep-set, and prominent. However, these limitations are due to a lack of labelled data captured from variant races and ethnicities. These challenges can be overcome when larger annotated data becomes available. Moreover, the threshold parameter values used for converting scalar value into a nominal label of eye attribute are selected empirically. For future studies, studying the possibility of using automatic parameter tuning strategies (heuristic mechanisms) for determining and setting optimum threshold values could be investigated.

Conclusions
In this paper, a decision support system for appropriate eyelashes and hairstyle recommendation was developed. The developed framework was evaluated on dataset provided with diversity of lighting, age, and ethnicity. The developed system, based on integrating three models, has proven efficient performance in the identification of face shape and eye attributes. Face and eye attributes measurement tasks are usually carried out by an expert person manually before applying the treatment of individual eyelashes extension and hairstyle. Measuring face and eye characteristics costs time and effort. The proposed system could alleviate the need for additional time and effort made by expert before each treatment. Furthermore, face shape identification automatically is a challenging duty due to the complexity of face and the possible variations in rotation, size, illumination, hairstyle, age, and expressions. Moreover, the existence of face occlusion from hats and glasses also adds difficulties to the classification process. Our model trained on diverse images for automatic face shape classification would be viable in many other decision support systems such as eyeglasses style recommendation systems. In future work, we aim to extend our developed GUI desktop software and build a cloud-based decision support system using a mobile application. The extended system targets users' facial and eye analysis via cloud server programs based on the images captured and uploaded from mobile phones to the server. Furthermore, the detected facial and eye attributes could be harnessed to develop a makeup recommendation system. Another important aspect that could be considered to be potential future work is studying the algorithm's computational complexity to achieve the best and fastest convergence in the developed framework. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study through a data collection process conducted by the authors who created this data and made it publicly available for scientific research purposes.

Data Availability Statement:
The data presented in this study are openly available in http://www. milbo.org/muct, accessed on 26 April 2021.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: