Down Syndrome Face Recognition: A Review

: One of the most pertinent applications of image analysis is face recognition and one of the most common genetic disorders is Down syndrome (DS), which is caused by chromosome abnormalities in humans. It is currently a challenge in computer vision in the domain of DS face recognition to build an automated system that equals the human ability to recognize face as one of the symmetrical structures in the body. Consequently, the use of machine learning methods has facilitated the recognition of facial dysmorphic features associated with DS. This paper aims to present a concise review of DS face recognition using the currently published literature by following the generic face recognition pipeline (face detection, feature extraction, and classiﬁcation) and to identify critical knowledge gaps and directions for future research. The technologies underlying facial analysis presented in recent studies have helped expert clinicians in general genetic disorders and DS prediction. using ICA in [35] for extraction, a


Introduction
One of the most frequent congenital birth defects is Down syndrome (DS). It is also one of the common genetic causes of mental retardation with clinical variable features affecting various human organs [1], which occurs in approximately one per 1000 babies born each year [2]. It was described in detail in [3] by John Langdon Down in 1866 and identified by French researchers in 1959 [4]. Certain clinical and phenotypic features are shared among affecting individuals such as congenital heart disease, cognitive impairment, characteristic physical and facial appearance as a result of an extra copy of chromosome 21(HSA21) [1]. More so, DS patients have been found to be clinically associated with multiple blood cell related phenotypes including decreased lymphocyte counts, increased risk to develop leukemia, incidence of auto-immune disorders, intrinsic defects of the immune system, cancer, pneumonia and other types of respiratory infections as well as vulnerability to recurrent viral infections and bacterial [5][6][7][8][9][10].
According to Farkas et al. [11], the reports on the quality of the relationships between measurements in DS have been based chiefly on the observations of investigators rather than on valid anthropometric norms. By comparing pairs of linear measurements with appropriate anthropometric proportion norms, many studies have determined the quality of facial proportion [12], but have not focused on the variations of proportionate and disproportionate indices. Among other physical characteristics, certain distinctive features in the human face are typically associated in people with DS such as slanting eyes, brush-field spots in the iris, round face, abnormal outer ears, flat nasal bridge, small Symmetry 2020, 12, 1182 2 of 17 chin and flattened nose, which are assessed through facial dysmorphology [13]. Dysmorphology is a branch of medicine in which physicians are involved in the identification of congenital anomalies caused by many syndromes [14]. Early and accurate analysis of a genetic disorder is imperative to the patients and their families, bringing about additional powerful care. In any case, it merits stressing that an exact acknowledgment of dysmorphic features relies on experienced clinicians and wellbeing staff, just as much as intricate research center strategies [13]. The application of computer-aided tools based on machine-learning and morphometric approaches can ease the recognition of facial dysmorphic features associated with genetic syndromes [13]. Through the detection and extraction of relevant facial points and computation of measurements from images, structural malformations may automatically be identified. However, there are still only a few studies available in the literature that have investigated the identification of DS from facial images. The proposed review aimed to identify articles that are only relevant to the domain and only a few studies have been conducted in this domain. Consequently, the study combines both binary (Down syndrome and healthy patients) and multi-class (Down syndrome and other related genetic disorders) face recognition fields of literature. Therefore, this work focused on summarizing some current of the methods in various areas of DS face recognition and identifies which are at the forefront of this challenging and exciting field. Figure 1 shows the schematic representation of the structural order of DS face recognition for this review. Dysmorphology is a branch of medicine in which physicians are involved in the identification of congenital anomalies caused by many syndromes [14]. Early and accurate analysis of a genetic disorder is imperative to the patients and their families, bringing about additional powerful care. In any case, it merits stressing that an exact acknowledgment of dysmorphic features relies on experienced clinicians and wellbeing staff, just as much as intricate research center strategies [13]. The application of computer-aided tools based on machine-learning and morphometric approaches can ease the recognition of facial dysmorphic features associated with genetic syndromes [13]. Through the detection and extraction of relevant facial points and computation of measurements from images, structural malformations may automatically be identified. However, there are still only a few studies available in the literature that have investigated the identification of DS from facial images. The proposed review aimed to identify articles that are only relevant to the domain and only a few studies have been conducted in this domain. Consequently, the study combines both binary (Down syndrome and healthy patients) and multi-class (Down syndrome and other related genetic disorders) face recognition fields of literature. Therefore, this work focused on summarizing some current of the methods in various areas of DS face recognition and identifies which are at the forefront of this challenging and exciting field. Figure 1 shows the schematic representation of the structural order of DS face recognition for this review.

Face Recognition Background
Face recognition is one of the biometric approaches that employ automated methods in recognizing the facial identity of human beings based on physiological characteristics. When it comes to face recognition, human facial traits play an essential role in identification. This is due to the fact that the face hosts the most important sensory organs in humans and acts as the central interface for appearance, communication, expression, and mutual identification. The earliest systematic studies of the later development of face recognition abilities were carried out by Goldstein and Chance [15] on school children of different ages. Children of 5, 8, and 13 years of age were required to perform a recognition memory task by first being shown a set of unfamiliar faces and then, at a later stage, were required to select those previously encountered faces from a larger set of unfamiliar faces. There was steady improvement in the performance of this task across the age group [16]. Nevertheless, the automated face recognition was carried out by Kanade [17], who employed simple image processing methods to extract a vector of 16 facial parameters. These parameters are ratios of distances, areas, and angles. Through the application of Euclidean distance (ED) measurement for matching, he achieved a peak performance of 75% on a database of 20 subjects using two images per person [18]. Since then, face recognition has drastically improved in its approaches, algorithms, and methods. Face recognition techniques are arguably categorized under three approaches:(1) The feature-based

Face Recognition Background
Face recognition is one of the biometric approaches that employ automated methods in recognizing the facial identity of human beings based on physiological characteristics. When it comes to face recognition, human facial traits play an essential role in identification. This is due to the fact that the face hosts the most important sensory organs in humans and acts as the central interface for appearance, communication, expression, and mutual identification. The earliest systematic studies of the later development of face recognition abilities were carried out by Goldstein and Chance [15] on school children of different ages. Children of 5, 8, and 13 years of age were required to perform a recognition memory task by first being shown a set of unfamiliar faces and then, at a later stage, were required to select those previously encountered faces from a larger set of unfamiliar faces. There was steady improvement in the performance of this task across the age group [16]. Nevertheless, the automated face recognition was carried out by Kanade [17], who employed simple image processing methods to extract a vector of 16 facial parameters. These parameters are ratios of distances, areas, and angles. Through the application of Euclidean distance (ED) measurement for matching, he achieved a peak performance of 75% on a database of 20 subjects using two images per person [18]. Since then, face recognition has drastically improved in its approaches, algorithms, and methods. Face recognition techniques Symmetry 2020, 12, 1182 3 of 17 are arguably categorized under three approaches:(1) The feature-based approach, which analyzes local features such as nose, eyes, mouth and their geometric relationships (e.g., Elastic Bunch Graph Matching (EBGM), hidden Markov model (HMM), etc.); (2) The holistic approach is where the entire facial region is taken into account as input data into face detection system (e.g., eigenfaces, principal component analysis (PCA), linear discriminant analysis (LDA), independent component analysis (ICA), etc.); and (3) the hybrid approach, which is the combination of the two approaches above.
A face can either be recognized by identification or verification. Identification is a 1:M that compares a queried face image against all the template images in the database to find an identity of the query; while on the contrary, verification is a 1:1 match that compares a face image against the template image to claim identity [19]. However, Grother [20] experimented with another approach where the test face may not necessarily be in the database, but compares the query face against all faces in the database with score computation for each face, which is numerically ranked so that the highest score comes first, with the possibility of raising an alarm if the similarity score is higher. With the advances in face recognition technology, many recognition algorithms are capable of achieving more than 90% accuracy, but due to the wide range of variations undergone by the face acquisition process, a real-time system is a big challenge.
Though generic face recognition usually follows a pattern of symmetrical structure that implies that the face structure is repeated in a different position or orientation, when the human face is deformed or affected by disease or genetic syndrome, facial asymmetry may evolve. Genetic quality and wellness of health are described by right and left symmetry in the field of biology [21]. Most times, in order to quantify them, morphometric analysis of face shape with the analysis of landmark configurations is employed to identify the patterns of symmetric variation and asymmetry [22].
In the emergence of face recognition, most evident face features were applied. This was a sensible approach in mimicking human face recognition ability by using certain intuitive features (eyes, mouth, and cheeks) [23] and geometric measures (distance points, width-height ratio) [24]. These phenomenon are still relevant today, because discarding part of the face or certain facial features can improve recognition performance [25]. It is therefore imperative to understand which facial features contribute to optimum performance and which ones add noise. However, the emergence of abstract mathematical methods such as eigenfaces [26,27] has introduced another approach in face recognition, gradually leaving behind the anthropometric approach. However, there still some features such as skin-color that are relevant for face detection [28]. It is however, essential to present abstractions and combat the challenges from a pure computational or mathematical view point.

Challenges Facing Face Recognition in Down Syndrome (DS)
Face recognition generally deals with some well-known challenges, which are attributed to factors such as pose variation (due to the movement of subject or camera's angle), occlusion (presence of elements such as glasses, beards, or hats), facial expression (different facial gestures), and imaging condition (ambient conditions or differences in cameras) [29,30]. These challenges, as listed below, are also found in generic face recognition algorithms.
• Illumination: The performance drops when illumination changes as a result of skin reflectance and internal cameral control. The appearance of the face changes drastically when there is variation in illumination. It is to be noted that the difference between images of two subjects under the same illumination condition is lower than the difference between the same subjects taken under varying illumination condition.  Timing: Due to the fact that the face changes over time, delay in time may affect the identification process in a nonlinear way over a long period of time, which has proven difficult to solve. • Aging Variation: Naturally and practically speaking, human beings can identify faces very easily even when aging, but not easily with computer algorithms. Increase in age affects the appearance of the person, which in turn affects the rate of recognition. • Occlusion: The unavailability of the entire input face as a result of glasses, moustache, beard, scarf, etc., could drastically affect the performance of the recognition. • Resolution: The image captures from a surveillance camera generally has a very small face area and low resolution and the acquisition distance at which the image is captured, even with a good camera, is very important to the availability of the information needed for face identification.

Down Syndrome Datasets
Due to the privacy and sensitive nature of Down syndrome subjects, few works have been proposed using a Down syndrome dataset. Some studies have used Down syndrome and healthy controls for binary classification, while others have used multiple genetic disorders that include Down syndrome for multi-class classification.
The study in [31] used 51 DS and 56 non DS faces collected from the Internet. The images were people from different races between age 0 and 50 years that had been resized to 100 by 100 pixels. The authors declared that differences between images made the problem harder and more complex. A total of 48 images were used in [32], consisting of 24 DS and normal cases resized to 256 by 256 pixels with various background, illumination, poses, and expression. The study in [33] used frontal images collected from 13 countries based in Africa, Latin America, and Asia that consisted of 129 DS and 132 healthy ethnically matched controls. A total of 2878 frontal images were used in [34] that consisted of 1363 images of eight known genetic disorders (Apert, Angelman, Cornella de Lange, Fragile X, Treacher-Collins, Williams-Beuren, progeria, and Down syndrome) collected from Internet sources and 1515 healthy controls. They performed manual checks to exclude images where faces are not clearly visible and where expert clinicians could not verify the diagnosis. A total of 130 face images were used in [35] that consisted of 50 DS and 80 healthy controls under variable illumination, expression, and poses. The subjects (86 males and 44 females) were from different races including 20 African Americans, 98 Caucasians, and 12 Asians; age varied from 0 to three years. Thirty subjects (15 DS and 15 non DS) were used in [36] as a custom face database to propose a novel method to distinguish DS.
Six (fetal alcohol syndrome, cerebral palsy, intellectual disabilities, progeria, Autism spectrum disorder, and Down syndrome) different developmental disorders were collected for the study in [37] from various organizations working with special needs in the Uttarakhand region in India between age 0 and 12 years. The subjects contained 1126 images with 537 DS patients. A total of 160 face images of five different dysmorphic syndromes (Hurler, Fragile X, Prader Willi, Wolf Hirschhorn syndromes, and Down syndrome) and healthy images between 1 and 12 years were used in [38]. The images had illumination, resolutions, and pose challenges due to the way by which they were obtained from the Internet; though the dataset contained only 22 DS subjects.
The study in [13] contained a total of 306 subjects where a subset of 153 face images of DS patients from [34] and 153 face images from the Dartmouth Database of Children's Faces between 6 and 16 years of age were used. The subjects were captured under different lighting conditions and seven facial expressions (contempt, happiness, anger, surprise, sadness, fear, disgust, and neutral). A total of 17,000 face images were used in [39], which contained 216 known genetic syndromes including DS based on unconstrained 2D images. The images were from a growing phenotype-genotype database, curated through a community-driven platform, consisting of thousands of validated clinical cases. However, the number of DS images used was not declared.

DS Face Recognition Pipeline
In computer vision, face recognition generally follows a logical pipeline, with a three step process [30]: face and landmark detection, feature extraction, and classification (or recognition). These will be looked into separately in the subsequent sections under the domain of Down syndrome face recognition.

Face Detection
Face and landmark detection is crucially the first step in facial analysis for face recognition and numerous other applications such as the ability for the system to detect and localize facial features in an image. This has many applications such as face tracking, compression, or pose estimation. The concept of face detection includes many sub-problems. Some detect and locate faces simultaneously, some first execute a detection routine and then, if positive, locate faces; while in some, tracking algorithms may be required. In [29], various face detection methods are presented, but only a few such as feature invariant and appearance-based methods are currently applied to DS face recognition systems.

Feature Invariant
Feature invariant is based on the observation that humans can effortlessly detect faces in different lighting conditions and poses. There exist features or properties that are invariant over this variability. In these methods, facial features such as eyes, eyebrows, mouth, and nose are commonly detected. Using the classical technique of Viola-Jones [40], several studies have been carried out to detect faces. The approach byFerry et al. [34] used a variety of facial detection algorithms such as OpenCV [41] and Viola-Jones [40] to detect the subjects' faces. The output takes the form of a square bounding box delimiting the area of the picture where a face is found. However, no landmark was automatically detected by the approach except the bounding square box over the face; all 36 facial landmarks used were manually annotated.
Zhao et al. [35] presented model building and searching stages called the constrained local texture model (CLM), which consists of a local texture model and statistical shape model to detect faces in the subjects. The shape is represented by n landmarks in two dimensions so that x = [x 1 , y 1 , x 2 , y 2 , . . . , x n , y n ] T . All shapes were Procruste-fitted and the mean shape of the training set is computed and subtracted from aligned shapes, denoted by x. The shape matrix X containing all centered shapes in the training set is expressed as a mixture of independent components (ICs) x = S.A, where S is the independent components and A is the matrix containing mixing parameters. This is written in vector form as x = n i=1 s i a i , where s i are the i th IC and a i are the columns of mixing matrix A. After the estimation of the matrix A, the de-mixing matrix W(W = A − 1) and ICs is computed by S = W.X.
The searching landmarks with CLM are equivalent to the estimation of landmark coordinates by minimizing the probabilities presented by SVM with shape constraints. The SVM response image is denoted by R(x, y) which is fitted in Equation (1), with a function: where y = (a, b, c) are the function parameters and (x 0 , y 0 ) is the center point of the function. To make the search more robust and efficient, a three-level multi-resolution search was performed. Additionally, the initial estimation for the first search is the shape generated from the Viola-Jones face detector [40]. Finally, the fiducial point positions are accounted for by joint optimization of the quadratic function and the shape reconstruction error using independent component analysis (ICA) and 44 landmarks were identified ( Figure 2). Both Kruszka et al. [33] and Kruszka et al. [42] applied the method proposed in [35] to detect 44 landmarks ( Figure 3). The method presented in [43] was also used by [37] to detect 68 landmarks on the human face. However, this has proven to be a very tasking process due to the numerous sources of variation in different dimensional data.  [33]. Red represents the inner facial landmarks, while blue represents the external landmarks. The calculated distances are represented by blue lines, while the calculated angles are represented by green circles.

Appearance-Based
In appearance-based methods, templates are learned from the images. The methods rely on machine learning and statistical analysis to find relevant face characteristics [29]. One of the methods in appearance-based is the neural network approach, which has been successfully applied to many pattern recognition problems.
The work in [39] first detects genetic disorders in the patient's face-including Down syndrome-in an input image by adopting a deep learning approach. This was based on a deep convolutional neural networks(DCNN) cascade in [44] for face detection in an uncontrolled environment. The technique was adjusted to fit the needs of their data and detected 130 landmarks on the patient's face. The detected landmarks were used to geometrically normalize faces, which reduced pose variation among patients and improved recognition performance. Table 1 shows the summary of face detection methods for DS face recognition. Both Kruszka et al. [33] and Kruszka et al. [42] applied the method proposed in [35] to detect 44 landmarks (Figure 3). The method presented in [43] was also used by [37] to detect 68 landmarks on the human face. However, this has proven to be a very tasking process due to the numerous sources of variation in different dimensional data. Both Kruszka et al. [33] and Kruszka et al. [42] applied the method proposed in [35] to detect 44 landmarks ( Figure 3). The method presented in [43] was also used by [37] to detect 68 landmarks on the human face. However, this has proven to be a very tasking process due to the numerous sources of variation in different dimensional data.  [33]. Red represents the inner facial landmarks, while blue represents the external landmarks. The calculated distances are represented by blue lines, while the calculated angles are represented by green circles.

Appearance-Based
In appearance-based methods, templates are learned from the images. The methods rely on machine learning and statistical analysis to find relevant face characteristics [29]. One of the methods in appearance-based is the neural network approach, which has been successfully applied to many pattern recognition problems.
The work in [39] first detects genetic disorders in the patient's face-including Down syndrome-in an input image by adopting a deep learning approach. This was based on a deep convolutional neural networks(DCNN) cascade in [44] for face detection in an uncontrolled environment. The technique was adjusted to fit the needs of their data and detected 130 landmarks on the patient's face. The detected landmarks were used to geometrically normalize faces, which reduced pose variation among patients and improved recognition performance. Table 1 shows the summary of face detection methods for DS face recognition.  [33]. Red represents the inner facial landmarks, while blue represents the external landmarks. The calculated distances are represented by blue lines, while the calculated angles are represented by green circles.

Appearance-Based
In appearance-based methods, templates are learned from the images. The methods rely on machine learning and statistical analysis to find relevant face characteristics [29]. One of the methods in appearance-based is the neural network approach, which has been successfully applied to many pattern recognition problems.
The work in [39] first detects genetic disorders in the patient's face-including Down syndrome-in an input image by adopting a deep learning approach. This was based on a deep convolutional neural networks(DCNN) cascade in [44] for face detection in an uncontrolled environment. The technique was adjusted to fit the needs of their data and detected 130 landmarks on the patient's face. The detected landmarks were used to geometrically normalize faces, which reduced pose variation among patients and improved recognition performance. Table 1 shows the summary of face detection methods for DS face recognition.

Feature Extraction
The next step after face detection is feature extraction. This involves obtaining relevant facial features, which could be for certain face regions, variations, angles, or measures. It is a process of extracting relevant information from a face image, which is valuable to classification, with an acceptable error rate. This process must be efficient in terms of memory usage and computing time. Recognizing people that we know, even when they are occluded, is not difficult for us as human beings because of familiar features in their faces, but a challenge to computers. The process of feature extraction may include dimensionality reduction, feature extraction, and feature selection, though there are sometimes overlaps.
Basically, the feature data are based on a combination or transformation of the original data. Feature selection selects the relevant and best subset of the input feature set by discarding the non-relevant features that produce the smallest classification error. The dimensionality reduction can be performed before feature extraction or feature selection, or embedded in some of these steps. However, due notably to various measures that may have consequential effects on the recognition algorithm such as speed, curse of dimensionality, number of features, amount of sample images, and classifier complexity, ultimately, the number of features must be chosen carefully. Redundant features can negatively affect the accuracy of the recognition system. Nonetheless, feature extraction can be placed into four categories: local feature-based, holistic approach, statistical shape models, and neural network approach [45,46]. In this section, only three feature extractions are discussed, which are directly related to DS feature extraction.

Local Feature-Based
The local feature-based approach, which is the most primitive method of facial feature extraction, primarily focuses on local texture features such as the eyes, mouth, eyebrows, and nose by locally analyzing small image patches and aggregating the local patches information into a full image representation [46]. Examples are scale-invariant feature transform (SIFT) [47] and local binary patterns (LBP), which were implemented in [31,33,48].
Local features can also be appearance-based, which uses low-level image features such as color and shape. This approach preserves the most significant information of the face image while the unnecessary information is discarded. Examples are independent component analysis (ICA) [35,49], principal component analysis (PCA) which was applied in [50][51][52] and Gabor Wavelet Transform (GWT), which was used by [34,36,53].
Most times, local features can be geometric-based, which combines the distances and angles among these features as well as other geometric characteristics into a single feature vector that can be used in subsequent analysis. At times, geometric is combined with either texture-based or appearance-based, as applied in [33,42]. Hierarchical constrained local model (HCLM) was proposed in [54] using 100 (50 DS and 50 healthy) facial images between 0 and 3 years of age based on ICA. This was applied to detect Down syndrome from facial images and extracted 22 landmarks using local binary patterns (LBPs). The method consists of two levels, as shown in Figure 4A,B. Figure 4A shows the inner facial landmarks annotated while Figure 4B shows the local texture features extracted. The first level was trained using the whole landmark set and the facial images for both DS and healthy subjects; the second level was trained using inner facial landmarks for both cases. The method refined the locations of This was applied to detect Down syndrome from facial images and extracted 22 landmarks using local binary patterns (LBPs). The method consists of two levels, as shown in Figure 4A, B. Figure 4A shows the inner facial landmarks annotated while Figure 4B shows the local texture features extracted. The first level was trained using the whole landmark set and the facial images for both DS and healthy subjects; the second level was trained using inner facial landmarks for both cases. The method refined the locations of inner facial landmarks, which are clinically relevant for diagnosis. The geometric and local texture features are concatenated into 159 combined features. A non-invasive and automated framework for Down syndrome detection was proposed in [35] based on disease-specific facial patterns. To describe structure and facial morphology, local texture and geometric information were extracted on automatically detected anatomical landmarks.
Burçin and Vasif [31] proposed a local binary pattern (LBP) approach on 107 subjects (51 DS and 56 non DS) for feature extraction, which is a very effective feature descriptor. The authors defined a LBP operator as a gray-scale measure, derived from a general definition of facial expression in a local neighborhood ( Figure 5A). A binary code was generated by thresholding its value with the value of the center pixel for each pixel in the image ( Figure 5B) where R is defined by a set of three different circular symmetric neighborhoods and indicates the radius of the sample and P is the number of neighbors. Although the method obtained successful results on the proposed database, different races, different light conditions, different facial expressions, and facial wears such as glasses, hairs, etc., made the problem more difficult.  A non-invasive and automated framework for Down syndrome detection was proposed in [35] based on disease-specific facial patterns. To describe structure and facial morphology, local texture and geometric information were extracted on automatically detected anatomical landmarks.
Burçin and Vasif [31] proposed a local binary pattern (LBP) approach on 107 subjects (51 DS and 56 non DS) for feature extraction, which is a very effective feature descriptor. The authors defined a LBP operator as a gray-scale measure, derived from a general definition of facial expression in a local neighborhood ( Figure 5A). A binary code was generated by thresholding its value with the value of the center pixel for each pixel in the image ( Figure 5B) where R is defined by a set of three different circular symmetric neighborhoods and indicates the radius of the sample and P is the number of neighbors. Although the method obtained successful results on the proposed database, different races, different light conditions, different facial expressions, and facial wears such as glasses, hairs, etc., made the problem more difficult.
This was applied to detect Down syndrome from facial images and extracted 22 landmarks using local binary patterns (LBPs). The method consists of two levels, as shown in Figure 4A, B. Figure 4A shows the inner facial landmarks annotated while Figure 4B shows the local texture features extracted. The first level was trained using the whole landmark set and the facial images for both DS and healthy subjects; the second level was trained using inner facial landmarks for both cases. The method refined the locations of inner facial landmarks, which are clinically relevant for diagnosis. The geometric and local texture features are concatenated into 159 combined features. A non-invasive and automated framework for Down syndrome detection was proposed in [35] based on disease-specific facial patterns. To describe structure and facial morphology, local texture and geometric information were extracted on automatically detected anatomical landmarks.
Burçin and Vasif [31] proposed a local binary pattern (LBP) approach on 107 subjects (51 DS and 56 non DS) for feature extraction, which is a very effective feature descriptor. The authors defined a LBP operator as a gray-scale measure, derived from a general definition of facial expression in a local neighborhood ( Figure 5A). A binary code was generated by thresholding its value with the value of the center pixel for each pixel in the image ( Figure 5B) where R is defined by a set of three different circular symmetric neighborhoods and indicates the radius of the sample and P is the number of neighbors. Although the method obtained successful results on the proposed database, different races, different light conditions, different facial expressions, and facial wears such as glasses, hairs, etc., made the problem more difficult.  Zhao et al. [32] presented a method for detecting Down syndrome in 48 facial images (24 DS and 24 non DS) through the combination of geometric information ( Figure 6A) and local texture ( Figure 6B). The images contained various backgrounds, illumination, poses, and expression challenges. The authors manually defined 17 clinically relevant facial anatomical points, which covered most of the inner facial features such as the mouth, eyes, and nose. Zhao et al. [32] presented a method for detecting Down syndrome in 48 facial images (24 DS and 24 non DS) through the combination of geometric information ( Figure 6A) and local texture ( Figure  6B). The images contained various backgrounds, illumination, poses, and expression challenges. The authors manually defined 17 clinically relevant facial anatomical points, which covered most of the inner facial features such as the mouth, eyes, and nose. The study in [33] focused their findings on examining digital facial analysis in individuals with DS in diverse populations. The study collected 65 DS images from 13 countries with their age ranging from one month to 26 years. When the algorithms were run for feature extraction, feature selection and classification, a total of 126 facial features were extracted from a set of 44 facial landmarks. Their findings demonstrated that clinical features were different across ethnicities (Asian, African, and Latin Americans). These features included ear anomalies, brachycephaly, sandal gap, clinodactyly, and abundant neck skin. The results indicated that all features were significant, except in Africans. The study was evaluated using a digital facial analysis technology of a larger diverse cohort of newborns to adults with 132 controls and 129 cases.
Cornejo et al. [13] investigated the problem of Down syndrome detection from a collection of 306 face images using a compact geometric descriptor for facial feature extraction. The authors conducted their experiment on the CENTRIST descriptor, geometric representation, and the fusion of both features through four approaches: PCA + SVM, PCA + LDA + SVM, PCA + K-NN, and PCA + LDA + K-NN. The study applied CENTRIST to each image to generate feature vectors of 256 in length, which was standardized by dividing each intensity level r by the total pixel n of each image shown in Equation (2), that is: where ℎ represents the occurrence frequency of each intensity level in the image and represents the probability of occurrence intensity. From the experiment, it was discovered that the geometric features were able to reach high-accuracy rates for DS detection, while PCA + LDA achieved a higher detection rate.
Özdemir et al. [38] classified dysmorphology in DS and related genetic disorders using an artificial neural network based hierarchical decision tree (ANN-HDT), where the 29 landmark points were marked on the face images of 30 subjects from five genetic syndromes (Fragile X, Hurler, Down syndrome, Prader Willi, Wolf Hirschhorn syndromes) and the healthy group. The ratios between the point's distances were used as features for classification. Furthermore, Cerrolaza et al. [55] presented a landmark-specific local texture descriptor to identify DS dysmorphology in 175 cases, which combined geometrical and texture features. Using LDA, the method detected 95% accuracy of genetic syndrome. The study in [33] focused their findings on examining digital facial analysis in individuals with DS in diverse populations. The study collected 65 DS images from 13 countries with their age ranging from one month to 26 years. When the algorithms were run for feature extraction, feature selection and classification, a total of 126 facial features were extracted from a set of 44 facial landmarks. Their findings demonstrated that clinical features were different across ethnicities (Asian, African, and Latin Americans). These features included ear anomalies, brachycephaly, sandal gap, clinodactyly, and abundant neck skin. The results indicated that all features were significant, except in Africans. The study was evaluated using a digital facial analysis technology of a larger diverse cohort of newborns to adults with 132 controls and 129 cases.
Cornejo et al. [13] investigated the problem of Down syndrome detection from a collection of 306 face images using a compact geometric descriptor for facial feature extraction. The authors conducted their experiment on the CENTRIST descriptor, geometric representation, and the fusion of both features through four approaches: PCA + SVM, PCA + LDA + SVM, PCA + K-NN, and PCA + LDA + K-NN. The study applied CENTRIST to each image to generate feature vectors of 256 in length, which was standardized by dividing each intensity level r by the total pixel n of each image shown in Equation (2), that is: where h(r) represents the occurrence frequency of each intensity level in the image and p(r) represents the probability of occurrence intensity. From the experiment, it was discovered that the geometric features were able to reach high-accuracy rates for DS detection, while PCA + LDA achieved a higher detection rate. Özdemir et al. [38] classified dysmorphology in DS and related genetic disorders using an artificial neural network based hierarchical decision tree (ANN-HDT), where the 29 landmark points were marked on the face images of 30 subjects from five genetic syndromes (Fragile X, Hurler, Down syndrome, Prader Willi, Wolf Hirschhorn syndromes) and the healthy group. The ratios between the point's distances were used as features for classification. Furthermore, Cerrolaza et al. [55] presented a landmark-specific local texture descriptor to identify DS dysmorphology in 175 cases, which combined geometrical and texture features. Using LDA, the method detected 95% accuracy of genetic syndrome.

Statistical-Based
These methods depend on the theory of the statistical analysis of shapes for feature extraction. Examples are the active appearance model (AAM) [56], which was applied in [34] to detect eight different genetic disorders and the active shape model (ASM) [57,58]. They proposed AAM to compute a mean face within a set of images that represent a consistent shape and appearance features within the group. The algorithm shows indication of effectively capturing the characteristic features of dysmorphic syndromes. Erogul et al. [59] presented a method for predicting DS in photographs of eighteen children whose ages ranged from 5 to 6 years. An elastic face bunch graph (EFBG) method was constructed to obtain the critical points.
Saraydemir et al. [36] presented a novel method to distinguish DS in a custom face database of 30 subjects (15 DS and 15 non DS) using Gabor Wavelet Transform (GWT) as a feature extraction method. Normalized elementary Gabor function is defined by [60] in Equation (3), where the Gaussian function is modulated by a sinusoidal signal.
A constant ratio γ = f 0 /α is defined so that the functions on different frequencies behave as scaled versions of each other where α is the sharpness of the Gaussian and f 0 is the center frequency of the sinusoidal signal. The most valuable 2000 components are chosen as a training set by the relationship between the pooled variance of feature vectors, which are used in PCA. The pooled variance s 2 p is calculated by Equation (4).
where n i is the sample size of the i th sample; s 2 i is the variance of the i th sample, and k is the number of samples being combined. The variable n − 1 is used instead of n in estimating the variances from samples. A new dimension that has most valuable information was derived with LDA.

Neural Network-Based
Neural networks are not only suitable for a classification model, but are also a common method for feature extraction as presented by [37]. Study in [39] also demonstrated the superiority of CNN over other methods such as local feature-based and statistical-based models for facial feature extraction. Few neural networks have been proposed by different authors regarding feature extraction algorithms for DS face recognition. The analysis of mathematical deep convolutional neural networks (DCNNs) for feature extraction was propounded by Bruna and Mallat [61]. The analysis in [59] considered deformation stability of the corresponding feature extractor with proven translation invariance and scattering networks based on a wavelet transform, followed by the modulus non-linearity in each network layer. Since then, more studies have been conducted, promising improved performance in pre-diagnosis of medical genetic prediction.
Down syndrome recognition probabilities were presented by [62] in Thai children with de-identified computer-aided facial analysis using 30 frontal face images of children with DS and 140 non-DS. The technology of Face2Gene was employed to compare the face images. The method started with face detection and background discarding, followed by measuring multiple lengths, angles, and ratios between 130 points. The input face was cropped into facial regions and fed into the DCNN. This resulted in a vector, indicating its correspondence to each syndrome in the Face2Gene database. The authors suggested that the ranking precision could increase with clinical features or anthropometric measurements. A novel framework to detect developmental disorders from facial images was proposed in [37]. Six categories of disorders were considered for recognition, which included DS using DCNN. A framework for facial analysis called DeepGestalt was developed in [39] using vision and deep learning algorithms. This quantified similarities to hundreds of related genetic syndromes based on 2D facial images (unconstrained). The algorithm trained over 17,000 patient cases from a public phenotype-genotype database (Figure 7). Symmetry 2020, 12, x FOR PEER REVIEW 11 of 18 syndromes based on 2D facial images (unconstrained). The algorithm trained over 17,000 patient cases from a public phenotype-genotype database (Figure 7).

Feature Extraction Comparison
Although the approach of the appearance-based local feature extraction preserves the most significant information using the shape and color of the face, it does not cover specific facial regions such as in local texture-based, which covers specific facial regions like the eyes, mouth, eyebrows, and nose. This may help in locating inner facial landmarks that are clinically needed for relevant diagnosis. While geometric features only have a proven high accuracy rate for DS detection, as shown in [13], combining the method with other local extraction methods may yield a better result. The advantage of geometric-based local feature extraction is the ability to combine geometric characteristics with either appearance-based or texture-based. This improves classification performance, as seen in [33,35].
The statistical-based approach indicates that it can effectively capture the characteristic features of dysmorphic syndromes such as PCA, LDA, and ICA, which are most widely used in genetic disorder prediction. PCA uses expressive features and approximates data by a linear subspace using the criterion of mean square error. The LDA extracts the most discriminatory features using category information associated with each linear pattern. Conversely, ICA is more appropriate for non-Gaussian distributions because it does not depend on the second-order property of the data.
Neural network-based with the introduction of CNN has demonstrated a higher performance over local feature-based and statistical-based, as presented in [39], but the method has been challenged with a small sample size and small features in DS face recognition, except in [39], where a large sample size was used. It is worth knowing that the feature extraction process should not just be optimized for classification purposes, but must also be efficient on memory usage and computing time. It is observed that geometric features seem to have predominance over textural features. Although the fusion of the texture and geometric features in feature extraction when combined provides high accuracy, the result is not superior to the accuracy rate obtained with only geometric features. Table 2 shows the summary of feature extraction methods for DS face recognition.

Feature Extraction Comparison
Although the approach of the appearance-based local feature extraction preserves the most significant information using the shape and color of the face, it does not cover specific facial regions such as in local texture-based, which covers specific facial regions like the eyes, mouth, eyebrows, and nose. This may help in locating inner facial landmarks that are clinically needed for relevant diagnosis. While geometric features only have a proven high accuracy rate for DS detection, as shown in [13], combining the method with other local extraction methods may yield a better result. The advantage of geometric-based local feature extraction is the ability to combine geometric characteristics with either appearance-based or texture-based. This improves classification performance, as seen in [33,35].
The statistical-based approach indicates that it can effectively capture the characteristic features of dysmorphic syndromes such as PCA, LDA, and ICA, which are most widely used in genetic disorder prediction. PCA uses expressive features and approximates data by a linear subspace using the criterion of mean square error. The LDA extracts the most discriminatory features using category information associated with each linear pattern. Conversely, ICA is more appropriate for non-Gaussian distributions because it does not depend on the second-order property of the data.
Neural network-based with the introduction of CNN has demonstrated a higher performance over local feature-based and statistical-based, as presented in [39], but the method has been challenged with a small sample size and small features in DS face recognition, except in [39], where a large sample size was used. It is worth knowing that the feature extraction process should not just be optimized for classification purposes, but must also be efficient on memory usage and computing time. It is observed that geometric features seem to have predominance over textural features. Although the fusion of the texture and geometric features in feature extraction when combined provides high accuracy, the result is not superior to the accuracy rate obtained with only geometric features. Table 2 shows the summary of feature extraction methods for DS face recognition.

Classification
The next step after feature extraction and selection is the recognition or classification of the images. Sometimes, two or more classifiers may be combined to achieve optimum performance. Although the goal of this study is not to present classifiers as a whole, it is worth knowing that, in building classifiers, three key concepts are involved: similarity (intuitive and simple patterns that group similar classes to establish a metric that defines a representation and similarity of the same-class samples); probability (the rule such as Bayes decision can be modified to cater for factors that could lead to mis-classification); and decision boundaries (minimizing a measurement of error between the testing patterns and the candidate pattern based on the metric chosen) [63]. The concept of support vector machine (SVM) is based on decision planes that define decision boundaries [64]. It is a useful supervised learning technique for data classification, which involves training and testing data consisting of some data instances [65], where each instance in the training set contains the target value and several attributes. Using a SVM classifier in [33] on DS and healthy patients, the features were trained using leave-one-out strategy cross-validation with the accuracy of 94.3%. However, capturing people from different ethnicities and tribes found throughout the world is an inherent limitation of any kind of this study. The study only represents a small fraction of the global population even though it encompasses many participants and countries.
Using PCA as feature extraction in [34], the authors used SVM to classify the images with an accuracy of 94.4%. Although the method is not intended to determine diagnosis in genetic disorders, it can facilitate narrowing the diagnostic search space in an unprejudiced manner [34]. Local shape variation was described using ICA in [35] for feature extraction, a statistical shape model. The method was validated on a dataset of 130 images, which achieved 96.7% accuracy with SVM. Nineteen pairwise landmark distances were extracted as the geometric features in [32], which represent the key clues for syndrome diagnosis. An SVM classifier was employed to distinguish between normal and abnormal cases using the leave-one-out validation, which achieved an accuracy of 97.92%. A CENTRIST descriptor outperformed other presented approaches with 98.39% accuracy when classified by SVM in [13]; validated through a 10-fold cross validation protocol. Though the classifier predicted high accuracy, 16 landmarks and 14 features may be insufficient to accurately predict genetic disorder. Among the four classifiers that were employed to compare the results of identification in [54], SVM-RBF proved to have the best recognition accuracy of 95.6%. However, the number of landmarks used may not be sufficient to characterize facial variation for genetic disorder. The classification process reported 97.34% and 96% on SVM in [36] to recognize dimorphic faces. However, the number of subjects for DS patients may be too small to characterize variation for genetic disorder.

Neural Network Approach
Neural networks (NN) are a popular tool that has been used in pattern recognition and classification. A neural network was first demonstrated in [66], where a neuron network could be used to recognize normalized and aligned facial images. Since then, many neural network-based methods have been presented. Some have used the methods for feature extraction and classification, while some have only applied the methods for classification. One of the commonly used neural network methods is CNN, which is very powerful for feature extraction [37] and classification model when enough labeled data of the target domain is used for training [39]. The networks are trained on an adjacent domain as a baseline for the knowledge transfer model or for extraction to the target domain, where such data do not exist [39].
To classify faces in DS patients, ten facial feature vectors were trained and classified using artificial neural network (ANN) in [59]. The results pre-diagnosed DS with an accuracy of 68.7%. However, both subjects and features may be too small to pre-diagnose genetic disorder. In [38], ANN-HDT was applied on geometrically extracted features. The accuracy yielded 86.7%, while clinical experts achieved recognition rate of 46.7% on the same images. Though the results indicated that the method could be used for pre-diagnosis of the dysmorphic syndromes, out of 30 images, only four images belonged to a DS patient.
DCNN was used in [62] to predict an accuracy of 89% using the proposed pipeline. The authors concluded that further studies on other genetic syndromes/ethnicities by software algorithms are still needed because Face2Gene cannot be considered as a substitute for clinicians' knowledge of phenotypes. In [37], a deep learning frame-work for recognizing developmental disorders was proposed. The framework was further tested on different age groups using DCNN, which outperformed average human intelligence in terms of disability differentiation with accuracy of 98.8%.
Furthermore, an application of DCNN classified major known syndromes including Down syndrome in [39]. The method achieved a 91% top-10-accuracy in identifying over 215 different genetic syndromes and has outperformed clinical experts in three separate experiments. However, due to rare access to large dataset, the method had small benchmarks in two binary experiments and the specialized experiment. Aside from the comparison results on the binary problem of detecting Angelman and Cornelia de Lange syndrome patients, no detection on DS was presented or compared.

Other Classifiers
Aside from the classifiers above-mentioned, k-nearest neighbor (kNN) and Euclidean distance (ED) have also been used in a few studies to classify Down syndrome patients. kNN is one of the commonly used supervised learning algorithms, where the result of a new instance query is classified based on the majority of the k-nearest neighbor category [67]. The ED classifier has the advantage of simplicity in design and fast computational speed, though the classification accuracy may be poor. However, through the normalization technique and weighting of features, performance can be improved substantially [68]. Classification process reported 96% on kNN to recognize dimorphic faces in [36] on 15 healthy and 15 Down syndrome patients. However, the number of subjects for DS patients may not be sufficient to characterize variation for genetic disorder. Using LPB for feature extraction in [31] on healthy and DS patients, the proposed method used ED for classification with 95.35% accuracy. Table 3 shows the summary of classification methods for DS face recognition.

Conclusions
This paper presented a review of face recognition in the field of Down syndrome, which consists of facial fiducial point detection, feature extraction, and classification. From the literature reviewed, some presented binary problem while some presented a multi-class problem. Among the state-of-the-art face detection methods mentioned in this review, Viola-Jones (Haar like) is commonly applied in detecting DS faces, although it has been proven to be a very tasking process due to the numerous sources of variation in different dimensional data. Generally, feature invariant face detection methods are challenged by illumination, occlusion, or noise, while numerous edges can be rendered useless by shadow.
It has been observed that geometric representation achieved better performance for DS recognition in the feature extraction process, among the state-of-the-art extraction methods mentioned. Though features are extracted from images by different regions for different research purposes, it must be noted that the extraction of facial features in face images should not be generalized for other images due to its unique recognition purpose.
Apart from the work proposed in [34,37,39], most of the other works used small scale data of less than 200 to train images, which is considered to be a small number in the field of deep learning. Additionally, many works in the field support a relatively small number of DS samples (30,15 or less), although by means of cross validation, the performance can be evaluated and optimized. The door to the emerging field of precision medicine has been opened in an attempt to present phenotype in a standardized manner. This has also opened new ways to rapidly reach an accurate genetic diagnosis for genetically disordered patients [39].
Early and accurate analysis of a genetic disorder is imperative to the patients and their families, bringing about additional powerful care. In any case, it merits stressing that an exact acknowledgment of dysmorphic features relies on experienced clinicians and wellbeing staff, just as much as on intricate research center strategies [13]. It was observed from the literature studied that SVM or SVM-based classifiers showed higher recognition accuracy than neural network-based classifiers. This has more to do with features extracted for classification, though the majority of neural network-based recognition systems did not present the feature extracted details, thus, it is difficult to judge. Despite this, the neural network approach does offer many advantages such as a unified approach for feature extraction and classification with flexible procedures for finding moderate non-linear solutions [69]. The performance of a classifier on subjects does not only depend on the classifier complexity, but also on the number of features and sample images used. However, when the number of the training sample is small, relative to the number of features, classification performance may degrade. This may be avoided by applying at least 10 times the training samples as the number of features [69]. This is the situation of "curse of dimensionality", however, a large set of features can produce false positives when they are redundant.
So far, several attempts have been made to predict genotype from phenotype data using facial landmarks or facial features from DS patients. In the future, it will be worthwhile predicting phenotype from genotype data using artificial intelligence techniques in the domain of DS face recognition by using sufficient samples and relevant features and to identify the patterns of symmetric and asymmetry variation in DS as one of the most common genetic disorders.