Improving Juvenile Age Estimation Based on Facial Landmark Points and Gravity Moment

Facial age estimation is of interest due to its potential to be applied in many real-life situations. However, recent age estimation efforts do not consider juveniles. Consequently, we introduce a juvenile age detection scheme called LaGMO, which focuses on the juvenile aging cues of facial shape and appearance. LaGMO is a combination of facial landmark points and Term Frequency Inverse Gravity Moment (TF-IGM). Inspired by the formation of words from morphemes, we obtained facial appearance features comprising facial shape and wrinkle texture and represented them as terms that described the age of the face. By leveraging the implicit ordinal relationship between the frequencies of the terms in the face, TF-IGM was used to compute the weights of the terms. From these weights, we built a matrix that corresponds to the possibilities of the face belonging to the age. Next, we reduced the reference matrix according to the juvenile age range (0–17 years) and avoided the exhaustive search through the entire training set. LaGMO detects the age by the projection of an unlabeled face image onto the reference matrix; the value of the projection depicts the higher probability of the image belonging to the age. With Mean Absolute Error (MAE) of 89% on the Face and Gesture Recognition Research Network (FG-NET) dataset, our proposal demonstrated superior performance in juvenile age estimation.


Introduction
Age estimation enables the automatic tagging of a person's age with a specific number or age bracket. It is relevant in real-world applications such as web access control [1], criminal investigations [2], forensics [3] and healthcare [4], where it can be particularly useful for addressing the problem of estimating the age of a separated or unaccompanied child [5].
Age estimation systems require inputs of features such as the iris [6], voice [1], teeth [7] or blood [8]. However, the majority of the systems rely on facial images [2,9,10]. This is due to the face being the most visible part of the body and natural storage for personal traits, making it the most representative part of an individual. Moreover, it is easy to acquire facial images through the use of portable, non-evasive cameras. Sharing the images is easier due to the widespread availability of the Internet. Consequently, the potential application of automatic age estimation systems continue to attract both researchers and practitioners.
However, estimating age from the facial image is challenging due to the variability of the face, which results from intrinsic (genes) and extrinsic (environmental) factors impacting the face in different ways. As a result, aging manifests differently for different individuals, or commonly for age groups. Whereas in the juvenile group, aging manifests by subtle movement and growth of facial bones

•
Utilizing the 68 facial landmark points to build high-level terms that describe the shape and appearance features. By exploiting the implicit ordinal relationships among the frequencies of the terms (features) in the various ages, we aggregated the features into a weight matrix for the entire ages.

•
The weights of the features and their contributions to the age prediction task were computed by TF-IGM. • We demonstrated the effectiveness of LaGMO on a discriminating dictionary of juvenile age descriptors, which we obtained from the FG-NET dataset.

•
With a MAE of 4.42 and a SC of 89.8%, LaGMO advances juvenile age estimation.
The rest of the paper is organized as follows: Section 2 reviews previous work and introduces the various components of our proposal. Section 3 discusses the fundamental concepts of our proposal. We present details of the proposed scheme in Section 4. In Section 5, experimentation and results are discussed. Finally, Section 6 concludes our contribution.

Related Work
Automatic age estimation continues to attract interest in research and practice. In this section, we discuss recent works that highlight our proposal.

Feature Extraction
Feature extraction attempts to estimate representations of the face, which are as close as possible to the ground truth. This is consistent with the knowledge that aging information is encoded in the face and can be represented by a set of facial landmarks [33]. However, extracting accurate features remains a difficult task. Consequently, various feature extraction methods are proposed for age estimation [14,15,25]. Pioneering works included that of Kwon et al. [15], who distinguished faces into baby, young-adult and senior-adult classes from geometrical features and wrinkle patterns. The authors used snakelets to detect curves and extracted wrinkle patterns from certain areas of the skin. The approach learned from a small private dataset of 47 images. Additionally, Ramanathan and Chellappa [20] proposed a shape model to examine facial shape differences for children under 18 years of age. They identified face shape by a set of facial muscle landmarks and their corresponding coordinates, which formed 48 fiducial features. The authors presented a model for measuring childhood aging deformations by warping faces per the model for rejuvenation or aging. The model enabled face age estimation based on both childhood aging features and adulthood aging features. Efraty et al. [34] proposed an automatic facial landmark detection method that analyzed image intensities with adaptive bag-of-words. Tong, et al. [35] introduced a landmark extraction method by minimizing an objective function that was trained on both labeled and unlabeled face images. Segundo et al. [36] extracted facial landmarks with a method that combined relief curves and surface curvature. Facial landmark extraction continues to attract attention. Recently, Su and Geng [37] proposed a multi-scale cascaded bivariate label distribution (BLD) learning method to improve facial landmarks. The authors used an ensemble of low and high-resolution images and an optimization mechanism to establish mappings from an input patch to the BLD for each image that most likely represented the true BLD.
From the preceding, the majority of the feature extraction methods utilize techniques that aim to locate landmarks on the face. In this regard, the Active Appearance Model (AAM) is predominately utilized [10].

The Active Appearance Model (AAM)
Initially proposed by Cootes et al. [16], AAM is a statistical model for representing shape and appearance variations of the face. The shape representations, obtained from key landmark points on the face, result from the AMM algorithm conducting a series of Procrustean and Principal Component Analyses (PCA). Due to its success, various extensions of the AAM have been proposed for different system objectives. Lanitis et al. [11], extended AAM to obtain person-specific features wherein the authors successfully extracted craniofacial growth-related representations for the child and adult faces.
Since AAM relies on landmark points, it is intuitive that the number of points should impact the performance of the proposed system. Although the selection of the number has yet to be reviewed comprehensively, available literature suggests the use of different numbers for different proposals; for example, 48 points were considered in [20], 17 points in [38], 79 points in [39], etc. However, the majority utilized 68 points [10,40,41]. We observe that the choice depends on the objectives of the proposal. In this proposal, we utilized the 68 facial landmark points for the following reasons: 1.
They are extensively utilized for age estimation and allied systems, attesting to their ability to represent face aging features. 2.
Some age estimation datasets are already annotated with the 68 facial landmark points.
Based on these advances, we utilized the 68 landmark points of the AMM to represent facial features similarly to the proposal by Chen et al. [14], wherein the features were represented as terms that described the shape and appearance of the face.

Age Estimation
Predicting the age from the extracted features can be achieved by classification or regression methods. However, the choice and combination of the methods depend on the system objectives. Recently, a promising proposal that alleviates the lack of dataset, known as the label distribution or soft classification, has attracted interest [42,43].

Label Distribution
The idea of the label distribution method is to present ages as distributed labels that cover a certain number, with each number describing the extent to which each label defines an instance of the face. Consider an image f ; label distribution is represented as a vector d that describes the age g, such that the descriptions are real numbers presented as d g , f P r0; 1s. The label distribution is expected to satisfy two conditions:

1.
The description degree of f should have the highest value.

2.
Description degree of the neighboring ages should decrease while going away from f . Inspired by the label distribution method, Kohli et al. [44] used an ensemble of classifiers to distinguish children and adults. Setting the child threshold at 21 years, they proceeded to estimate the age using an aging function. To mitigate dataset constraints, Geng et al. [42] proposed label distribution by learning chronological age and adjacent ages for each image. Their proposal included the image itself in the distribution and assumed the representation to be consistent with the entropy condition. Additionally, He et al. [43] combined the distribution learning of age labels with age prediction. Their approach considered context relationships by using different face samples to find the correlation of the cross-age. The method was proposed for problems that consider a preconception-less label distribution learning or for learning sample-specific, context-conscious label distribution properties capable of solving multiple tasks, including age distribution and age prediction. The label distribution method can be extended to address specific age estimation problems if the ordinal relationship is adequately exploited to represent the aging features.
From the related works, we made the following observations: 1.
Aging features come in two major kinds-global and local features, corresponding to face shape-related and skin texture-related features.

2.
Shape-related features are more useful for juvenile age estimation, whereas skin texture (wrinkle) better serves the adult age estimation proposals. However, both of them can be exploited to address specific age estimation objectives.

3.
Age prediction methods can be categorized as classification and regression, but considering the implicit ordinal relationship of faces within a similar aging subspace could advance the age estimation task.

4.
Although some publicly available datasets are commonly used for age estimation, in general, there is a lack of datasets for juvenile age estimation.

Preliminaries
This section briefly describes the theoretical concepts and definitions that are vital for understanding our proposal.

Facial Landmark-Term by AAM
AAM is a statistical shape and appearance model that presents a holistic view of the face [10,16]. Given an annotated facial image, the shape information can be considered as the dominant points defining specific landmarks of the face, including the forehead, eye corner, forehead, face cheeks, etc., or an interpolated linkage of the points around the whole face [45]. However, when considered separately, we observe that the landmark points have no inherent ability to describe the face. Their independence appears similarly to the individual units of a natural language. As with the 26 letters of the English Language, the landmark points cannot by themselves describe the age of a face. However, when represented at a higher level, they can be considered to represent age.
As with the quality of work by [14], we corroborate that the 68 landmark points can be transferred into meaningful terms (features) similar to words in natural language. Then, for different term vectors (features), the vectors can represent the age of the face. For the AAM of the face image, 68 landmark points can be expanded such that the same terms for different AAMs can be treated as the same features. We denote the AAM of an image img as vector LM pimgq and expand it to cover the 68 points as LM pimgq " px 1 , x 2 , x 3 , x 4 , ..., x 135 , x 136 q, where x 1 , x 2 is the first landmark point, x 3 , x 4 the second landmark point and x 135 , x 136 is the last landmark point, aiming to obtain a term of the form F´j´LOC, where F represents the appearance information and j is the j th element with LOC representing the location of the j th point measured as LOC " roundp x j width q. However, as shown in Figure 1, the 68 points of the AAM do not fully represent the holistic appearance of the face, as the distribution of the points falls within the areas marked off by the points. Additionally, certain regions of the face tend to show early signs of aging and must therefore be considered for a more holistic appearance representation. Therefore, we adapt the proposal in [46] to obtain wrinkle information from the corners of the eyes and the forehead region. We annotate the wrinkles in these regions with five points representing each wrinkle's shape. The wrinkle information is obtained using a bounding box placed around the annotations and only maintaining high-frequency information by difference-of-Gaussian. The wrinkle is warped in a mean shape and then transformed in pose parameters. By using the second derivative of the Lorentzian function to fit, the average of every parameter is obtained. Thus, the process leads to the transformation of the wrinkle into a vector of meaningful parameters representing the shape and appearance of the wrinkle. The parameters of the wrinkle vector include the following: • cx, cy, the center of the wrinkle; • d, the geodesic distance between first and last points; • a, angle in degrees; • C, curvature computed as least-squares minimization using Equation (1); with Ypresp.Xq the coordinates (resp. abscissa) of the wrinkle centered at the origin, and with the horizontal alignment of the first and last points.
Since there can be an unequal number of wrinkles in faces, a technique is used to obtain the wrinkle information from six equally sized non-overlapping segments of the target region. For every wrinkle, an estimate of the probability density is obtained. The strategy leads to a vector which incidentally is high dimensional. The problem is mitigated by introducing an approximation of an arbitrary joint probability of random variables by computing in pairs, every joint probability for every random variable. The computations are by means of Kernel Density Estimation (KDE) with a Gaussian and a standard deviation of 1.5. Thus, a new vector containing the following approximated information for each of the six zones of every face is obtained.

•
The number of wrinkles in the current zone. By fusing the wrinkle information with the shape information, we hoped to obtain a term that comprehensively describes the age of the face.

Fusing the Facial Shape and Wrinkle Information
Since the shapes and wrinkle vectors are different properties and sizes and cannot be directly concatenated, we employ the z-score method proposed in [22,47] to normalize into a new vector we denote as f j . In this paper, fusion is achieved by the following expression.
where f j is the normalized vector, f 1 j represents the vector before normalization, µ is the mean and σ is the standard deviation. We further applied PCA to reduce the dimension of the fused vector. Subsequently, term vector f j is utilized in our proposal.

Term Frequency Inverse Gravity Moment (TF-IGM)
Perceived as a gravitational inverse problem, TF-IGM is a two-part term weighting mechanism comprising the local factor (TF) and the global factor (IGM). The main idea of TF-IGM is to measure the weight of a term (feature) in a corpus (dataset) such that the more evenly a term distributes in several classes within a dataset, the less important its inter-class discriminating power is. Thus, terms have more discriminating power if they are more concentrated in one class than other classes in the dataset. As the distribution of terms in the classes is considered uneven, the concentration of the terms can be used to:

1.
Distinguish different classes when the terms with stronger class distinguishing power are assigned heavier weights than the others.

2.
Measure the fine-grained inter-class distribution of a term in different classes so that the obtained weight can represent the term's contribution to the classification task.

3.
Provide a co-efficient to achieve optimal performance between the local and global factors contributing to the weight.
Furthermore, since in this method weights are independent of classes, the unique IGM-based global weighting factor for a term can be obtained by one-time calculation from the dataset. To the best of our knowledge, the idea was initially proposed by Chen et al. [48] for the classification of text and information retrieval. However, the base version, known as Term Frequency Inverse Document Frequency (TFIDF), was exploited for age estimation in [14], wherein the authors represented the age features as a set of terms that described the face. For a dataset of facial images, they determined the extent to which the features corresponded the age. The proposal demonstrated state-of-the-art performance. However, TFIDF has since been improved to TF-IGM [48]. In this proposal, we leverage the class distinguishing abilities of TF-IGM and hope to utilize it for juvenile age estimation.

Establishing Ordinal Relationships among the Features (Terms)
To compute the inter-class distribution concentration of feature f j , we sort in descending order all the frequencies of f j 's occurrence in the individual age classes as follows.
where f jr pR " 1, 2, . . . , nq is the frequency of f j occurring in the r´th age class after sorting, and n is the number of age classes. If f jr is proposed for the frequency of a feature (TF); then for the sorted list, the far-left, which is heavier, exerts more weight to the left than the far-right, which is comparatively lighter. Hence, the center position is biased to the left. As the feature keeps occurring in fewer age classes, the balance follows the left shift until it is at the starting position where R " 1 . At this point, the feature has occurred in only one age class. Similarly, at the position where pn{2q, the point is considered as the "gravity center" of the overall inter-class distribution. The following expression depicts the uniform inter-class distribution of the feature.
f j1 " f j2 " . . . " f jn Clearly, it is intuitive that the position of the gravity center can reflect the inter-class distribution concentration of a feature and ultimately contribute to the inter-class distinguishing power of the feature.

Inverse Gravity Moment (IGM)
However, for the class-specific gravity f jr , if it is ranked R, and the distance from R to the beginning 0 is observed, then the gravity moment is the product of the class-specific gravity and the rank R expressed as follows.
GM " f jr¨R In line with the TF-IGM concept, the distributed concentration of the feature is proportional to the reciprocal value of the total gravity moment. Hence, the following expressing is employed to measure the inverse gravity moment.
where IGMp f j q denotes the inverse gravity moment of feature f j ; f jr pr " 1, 2, . . . , nq are the frequencies of f j 's occurrence in the various age classes sorted in descending order and R is the rank. Typically, the inverse gravity moment of a feature's inter-class distribution is within 2{pp1`nq¨nq and 1.0. Since in the descending order the first element, f j1 , is the maximum in the list of { f jr |R " 1, 2, . . . , n }, Equation (5) can further be expressed as follows.
IGMp f j q " 1 ř n r"1 f jr max 1ďjďn p f jr q¨R (6) Consequently, Equation (6) depicts that the IGM of a feature is the sum of the gravity moment computed from the normalized frequencies of the feature's occurrence in the individual classes. Since the IGM value can be transformed to fall within the range of [0, 1.0], the minimum value is close to zero. Therefore, the basic IGM model defined in Equation (5) is adopted in our proposal.
The preceding satisfies the two conditions of the label distribution approach stated in Section 2.2. Additionally, it demonstrates that the ordinal relationship is implicit in the TF-IGM mechanism.

Measuring Weight by TF-IGM
With TF-IGM, the weight of a feature in a sample image is determined by its frequency in the face image and its contribution to the classification, corresponding to the local factor (TF) and the global factor (IGM) respectively. Thus, a feature's contribution to classification depends on its class distinguishing power, which is reflected by its inter-class distribution concentration. The higher the level of concentration, the greater the weight assigned to the feature. The former can be measured by the IGM model expressed by Equation (5).

Limitations of IGM
However, empirical tests revealed some cases of redundancies in the computation of the weight based on the IGM factor. For instance, in the case of five features f j1 , f j2 , f j3 , f j4 and f j5 having individual feature frequencies (TF) in six age classes corresponding to {100, 100, 0, 0, 0, 0} , {40, 40, 0, 0, 0, 0}, {23, 23, 0, 0, 0, 0}, {11, 11, 0, 0, 0, 0} and {2, 2, 0, 0, 0, 0} respectively. If there are 100 samples in each age class, then standard IGM computes the weight of each feature as 0.333, regardless of the sorted distinguishing power being f j1 > f j2 > f j2 > f j3 > f j4 > f j5 . We refer to this as case 1. Additionally, consider the case of five features f j6 , f j7 , f j8 , f j9 and f j10 and a dataset of five age classes each with 10 samples and corresponding feature frequencies (TF) of {10, 0, 0, 0, 0} , {8, 0, 0, 0, 0}, {5, 0, 0, 0, 0}, {3, 0, 0, 0, 0} and {1, 0, 0, 0, 0} respectively. Intuitively, the order of the class distinguishing power should be f j6 > f j7 > f j8 > f j9 > f j10 . However, per Equation (5), the IGM values are all the same as 1.0. This can be considered as case 2. Obviously, computing the weight based on the standard IGM does not fully depict the distinguishing power, since in both cases, features with different frequencies are assigned the same values. In addressing these limitations, we introduced a common logarithm denoted by K to the standard IGM equation such that K " log 10 rS total p f jpmaxq q{S f jpmaxq s, where S total p f jpmaxq q is the total face samples in the age class, in which f j occurs the most and S f jpmaxq is the number of the face samples in the age class in which f j occurs the most. For clarity, we denote the intervention as GMO expressed as follows.
GMOp f j q " f j1 ř n r"1 f jr¨R`K (7) As shown in Table 1, the weights computed by GMO, using Equation (7), show unique values for features f j1 through f j10 . This shows that, whereas in case 1, the IGM values are the same for the different features, the values as computed by GMO are unique. This is indicative of the distinguishing abilities of GMO over IGM. Consequently, Equation (8) is proposed to compute the weight as follows.
Wp f j q " TFp f j , s k q¨p1`λ¨GMOp f j qq (8) where TF represents the local weighting factor, GMO is the global weighting factor and λ is a co-efficient that maintains the balance between the two factors. W is the weight and s k is the sample image. Intuitively, the local weighting factor should be lowest since a feature occurring 68 times in an age class is generally less than 68 times as important as a feature that rarely occurs [48]. Similar to the method in [49], the TF was reduced by the introduction of square root to Equation (8), resulting in new weight expressed by Equation (9).
We utilize the new GMO-based weight for our proposed scheme.

The Proposed Method
The overall illustration of our proposal is shown in Figure 2. As can be seen, facial images annotated with the 68 facial landmark points serve the system, which obtains a combination of shape and wrinkles information, and represents it as terms that can be used describe the age. In order to explore the distinguishing power of the terms, TF-IGM is used to assign weights to the term. Next, the computed weights are used to build a matrix of predictive probabilities, which is utilized for the detection of the age. Consequently, we introduced a scheme we name LaGMO, which is a combination of facial landmark points and TF-IGM for juvenile age detection.
The following is an encapsulation of the key tasks that constitute LaGMO.

Definition 1. Transferring Landmark points to landmark-term vectors (features):
Given the 68 facial landmark points, we denote AAM of an image img as vector LM pimgq " px 1 , x 2 , x 3 , x 4 , ..., x 135 , x 136 q, where x 1 , x 2 is the first landmark point, x 3 , x 4 the second landmark point and x 135 , x 136 is the last landmark point. For the vector LM pimgq , we transferred the points into a string form such that the j th member represents the j th string of the form F´j´LOC, where F represents the appearance information, j is the j th element and LOC " roundp x j width q. Width is hard coded. In line with the process described in Section 3.1, we obtained a compact term denoted by f j .

Definition 2.
Obtaining term dictionary: If for the whole space there are n different terms, then they constitute a dictionary denoted as TDIC = p f j1 , f j2 , ..., f jn ). Then for any arbitrary f j , we assign weight by Equation (9), resulting in a new vector denoted as LMM pimgq = wp f j1 q, wp f j2 q, ..., wp f jn q, where the n th element is between 0 and 1. The bigger the weight the higher the importance of the weight in the class. Then we define a weighted term matrix as follows.
wMtrx " wp f j qpTDIC size q¨|n| (10) where wp f j q is the weight of the term, TDIC size is the size of the dictionary and |n| is the total number of age classes.

Definition 3.
Establishing the relationship between the features and ages: To establish the relationship between the features and ages, we classified all LMM pimgq samples into different age classes according to the age denoted as c " tc 1 , c 2 , ..., c n u with n representing the n th class. All samples in the class have the same ground-truth age c.

Definition 4.
Restricting the matrix table to correspond to juvenile age: Since our proposal considers the relative order of vector LMM pimgq as age labels, we treated LMM pimgq as labels of the order LMM pimgq P t1, 2, ..., nu, where n is the number of age classes based on the entire dataset. We reduced the dataset to correspond to juveniles using Equation (11).
Xǹ " tpLMM pimgq , c n q|c n ą nu, Xń " tpLMM pimgq , c n q|c n ď nu Next, we resolved Xń into a new dataset denoted by tdic and utilized Equation (10) to build the new search table we denote wMtrx new . By this strategy, we avoided the rather exhaustive search throughout the larger table, thereby reducing the overhead in both time and space.

Age Prediction
Generally, age estimation aims to predict age based on the mapping from an input vector to an output age space. In this proposal, for an unlabeled facial image img, we projected its landmark feature vector LMM pimgq into the age group space using the matrix wMtrx new . Equation (12) was utilized for the projection where the projected value represents the closeness of the image belonging to the class; the bigger the value, the higher the possibility of assigning the age to the age class.

Experimentation
The image dataset, which is the data basis for experimentation, can be challenging to obtain due to privacy, quality and other issues. There are currently many facial-image datasets with age labels, but most of them are unavailable or do not contain enough juveniles. Therefore, we only used the Face and Gesture Recognition Research Network (FG-NET) database for experimentation. FG-NET which is publicly available, is already annotated, and has a large number of juvenile images.

Datasets and Evaluation
FG-NET consists of 1002 facial photographs from 82 individuals with ages ranging from 0 to 69 years. Although the number seems small, there are at least 12 age-separated images per person, and a substantial majority of the images are younger faces within the 0-40 age group. We collected images from ages 0 to 34 years to constitute our dataset. Since the FG-NET contains only 1002 faces, it could not be divided into the typical 80-20, train-test strategy. Therefore, we adopted the leave-one-person-out (LOPO) cross-validation strategy. In order to train our model, the dataset was divided into the various age classes, such that each class had approximately the same number of samples. This strategy maintained balance in the dataset. The training was first conducted on the entire set with ages ranging from 0-34 years to obtain a large reference matrix. Since the focus is on juveniles, we reduced the set to reflect ages 0-17 years, as depicted in Figure 3. Consequently, we utilized the new dataset to construct a new reference matrix. To validate the performance of our scheme, the Mean Absolute Error (MAE) was used to indicate the effectiveness of our proposal. MAE is expressed as: where N is the total number of test data , Ag 0 the actual age and Ag 1 is the estimated age. Additionally, we investigated the accuracy by another metric known as the Cumulative Score (CS) expressed as follows.

CS "
N eďth N¨1 00% (14) where N is the total number of test images and N eďth is the number of test images whose absolute error is less than th.

Performance Evaluation
We conducted two tests-one to indicate the effect of reducing the local factor (TF) on the weight and the other to compare our scheme with similar approaches.

The Effect of the TF Factor on the Weight
To order to verify the effect of reducing the TF factor on the weight, we randomly selected a small sample of images and utilized Equations (8) and (9) to compute the weight. To show clarity, we denote the two computations of the weight: standardTFpWq and lowTFpwq corresponding to the weight with standard local factor and weight with reduced local factor. We noticed from the initial rough checks, a significant difference between the two, as shown in Figure 4 and Table 2. This suggest that reducing the TF factor could improve the performance of our scheme.

Comparison with Similar Approaches
To validate our scheme, we investigated our proposal against the few approaches that offer similar solutions, including Chen et al. [14] (LM+TFIDF ), Wang et al. [13] (LBP+SVM) and Kohli et al. [44] (EOC+CTAF). We compared LaGMO with LM+IDF based on the approach utilizing facial landmark points and the TF-IDF term weighting scheme. Regardless of LM+TFIDF focusing on age estimation in general, Table 3 and Figure 5 illustrate that LaGMO out-performed LM+IDF. We further compared our scheme with LBP+SVM. Although the LBP+SVM proposal focuses on detecting juveniles from adults, only texture-based features obtained from the coordinates of the facial landmark points were considered for feature representation. Regardless, it remains the only proposal that is fully dedicated to juvenile detection. As illustrated in Table 3, LaGMO performed better than LBP+SVM. Regarding EOC+CTAF, the authors proposed it for age estimation in general. However, the approach incorporated a channel for child age estimation and reported an impressive CS of 95.0 and a MAE of 2.69. We assumed the performance was due to the age threshold being 21 years for EOC+CTAF but 17 years for LaGMO. Therefore, we took on the challenging task of adjusting the threshold in EOC+CTAF to 17 years. Interestingly, we observed degradation in the performance of EOC+CTAF. Finally, with a CS of 89.86% and a MAE of 4.42, LaGMO demonstrated performance that was state-of-the-art for juvenile age estimation.

Conclusions and Future Work
This proposal characterized juvenile aging cues based on the 68 facial landmark points of the Active Appearance Model, where the shape and appearance features were presented as terms that described the age of the face. The scheme effectively exploited the new term weighting scheme known as Term Frequency Inverse Gravity Moment (TF-IGM), first to establish the ordinal relationship among the terms in the various age classes and ultimately to compute the weights of the terms for the classification task. The implicit ability of TF-IGM to establish ordinal relationships made it possible to demonstrate impressive performance, even with limited datasets. Therefore, this proposal demonstrates that facial landmark points can be applied to juvenile age detection. Accordingly, an age estimation scheme called LaGMO, which is the combination of facial landmark points and TF-IGM, was presented to alleviate the lack of juvenile age estimation schemes. We hope to extend the method to cover the adult aging subspace and utilize more datasets in the future.