Optimized Dimensionality Reduction Methods for Interval-Valued Variables and Their Application to Facial Recognition

The center method, which was first proposed in Rev. Stat. Appl. 1997 by Cazes et al. and Stat. Anal. Data Mining 2011 by Douzal-Chouakria et al., extends the well-known Principal Component Analysis (PCA) method to particular types of symbolic objects that are characterized by multivalued interval-type variables. In contrast to classical data, symbolic data have internal variation. The authors who originally proposed the center method used the center of a hyper-rectangle in Rm as a base point to carry out PCA, followed by the projection of all vertices of the hyper-rectangles as supplementary elements. Since these publications, the center point of the hyper-rectangle has typically been assumed to be the best point for the initial PCA. However, in this paper, we show that this is not always the case, if the aim is to maximize the variance of projections or minimize the squared distance between the vertices and their respective projections. Instead, we propose the use of an optimization algorithm that maximizes the variance of the projections (or that minimizes the distances between the squares of the vertices and their respective projections) and finds the optimal point for the initial PCA. The vertices of the hyper-rectangles are, then, projected as supplementary variables to this optimal point, which we call the “Best Point” for projection. For this purpose, we propose four new algorithms and two new theorems. The proposed methods and algorithms are illustrated using a data set comprised of measurements of facial characteristics from a study on facial recognition patterns for use in surveillance. The performance of our approach is compared with that of another procedure in the literature, and the results show that our symbolic analyses provide more accurate information. Our approach can be regarded as an optimization method, as it maximizes the explained variance or minimizes the squared distance between projections and the original points. In addition, the symbolic analyses generate more informative conclusions, compared with the classical analysis in which classical surrogates replace intervals. All the methods proposed in this paper can be executed in the RSDA package developed in R.


The Center Method
Symbolic data were introduced by Diday in [1]. In contrast to classical data analysis, in which a variable takes a single value, a variable in symbolic data can take a finite or infinite set of values: For example, an interval variable can take an infinite set of numerical values that range from low to high. As Principal Component Analysis (PCA) is one of the most popular multivariate methods for dimension reduction, its extension to symbolic data is important. Many generalizations of PCA have been developed and several studies have contributed to its extension to interval-valued data.
Among the methods for this in the literature, two are the vertex method and the center method [2][3][4]. In [5], the authors introduced new PCA techniques in order to visualize and compare the structures of interval data. Then, the authors of [6] proposed an approach that extended the classical PCA method to interval-valued data by using symbolic covariance to determine the principal component space to reflect the total variation in the interval-valued data. PCA has also been extended to histogram data in a number of studies (see [7][8][9][10][11]).
Most of these methods were developed for interval matrices, where an interval matrix X is defined as where a ij ≤ b ij for all i = 1, 2, . . . , n and j = 1, 2, . . . , m (others authors denote the interval a ij , b ij by x lo ij , x up ij or x ij , x ij for all i = 1, 2, . . . , n and j = 1, 2, . . . , m). An interval matrix can be considered a subset of a matrix M n×m , which we denote by X, such that X = Z ∈ M n×m | ∀i ∈ {1, 2, ..., n} , ∀j ∈ {1, 2, ..., m} , Z ij ∈ a ij , b ij . In this case, Z ∈ X.
The center matrix of X is defined as where Here, X c ∈ X for i = 1, . . . , n and j = 1, . . . , m; X c ij is a real number and not an interval. Thus, the center matrix (2) is a classical matrix. The center principal component method starts from the center matrix; in other words, classical PCA is applied to the center matrix X c . Then, the kth principal components of the centers are where v c k is the kth eigenvector of the variance-covariance matrix of X c , defined in Equation (2). For cases (rows) i = 1, . . . , n, the kth principal component for an interval variable is constructed as follows. Let Y c ik = y lo ik , y up ik be the interval principal component for an interval variable. Then, where J − c = {j|v c kj < 0} and J + c = {j|v c kj ≥ 0}. More details can be found in [2]. The dual problem in the center PCA method was introduced by Rodriguez in [12]. To generalize duality relations, we let D be an interval matrix, defined as for i = 1, . . . , n and j = 1, . . . , m, withX c (j) and σ (j) denoting the average and standard deviation of column j, respectively. The formulas shown in Theorem 1 are thus obtained and can be used to calculate the projections of interval variables. Theorem 1. If the hyper-rectangle defined by the jth column of D in the ith principal component is projected in the direction of v i , then the minimum and maximum values can be computed by Equations (7) and (8), respectively.
The proof of this theorem can be found in [12,13].

The Best Point Method
Let X be an n × m matrix of interval variables and let Z ∈ X. If we apply PCA to a matrix Z, then the kth principal component of Z for an observation ξ u , with k = 1, . . . , s < m and i = 1, . . . , m, is whereZ (j) is the average of the variable Z (j) (i.e.,Z (j) = 1 n ∑ n i=1 Z ij ), and w Z k = (w Z k 1 , . . . , w Z k m ) is the kth eigenvector associated with the variance-covariance matrix of Z. It is clear that β(Z) = {w Z 1 , . . . , w Z m } is an orthonormal basis of R m .
For the matrix X defined in (1), we let X i = [a i1 , b i1 ] , . . . , [a im , b im ] for i = 1, . . . , n. Then, we define the vertex matrix for an observation i as Thus, the vertex matrix of X is Next, the rows of the vertex matrix of X are projected as supplementary elements in the PCA of Z. We define the supplementary vertex matrix as (2) . . .
where σ (j) is the standard deviation of Z (j) . To simplify this approach, we denote each row of the matrixX v i (Z) by x v i tj (Z), with t = 1, . . . , 2 m i , in which m i is the number of nontrivial intervals, and j = 1, . . . , m. Then, the co-ordinates are obtained: with j = 1, . . . , 2 m i , in which m i is the number of nontrivial intervals. Then, the minimum and maximum of the interval can be calculated: and The formulas in the following theorem allow us to compute Equations (15) and (16) much more quickly.
Theorem 2. The co-ordinates of Y v Z ik can be found as follows: is the mean of the jth column.
Proof. Let Z ∈ X; then, As a ij and b ij are supplementary elements in the PCA of Z, they must first be centered with respect to the columns (variables) of Z. Thus, Then, from Equations (17) and (18), For Therefore, from Equations (20) and (21), we obtain Therefore, The following theorem provides the co-ordinates in the variable space; this is a dual relationship. We need to center and standardize the matrix Z.
Next, we next focus on the matrixZ =Z ij ∀i, ∀j. Letz j be the jth column ofZ, with (z j ) t ·z i = R(i, j) ≤ 1. Then, the interval matrix is centered and standardized with respect to Z: . . .
To facilitate the analysis, we define The inertia matrixZZ t is symmetric and positive semidefinite, so all its eigenvectors are orthogonal and its eigenvalues are real and nonnegative. We let v Z 1 , v Z 2 , . . . , v Z s denote the s eigenvectors ofZZ t associated with eigenvalues λ 1 , λ 2 , . . . , λ s ≥ 0. Then, V(Z) = v Z 1 |v Z 2 | . . . |v Z s is defined as a matrix of the size n × s whose columns are the eigenvectors ofZZ t . We can compute the co-ordinates of the variables in the correlation circle asZ t V, and we can then compute the ith column of Z in the jth principal component (in the v Z j direction) using Equation (23).
The next theorem proves the duality relation of any matrix that belongs to an interval matrix.
Proof. The proof of this theorem is similar to the proof of Theorem 2.
In the above, we prove that p z ij ∈ [r ij , r ij ] and that r ij and r ij are a combination of the projections of the vertices of the hyper-rectangle R m . We can form duality relations between the eigenvectors of ZZ t andZ tZ : Both matrices have the same s positive eigenvalues λ Z 1 , λ Z 2 , . . . , λ Z s , and if u Z 1 , u Z 2 , . . . , u Z s are the first s eigenvectors ofZ tZ , then the relations between the eigenvectors ofZZ t andZ tZ can be computed by Equations (26) and (27): Above, we provide the theory to apply PCA to all matrices Z ∈ X. Now, we aim to find a matrix Z * ∈ X that is optimal for one of two criteria: (1) The minimization of the square of the distance from the vertices of the hypercubes to the principal axes of Z, or (2) the maximization of the variance of the first components of Z. We develop these concepts in the following two sections. We let X v denote the vertex matrix of X and N = ∑ n i=1 2 m i , in which m i is the number of nontrivial intervals for case ξ i . Then, the centered and standardized vertex matrix with respect to Z has the following form: (2) . . . (2) . . .

Algorithm 1 The computation of ϕ.
Require: X an n × m matrix of intervals, Z ∈ X, s number of principal components. Ensure: ϕ(Z). 1: Apply PCA to Z. 2: β = {w 1 , . . . , w s }, with s ≤ m and w i eigenvectors of the variance-covariance matrix of Z. 3: Compute the vertex matrix of X (X v ). 4: Compute the vertex matrix of the centered and standardized X with respect to Z (X v (Z)).
As Z ∈ X, X is a finite union of compact sets and ϕ(Z) is a continuous function, ϕ always reaches the minimum and the maximum. In this case, the aim is to obtain the matrix Z that minimizes the distance to the vertex matrix X v . The problem that we aim to solve is Subject to Z ∈ X.
(30) Definition 1. The matrix Z ∈ X that solves Problem 30 is the optimal matrix with respect to distance, which is denoted by Z ϕ .
To perform the optimization that computes Z ϕ , we propose Algorithm 2: Algorithm 2 Computation of the Best Matrix with respect to the distances of the vertices. Require: X a symbolic matrix of intervals of dimension n × m, Z ∈ X, s number of principal components, TOL is the variation tolerance between iterations, and N is the maximum number of iterations. Ensure:Ỹ V Z ϕ . 1: Consider Z = X c , the center matrix 2, to be the initial value. 2: Get Z ϕ by means of optimization algorithm initialvalue = Z, function = ϕ(Z), TOL, N .

Maximizing the Variance of the First Components
Let X be an interval matrix of dimension n × m, Z ∈ X, and β(Z) = {w Z 1 , . . . , w Z s }, with s ≤ m and w Z i eigenvectors of the variance-covariance matrix of Z and λ(Z) = λ Z 1 , . . . , λ Z s denoting the set of associated eigenvalues of the variance-covariance matrix of Z. We define the function Λ(Z, s) : To compute Λ(Z, s), we propose Algorithm 3:
As above, since Z ∈ X and X is the finite union of compact sets with s number of principal components, Λ(Z, s) is a continuous function and, thus, always reaches the minimum and the maximum. In this case, the aim is to obtain the matrix Z that maximizes the accumulated inertia in the first s principal components. The problem that we want to solve is (31) Definition 2. The matrix Z ∈ X that solves Problem 31 is the optimal matrix with respect to inertia, denoted by Z Λ .
To perform the optimization that computes Z Λ , we propose Algorithm 4:

Algorithm 4
The computation of the Best Matrix with respect to inertia. Require: X an n × m symbolic matrix of intervals of dimension, Z ∈ X, s number of principal components. Ensure:Ỹ V Z Λ . 1: Consider Z = X c , center matrix 2, as the initial value. 2: Get Z Λ by means of the optimization algorithm initialvalue = Z, function = Λ(Z, s) .

Experimental Evaluation: The Application to Facial Recognition
Automatic facial recognition has recently gained momentum, especially in the context of security issues such as access to buildings, and in the context of monitoring and continued surveillance. A well-known application of facial recognition is its incorporation in the iPhone X. According to Apple's support website, the technology that enables facial ID is some of the most advanced hardware and software that has ever been created. The TrueDepth camera captures accurate facial data by projecting and analyzing over 30,000 invisible dots to create a facial depth map, while also capturing an infrared image. These images are transformed into a mathematical representation, which is compared with registered facial data. In both R and Python, a significant number of libraries, such as the videoplayR package, have also been developed for facial recognition. The link below contains more details: http://www.stoltzmaniac.com/facial-recognition-in-r/.
As described in this section, we applied all the proposed methods using p = 6 interval-valued variables in a data set of m = 27 faces for a total of 27,000 photos. The data set was taken from [14], in which the authors investigated facial characteristics for detection purposes in a surveillance study. In this study, the center PCA method in [4] was applied, as shown in Figure 1. The data are provided in Table 1.
The data set contains measurements of p = 6 random variables designed to identify each face: The length spanned by the eyes X 1 (distance AD in Figure 1), the length between the eyes X 2 (distance BC), the length from the outer right eye to the upper-middle lip at point H between the nose and mouth X 3 (distance AH), the corresponding length for the left eye X 4 (DH), the length from point H to the outside of the mouth on the right side X 5 (EH), and the corresponding distance to the left side of the mouth X 6 (GH). For each facial image in this facial recognition process, salient features, such as the nose, mouth, and eyes, are located using morphological operators. The boundaries of the located elements are extracted by using a specific active contour method based on Fourier descriptors, which incorporates information about the global shape of each object. Finally, the specific points delimiting each extracted boundary are located, and the distance is measured between a specific pair of points, as represented by the random variables in Figure 1. This distance measure is expressed as the number of pixels in a facial image. As there is a sequence of such images, the actual measured distances are interval-valued variables. Thus, for example, the eye span distance X 1 for case HUS1 is X 1 = [168.86, 172.84] for this series of images. Notably, different conditions of alignment, illumination, pose, and occlusion cause variation in the distances extracted from different images of the same person. The study that generated the data set involved nine men and three sequences for each subject for a total of m = 27 cases. The complete data set is provided in Table 1.   It is important to note that the data in Table 1 are aggregated. There are 27 interval-valued cases; if each case is drawn from a sequence of 1000 images, then there are 27,000 classical point observations in R 6 . An underlying assumption of the standard classical analysis is that all 27,000 observations are independent. However, this is not the case in this data set. The data values for each face form a set of 1000 dependent observations. Therefore, if we were to use each image as a statistical unit by performing classical analysis, then we would lose information about the dependence contained in the 27,000 observations. The resulting principal component analysis would look for axes that maximize the variability across all 27,000 images, regardless of whether some images belong to the same sequence. In contrast, as interval-valued observations are obtained from each sequence, the Best Point method extracts the principal component axes that maximize the variability in each interval (i.e., those that maximize the internal variability), thereby retaining information on the dependency among the 1000 images in each sequence.

Comparison between the Center, Vertex, and Best Point Methods
We applied the vertex, center, and best point principal component methods to the data in Table 1. The Best Point principal component method was run with two different goals: (1) To minimize the squared distance and (2) to maximize the variance. From this point, the Best Point principal component method that minimizes the squared distance was designated as the Best Point Distance, and the Best Point principal component method that maximizes the variance was designated as the Best Point Variance. Table 2 shows the first two principal components generated by the four methods.  Figure 2 compares the data in Table 1   Numerical analysis results confirm that the separation of classes from the Best Point Distance and Best Point Variance methods was much better than that from the other methods. Table 3 compares the accumulated variance of the vertex, center, Best Point Distance, and Best Point Variance principal component methods. The better methods are clearly Best Point Distance and Best Point Variance, which, in the third principal component, reached 91.25% and 99.72% of the accumulated variance, respectively; both of which were far superior to the results of the center and vertex methods.
As shown in Table 4, for the criterion of the minimum distance between the corners of the original hyper-rectangles in R m and the principal components, the Best Point Distance method outperformed the other methods, with a minimum distance of 6676.43. This distance was significantly less than the distances obtained by the other methods.
In Table 5 Table 5. Correlation between the first component of each method and the variables.

Conclusions
This work focused on improving the center and vertex principal component methodology for interval-valued data. Compared with classical methods, symbolic methods based on interval-valued variables have important advantages, such as improved computational complexity due to reduced execution times, as small data tables are used. For example, for the facial recognition example, a table was passed from 27,000 cases to only 1 of 27 cases. In addition, symbolic methods allow for much better handling and interpretation of data variability. In the facial recognition scenario, the variation in the distances that measure different variables from one photo to another of the same person (such as the variation in the distance of the eyes X 1 from one photo to another) was due to the variation in the angle at which the photo was taken.
The Best Point methods proposed in this paper considerably improved both the center method and the vertex method. This is because Best Point Variance maximized the variance explained by the components and Best Point Distance minimized the squared distance between the vertices of the hyper-rectangles and their respective projections. As shown in the tables above, this led to a substantial improvement in all the quality indices used in principal component analysis. The result is better data clustering and, therefore, better prediction.
In future works, a consensus between the Best Point Variance and Best Point Distance methods could be constructed by applying a multiobjective optimization method to the functions ϕ and Λ. Finally, all the proposed algorithms for executing symbolic analyses of interval-valued data are available in the RSDA package in R (see [15]).