Optimized Dimensionality Reduction Methods for Interval-Valued Variables and Their Application to Facial Recognition

Arce Garro, Jorge; Rodríguez Rojas, Oldemar

doi:10.3390/e21101016

Open AccessArticle

Optimized Dimensionality Reduction Methods for Interval-Valued Variables and Their Application to Facial Recognition

by

Jorge Arce Garro

¹ and

Oldemar Rodríguez Rojas

^2,*

¹

National Bank of Costa Rica, 10101 San José, Costa Rica

²

School of Mathematics, Research Center in Pure and Applied Mathematics (CIMPA), University of Costa Rica, 10101 San José, Costa Rica

^*

Author to whom correspondence should be addressed.

Entropy 2019, 21(10), 1016; https://doi.org/10.3390/e21101016

Submission received: 24 September 2019 / Revised: 13 October 2019 / Accepted: 17 October 2019 / Published: 19 October 2019

(This article belongs to the Special Issue Symbolic Entropy Analysis and Its Applications II)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The center method, which was first proposed in Rev. Stat. Appl. 1997 by Cazes et al. and Stat. Anal. Data Mining 2011 by Douzal-Chouakria et al., extends the well-known Principal Component Analysis (PCA) method to particular types of symbolic objects that are characterized by multivalued interval-type variables. In contrast to classical data, symbolic data have internal variation. The authors who originally proposed the center method used the center of a hyper-rectangle in

R^{m}

as a base point to carry out PCA, followed by the projection of all vertices of the hyper-rectangles as supplementary elements. Since these publications, the center point of the hyper-rectangle has typically been assumed to be the best point for the initial PCA. However, in this paper, we show that this is not always the case, if the aim is to maximize the variance of projections or minimize the squared distance between the vertices and their respective projections. Instead, we propose the use of an optimization algorithm that maximizes the variance of the projections (or that minimizes the distances between the squares of the vertices and their respective projections) and finds the optimal point for the initial PCA. The vertices of the hyper-rectangles are, then, projected as supplementary variables to this optimal point, which we call the “Best Point” for projection. For this purpose, we propose four new algorithms and two new theorems. The proposed methods and algorithms are illustrated using a data set comprised of measurements of facial characteristics from a study on facial recognition patterns for use in surveillance. The performance of our approach is compared with that of another procedure in the literature, and the results show that our symbolic analyses provide more accurate information. Our approach can be regarded as an optimization method, as it maximizes the explained variance or minimizes the squared distance between projections and the original points. In addition, the symbolic analyses generate more informative conclusions, compared with the classical analysis in which classical surrogates replace intervals. All the methods proposed in this paper can be executed in the RSDA package developed in R.

Keywords:

interval-valued variables; principal component analysis; symbolic data analysis; Best Point method

1. The Center Method

Symbolic data were introduced by Diday in [1]. In contrast to classical data analysis, in which a variable takes a single value, a variable in symbolic data can take a finite or infinite set of values: For example, an interval variable can take an infinite set of numerical values that range from low to high. As Principal Component Analysis (PCA) is one of the most popular multivariate methods for dimension reduction, its extension to symbolic data is important. Many generalizations of PCA have been developed and several studies have contributed to its extension to interval-valued data. Among the methods for this in the literature, two are the vertex method and the center method [2,3,4]. In [5], the authors introduced new PCA techniques in order to visualize and compare the structures of interval data. Then, the authors of [6] proposed an approach that extended the classical PCA method to interval-valued data by using symbolic covariance to determine the principal component space to reflect the total variation in the interval-valued data. PCA has also been extended to histogram data in a number of studies (see [7,8,9,10,11]).

Most of these methods were developed for interval matrices, where an interval matrix X is defined as

X = [\begin{matrix} [a_{11}, b_{11}] & [a_{12}, b_{12}] & \dots & [a_{1 m}, b_{1 m}] \\ [a_{21}, b_{21}] & [a_{22}, b_{22}] & \dots & [a_{2 m}, b_{2 m}] \\ ⋮ & ⋮ & ⋱ & ⋮ \\ [a_{n 1}, b_{n 1}] & [a_{n 2}, b_{n 2}] & \dots & [a_{n m}, b_{n m}] \end{matrix}],

(1)

where

a_{i j} \leq b_{i j}

for all

i = 1, 2, \dots, n

and

j = 1, 2, \dots, m

(others authors denote the interval

[a_{i j}, b_{i j}]

by

[x_{i j}^{l o}, x_{i j}^{u p}]

or

[\underset{̲}{x_{i j}}, \bar{x_{i j}}]

for all

i = 1, 2, \dots, n

and

j = 1, 2, \dots, m

). An interval matrix can be considered a subset of a matrix

M_{n \times m}

, which we denote by

X

, such that

X = \{Z \in M_{n \times m} ∣ \forall i \in \{1, 2, . . ., n\}, \forall j \in \{1, 2, . . ., m\}, Z_{i j} \in [a_{i j}, b_{i j}]\}

. In this case,

Z \in X

.

The center matrix of X is defined as

X^{c} = [\begin{matrix} X_{11}^{c} & X_{12}^{c} & \dots & X_{1 m}^{c} \\ X_{21}^{c} & X_{22}^{c} & \dots & X_{2 m}^{c} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ X_{n 1}^{c} & X_{n 2}^{c} & \dots & X_{n m}^{c} \end{matrix}],

(2)

where

X_{i j}^{c} = \frac{a_{i j} + b_{i j}}{2} .

(3)

Here,

X^{c} \in X

for

i = 1, \dots, n

and

j = 1, \dots, m

;

X_{i j}^{c}

is a real number and not an interval. Thus, the center matrix (2) is a classical matrix. The center principal component method starts from the center matrix; in other words, classical PCA is applied to the center matrix

X^{c}

. Then, the kth principal components of the centers are

Y_{(k)}^{c} = X^{c} v_{k}^{c},

(4)

where

v_{k}^{c}

is the kth eigenvector of the variance–covariance matrix of

X^{c}

, defined in Equation (2). For cases (rows)

i = 1, \dots, n

, the kth principal component for an interval variable is constructed as follows. Let

Y_{i k}^{c} = [y_{i k}^{l o}, y_{i k}^{u p}]

be the interval principal component for an interval variable. Then,

y_{i k}^{l o} = \sum_{j \in J_{c}^{-}} (b_{i j} - {\bar{X}}_{(j)}) v_{k j}^{c} + \sum_{j \in J_{c}^{+}} (a_{i j} - {\bar{X}}_{(j)}) v_{k j}^{c},

(5)

y_{i k}^{u p} = \sum_{j \in J_{c}^{-}} (a_{i j} - {\bar{X}}_{(j)}) v_{k j}^{c} + \sum_{j \in J_{c}^{+}} (b_{i j} - {\bar{X}}_{(j)}) v_{k j}^{c},

(6)

where

J_{c}^{-} = {j | v_{k j}^{c} < 0}

and

J_{c}^{+} = {j | v_{k j}^{c} \geq 0}

. More details can be found in [2].

The dual problem in the center PCA method was introduced by Rodriguez in [12]. To generalize duality relations, we let D be an interval matrix, defined as

D_{i j} = [\frac{a_{i j} - {\bar{X}}_{(j)}^{c}}{\sqrt{n} σ_{(j)}}, \frac{b_{i j} - {\bar{X}}_{(j)}^{c}}{\sqrt{n} σ_{(j)}}],

for

i = 1, \dots, n

and

j = 1, \dots, m

, with

{\bar{X}}_{(j)}^{c}

and

σ_{(j)}

denoting the average and standard deviation of column j, respectively. The formulas shown in Theorem 1 are thus obtained and can be used to calculate the projections of interval variables.

Theorem 1.

If the hyper-rectangle defined by the jth column of D in the ith principal component is projected in the direction of

v_{i}

, then the minimum and maximum values can be computed by Equations (7) and (8), respectively.

\underset{̲}{r_{i j}} = \sum_{k = 1, v_{k j} < 0}^{m} {\bar{d}}_{k i}^{c} v_{k j} + \sum_{k = 1, v_{k j} > 0}^{m} {\underset{̲}{d}}_{k i}^{c} v_{k j},

(7)

\bar{r_{i j}} = \sum_{k = 1, v_{k j} < 0}^{m} {\underset{̲}{d}}_{k i}^{c} v_{k j} + \sum_{k = 1, v_{k j} > 0}^{m} {\bar{d}}_{k i}^{c} v_{k j} .

(8)

The proof of this theorem can be found in [12,13].

2. The Best Point Method

Let X be an

n \times m

matrix of interval variables and let

Z \in X

. If we apply PCA to a matrix Z, then the kth principal component of Z for an observation

ξ_{u}

, with

k = 1, \dots, s < m and i = 1, \dots, m,

is

y_{i k}^{Z} = \sum_{j = 1}^{m} (Z_{j k} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z},

(9)

where

{\bar{Z}}_{(j)}

is the average of the variable

Z_{(j)}

(i.e.,

{\bar{Z}}_{(j)} = \frac{1}{n} \sum_{i = 1}^{n} Z_{i j}

), and

w_{k}^{Z} = (w_{k_{1}}^{Z}, \dots, w_{k_{m}}^{Z})

is the kth eigenvector associated with the variance–covariance matrix of Z. It is clear that

β (Z) = {w_{1}^{Z}, \dots, w_{m}^{Z}}

is an orthonormal basis of

R^{m}

.

For the matrix X defined in (1), we let

X_{i} = ([a_{i 1}, b_{i 1}], \dots, [a_{i m}, b_{i m}])

for

i = 1, \dots, n

. Then, we define the vertex matrix for an observation i as

X_{i}^{v} = [\begin{matrix} a_{i 1} & a_{i 2} & \dots & a_{i m} \\ a_{i 1} & a_{i 2} & \dots & b_{i m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ b_{i 1} & b_{i 2} & \dots & a_{i m} \\ b_{i 1} & b_{i 2} & \dots & b_{i m} \end{matrix}] .

(10)

Thus, the vertex matrix of X is

X^{v} = [\begin{matrix} [\begin{matrix} a_{11} & a_{12} & \dots & a_{1 m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ b_{11} & b_{12} & \dots & b_{1 m} \end{matrix}] \\ \begin{matrix} ⋮ & ⋮ & ⋮ & ⋮ \end{matrix} \\ [\begin{matrix} a_{i 1} & a_{i 2} & \dots & a_{i m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ b_{i 1} & b_{i 2} & \dots & b_{i m} \end{matrix}] \\ \begin{matrix} ⋮ & ⋮ & ⋮ & ⋮ \end{matrix} \\ [\begin{matrix} a_{n 1} & a_{n 2} & \dots & a_{n m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ b_{n 1} & b_{n 2} & \dots & b_{n m} \end{matrix}] \end{matrix}] .

(11)

Next, the rows of the vertex matrix of X are projected as supplementary elements in the PCA of Z. We define the supplementary vertex matrix as

{\tilde{X}}_{i}^{v} (Z) = [\begin{matrix} \frac{a_{i 1} - {\bar{Z}}_{(1)}}{\sqrt{n} σ_{(1)}} & \frac{a_{i 2} - {\bar{Z}}_{(2)}}{\sqrt{n} σ_{(2)}} & \dots & \frac{a_{i m} - {\bar{Z}}_{(m)}}{\sqrt{n} σ_{(m)}} \\ \frac{a_{i 1} - {\bar{Z}}_{(1)}}{\sqrt{n} σ_{(1)}} & \frac{a_{i 2} - {\bar{Z}}_{2}}{\sqrt{n} σ_{(2)}} & \dots & \frac{a_{i m} - {\bar{Z}}_{(m)}}{\sqrt{n} σ_{(m)}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{b_{i 1} - {\bar{Z}}_{(1)}}{\sqrt{n} σ_{(1)}} & \frac{b_{i 2} - {\bar{Z}}_{(2)}}{\sqrt{n} σ_{(2)}} & \dots & \frac{b_{i m} - {\bar{Z}}_{(m)}}{\sqrt{n} σ_{(m)}} \end{matrix}],

(12)

where

σ_{(j)}

is the standard deviation of

Z_{(j)}

. To simplify this approach, we denote each row of the matrix

{\tilde{X}}_{i}^{v} (Z)

by

{\tilde{x}}_{i_{t j}}^{v} (Z),

with

t = 1, \dots, 2^{m_{i}}

, in which

m_{i}

is the number of nontrivial intervals, and

j = 1, \dots, m

. Then, the co-ordinates are obtained:

C^{k} (x_{i_{j}}^{v}) = \sum_{h = 1}^{m} {\tilde{x}}_{i_{j h}}^{v} (Z) w_{k_{h}},

(13)

with

j = 1, \dots, 2^{m_{i}}

, in which

m_{i}

is the number of nontrivial intervals. Then, the minimum and maximum of the interval can be calculated:

{\tilde{Y}}_{i k}^{v} = {\tilde{y}}_{i k} = [{\tilde{y}}_{i k}^{a_{Z}}, {\tilde{y}}_{i k}^{b_{Z}}] with k = 1, \dots, s < m,

(14)

and

{\tilde{y}}_{i k}^{a_{Z}} = [j = 1, \dots, 2^{m_{i}}] min C^{k} (x_{i_{j}}^{v}),

(15)

{\tilde{y}}_{i k}^{b_{Z}} = [j = 1, \dots, 2^{m_{i}}] max C^{k} (x_{i_{j}}^{v}) .

(16)

The formulas in the following theorem allow us to compute Equations (15) and (16) much more quickly.

Theorem 2.

The co-ordinates of

{\tilde{Y}}_{i k}^{v_{Z}}

can be found as follows:

{\tilde{y}}_{i k}^{a_{Z}} = \sum_{j \in J_{Z}^{-}} (b_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} + \sum_{j \in J_{Z}^{+}} (a_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z},

{\tilde{y}}_{i k}^{b_{Z}} = \sum_{j \in J_{Z}^{-}} (a_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} + \sum_{j \in J_{Z}^{+}} (b_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z},

where

J_{Z}^{-} = {j | w_{k j}^{v} < 0}

,

J_{Z}^{+} = {j | w_{k j}^{v} \geq 0}

, and

{\bar{Z}}_{(j)}

is the mean of the jth column.

Proof.

Let

Z \in X

; then,

\forall j, \forall i, Z_{i j} \in X_{i j} = [a_{i j}, b_{i j}] .

(17)

As

a_{i j}

and

b_{i j}

are supplementary elements in the PCA of Z, they must first be centered with respect to the columns (variables) of Z. Thus,

\forall i, \forall j, a_{i j} - {\bar{Z}}_{(j)}, b_{i j} - {\bar{Z}}_{(j)} .

(18)

Then, from Equations (17) and (18),

\forall i, \forall j, z_{i j} - {\bar{Z}}_{(j)} \in X_{i j} - {\bar{Z}}_{(j)} = [a_{i j} - {\bar{Z}}_{(j)}, b_{i j} - {\bar{Z}}_{(j)}] .

(19)

Case 1: $\forall j, w_{k_{j}}^{Z} > 0,$

$\forall j, \forall i, w_{k_{j}}^{Z} (z_{i j} - {\bar{Z}}_{(j)}) \in w_{k_{j}}^{Z} [a_{i j} - {\bar{Z}}_{(j)}, b_{i j} - {\bar{Z}}_{(j)}] \Rightarrow \sum_{j = 1}^{p} (a_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} \leq \sum_{j = 1}^{p} (z_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} \leq \sum_{j = 1}^{p} (b_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} .$
Case 2: $\forall j, w_{k j}^{Z} < 0,$

$\forall j, \forall i, w_{k_{j}}^{Z} (z_{i j} - {\bar{Z}}_{(j)}) \in w_{k_{j}}^{Z} [b_{i j} - {\bar{Z}}_{(j)}, a_{i j} - {\bar{Z}}_{(j)}] \Rightarrow \sum_{j = 1}^{p} (b_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} \leq \sum_{j = 1}^{p} (z_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} \leq \sum_{j = 1}^{p} (a_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} .$
Case 3: Let $J_{Z}^{-} = {j | w_{k_{j}} < 0}$ and $J_{Z}^{+} = {j | w_{k_{j}} \geq 0}$ .
For Case 1 applied to $J_{Z}^{+},$

$\sum_{j \in J_{Z}^{+}} (a_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} \leq \sum_{j \in J_{Z}^{+}} (z_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} \leq \sum_{j \in J_{Z}^{+}} (b_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} .$

(20)

For Case 2 applied to $J_{v}^{-},$

$\sum_{j \in J_{Z}^{+}} (b_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} \leq \sum_{j \in J_{Z}^{+}} (z_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} \leq \sum_{j \in J_{Z}^{+}} (a_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} .$

(21)

Therefore, from Equations (20) and (21), we obtain

$\begin{matrix} \sum_{j \in J_{Z}^{-}} (b_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} + \sum_{j \in J_{Z}^{+}} (a_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} \\ \leq & \sum_{j = 1}^{n} (z_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} \\ \leq & \sum_{j \in J_{Z}^{-}} (a_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} + \sum_{j \in J_{Z}^{+}} (b_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} . \end{matrix}$

Therefore,

{\tilde{y}}_{i k}^{a_{Z}} = \sum_{j \in J_{Z}^{-}} (b_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} + \sum_{j \in J_{Z}^{+}} (a_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z},

{\tilde{y}}_{i k}^{b_{Z}} = \sum_{j \in J_{Z}^{-}} (a_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} + \sum_{j \in J_{Z}^{+}} (b_{i j} - {\bar{Z}}_{(j)}) w_{k_{j}}^{Z} .

□

The following theorem provides the co-ordinates in the variable space; this is a dual relationship. We need to center and standardize the matrix Z.

{\tilde{Z}}_{i j} = \frac{Z_{i j} - {\bar{Z}}_{(j)}}{\sqrt{n} σ_{(j)}} .

Next, we next focus on the matrix

\tilde{Z} = {\tilde{Z}}_{i j}

\forall i

,

\forall j

. Let

{\tilde{z}}^{j}

be the jth column of

\tilde{Z}

, with

{({\tilde{z}}^{j})}^{t} \cdot {\tilde{z}}^{i} = R (i, j) \leq 1

. Then, the interval matrix is centered and standardized with respect to Z:

\tilde{X} (Z) = [\begin{matrix} [\frac{a_{11} - {\bar{Z}}_{(1)}}{\sqrt{n} σ_{(1)}}, \frac{b_{11} - {\bar{Z}}_{(1)}}{\sqrt{n} σ_{(1)}}] & [\frac{a_{12} - {\bar{Z}}_{(2)}}{\sqrt{n} σ_{(2)}}, \frac{b_{12} - {\bar{Z}}_{(2)}}{\sqrt{n} σ_{(2)}}] & \dots & [\frac{a_{1 m} - {\bar{Z}}_{(m)}}{\sqrt{n} σ_{(m)}}, \frac{b_{1 m} - {\bar{Z}}_{(m)}}{\sqrt{n} σ_{(m)}}] \\ ⋮ & ⋮ & ⋱ & ⋮ \\ [\frac{a_{i 1} - {\bar{Z}}_{(1)}}{\sqrt{n} σ_{(1)}}, \frac{b_{i 1} - {\bar{Z}}_{(1)}}{\sqrt{n} σ_{(1)}}] & [\frac{a_{i 2} - {\bar{Z}}_{(2)}}{\sqrt{n} σ_{(2)}}, \frac{b_{i 2} - {\bar{Z}}_{(2)}}{\sqrt{n} σ_{(2)}}] & \dots & [\frac{a_{i m} - {\bar{Z}}_{(m)}}{\sqrt{n} σ_{(m)}}, \frac{b_{i m} - {\bar{Z}}_{(m)}}{\sqrt{n} σ_{(m)}}] \\ ⋮ & ⋮ & ⋱ & ⋮ \\ [\frac{a_{n 1} - {\bar{Z}}_{(1)}}{\sqrt{n} σ_{(1)}}, \frac{b_{n 1} - {\bar{Z}}_{(1)}}{\sqrt{n} σ_{(1)}}] & [\frac{a_{n 2} - {\bar{Z}}_{(2)}}{\sqrt{n} σ_{(2)}}, \frac{b_{n 2} - {\bar{Z}}_{(2)}}{\sqrt{n} σ_{(2)}}] & \dots & [\frac{a_{n m} - {\bar{Z}}_{(m)}}{\sqrt{n} σ_{(m)}}, \frac{b_{n m} - {\bar{Z}}_{(m)}}{\sqrt{n} σ_{(m)}}] \end{matrix}] .

(22)

To facilitate the analysis, we define

{(\tilde{X} (Z))}_{i j} = [\frac{a_{i j} - {\bar{Z}}_{(j)}}{\sqrt{n} σ_{(j)}}, \frac{b_{i j} - {\bar{Z}}_{(j)}}{\sqrt{n} σ_{(j)}}] = [a_{i j}^{Z}, b_{i j}^{Z}] .

The inertia matrix

\tilde{Z} {\tilde{Z}}^{t}

is symmetric and positive semidefinite, so all its eigenvectors are orthogonal and its eigenvalues are real and nonnegative. We let

v_{1}^{Z}, v_{2}^{Z}, \dots, v_{s}^{Z}

denote the s eigenvectors of

\tilde{Z} {\tilde{Z}}^{t}

associated with eigenvalues

λ_{1}, λ_{2}, \dots, λ_{s} \geq 0

. Then,

V (Z) = [v_{1}^{Z} | v_{2}^{Z} | \dots | v_{s}^{Z}]

is defined as a matrix of the size

n \times s

whose columns are the eigenvectors of

\tilde{Z} {\tilde{Z}}^{t}

. We can compute the co-ordinates of the variables in the correlation circle as

{\tilde{Z}}^{t} V

, and we can then compute the ith column of Z in the jth principal component (in the

v_{j}^{Z}

direction) using Equation (23).

r_{i j}^{Z} = \sum_{k = 1}^{m} {\tilde{Z}}_{k i} v_{k j}^{Z} .

(23)

The next theorem proves the duality relation of any matrix that belongs to an interval matrix.

Theorem 3.

If the hyper-rectangle defined by the jth column of

\tilde{X} (Z)

in the ith principal component is projected in the direction of

v_{i}

, then the maximum and minimum values can be obtained by Equations (24) and (25), respectively.

\underset{̲}{r_{i j}} = \sum_{k = 1, v_{k j} < 0}^{m} b_{k i}^{Z} v_{k j} + \sum_{k = 1, v_{k j} > 0}^{m} a_{k i}^{Z} v_{k j},

(24)

\bar{r_{i j}} = \sum_{k = 1, v_{k j} < 0}^{m} a_{k i}^{Z} v_{k j} + \sum_{k = 1, v_{k j} > 0}^{m} b_{k i}^{Z} v_{k j} .

(25)

Proof.

The proof of this theorem is similar to the proof of Theorem 2. □

In the above, we prove that

p {\hat{z}}_{i j} \in [\underset{̲}{r_{i j}}, \bar{r_{i j}}]

and that

\underset{̲}{r_{i j}}

and

\bar{r_{i j}}

are a combination of the projections of the vertices of the hyper-rectangle

R^{m}

. We can form duality relations between the eigenvectors of

\tilde{Z} {\tilde{Z}}^{t}

and

{\tilde{Z}}^{t} \tilde{Z}

: Both matrices have the same s positive eigenvalues

λ_{1}^{Z}, λ_{2}^{Z}, \dots, λ_{s}^{Z}

, and if

u_{1}^{Z}, u_{2}^{Z}, \dots, u_{s}^{Z}

are the first s eigenvectors of

{\tilde{Z}}^{t} \tilde{Z}

, then the relations between the eigenvectors of

\tilde{Z} {\tilde{Z}}^{t}

and

{\tilde{Z}}^{t} \tilde{Z}

can be computed by Equations (26) and (27):

u_{ℓ}^{Z} = \frac{{\tilde{Z}}^{t} v_{ℓ}^{Z}}{\sqrt{λ_{ℓ}}} for ℓ = 1, 2, \dots, s .

(26)

v_{ℓ}^{Z} = \frac{\tilde{Z} u_{ℓ}^{Z}}{\sqrt{λ_{ℓ}}} for ℓ = 1, 2, \dots, s .

(27)

Above, we provide the theory to apply PCA to all matrices

Z \in X

. Now, we aim to find a matrix

Z^{*} \in X

that is optimal for one of two criteria: (1) The minimization of the square of the distance from the vertices of the hypercubes to the principal axes of Z, or (2) the maximization of the variance of the first components of Z. We develop these concepts in the following two sections.

2.1. Minimizing the Square of the Distance from the Hypercube Vertices to the Principal Axes of Z

Let X be an interval of an

n \times m

matrix,

Z \in X

, and

β (Z) = {w_{1}^{Z}, \dots, w_{s}^{Z}}

, with

s \leq m

and

w_{i}^{Z}

eigenvectors of the variance–covariance matrix of Z. We let

X^{v}

denote the vertex matrix of X and

N = \sum_{i = 1}^{n} 2^{m_{i}}

, in which

m_{i}

is the number of nontrivial intervals for case

ξ_{i}

. Then, the centered and standardized vertex matrix with respect to Z has the following form:

{\tilde{X}}^{v} (Z) = [\begin{matrix} [\begin{matrix} \frac{a_{11} - {\bar{Z}}_{(1)}}{\sqrt{n} σ_{(1)}} & \frac{a_{12} - {\bar{Z}}_{(2)}}{\sqrt{n} σ_{(2)}} & \dots & \frac{a_{1 m} - {\bar{Z}}_{(m)}}{σ_{(m)}} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ \frac{b_{11} - {\bar{Z}}_{(1)}}{\sqrt{n} σ_{(1)}} & \frac{b_{12} - {\bar{Z}}_{(2)}}{\sqrt{n} σ_{(2)}} & \dots & \frac{b_{1 m} - {\bar{Z}}_{(m)}}{\sqrt{n} σ_{(m)}} \end{matrix}] \\ \begin{matrix} ⋮ & ⋮ & ⋮ & ⋮ \end{matrix} \\ [\begin{matrix} \frac{a_{n 1} - {\bar{Z}}_{(1)}}{\sqrt{n} σ_{(1)}} & \frac{a_{n 2} - {\bar{Z}}_{(2)}}{\sqrt{n} σ_{(2)}} & \dots & \frac{a_{n m} - {\bar{Z}}_{(m)}}{\sqrt{n} σ_{(m)}} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ \frac{b_{n 1} - {\bar{Z}}_{(1)}}{\sqrt{n} σ_{(1)}} & \frac{b_{n 2} - {\bar{Z}}_{(2)}}{\sqrt{n} σ_{(2)}} & \dots & \frac{b_{n m} - {\bar{Z}}_{(m)}}{\sqrt{n} σ_{(m)}} \end{matrix}] \end{matrix}] .

(28)

Let

φ (Z) : X \to R^{+} \cup {0}

be the function defined by

φ (Z) = \sum_{i = 1}^{N} {\tilde{X_{i}^{v}} (Z) - P r_{β (Z)} (\tilde{X_{i}^{v}} (Z))}^{2},

(29)

where ||.|| is the Euclidean norm. To compute

φ (Z)

, we propose Algorithm 1:

Algorithm 1 The computation of

φ

.

Require:Xan $n \times m$ matrix of intervals, $Z \in X$ , s number of principal components.
Ensure: $φ (Z) .$

1:: Apply PCA to Z.
2:: $β = {w_{1}, \dots, w_{s}}$ , with $s \leq m$ and $w_{i}$ eigenvectors of the variance–covariance matrix of Z.
3:: Compute the vertex matrix of X $(X^{v})$ .
4:: Compute the vertex matrix of the centered and standardized X with respect to Z $(\tilde{X^{v}} (Z))$ .
5:: $φ (Z) = \sum_{i = 1}^{N} {\tilde{X_{i}^{v}} (Z) - P r_{β (Z)} (\tilde{X_{i}^{v}} (Z))}^{2}$ .
6:: return $φ (Z)$ .

As

Z \in X

,

X

is a finite union of compact sets and

φ (Z)

is a continuous function,

φ

always reaches the minimum and the maximum. In this case, the aim is to obtain the matrix Z that minimizes the distance to the vertex matrix

X^{v}

. The problem that we aim to solve is

\begin{matrix} Minimize & φ (Z) = \sum_{i = 1}^{N} {|| {\tilde{X}}_{i}^{v} (Z) - P r_{β (Z)} ({\tilde{X}}_{i}^{v} (Z)) ||}^{2} \\ Subject to & Z \in X . \end{matrix}

(30)

Definition 1.

The matrix

Z \in X

that solves Problem 30 is the optimal matrix with respect to distance, which is denoted by

Z^{φ}

.

To perform the optimization that computes

Z^{φ}

, we propose Algorithm 2:

Algorithm 2 Computation of the Best Matrix with respect to the distances of the vertices.

Require:Xa symbolic matrix of intervals of dimension $n \times m$ , $Z \in X$ , s number of principal components, $T O L$ is the variation tolerance between iterations, and N is the maximum number of iterations.
Ensure: $\tilde{Y^{V_{Z^{φ}}}}$ .

1:: Consider $Z = X^{c}$ , the center matrix 2, to be the initial value.
2:: Get $Z^{φ}$ by means of optimization algorithm $(initialvalue = Z, function = φ (Z), T O L, N)$ .
3:: Get $\tilde{Y^{V_{Z^{φ}}}}$ Use Theorem 2.
4:: return $\tilde{Y^{V_{Z^{φ}}}}$ .

2.2. Maximizing the Variance of the First Components

Let X be an interval matrix of dimension

n \times m

,

Z \in X

, and

β (Z) = {w_{1}^{Z}, \dots, w_{s}^{Z}}

, with

s \leq m

and

w_{i}^{Z}

eigenvectors of the variance–covariance matrix of Z and

λ (Z) = λ_{1}^{Z}, \dots, λ_{s}^{Z}

denoting the set of associated eigenvalues of the variance–covariance matrix of Z. We define the function

Λ (Z, s) : X \times N \to R^{+}

as

Λ (Z, s) = \sum_{i = 1}^{s} λ_{i}^{Z}

. To compute

Λ (Z, s)

, we propose Algorithm 3:

Algorithm 3 The computation of

Λ

.

Require:Xan $n \times m$ symbolic matrix of intervals of dimension $Z \in X$ , s number of principal components.
Ensure: $Λ (Z, s)$ .

1:: Apply PCA to Z.
2:: $λ (Z) = λ_{1}^{Z}, \dots, λ_{s}^{Z}$ set of associated eigenvalues of the variance–covariance matrix of Z.
3:: $Λ (Z, s) = \sum_{i = 1}^{s} λ_{i}^{Z}$ .
4:: return $Λ (Z, s)$ .

As above, since

Z \in X

and

X

is the finite union of compact sets with s number of principal components,

Λ (Z, s)

is a continuous function and, thus, always reaches the minimum and the maximum. In this case, the aim is to obtain the matrix Z that maximizes the accumulated inertia in the first s principal components. The problem that we want to solve is

\begin{matrix} Maximize & Λ (Z, s) = \sum_{i = 1}^{s} λ_{i}^{Z} \\ Subject to & Z \in X . \end{matrix}

(31)

Definition 2.

The matrix

Z \in X

that solves Problem 31 is the optimal matrix with respect to inertia, denoted by

Z^{Λ}

.

To perform the optimization that computes

Z^{Λ}

, we propose Algorithm 4:

Algorithm 4 The computation of the Best Matrix with respect to inertia.

Require:Xan $n \times m$ symbolic matrix of intervals of dimension, $Z \in X$ , s number of principal components.
Ensure: ${\tilde{Y}}^{V_{Z^{Λ}}}$ .

1:: Consider $Z = X^{c}$ , center matrix 2, as the initial value.
2:: Get $Z^{Λ}$ by means of the optimization algorithm $(initialvalue = Z, function = Λ (Z, s))$ .
3:: Get ${\tilde{Y}}^{V_{Z^{Λ}}}$ using Theorem 2.
4:: return $\tilde{Y^{V_{Z^{Λ}}}}$ .

3. Experimental Evaluation: The Application to Facial Recognition

Automatic facial recognition has recently gained momentum, especially in the context of security issues such as access to buildings, and in the context of monitoring and continued surveillance. A well-known application of facial recognition is its incorporation in the iPhone X. According to Apple’s support website, the technology that enables facial ID is some of the most advanced hardware and software that has ever been created. The TrueDepth camera captures accurate facial data by projecting and analyzing over 30,000 invisible dots to create a facial depth map, while also capturing an infrared image. These images are transformed into a mathematical representation, which is compared with registered facial data. In both R and Python, a significant number of libraries, such as the videoplayR package, have also been developed for facial recognition. The link below contains more details: http://www.stoltzmaniac.com/facial-recognition-in-r/.

As described in this section, we applied all the proposed methods using

p = 6

interval-valued variables in a data set of

m = 27

faces for a total of 27,000 photos. The data set was taken from [14], in which the authors investigated facial characteristics for detection purposes in a surveillance study. In this study, the center PCA method in [4] was applied, as shown in Figure 1. The data are provided in Table 1.

The data set contains measurements of

p = 6

random variables designed to identify each face: The length spanned by the eyes

X_{1}

(distance AD in Figure 1), the length between the eyes

X_{2}

(distance BC), the length from the outer right eye to the upper-middle lip at point H between the nose and mouth

X_{3}

(distance AH), the corresponding length for the left eye

X_{4}

(DH), the length from point H to the outside of the mouth on the right side

X_{5}

(EH), and the corresponding distance to the left side of the mouth

X_{6}

(GH). For each facial image in this facial recognition process, salient features, such as the nose, mouth, and eyes, are located using morphological operators. The boundaries of the located elements are extracted by using a specific active contour method based on Fourier descriptors, which incorporates information about the global shape of each object. Finally, the specific points delimiting each extracted boundary are located, and the distance is measured between a specific pair of points, as represented by the random variables in Figure 1. This distance measure is expressed as the number of pixels in a facial image. As there is a sequence of such images, the actual measured distances are interval-valued variables. Thus, for example, the eye span distance

X_{1}

for case HUS1 is

X_{1} = [168.86, 172.84]

for this series of images. Notably, different conditions of alignment, illumination, pose, and occlusion cause variation in the distances extracted from different images of the same person. The study that generated the data set involved nine men and three sequences for each subject for a total of

m = 27

cases. The complete data set is provided in Table 1.

It is important to note that the data in Table 1 are aggregated. There are 27 interval-valued cases; if each case is drawn from a sequence of 1000 images, then there are 27,000 classical point observations in

R^{6}

. An underlying assumption of the standard classical analysis is that all 27,000 observations are independent. However, this is not the case in this data set. The data values for each face form a set of 1000 dependent observations. Therefore, if we were to use each image as a statistical unit by performing classical analysis, then we would lose information about the dependence contained in the 27,000 observations. The resulting principal component analysis would look for axes that maximize the variability across all 27,000 images, regardless of whether some images belong to the same sequence. In contrast, as interval-valued observations are obtained from each sequence, the Best Point method extracts the principal component axes that maximize the variability in each interval (i.e., those that maximize the internal variability), thereby retaining information on the dependency among the 1000 images in each sequence.

Comparison between the Center, Vertex, and Best Point Methods

We applied the vertex, center, and best point principal component methods to the data in Table 1. The Best Point principal component method was run with two different goals: (1) To minimize the squared distance and (2) to maximize the variance. From this point, the Best Point principal component method that minimizes the squared distance was designated as the Best Point Distance, and the Best Point principal component method that maximizes the variance was designated as the Best Point Variance. Table 2 shows the first two principal components generated by the four methods.

Figure 2 compares the data in Table 1 with the principal planes generated by the vertex, center, Best Point Distance, and Best Point Variance principal component methods. The plots of these, along with the first principal component (PC1) and second principal component (PC2) axes, are shown in Figure 2. The proximity of the three sequences for the three faces for each individual can be readily observed, which validates their within-subject coherence. Furthermore, with all four methods, the same four classes of faces were distinguished. Faces {INC, FRA} can be regarded as one class, faces {HUS, PHI, JPL} and {ISA, ROM} constitute two other classes, and faces {LOT, KHA} form the fourth class. Of the four principal planes in the graph, those corresponding to the Best Point Distance and Best Point Variance methods show much better separation of the face classes, which results in superior facial classification. In other words, the proposed methods more accurately predicted the individual in the photo.

Numerical analysis results confirm that the separation of classes from the Best Point Distance and Best Point Variance methods was much better than that from the other methods. Table 3 compares the accumulated variance of the vertex, center, Best Point Distance, and Best Point Variance principal component methods. The better methods are clearly Best Point Distance and Best Point Variance, which, in the third principal component, reached 91.25% and 99.72% of the accumulated variance, respectively; both of which were far superior to the results of the center and vertex methods.

As shown in Table 4, for the criterion of the minimum distance between the corners of the original hyper-rectangles in

R^{m}

and the principal components, the Best Point Distance method outperformed the other methods, with a minimum distance of 6676.43. This distance was significantly less than the distances obtained by the other methods.

In Table 5, we show the correlation of the original variables and the first principal component generated by the vertex, center, Best Point Distance, and Best Point Variance methods. For all variables except for

X_{6}

, the correlation was stronger with Best Point Distance or Best Point Variance. It can be inferred that the original variables were better represented by the PC1 component of the Best Point Distance and Best Point Variance methods. Interestingly, the correlation of the PC1 component from the Best Point Distance and Best Point Variance methods was stronger with the variables for the upper part of the face. The same result was generated if the correlations of the original variables were analyzed with respect to the other principal components of Best Point Distance and Best Point Variance.

4. Conclusions

This work focused on improving the center and vertex principal component methodology for interval-valued data. Compared with classical methods, symbolic methods based on interval-valued variables have important advantages, such as improved computational complexity due to reduced execution times, as small data tables are used. For example, for the facial recognition example, a table was passed from 27,000 cases to only 1 of 27 cases. In addition, symbolic methods allow for much better handling and interpretation of data variability. In the facial recognition scenario, the variation in the distances that measure different variables from one photo to another of the same person (such as the variation in the distance of the eyes

X_{1}

from one photo to another) was due to the variation in the angle at which the photo was taken.

The Best Point methods proposed in this paper considerably improved both the center method and the vertex method. This is because Best Point Variance maximized the variance explained by the components and Best Point Distance minimized the squared distance between the vertices of the hyper-rectangles and their respective projections. As shown in the tables above, this led to a substantial improvement in all the quality indices used in principal component analysis. The result is better data clustering and, therefore, better prediction.

In future works, a consensus between the Best Point Variance and Best Point Distance methods could be constructed by applying a multiobjective optimization method to the functions

φ

and

Λ

. Finally, all the proposed algorithms for executing symbolic analyses of interval-valued data are available in the RSDA package in R (see [15]).

Author Contributions

These authors contributed equally to this work.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Diday, E. Introduction a lApproache Symbolique en Analyse des Données. RAIRO Oper. Res. 1989, 23, 193–236. [Google Scholar] [CrossRef]
Billard, L.; Diday, E. Symbolic Data Analysis: Conceptual Statistics and Data Mining; John Wiley & Sons Ltd: Chichester, UK, 2006. [Google Scholar]
Cazes, P.; Chouakria, A.; Diday, E.; Schektman, Y. Extension de l’analyse en com-posantes principales á des données de type intervalle. Rev. Stat. Appl. 1997, 3, 5–24. [Google Scholar]
Douzal-Chouakria, A.; Billard, L.; Diday, E. Principal component analysis for interval-valued observations. Stat. Anal. Data Min. 2011, 4, 229–246. [Google Scholar] [CrossRef]
Lauro, C.; Palumbo, F. Principal Component Analysis of Interval Data: A Symbolic Data Analysis Approach. Comput. Stat. 2000, 15, 73–87. [Google Scholar] [CrossRef]
Le-Rademacher, J.; Billard, L. Symbolic Covariance Principal Component Analysis and Visualization for Interval-Valued Data. J. Comput. Gr. Stat. 2012, 21, 413–432. [Google Scholar] [CrossRef]
Diday, E. Principal component analysis for bar charts and metabins tables. Stat. Anal. Data Min. 2013, 6, 403–430. [Google Scholar] [CrossRef]
Diday, E.; Emilion, R.; Wang, C.; Wang, H.; Wang, S. Sampling Based Histogram PCA and its Mapreduce Parallel Implementation on Multicore. Symmetry 2018, 10, 5. [Google Scholar]
Ichino, M. Symbolic PCA for histogram-valued data. In Proceedings of the IASC, Yokohama, Japan, 5–8 December 2008. [Google Scholar]
Kallyth, S.M. Analyse en Composantes Principales de Variables Symboliques de Type Histogramme. Ph.D. Thesis, Université Paris-Dauphine, Paris, France, 2010. [Google Scholar]
Rodrıguez, O.; Diday, E.; Winsberg, S. Generalization of the principal components analysis to histogram data. In Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, Lyon, France, 13–16 September 2000. [Google Scholar]
Rodríguez, O. Classification et Modèles Linéaires en Analyse des Données Symboliques. PhD Thesis, Paris IX-Dauphine University, Paris, France, 2000. [Google Scholar]
Arce, J. Dimensionality Reduction Methods for Symbolic Interval Variables. Master’s Thesis, University of Costa Rica, San José, Costa Rica, 2018. [Google Scholar]
Leroy, B.; Chouakria, A.; Herlin, I.; Diday, E. Approche geometrique et classication pour la reconnaissance de visage. Reconnaissance des Forms et Intelligence Artificelle. In Proceedings of the Congrès de Reconnaissance des Formes et IntelligenceArtificielle, Rennes, France, January 1996; pp. 548–557. (In France). [Google Scholar]
Rodríguez, O. RSDA—R to Symbolic Data Analysis. R Package Version 2.0.8. 2019. Available online: http://CRAN.R-project.org/package=RSDA (accessed on 16 October 2019).

Figure 1. Random variables for facial description.

Figure 2. Principal component analysis (PCA) comparison.

Table 1. Faces data set.

Case	$X_{1} =$ AD	$X_{2} =$ BC	$X_{3} =$ AH	$X_{4} =$ DH	$X_{5} =$ EH	$X_{6} =$ GH
FRA1	[155.00, 157.00]	[58.00, 61.01]	[100.45, 103.28]	[105.00, 107.30]	[61.40, 65.73]	[64.20, 67.80]
FRA2	[154.00, 160.01]	[57.00, 64.00]	[101.98, 105.55]	[104.35, 107.30]	[60.88, 63.03]	[62.94, 66.47]
FRA3	[154.01, 161.00]	[57.00, 63.00]	[99.36, 105.65]	[101.04, 109.04]	[60.95, 65.60]	[60.42, 66.40]
HUS1	[168.9,172.84]	[58.55,63.39]	[102.83,106.53]	[122.38,124.52]	[56.73,61.07]	[60.44,64.54]
HUS2	[169.8,175.03]	[60.21,64.38]	[102.94,108.71]	[120.24,124.52]	[56.73,62.37]	[60.44,66.84]
HUS3	[168.8,175.15]	[61.4,63.51]	[104.35,107.45]	[120.93,125.18]	[57.2,61.72]	[58.14,67.08]
INC1	[155.3,160.45]	[53.15,60.21]	[95.88,98.49]	[91.68,94.37]	[62.48,66.22]	[58.9,63.13]
INC2	[156.3,161.31]	[51.09,60.07]	[95.77,99.36]	[91.21,96.83]	[54.92,64.2]	[54.41,61.55]
INC3	[154.5,160.31]	[55.08,59.03]	[93.54,98.98]	[90.43,96.43]	[59.03,65.86]	[55.97,65.8]
ISA1	[164,168]	[55.01,60.03]	[120.28,123.04]	[117.52,121.02]	[54.38,57.45]	[50.8,53.25]
ISA2	[163,170]	[54.04,59]	[118.8,123.04]	[116.67,120.24]	[55.47,58.67]	[52.43,55.23]
ISA3	[164,169.01]	[55,59.01]	[117.38,123.11]	[116.67,122.43]	[52.8,58.31]	[52.2,55.47]
JPL1	[167.1,171.19]	[61.03,65.01]	[118.23,121.82]	[108.3,111.2]	[63.89,67.88]	[57.28,60.83]
JPL2	[169.1,173.18]	[60.07,65.07]	[118.85,120.88]	[108.98,113.17]	[62.63,69.07]	[57.38,61.62]
JPL3	[169,170.11]	[59.01,65.01]	[115.88,121.38]	[110.34,112.49]	[61.72,68.25]	[59.46,62.94]
KHA1	[149.3,155.54]	[54.15,59.14]	[111.95,115.75]	[105.36,111.07]	[54.2,58.14]	[48.27,50.61]
KHA2	[149.3,155.32]	[52.04,58.22]	[111.2,113.22]	[105.36,111.07]	[53.71,58.14]	[49.41,52.8]
KHA3	[150.3,157.26]	[52.09,60.21]	[109.04,112.7]	[104.74,111.07]	[55.47,60.03]	[49.2,53.41]
LOT1	[152.6,157.62]	[51.35,56.22]	[116.73,119.67]	[114.62,117.41]	[55.44,59.55]	[53.01,56.6]
LOT2	[154.6,157.62]	[52.24,56.32]	[117.52,119.67]	[114.28,117.41]	[57.63,60.61]	[54.41,57.98]
LOT3	[154.8,157.81]	[50.36,55.23]	[117.59,119.75]	[114.04,116.83]	[56.64,61.07]	[55.23,57.8]
PHI1	[163.1,167.07]	[66.03,68.07]	[115.26,119.6]	[116.1,121.02]	[60.96,65.3]	[57.01,59.82]
PHI2	[164,168.03]	[65.03,68.12]	[114.55,119.6]	[115.26,120.97]	[60.96,67.27]	[55.32,61.52]
PHI3	[161,167]	[64.07,69.01]	[116.67,118.79]	[114.59,118.83]	[61.52,68.68]	[56.57,60.11]
ROM1	[167.2,171.24]	[64.07,68.07]	[123.75,126.59]	[122.92,126.37]	[51.22,54.64]	[49.65,53.71]
ROM2	[168.2,172.14]	[63.13,68.07]	[122.33,127.29]	[124.08,127.14]	[50.22,57.14]	[49.93,56.94]
ROM3	[167.1,171.19]	[63.13,68.03]	[121.62,126.57]	[122.58,127.78]	[49.41,57.28]	[50.99,60.46]

Table 2. Comparison of the first two principal components from the four methods.

	Vertex		Center		Best Point Distance		Best Point Variance
Cases	PC1	PC2	PC1	PC2	PC1	PC2	PC1	PC2
FRA1	[1.61,2.66]	[0.27,1.57]	[-2.97,-1.75]	[0.24,1.72]	[ $- 2.21$ , $- 1.39$ ]	[0.24,1.25]	[ $- 2.83$ , $- 1.57$ ]	[0.45,1.87]
FRA2	[1.03,2.49]	[ $- 0.11$ ,1.61]	[ $- 2.73$ , $- 1.11$ ]	[ $- 0.2$ ,1.79]	[ $- 2.09$ , $- 0.93$ ]	[ $- 0.06$ ,1.29]	[ $- 2.57$ , $- 0.92$ ]	[0.01,1.95]
FRA3	[0.81,2.99]	[ $- 0.4$ ,1.88]	[ $- 3.29$ , $- 0.86$ ]	[ $- 0.52$ ,2.08]	[ $- 2.49$ , $- 0.73$ ]	[ $- 0.29$ ,1.48]	[ $- 3.15$ , $- 0.73$ ]	[ $- 0.31$ ,2.21]
HUS1	[ $- 1.1$ ,0.24]	[0.39,2.05]	[ $- 0.37$ ,1.16]	[0.38,2.28]	[ $- 0.19$ ,0.86]	[0.28,1.58]	[ $- 0.24$ ,1.34]	[0.47,2.31]
HUS2	[ $- 1.41$ ,0.4]	[0.56,2.65]	[ $- 0.58$ ,1.48]	[0.59,2.98]	[ $- 0.33$ ,1.11]	[0.42,2.04]	[ $- 0.42$ ,1.66]	[0.66,2.97]
HUS3	[ $- 1.42$ ,0.24]	[0.43,2.52]	[ $- 0.4$ ,1.51]	[0.46,2.82]	[ $- 0.21$ ,1.13]	[0.33,1.94]	[ $- 0.22$ ,1.63]	[0.53,2.8]
INC1	[2.29,3.77]	[ $- 0.67$ ,1.23]	[ $- 4.07$ , $- 2.38$ ]	[ $- 0.9$ ,1.29]	[ $- 3.11$ , $- 1.95$ ]	[ $- 0.53$ ,0.95]	[ $- 3.97$ , $- 2.23$ ]	[ $- 0.55$ ,1.56]
INC2	[1.35,3.66]	[ $- 2.05$ ,0.92]	[ $- 3.91$ , $- 1.21$ ]	[ $- 2.51$ ,0.93]	[ $- 3.02$ , $- 1.22$ ]	[ $- 1.63$ ,0.7]	[ $- 3.8$ , $- 0.97$ ]	[ $- 2.02$ ,1.22]
INC3	[1.86,4.02]	[ $- 1.2$ ,1.41]	[ $- 4.33$ , $- 1.84$ ]	[ $- 1.5$ ,1.47]	[ $- 3.34$ , $- 1.61$ ]	[ $- 0.94$ ,1.08]	[ $- 4.14$ , $- 1.67$ ]	[ $- 1.08$ ,1.77]
ISA1	[ $- 2.01$ , $- 0.8$ ]	[ $- 1.83$ , $- 0.46$ ]	[0.88,2.24]	[ $- 2.06$ , $- 0.48$ ]	[0.66,1.61]	[ $- 1.42$ , $- 0.34$ ]	[0.75,2.15]	[ $- 2.11$ , $- 0.59$ ]
ISA2	[ $- 1.86$ , $- 0.37$ ]	[ $- 1.71$ , $- 0.08$ ]	[0.39,2.04]	[ $- 1.93$ , $- 0.07$ ]	[0.32,1.51]	[ $- 1.32$ , $- 0.07$ ]	[0.26,1.95]	[ $- 1.99$ , $- 0.19$ ]
ISA3	[ $- 2.11$ , $- 0.41$ ]	[ $- 1.84$ , $- 0.12$ ]	[0.44,2.36]	[ $- 2.09$ , $- 0.11$ ]	[0.34,1.7]	[ $- 1.43$ , $- 0.09$ ]	[0.35,2.32]	[ $- 2.1$ , $- 0.21$ ]
JPL1	[ $- 0.92$ ,0.36]	[0.54,2.03]	[ $- 0.57$ ,0.89]	[0.67,2.38]	[ $- 0.29$ ,0.73]	[0.43,1.59]	[ $- 0.7$ ,0.79]	[0.49,2.14]
JPL2	[ $- 1.17$ ,0.34]	[0.48,2.37]	[ $- 0.59$ ,1.17]	[0.58,2.76]	[ $- 0.25$ ,0.93]	[0.37,1.84]	[ $- 0.76$ ,1.1]	[0.43,2.47]
JPL3	[ $- 0.93$ ,0.52]	[0.5,2.28]	[ $- 0.78$ ,0.92]	[0.59,2.64]	[ $- 0.4$ ,0.73]	[0.38,1.78]	[ $- 0.89$ ,0.91]	[0.45,2.43]
KHA1	[ $- 0.39$ ,1.18]	[ $- 3.08$ , $- 1.46$ ]	[ $- 1.1$ ,0.64]	[ $- 3.51$ , $- 1.65$ ]	[ $- 1.01$ ,0.25]	[ $- 2.34$ , $- 1.09$ ]	[ $- 1.21$ ,0.58]	[ $- 3.32$ , $- 1.56$ ]
KHA2	[ $- 0.15$ ,1.46]	[ $- 3.17$ , $- 1.32$ ]	[ $- 1.43$ ,0.39]	[ $- 3.65$ , $- 1.5$ ]	[ $- 1.23$ ,0.05]	[ $- 2.42$ , $- 0.98$ ]	[ $- 1.52$ ,0.37]	[ $- 3.41$ , $- 1.39$ ]
KHA3	[ $- 0.25$ ,1.71]	[ $- 2.95$ , $- 0.72$ ]	[ $- 1.73$ ,0.47]	[ $- 3.39$ , $- 0.82$ ]	[ $- 1.42$ ,0.13]	[ $- 2.26$ , $- 0.52$ ]	[ $- 1.83$ ,0.43]	[ $- 3.18$ , $- 0.72$ ]
LOT1	[ $- 0.61$ ,0.74]	[ $- 2.51$ , $- 0.87$ ]	[ $- 0.79$ ,0.74]	[ $- 2.86$ , $- 0.98$ ]	[ $- 0.6$ ,0.47]	[ $- 1.91$ , $- 0.64$ ]	[ $- 0.94$ ,0.66]	[ $- 2.82$ , $- 1.01$ ]
LOT2	[ $- 0.4$ ,0.69]	[ $- 1.94$ , $- 0.62$ ]	[ $- 0.77$ ,0.47]	[ $- 2.2$ , $- 0.69$ ]	[ $- 0.55$ ,0.31]	[ $- 1.47$ , $- 0.44$ ]	[ $- 0.91$ ,0.36]	[ $- 2.21$ , $- 0.75$ ]
LOT3	[ $- 0.34$ ,0.82]	[ $- 2.12$ , $- 0.7$ ]	[ $- 0.93$ ,0.41]	[ $- 2.44$ , $- 0.78$ ]	[ $- 0.64$ ,0.26]	[ $- 1.63$ , $- 0.51$ ]	[ $- 1.09$ ,0.33]	[ $- 2.41$ , $- 0.85$ ]
PHI1	[ $- 1.51$ , $- 0.22$ ]	[0.56,1.84]	[0.11,1.57]	[0.72,2.18]	[0.14,1.19]	[0.49,1.47]	[0.05,1.53]	[0.59,1.97]
PHI2	[ $- 1.66$ ,0.09]	[0.33,2.29]	[ $- 0.27$ ,1.74]	[0.45,2.69]	[ $- 0.1$ ,1.3]	[0.3,1.82]	[ $- 0.35$ ,1.67]	[0.32,2.46]
PHI3	[ $- 1.38$ ,0.25]	[0.25,2.25]	[ $- 0.45$ ,1.44]	[0.36,2.67]	[ $- 0.22$ ,1.07]	[0.25,1.8]	[ $- 0.62$ ,1.4]	[0.24,2.39]
ROM1	[ $- 3.45$ , $- 2.19$ ]	[ $- 1.2$ ,0.29]	[2.41,3.84]	[ $- 1.27$ ,0.44]	[1.74,2.74]	[ $- 0.89$ ,0.26]	[2.39,3.84]	[ $- 1.38$ ,0.26]
ROM2	[ $- 3.63$ , $- 1.85$ ]	[ $- 1.3$ ,0.97]	[1.96,4.04]	[ $- 1.39$ ,1.2]	[1.49,2.89]	[ $- 0.98$ ,0.79]	[1.94,4.06]	[ $- 1.5$ ,0.99]
ROM3	[ $- 3.57$ , $- 1.48$ ]	[ $- 1.33$ ,1.31]	[1.53,3.98]	[ $- 1.44$ ,1.58]	[1.17,2.83]	[ $- 1$ ,1.06]	[1.57,4.04]	[ $- 1.5$ ,1.39]

Table 3. Comparison of the variance resulting from different methods using facial recognition data.

	Vertex	Center	Best Point Distance	Best Point Variance
PC1	42.67%	46.47%	45.49%	56.01%
PC2	72.64%	80.53%	81.05%	88.31%
PC3	83.35%	89.65%	91.25%	99.72%
PC4	91.28%	95.06%	95.80%	99.85%
PC5	96.86%	98.96%	99.28%	99.97%
PC6	100.00%	100.00%	100.00%	100.00%

Table 4. Comparison of the distances resulting from different methods in facial recognition data.

	Vertex	Center	Best Point Distance	Best Point Variance
	10368.00	12719.64	6676.43	12457.09

Table 5. Correlation between the first component of each method and the variables.

	$X_{1} =$ AD	$X_{2} =$ BC	$X_{3} =$ AH	$X_{4} =$ DH	$X_{5} =$ EH	$X_{6} =$ GH
Vertex-PC1	0.64	0.49	0.84	0.89	−0.47	−0.43
Center-PC1	0.61	0.47	0.83	0.88	−0.52	−0.47
BestPointDistance-PC1	0.65	0.48	0.84	0.90	−0.46	−0.43
BestPointVariance-PC1	0.63	0.49	0.80	0.88	−0.55	−0.43

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Arce Garro, J.; Rodríguez Rojas, O. Optimized Dimensionality Reduction Methods for Interval-Valued Variables and Their Application to Facial Recognition. Entropy 2019, 21, 1016. https://doi.org/10.3390/e21101016

AMA Style

Arce Garro J, Rodríguez Rojas O. Optimized Dimensionality Reduction Methods for Interval-Valued Variables and Their Application to Facial Recognition. Entropy. 2019; 21(10):1016. https://doi.org/10.3390/e21101016

Chicago/Turabian Style

Arce Garro, Jorge, and Oldemar Rodríguez Rojas. 2019. "Optimized Dimensionality Reduction Methods for Interval-Valued Variables and Their Application to Facial Recognition" Entropy 21, no. 10: 1016. https://doi.org/10.3390/e21101016

APA Style

Arce Garro, J., & Rodríguez Rojas, O. (2019). Optimized Dimensionality Reduction Methods for Interval-Valued Variables and Their Application to Facial Recognition. Entropy, 21(10), 1016. https://doi.org/10.3390/e21101016

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimized Dimensionality Reduction Methods for Interval-Valued Variables and Their Application to Facial Recognition

Abstract

1. The Center Method

2. The Best Point Method

2.1. Minimizing the Square of the Distance from the Hypercube Vertices to the Principal Axes of Z

2.2. Maximizing the Variance of the First Components

3. Experimental Evaluation: The Application to Facial Recognition

Comparison between the Center, Vertex, and Best Point Methods

4. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI