An Entropy-Based Tool to Help the Interpretation of Common-Factor Spaces in Factor Analysis

Eshima, Nobuoki; Borroni, Claudio Giovanni; Tabata, Minoru; Kurosawa, Takeshi

doi:10.3390/e23020140

Open AccessArticle

An Entropy-Based Tool to Help the Interpretation of Common-Factor Spaces in Factor Analysis

¹

Center for Educational Outreach and Admissions, Kyoto University, Yoshida-machi, Sakyoku, Kyoto 660-8501, Japan

²

Department of Statistics and Quantitative Methods, University of Milano Bicocca, 20126 Milano, Italy

³

Department of Mathematical Sciences, Osaka Prefecture University, Osaka 599-8532, Japan

⁴

Department of Applied Mathematics, Tokyo University of Science, Kagurazaka, Shinzyukuku, Tokyo 162-0825, Japan

^*

Author to whom correspondence should be addressed.

Entropy 2021, 23(2), 140; https://doi.org/10.3390/e23020140

Submission received: 27 December 2020 / Revised: 18 January 2021 / Accepted: 19 January 2021 / Published: 24 January 2021

(This article belongs to the Section Information Theory, Probability and Statistics)

Download

Browse Figure

Review Reports Versions Notes

Abstract

This paper proposes a method for deriving interpretable common factors based on canonical correlation analysis applied to the vectors of common factors and manifest variables in the factor analysis model. First, an entropy-based method for measuring factor contributions is reviewed. Second, the entropy-based contribution measure of the common-factor vector is decomposed into those of canonical common factors, and it is also shown that the importance order of factors is that of their canonical correlation coefficients. Third, the method is applied to derive interpretable common factors. Numerical examples are provided to demonstrate the usefulness of the present approach.

Keywords:

canonical factor analysis; canonical common factor; entropy; factor contribution

1. Introduction

In factor analysis, extracting interpretable factors is important for practical data analysis. In order to carry it out, methods for factor rotation have been studied, e.g., varimax [1] and orthomax [2] for orthogonal rotations and oblimin [3] and orthoblique [4] for oblique rotations. The basic idea for factor rotation in factor analysis is owed to the criteria of simple structures of factor analysis models by Thurstone [5], and the methods of factor rotation are constructed with respect to maximizations of variations of the squared factor loadings in order to derive simple structures of factor analysis models. Let

X_{i}

be manifest variables, let

ξ_{j}

be latent variables (common factors), let

ε_{i}

be unique factors related to

X_{i}

, and finally, let

λ_{i j}

be factor loadings that are weights of common factors

ξ_{j}

to explain

X_{i}

. Then, the factor analysis model is given as follows:

X_{i} = \sum_{j = 1}^{m} λ_{i j} ξ_{j} + ε_{i}, i = 1, 2, \dots, p,

(1)

where

{\begin{matrix} E (X_{i}) = E (ε_{i}) = 0, i = 1, 2, \dots, p; \\ E (ξ_{j}) = 0, j = 1, 2, \dots, m; \\ Var (ξ_{j}) = 1, j = 1, 2, \dots, m; Cov (ξ_{k}, ξ_{l}) = ϕ_{k l}; \\ Var (ε_{i}) = ω_{i}^{2} > 0, i = 1, 2, \dots, p; \\ Cov (ε_{k}, ε_{l}) = 0, k \neq l . \end{matrix}

To derive simple structures of factor analysis models, for example, in the varimax method, the following variation function of squared factor loadings is maximized with respect to factor loadings:

V = \sum_{i = 1}^{p} \sum_{j = 1}^{m} {(λ_{i j}^{2} - \bar{λ^{2}})}^{2},

(2)

where

\bar{λ^{2}} = \frac{1}{p m} \sum_{i = 1}^{p} \sum_{j = 1}^{m} λ_{i j}^{2}

. In this sense, the basic factor rotation methods can be viewed as those for exploratively analyzing multidimensional common-factor spaces. The interpretation of factors is made according to manifest variables with large weights in common factors. As far as we know, novel methods for factor rotation have not been investigated except for rotation methods similar to the above basic ones. In real data analyses, manifest variables are usually classified into some groups of variables in advance that may have common factors and concepts for themselves. For example, suppose we have a test battery including the following five subjects: Japanese, English, Social Science, Mathematics, and Natural Science. It is then reasonable to classify the five subjects into two groups, {Japanese, English, Social Science} and {Mathematics, Natural Science}. In such cases, it is meaningful to determine common factors related to the two manifest variable groups. For this objective, it is useful to develop a novel method to derive the common factors based on a factor contribution measure. In conventional methods of factor rotation, for example, as mentioned above, variation function (2) for the varimax method is not related to factor contribution.

An entropy-based method for measuring factor contribution was proposed by [6], and the method can measure factor contributions to manifest variables vectors and can decompose the factor contributions into those of manifest subvectors and individual manifest variables. By using the method, we can derive important common factors related to the manifest subvectors and the manifest variables. The aim of the present paper is to propose a new method for deriving simple structures based on entropy, that is, extracting common factors easy to interpret. In Section 2, an entropy-based method for measuring factor contribution [6] is reviewed to apply its properties for deriving simple structures in factor analysis models. Section 3 discusses canonical correlation analysis between common factors and manifest variables, and the contributions of common factors to the manifest variables are decomposed into components related to the extracted pairs of canonical variables. A numerical example is given to demonstrate the approach. In Section 4, canonical correlation analysis is applied to obtain common factors easy to interpret, and the contributions of the extracted factors are measured. Numerical examples are given to illustrate the present approach, and finally, Section 5 provides discussions and conclusions to summarize the present approach.

2. Entropy-Based Method for Measuring Factor Contributions

First, in order to derive factor contributions, factor analysis model (1) with error terms

ε_{i}, i = 1, 2, \dots, p

, which are normally distributed, can be discussed in the framework of generalized linear models (GLMs) [7]. A general path diagram among manifest variables

X_{i}, i = 1, 2, \dots, p

and common factors

ξ_{j}, j = 1, 2, \dots, m

in the factor analysis model is illustrated in Figure 1. The conditional density functions of manifest variables of

X_{i}, i = 1, 2, \dots, p

, given the factors

ξ_{j}, j = 1, 2, \dots, m

, are expressed as follows:

\begin{matrix} f_{i} (x_{i} | ξ) = \frac{1}{\sqrt{2 π ω_{i}^{2}}} \exp (- \frac{{(x_{i} - \sum_{j = 1}^{m} λ_{i j} ξ_{j})}^{2}}{2 ω_{i}^{2}}) \\ = \exp (\frac{x_{i} \sum_{j = 1}^{m} λ_{i j} ξ_{j} - \frac{1}{2} {(\sum_{j = 1}^{m} λ_{i j} ξ_{j})}^{2}}{ω_{i}^{2}} - \frac{x_{i}^{2}}{2 ω_{i}^{2}} - \log \sqrt{2 π ω_{i}^{2}}), i = 1, 2, \dots, p . \end{matrix}

Let

θ_{i} = \sum_{j = 1}^{m} λ_{i j} ξ_{j}

and

d (x_{i}, ω_{i}^{2}) = - \frac{x_{i}^{2}}{2 ω_{i}^{2}} - \log \sqrt{2 π ω_{i}^{2}}

. Then, the above density function is described in a GLM framework as

f_{i} (x_{i} | ξ) = \exp (\frac{x_{i} θ_{i} - \frac{1}{2} θ_{i}^{2}}{ω_{i}^{2}} + d (x_{i}, ω_{i}^{2})), i = 1, 2, \dots, p .

(3)

According to the local independence of the manifest variables in factor analysis model (1), the conditional density function of

X = {(X_{1}, X_{2}, \dots, X_{p})}^{T}

given

ξ = {(ξ_{1}, ξ_{2}, \dots, ξ_{m})}^{T}

is expressed as

f (x | ξ) = \prod_{i = 1}^{p} \exp (\frac{x_{i} θ_{i} - \frac{1}{2} θ_{i}^{2}}{ω_{i}^{2}} + d (x_{i}, ω_{i}^{2})) = \exp (\sum_{i = 1}^{p} \frac{x_{i} θ_{i} - \frac{1}{2} θ_{i}^{2}}{ω_{i}^{2}} + \sum_{i = 1}^{p} d (x_{i}, ω_{i}^{2})) .

(4)

Let

g (ξ)

be the joint density function of common-factor vector

= {(ξ_{1}, ξ_{2}, \dots, ξ_{m})}^{T}

; let

f_{i} (x_{i})

be the marginal density functions of

X_{i}, i = 1, 2, \dots, p

; and let us set

KL (X, ξ) = \iint^{} f (x | ξ) g (ξ) \log \frac{f (x | ξ)}{f (x)} d x d ξ + \iint^{} f (x) g (ξ) \log \frac{f (x)}{f (x | ξ)} d x d ξ,

(5)

\begin{matrix} KL (X_{i}, ξ) = \iint^{} f_{i} (x_{i} | ξ) g (ξ) \log \frac{f_{i} (x_{i} | ξ)}{f_{i} (x_{i})} d x_{i} d ξ + \iint^{} f_{i} (x_{i}) g (ξ) \log \frac{f_{i} (x_{i})}{f_{i} (x_{i} | ξ)} d x_{i} d ξ, \\ i = 1, 2, \dots, p . \end{matrix}

(6)

where “KL” stands for “Kullback–Leibler information” [8]. From (3) and (4), we have

KL (X_{i}, ξ) = \frac{Cov (X_{i}, θ_{i})}{ω_{i}^{2}} = \sum_{j = 1}^{m} \frac{λ_{i j} Cov (X_{i}, ξ_{j})}{ω_{i}^{2}}, i = 1, 2, \dots, p;

(7)

KL (X, ξ) = \sum_{i = 1}^{p} \frac{Cov (X_{i}, θ_{i})}{ω_{i}^{2}} = \sum_{i = 1}^{p} \sum_{j = 1}^{m} \frac{λ_{i j} Cov (X_{i}, ξ_{j})}{ω_{i}^{2}} .

(8)

The above quantities (7) and (8) are interpreted as the signal-to-noise ratios for dependent variables

X_{i}

and predictors

θ_{i}

; and the signal-to-noise ratio for dependent-variable vectors

X

and common-factor vector

ξ

, respectively.

From (7) and (8), the following theorem can be derived [6]:

Theorem 1.

In factor analysis model (1), let

X = {(X_{1}, X_{2}, \dots, X_{p})}^{T}

and

ξ = {(ξ_{1}, ξ_{2}, \dots, ξ_{m})}^{T}

. Then,

KL (X, ξ) = \sum_{i = 1}^{p} KL (X_{i}, ξ) .

Consistently, the following theorem, which is actually an extended version of Corollary 1 in [6], can be also obtained:

Theorem 2.

Let manifest variable subvectors

X^{(a)}, a = 1, 2, \dots, A

be any decomposition of manifest variable vector

X = {(X_{1}, X_{2}, \dots, X_{p})}^{T}

. Then,

KL (X, ξ) = \sum_{a = 1}^{A} KL (X^{(a)}, ξ) .

(9)

Following Eshima et al. [6], the contribution of factor vector

ξ = {(ξ_{1}, ξ_{2}, \dots, ξ_{m})}^{T}

to manifest variable vector

X = {(X_{1}, X_{2}, \dots, X_{p})}^{T}

is thus defined as

C (ξ \to X) = KL (X, ξ),

so that, in Theorem 2, the contributions of factor vector

ξ = {(ξ_{1}, ξ_{2}, \dots, ξ_{m})}^{T}

to manifest variable vectors

X^{(a)}, a = 1, 2, \dots, A

are defined by

C (ξ \to X^{(a)}) = KL (X^{(a)}, ξ), a = 1, 2, \dots, A .

Let

ξ^{\ j}

be subvectors of all variables

ξ_{i}

except

ξ_{j}

from

ξ = {(ξ_{1}, ξ_{2}, \dots, ξ_{m})}^{T}

, i.e.,

ξ^{\ j} = {(ξ_{1}, ξ_{2}, \dots, ξ_{j - 1}, ξ_{j + 1}, \dots, ξ_{m})}^{T}, j = 1, 2, \dots, m;

and let

KL (X, ξ^{\ j} | ξ_{j})

and

KL (X^{(a)}, ξ^{\ j} | ξ_{j})

be the conditional Kullback–Leibler information as defined in (5) and (6). The contributions of common factors

ξ_{j}

are defined by

C (ξ_{j} \to X) = KL (X, ξ) - KL (X, ξ^{\ j} | ξ_{j}),

C (ξ_{j} \to X^{(a)}) = KL (X^{(a)}, ξ) - KL (X^{(a)}, ξ^{\ j} | ξ_{j}), j = 1, 2, \dots, m .

Remark 1.

Information

KL (X, ξ^{\ j} | ξ_{j})

and

KL (X^{(a)}, ξ^{\ j} | ξ_{j})

can be expressed by using the conditional covariances

Cov (X_{i}, θ_{i} | ξ_{j})

. For example,

KL (X, ξ^{\ j} | ξ_{j}) = \sum_{i = 1}^{p} \frac{Cov (X_{i}, θ_{i} | ξ_{j})}{ω_{i}^{2}} .

Finally, the following decomposition of

KL (X, ξ)

holds for orthogonal factors ([6], Theorem 3):

Theorem 3.

If the common factors are mutually independent, it follows that

C (ξ \to X) = \sum_{j = 1}^{m} \sum_{a = 1}^{A} C (ξ_{j} \to X^{(a)}) = \sum_{j = 1}^{m} \sum_{i = 1}^{p} C (ξ_{j} \to X_{i}) .

The entropy coefficient of determination (ECD) [9] between

ξ

and

X

is defined by

ECD (ξ, X) = \frac{KL (ξ, X)}{KL (ξ, X) + 1},

so that the total relative contribution of factor vector

ξ

to manifest variable vector

X

in entropy can be defined as

\tilde{RC} (ξ \to X) = ECD (ξ, X) = \frac{C (ξ \to X)}{C (ξ \to X) + 1},

while, for a single factor

ξ_{j}

, two relative contribution ratios can be defined:

\begin{matrix} RC (ξ_{j} \to X) = \frac{C (ξ_{j} \to X)}{C (ξ \to X)} = \frac{KL (X, ξ) - KL (X, ξ^{\ j} | ξ_{j})}{KL (ξ, X)}, \\ \tilde{RC} (ξ_{j} \to X) = \frac{C (ξ_{j} \to X)}{KL (ξ, X) + 1} = \frac{KL (X, ξ) - KL (X, ξ^{\ j} | ξ_{j})}{KL (ξ, X) + 1} \end{matrix}

(see [6] for details).

Second, factor analysis model (1) in a general case is discussed. Let

Σ

be the variance–covariance matrix of manifest variable vector

X = {(X_{1}, X_{2}, \dots, X_{p})}^{T}

; let

Ω

be the

p \times p

variance–covariance matrix of unique factor vector

ε = {(ε_{1}, ε_{2}, \dots, ε_{p})}^{T}

; let

Λ

be the

p \times m

factor loading matrix of

λ_{i j}

; and let

Φ

be the correlation matrix of common-factor vector

ξ = {(ξ_{1}, ξ_{2}, \dots, ξ_{m})}^{T}

. Then, model (1) can be expressed as

X = Λ ξ + ε

and we have

Σ = Λ Φ Λ^{T} + Ω .

Now, the above discussion is extended in a general factor analysis model (1) with the following variance–covariance matrix of

X

and

ε

:

(\begin{matrix} Λ Φ Λ^{T} + Ω & Λ Φ \\ Φ Λ^{T} & Φ \end{matrix}) .

(10)

Let

θ = Λ ξ

be the predictor vector of manifest variable vector

X^{T} = (X_{1}, X_{2}, \dots, X_{p})

. Then, the contribution of common-factor vector

ξ

to manifest variable vector

X

is defined by the following generalized signal-to-noise ratio:

E (X^{T} Ω^{- 1} θ) = \frac{E (X^{T} \tilde{Ω} Λ ξ)}{| Ω |} = \frac{tr \tilde{Ω} Λ Φ Λ^{T}}{| Ω |},

(11)

where

\tilde{Ω}

is the cofactor matrix of

Ω

. The signal is

tr \tilde{Ω} Λ Φ Λ^{T}

and the noise

| Ω |

, and both are positive. Hence, the above quantity is defined as the explained entropy with the factor analysis model, and the same notation

K L (X, ξ)

as above is used, having to do with the Kullback–Leibler information for the factor analysis model with normal distribution errors (4). Similarly, in the general model, as in (9), signal-to-noise ratio (11) is decomposed into

\frac{tr \tilde{Ω} Λ Φ Λ^{T}}{| Ω |} = \sum_{i = 1}^{p} \frac{Cov (X_{i}, θ_{i})}{ω_{i}^{2}} = \sum_{i = 1}^{p} \sum_{j = 1}^{m} \frac{λ_{i j} Cov (X_{i}, ξ_{j})}{ω_{i}^{2}},

so the above theorems hold true as well. Thus, the results mentioned above are applicable to factor analysis models with error terms with non-normal distributions.

3. Canonical Factor Analysis

In order to derive interpretable factors from the common-factor space, we propose taking advantage of the results of canonical correlation analysis applied to manifest variables and common factors. This approach can be referred to as “canonical factor analysis” [10]. In the factor analysis model (1), the variance–covariance matrix of

X = {(X_{1}, X_{2}, \dots, X_{p})}^{T}

and

ξ = {(ξ_{1}, ξ_{2}, \dots, ξ_{m})}^{T}

is given by (10). Then, we have the following theorem:

Theorem 4.

For canonical correlation coefficients

ρ_{k}, k = 1, 2, \dots, m

between

X

and

ξ

in factor analysis model (1) with (10), it follows that

KL (X, ξ) = \sum_{j = 1}^{m} \frac{ρ_{j}^{2}}{1 - ρ_{j}^{2}} .

Proof.

Let

B^{(1)}

,

B^{(2)}

, and

F

be

m \times p

,

(p - m) \times p

, and

m \times m

matrices, respectively; let

V^{(1)} = {(V_{1}, V_{2}, \dots, V_{m})}^{T} = B^{(1)} X

,

V^{(2)} = B^{(2)} X

, and

η = {(η_{1}, η_{2}, \dots, η_{m})}^{T} = F ξ

. It is assumed that

(V_{j}, η_{j})

are the pairs of canonical variables with correlation coefficients

ρ_{j}, j = 1, 2, \dots, m

; that matrices

(\begin{matrix} B^{(1)} \\ B^{(2)} \end{matrix})

and

F

are nonsingular; and that

V^{(1)}

and

V^{(2)}

are statistically independent. Since all pairs of canonical variables

(V_{j}, η_{j})

and

V^{(2)}

are mutually independent, we have

KL (V^{(2)}, η) = 0, KL (V^{(1)}, η_{j}) = KL (V_{j}, η_{j}), j = 1, 2, \dots, m .

From Theorem 2, it follows that

\begin{matrix} KL (X, ξ) = KL (V, F ξ) = KL ((\begin{matrix} V^{(1)} \\ V^{(2)} \end{matrix}), η) = KL (V^{(1)}, η) + KL (V^{(2)}, η) = KL (V^{(1)}, η) = \sum_{j = 1}^{m} KL (V^{(1)}, η_{j}) \\ = \sum_{j = 1}^{m} KL (V_{j}, η_{j}) = \sum_{j = 1}^{m} \frac{ρ_{j}^{2}}{1 - ρ_{j}^{2}} . \end{matrix}

This completes the theorem. □

In the proof of the above theorem, we have

KL (X, η_{j}) = KL (V_{j}, η_{j}) = \frac{ρ_{j}^{2}}{1 - ρ_{j}^{2}}, j = 1, 2, \dots, m .

(12)

It implies that

C (η_{j} \to X) = C (η_{j} \to V_{j}) = \frac{ρ_{j}^{2}}{1 - ρ_{j}^{2}};

\tilde{RC} (η_{j} \to X) = \frac{KL (X, η_{j})}{KL (X, ξ) + 1} = \frac{KL (X, η_{j})}{KL (η, V) + 1} = \frac{\frac{ρ_{j}^{2}}{1 - ρ_{j}^{2}}}{\sum_{a = 1}^{m} \frac{ρ_{a}^{2}}{1 - ρ_{a}^{2}} + 1};

RC (η_{j} \to X) = \frac{KL (X, η_{j})}{KL (ξ, X)} = \frac{KL (V_{j}, η_{j})}{KL (V, η)} = \frac{\frac{ρ_{j}^{2}}{1 - ρ_{j}^{2}}}{\sum_{a = 1}^{m} \frac{ρ_{a}^{2}}{1 - ρ_{a}^{2}}}, j = 1, 2, \dots, m .

Theorem 4 shows that the contribution of common-factor vector

ξ

to manifest variable vector

X

is decomposed into those of canonical common factors

η_{j}

, i.e.,

KL (X, ξ) = \sum_{j = 1}^{m} KL (X, η_{j}) = \sum_{j = 1}^{m} KL (V_{j}, η_{j}), j = 1, 2, \dots, m .

Let us assume

1 > ρ_{1}^{2} \geq ρ_{2}^{2} \geq \dots \geq ρ_{m}^{2} \geq 0 .

(13)

According to the entropy-based criterion in Theorem 4, the order of importance of canonical common factors is that of canonical correlation coefficients. The interpretation of factors

η_{j}

can be made with the corresponding manifest canonical variables

V_{j}

and the factor loading matrix of canonical common factors

η = F ξ

. For the canonical common factors, the factor loading matrix can be obtained as

Λ^{*} = Λ F^{- 1}

. We refer to the canonical correlation analysis in Theorem 4 as canonical factor analysis [10].

Theorem 5.

In factor analysis model (1), for any

p \times p

and

m \times m

nonsingular matrices

P

and

Q

, the canonical factor analysis between manifest variable vector

P X

and common-factor vector

Q ξ

is invariant.

Proof.

Since the variance–covariance matrix of

P X

and

Q ξ

is given by

(\begin{matrix} P & 0 \\ 0 & Q \end{matrix}) (\begin{matrix} Σ & Λ^{T} \\ Λ & I_{m} \end{matrix}) {(\begin{matrix} P & 0 \\ 0 & Q \end{matrix})}^{T},

the theorem follows. □

Notice that we also have

KL (P X, Q ξ) = KL (X, ξ) .

From the above theorem, the results of the canonical factor analysis do not depend on the initial common factors

ξ_{j}

in factor analysis model (1). For factor analysis model (1), it follows that

KL (X, ξ) = \sum_{j = 1}^{m} KL (V_{j}, η_{j}) = \sum_{i = 1}^{p} KL (X_{i}, ξ),

implying that

\frac{tr \tilde{Ω} Λ Φ Λ^{T}}{| Ω |} = \sum_{j = 1}^{m} \frac{ρ_{j}^{2}}{1 - ρ_{j}^{2}} = \sum_{i = 1}^{p} \frac{R_{i}^{2}}{1 - R_{i}^{2}},

where

R_{i}

are the multiple correlation coefficients between manifest variables

X_{i}

and factor vector

ξ = (ξ_{1}, ξ_{2}, \dots, ξ_{m}), i = 1, 2, \dots, p

.

Numerical Example 1

Table 1 shows the results of orthogonal factor analysis (varimax method by S-PLUS ver. 8.2) as reported in [6]; the same example is used here to demonstrate the canonical factor analysis mentioned above. In Table 1, manifest variables

X_{1}, X_{2}, and X_{3}

are scores in some subjects in the liberal arts, while variables

X_{4} and X_{5}

are those in the sciences. We refer to the factors as the initial common factors. In this example, from Table 1, the variance–covariance matrices in (10) are given as follows:

Σ = (\begin{matrix} \begin{matrix} 1 & 0.54 & 0.39 & 0.42 & 0.36 \\ 0.54 & 1 & 0.49 & 0.38 & 0.22 \\ 0.39 & 0.49 & 1 & 0.21 & 0 \\ 0.42 & 0.38 & 0.21 & 1 & 0.54 \\ 0.36 & 0.22 & 0 & 0.54 & 1 \end{matrix} \end{matrix}),

Φ = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}) .

where covariance matrix

Λ^{T}

is given in Table 1.

From the above matrices, to obtain the pairs of canonical variables, linear transformation matrices

B^{(1)}

and

F

in Theorem 4 are as follows:

B^{(1)} = (\begin{matrix} \begin{matrix} 0.19 & 0.20 \\ 0.32 & 0.58 \end{matrix} & \begin{matrix} 0.06 & 0.20 & 0.94 \\ 0.37 & 0.00 & - 0.65 \end{matrix} \end{matrix}),

and

F = (\begin{matrix} 0.32 & 0.95 \\ 0.95 & - 0.32 \end{matrix}) .

By the above matrices, we have the following pairs of canonical variables

(V_{i}, η_{i})

and their squared canonical correlation coefficients

ρ_{i}^{2}

:

{\begin{matrix} V_{1} = 0.19 X_{1} + 0.20 X_{2} + 0.06 X_{3} + 0.20 X_{4} + 0.94 X_{5}, \\ η_{1} = 0.32 ξ_{1} + 0.95 ξ_{2}, \\ ρ_{1}^{2} = 0.88, \end{matrix}

{\begin{matrix} V_{2} = 0.32 X_{1} + 0.58 X_{2} + 0.37 X_{3} + 0.07 X_{4} - 0.65 X_{5}, \\ η_{2} = 0.95 ξ_{1} - 0.32 ξ_{2}, \\ ρ_{2}^{2} = 0.73 . \end{matrix}

According to the above canonical variables, the factor loading for canonical factors

η_{i}, i = 1, 2

is calculated with the initial loading matrix

Λ

and the rotation matrix

F

, and we have

\begin{matrix} Λ^{* T} = {(Λ F^{- 1})}^{T} = {(\begin{matrix} 0.32 & 0.95 \\ 0.95 & - 0.32 \end{matrix})}^{- 1} (\begin{matrix} \begin{matrix} 0.6 & 0.75 \\ 0.39 & 0.24 \end{matrix} & \begin{matrix} 0.65 & 0.32 & 0.00 \\ 0.00 & 0.59 & 0.92 \end{matrix} \end{matrix}) \\ = (\begin{matrix} \begin{matrix} 0.56 & 0.47 \\ 0.45 & 0.64 \end{matrix} & \begin{matrix} 0.21 & 0.66 & 0.87 \\ 0.62 & 0.12 & - 0.29 \end{matrix} \end{matrix}) . \end{matrix}

From the above results, the first canonical factor

η_{1}

can be viewed as a general common ability (factor) to solve all five subjects. The second factor

η_{2}

can be regarded as a factor related to subjects in the liberal arts, which is independent of the first canonical factor. In the canonical correlation analysis, the contributions of canonical factors are calculated. Since the multiple correlation coefficient between

η_{1}

and

X = {(X_{1}, X_{2}, \dots, X_{5})}^{T}

is

ρ_{1}^{2} = 0.88

and that between

η_{2}

and

X

is

ρ_{2}^{2} = 0.73

, we have

C (η_{1} \to X) = \frac{ρ_{1}^{2}}{1 - ρ_{1}^{2}} = 7.06, C (η_{2} \to X) = \frac{ρ_{2}^{2}}{1 - ρ_{2}^{2}} = 2.70 .

Let

ξ = (ξ_{1}, ξ_{2})

. From the above results, we have

\begin{matrix} C (ξ \to X) = KL (ξ, X) = C (η_{1} \to X) + C (η_{2} \to X) = 9.86, \\ \tilde{CR} (ξ \to X) = \frac{KL (ξ, X)}{KL (ξ, X) + 1} = 0.91 (= ECD (ξ, X)) . \end{matrix}

From this,

91 %

of the variation of manifest random vector

X

in entropy is explained by the common latent factors

ξ

. The contribution ratios of canonical common factors are calculated as follows:

CR (η_{1} \to X) = \frac{7.06}{7.06 + 2.70} = 0.72, CR (η_{2} \to X) = 2.70 .

The contribution of the first canonical factor is about 2.6 times greater than that of the second one.

4. Deriving Important Common Factors Based on Decomposition of Manifest Variables into Subsets

From (9) in Theorem 2,

KL (X, ξ)

is decomposed into those for manifest variable subvectors

X^{(a)}

,

KL (X^{(a)}, ξ), a = 1, 2, \dots, A

. Thus, we have the following theorem:

Theorem 6.

Let manifest variable vector

X

be decomposed into subvectors

X^{(a)}, a = 1, 2, \dots, A

. Let

ρ_{(a) j}, j = 1, 2, \dots, m_{(a)}

be the canonical correlation coefficients between manifest variable subvector

X^{(a)}

and common-factor vector

ξ

,

a = 1, 2, \dots, A

in the factor analysis model (1), where

m_{(a)} \leq \min {dimension of X^{(a)}, m}

. Then,

KL (X, ξ)

is decomposed into canonical components as follows:

KL (X, ξ) = \sum_{a = 1}^{A} \sum_{j = 1}^{m_{(a)}} \frac{ρ_{(a) j}^{2}}{1 - ρ_{(a) j}^{2}} .

Proof.

For manifest variable vector

X^{(a)}

and common-factor vector

ξ

, applying canonical correlation analysis, we have

m_{(a)}

pairs of canonical variables

(V_{j}^{(α)}, η_{j}^{(α)})

with squared canonical correlation coefficients

ρ_{(a) j}^{2}, j = 1, 2, \dots, m_{(a)}

. Then, applying Theorem 4 to

KL (X^{(a)}, ξ)

it follows that

KL (X^{(a)}, ξ) = \sum_{j = 1}^{m_{(a)}} KL (V_{j}^{(α)}, η_{j}^{(α)}) = \sum_{j = 1}^{m_{(a)}} \frac{ρ_{(a) j}^{2}}{1 - ρ_{(a) j}^{2}}, a = 1, 2, \dots, A .

From Theorem 2, the theorem follows. □

Remark 2.

As shown in the above theorem, the following relations hold:

KL (X^{(a)}, η_{j}^{(α)}) = KL (V_{j}^{(α)}, η_{j}^{(α)}) = \frac{ρ_{(a) j}^{2}}{1 - ρ_{(a) j}^{2}}, j = 1, 2, \dots, m_{(a)}; a = 1, 2, \dots, A .

In this sense,

C (η_{j}^{(a)} \to X^{(a)}) = \frac{ρ_{(a) j}^{2}}{1 - ρ_{(a) j}^{2}}, j = 1, 2, \dots, m_{(a)}; a = 1, 2, \dots, A .

To derive important common factors, the above theorem can be used. In many of the data in factor analysis, manifest variables can be classified into subsets that have common concepts (factors) to be measured. For example, in the data used for Table 1, it is meaningful to classify the five variables into two subsets

X^{(1)} = (X_{1}, X_{2}, X_{3})

and

X^{(2)} = (X_{4}, X_{5})

, where the first subset is related to the liberal arts and the second one is related to the sciences. In

(X^{(1)}, ξ)

and

(X^{(2)}, ξ)

, it is possible to derive the latent ability for the liberal arts and that for the sciences, respectively.

4.1. Numerical Example 1 (Continued)

For

(X^{(1)}, ξ)

and

(X^{(2)}, ξ)

, two sets of canonical variables are obtained, respectively, as follows:

{\begin{matrix} η_{1}^{(1)} = 0.95 ξ_{1} + 0.32 ξ_{2}, V_{1}^{(1)} = 0.52 X_{1} + 0.76 X_{2} + 0.39 X_{3}, ρ_{(1) 1}^{2} = 0.77, \\ η_{2}^{(1)} = 0.32 ξ_{1} - 0.95 ξ_{2}, V_{2}^{(1)} = 0.71 X_{1} - 0.07 X_{2} - 0.71 X_{3}, ρ_{(1) 2}^{2} = 0.12, \end{matrix}

{\begin{matrix} η_{1}^{(2)} = 0.06 ξ_{1} + 1.00 ξ_{2}, V_{1}^{(2)} = 0.18 X_{4} + 0.98 X_{5}, ρ_{(2) 1}^{2} = 0.97, \\ η_{2}^{(2)} = 1.00 ξ_{1} - 0.06 ξ_{2}, V_{2}^{(2)} = 0.83 X_{4} - 0.55 X_{5}, ρ_{(2) 2}^{2} = 0.03 . \end{matrix}

According to the above canonical variables, we have the following factor contributions:

{\begin{matrix} C (η_{1}^{(1)} \to X^{(1)}) = C (η_{1}^{(1)} \to V_{1}^{(1)}) = \frac{0.77}{1 - 0.77} = 3.27, CR (η_{1}^{(1)} \to X^{(1)}) = 0.96, \\ C (η_{2}^{(1)} \to X^{(1)}) = \frac{0.12}{1 - 0.12} = 0.14 . CR (η_{2}^{(1)} \to X^{(1)}) = 0.04; \end{matrix}

{\begin{matrix} C (η_{1}^{(2)} \to X^{(2)}) = 6.14, CR (η_{1}^{(2)} \to X^{(2)}) = 0.97, \\ C (η_{2}^{(2)} \to X^{(2)}) = 0.17, CR (η_{2}^{(2)} \to X^{(2)}) = 0.03 . \end{matrix}

From the above results, canonical factors

η_{1}^{(1)}

and

η_{1}^{(2)}

can be interpreted as general common factors for the liberal arts and for the sciences, respectively. By using the factors, the factor loadings are given in Table 2. In this case, Table 2 is similar to Table 1; however, the factor analysis model is oblique and the correlation coefficient between

η_{1}^{(1)}

and

η_{1}^{(2)}

is 0.374. The contributions of the factors to manifest variable vector

X = (X_{1}, X_{2}, X_{3}, X_{4}, X_{5}) = (X^{(1)}, X^{(2)})

are calculated as follows:

{\begin{matrix} C (η_{1}^{(1)} \to X) = 6.563, CR (η_{1}^{(1)} \to X) = 0.687, \tilde{CR} (η_{1}^{(1)} \to X) = 0.60, \\ C (η_{1}^{(2)} \to X) = 4.223 . CR (η_{1}^{(2)} \to X) = 0.442, \tilde{CR} (η_{1}^{(2)} \to X) = 0.39 . \end{matrix}

In this case, factors

η_{1}^{(1)}

and

η_{1}^{(2)}

are correlated, so it follows that

CR (η_{1}^{(1)} \to X) + CR (η_{1}^{(2)} \to X) = 1.129 > 1 .

4.2. Numerical Example 2

Table 3 shows the results of the maximum likelihood factor analysis (orthogonal) for six scores

X_{i}, i = 1, 2, \dots, 6

([11], pp. 61–65); such results are treated as the initial estimates in the present analysis. In this example, variables are classified into the following three groups: variable

X_{1}

is related to the Spearman’s

g

factor; variables

X_{2}

,

X_{3}

, and

X_{4}

account for problem-solving ability; and variables

X_{5}

and

X_{6}

are associated with verbal ability [11]; however, it is difficult to explain the three factors by using Table 3. In this example, the present approach is employed for deriving the three factors. From (10) and Table 3, the correlation matrix of the manifest variables is given as follows:

\hat{Σ} = (\begin{matrix} \begin{matrix} 1 & 0.417 & 0.576 \\ 0.417 & 1 & 0.567 \\ 0.576 & 0.567 & 1 \end{matrix} & \begin{matrix} 0.312 & 0.576 & 0.514 \\ 0.306 & 0.265 & 0.263 \\ 0.427 & 0.355 & 0.354 \end{matrix} \\ \begin{matrix} 0.312 & 0.306 & 0.427 \\ 0.576 & 0.265 & 0.355 \\ 0.514 & 0.263 & 0.354 \end{matrix} & \begin{matrix} 1 & 0.193 & 0.193 \\ 0.193 & 1 & 0.799 \\ 0.193 & 0.799 & 1 \end{matrix} \end{matrix}) .

Let

X^{(2)} = (X_{2}, X_{3}, X_{4})

, let

X^{(3)} = (X_{5}, X_{6})

, and let

ξ = (ξ_{1}, ξ_{2})

. Canonical correlation analysis is carried out for

(X_{1}, ξ)

,

(X^{(2)}, ξ)

, and

(X^{(3)}, ξ)

, and we have the following canonical variables, respectively:

η_{1}^{(1)} = \frac{1}{\sqrt{{0.67}^{2} + {0.37}^{2}}} (0.64 ξ_{1} + 0.37 ξ_{2}) = 0.87 ξ_{1} + 0.50 ξ_{2}, V_{1}^{(1)} = X_{1}, ρ_{(1) 1}^{2} = 0.55

{\begin{matrix} η_{1}^{(2)} = 0.52 ξ_{1} + 0.85 ξ_{2}, V_{1}^{(2)} = 0.24 X_{2} + 0.96 X_{3} + 0.14 X_{4}, ρ_{(2) 1}^{2} = 0.83, \\ η_{2}^{(2)} = 0.85 ξ_{1} - 0.52 ξ_{2}, V_{2}^{(2)} = 0.81 X_{2} - 0.59 X_{3} - 0.02 X_{4}, ρ_{(1) 2}^{2} = 0.00, \end{matrix}

{\begin{matrix} η_{1}^{(3)} = 0.99 ξ_{1} - 0.12 ξ_{2}, V_{1}^{(3)} = 0.99 X_{5} + 0.11 X_{6}, ρ_{(3) 1}^{2} = 0.96, \\ η_{2}^{(3)} = 0.12 ξ_{1} + 0.99 ξ_{2}, V_{2}^{(3)} = 0.64 X_{5} - 0.77 X_{6}, ρ_{(3) 2}^{2} = 0.01 . \end{matrix}

The contributions of canonical factors

η_{i}^{(k)}, i = 1, 2; k = 2.3

are calculated as follows:

{\begin{matrix} C (η_{1}^{(2)} \to X^{(2)}) = C (η_{1}^{(2)} \to V_{1}^{(2)}) = \frac{0.83}{1 - 0.83} = 4.88, CR (η_{1}^{(2)} \to X^{(2)}) = 1.00, \\ C (η_{2}^{(2)} \to X^{(2)}) = \frac{0.00}{1 - 0.00} = 0.00 . CR (η_{2}^{(1)} \to X^{(1)}) = 0.00; \end{matrix}

{\begin{matrix} C (η_{1}^{(3)} \to X^{(3)}) = \frac{0.96}{1 - 0.96} = 24.00, CR (η_{1}^{(3)} \to X^{(3)}) = 0.99, \\ C (η_{2}^{(2)} \to X^{(2)}) = 0.01, CR (η_{2}^{(2)} \to X^{(2)}) = 0.01 . \end{matrix}

The common factor

η_{1}^{(1)} (= g)

can be interpreted as the Spearman’s

g

factor (general intelligence) and canonical common factors

η_{1}^{(2)}

and

η_{1}^{(3)}

can be interpreted as problem-solving ability and verbal ability, respectively. The correlation coefficients between the three factors are given by

Corr (g, η_{1}^{(2)}) = 0.88, Corr (g, η_{1}^{(3)}) = 0.80, Corr (η_{1}^{(2)}, η_{1}^{(3)}) = 0.42 .

The contributions of the above three factors to manifest variable vector

X = (X_{1}, X_{2}, X_{3}, X_{4}, X_{5}, X_{6})

are computed as follows:

{\begin{matrix} C (g \to X) = 19.93, & CR (g \to X) = 0.68 & \tilde{CR} (g \to X) = 0.66 \\ C (η_{1}^{(2)} \to X) = 9.77, & CR (η_{1}^{(2)} \to X) = 0.33, & \tilde{CR} (η_{1}^{(2)} \to X) = 0.32, \\ C (η_{1}^{(3)} \to X) = 25.02, & CR (η_{1}^{(3)} \to X) = 0.85, & \tilde{CR} (η_{1}^{(3)} \to X) = 0.82 . \end{matrix}

The common-factor space is two-dimensional, and the factor loadings with common factors

η_{1}^{(2)}

and

η_{1}^{(3)}

are calculated as in Table 4. The table shows a clear interpretation of the common factors. Thus, the present method is effective for deriving interpretable factors in situations such as that of this example. The expressions of the factor analysis model can also be given by factor vectors

(g, η_{1}^{(2)})

and

(g, η_{1}^{(3)})

, respectively. The present method is applicable for any subsets of manifest variables.

5. Discussion

In order to find interpretable common factors in factor analysis models, methods of factor rotation are often used. The methods are based on maximizations of variation functions of squares of factor loadings, and orthogonal or oblique factors are applied. The factors derived by the conventional methods may be interpretable; however, it may be more useful to propose a method for detecting interpretable common factors based on factor contribution measurement, i.e., importance of common factors. An entropy-based method for measuring factor contribution [6] can measure the contribution of the common-factor vector to the manifest variable vector, and one can decompose such a contribution into those of single manifest variables (Theorem 1) and into that of some manifest variable subvectors as well (Theorem 2). A characterization in the case of orthogonal factors can be also given (Theorem 3). The paper shows that the most important common factor with respect to entropy can be identified by using canonical correlation analysis between the factor vector and the manifest variable vector (Theorem 4). Theorem 4 shows that the contribution of the common-factor vector to the manifest variable vector can be decomposed into those of canonical factors and that the order of canonical correlation coefficients is that of factor contributions. In most multivariate data, manifest variables can be naturally classified into subsets according to common concepts as in Examples 1 and 2. By using Theorems 2 and 5, canonical correlation analysis can also be applied to derive canonical common factors from subsets of manifest variables and the initial common-factor vector (Theorem 6). According to the analysis, interpretable common factors can be obtained easily, as demonstrated in Examples 1 and 2. In Example 1, Table 1 and Table 2 have similar factor patterns; however, the derived factors in Table 1 are orthogonal and those in Table 2 are oblique. In Example 2, it may be difficult to interpret the factors in Table 3 produced by the varimax method. On the other hand, Table 4, obtained by using the present method, can be interpreted clearly. Finally, according to Theorem 5, the present method produces results that are invariant with respect to linear transformations of common factors, so that the method is independent of the initial common factors. The present method is the first one to derive interpretable factors based on a factor contribution measure, and the interpretable factors can be obtained easily through canonical correlation analysis between manifest variable subvectors and the factor vectors.

Author Contributions

Conceptualization, N.E.; methodology, N.E., C.G.B., M.T., T.K.; formal analysis, N.E.; writing—original draft preparation, N.E.; writing—review and editing, C.G.B.; funding acquisition, T.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Grant-in-aid for Scientific Research 18K11200, Ministry of Education, Culture, Sports, Science, and Technology of Japan.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article.

Acknowledgments

The authors would like to thank the three referees for their useful comments and suggestions for improving the first version of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kaiser, H.F. The varimax criterion for analytic rotation in factor analysis. Psychometrika 1958, 23, 187–200. [Google Scholar] [CrossRef]
Ten Berge, J.M.F. A joint treatment of VARIMAX rotation and the problem of diagonalizing symmetric matrices simultaneously in the least-squares sense. Psychometrika 1984, 49, 347–358. [Google Scholar] [CrossRef]
Jennrich, R.I.; Sampson, P.F. Rotation for simple loadings. Psychometrika 1966, 31, 313–323. [Google Scholar] [CrossRef] [PubMed]
Harris, C.W.; Kaiser, H.F. Oblique factor analytic solutions by orthogonal transformation. Psychometrika 1964, 29, 347–362. [Google Scholar] [CrossRef]
Thurstone, L.L. Vector of Mind: Multiple Factor Analysis for the Isolation of Primary Traits; University of Chicago Press: Chicago, IL, USA, 1935. [Google Scholar]
Eshima, N.; Tabata, M.; Borroni, C.G. An entropy-based approach for measuring factor contributions in factor analysis models. Entropy 2018, 20, 634. [Google Scholar] [CrossRef] [PubMed]
Nelder, J.A.; Wedderburn, R.W.M. Generalized linear model. J. R. Stat. Soc. A 1972, 135, 370–384. [Google Scholar] [CrossRef]
Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Eshima, N.; Tabata, M. Entropy coefficient of determination for generalized linear models. Comput. Stat. Data Anal. 2010, 54, 1381–1389. [Google Scholar] [CrossRef]
Rao, C.R. Estimation and tests of significance in factor analysis. Psychometrika 1955, 20, 93–111. [Google Scholar] [CrossRef]
Bartholomew, D.J. Latent Variable Models and Factor Analysis; Oxford University Press: New York, NY, USA, 1987. [Google Scholar]

Figure 1. Path diagram of a general factor analysis model.

Table 1. Factor loadings of orthogonal (varimax) factor analysis.

	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$
$ξ_{1}$	0.60	0.75	0.65	0.32	0.00
$ξ_{2}$	0.39	0.24	0.00	0.59	0.92
uniqueness	0.50	0.38	0.58	0.55	0.16

Uniqueness is the proportion of unique factor

ε_{i}

related to manifest variable

X_{i}

.

Table 2. Factor loadings by using canonical common factors

η_{1}^{(1)}

and

η_{1}^{(2)}

.

Table 2. Factor loadings by using canonical common factors

η_{1}^{(1)}

and

η_{1}^{(2)}

.

	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$
$η_{1}^{(1)}$	0.62	0.80	0.70	0.31	$- 0.06$
$η_{1}^{(2)}$	0.19	$- 0.02$	$- 0.22$	0.49	0.94
uniqueness	0.50	0.38	0.58	0.55	0.16

Table 3. The initial maximum likelihood estimates of factor loadings (varimax).

	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$
$ξ_{1}$	0.64	0.34	0.46	0.25	0.97	0.82
$ξ_{2}$	0.37	0.54	0.76	0.41	$- 0.12$	$- 0.03$
uniqueness	0.45	0.59	0.21	0.77	0.04	0.33

Table 4. The factor loadings with common factors

η_{1}^{(2)}

and

η_{1}^{(3)}

.

Table 4. The factor loadings with common factors

η_{1}^{(2)}

and

η_{1}^{(3)}

.

	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$
$η_{1}^{(2)}$	0.49	0.63	0.89	0.48	$- 0.01$	0.07
$η_{1}^{(3)}$	0.39	0.01	0.00	0.00	0.98	$0.79$
uniqueness	0.45	0.59	0.21	0.77	0.04	0.33

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Eshima, N.; Borroni, C.G.; Tabata, M.; Kurosawa, T. An Entropy-Based Tool to Help the Interpretation of Common-Factor Spaces in Factor Analysis. Entropy 2021, 23, 140. https://doi.org/10.3390/e23020140

AMA Style

Eshima N, Borroni CG, Tabata M, Kurosawa T. An Entropy-Based Tool to Help the Interpretation of Common-Factor Spaces in Factor Analysis. Entropy. 2021; 23(2):140. https://doi.org/10.3390/e23020140

Chicago/Turabian Style

Eshima, Nobuoki, Claudio Giovanni Borroni, Minoru Tabata, and Takeshi Kurosawa. 2021. "An Entropy-Based Tool to Help the Interpretation of Common-Factor Spaces in Factor Analysis" Entropy 23, no. 2: 140. https://doi.org/10.3390/e23020140

APA Style

Eshima, N., Borroni, C. G., Tabata, M., & Kurosawa, T. (2021). An Entropy-Based Tool to Help the Interpretation of Common-Factor Spaces in Factor Analysis. Entropy, 23(2), 140. https://doi.org/10.3390/e23020140

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Entropy-Based Tool to Help the Interpretation of Common-Factor Spaces in Factor Analysis

Abstract

1. Introduction

2. Entropy-Based Method for Measuring Factor Contributions

3. Canonical Factor Analysis

Numerical Example 1

4. Deriving Important Common Factors Based on Decomposition of Manifest Variables into Subsets

4.1. Numerical Example 1 (Continued)

4.2. Numerical Example 2

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI