An Entropy-Based Tool to Help the Interpretation of Common-Factor Spaces in Factor Analysis

This paper proposes a method for deriving interpretable common factors based on canonical correlation analysis applied to the vectors of common factors and manifest variables in the factor analysis model. First, an entropy-based method for measuring factor contributions is reviewed. Second, the entropy-based contribution measure of the common-factor vector is decomposed into those of canonical common factors, and it is also shown that the importance order of factors is that of their canonical correlation coefficients. Third, the method is applied to derive interpretable common factors. Numerical examples are provided to demonstrate the usefulness of the present approach.


Introduction
In factor analysis, extracting interpretable factors is important for practical data analysis. In order to carry it out, methods for factor rotation have been studied, e.g., varimax [1] and orthomax [2] for orthogonal rotations and oblimin [3] and orthoblique [4] for oblique rotations. The basic idea for factor rotation in factor analysis is owed to the criteria of simple structures of factor analysis models by Thurstone [5], and the methods of factor rotation are constructed with respect to maximizations of variations of the squared factor loadings in order to derive simple structures of factor analysis models. Let X i be manifest variables, let ξ j be latent variables (common factors), let ε i be unique factors related to X i , and finally, let λ ij be factor loadings that are weights of common factors ξ j to explain X i . Then, the factor analysis model is given as follows: where E(X i ) = E(ε i ) = 0, i = 1, 2, . . . , p; E ξ j = 0, j = 1, 2, . . . , m; Var ξ j = 1, j = 1, 2, . . . , m; Cov(ξ k , ξ l ) = φ kl ; Var(ε i ) = ω 2 i > 0, i = 1, 2, . . . , p; Cov(ε k , ε l ) = 0, k = l. To derive simple structures of factor analysis models, for example, in the varimax method, the following variation function of squared factor loadings is maximized with respect to factor loadings: where In this sense, the basic factor rotation methods can be viewed as those for exploratively analyzing multidimensional common-factor spaces. The interpretation of factors is made according to manifest variables with large weights in common factors. As far as we know, novel methods for factor rotation have not been investigated except for rotation methods similar to the above basic ones. In real data analyses, manifest variables are usually classified into some groups of variables in advance that may have common factors and concepts for themselves. For example, suppose we have a test battery including the following five subjects: Japanese, English, Social Science, Mathematics, and Natural Science. It is then reasonable to classify the five subjects into two groups, {Japanese, English, Social Science} and {Mathematics, Natural Science}. In such cases, it is meaningful to determine common factors related to the two manifest variable groups. For this objective, it is useful to develop a novel method to derive the common factors based on a factor contribution measure. In conventional methods of factor rotation, for example, as mentioned above, variation function (2) for the varimax method is not related to factor contribution.
An entropy-based method for measuring factor contribution was proposed by [6], and the method can measure factor contributions to manifest variables vectors and can decompose the factor contributions into those of manifest subvectors and individual manifest variables. By using the method, we can derive important common factors related to the manifest subvectors and the manifest variables. The aim of the present paper is to propose a new method for deriving simple structures based on entropy, that is, extracting common factors easy to interpret. In Section 2, an entropy-based method for measuring factor contribution [6] is reviewed to apply its properties for deriving simple structures in factor analysis models. Section 3 discusses canonical correlation analysis between common factors and manifest variables, and the contributions of common factors to the manifest variables are decomposed into components related to the extracted pairs of canonical variables. A numerical example is given to demonstrate the approach. In Section 4, canonical correlation analysis is applied to obtain common factors easy to interpret, and the contributions of the extracted factors are measured. Numerical examples are given to illustrate the present approach, and finally, Section 5 provides discussions and conclusions to summarize the present approach.

Remark 1.
Information KL X, ξ \j ξ j and KL X (a) , ξ \j ξ j can be expressed by using the conditional covariances Cov X i , θ i ξ j . For example, Finally, the following decomposition of KL(X, ξ) holds for orthogonal factors ( [6], Theorem 3): Theorem 3. If the common factors are mutually independent, it follows that The entropy coefficient of determination (ECD) [9] between ξ and X is defined by so that the total relative contribution of factor vector ξ to manifest variable vector X in entropy can be defined as while, for a single factor ξ j , two relative contribution ratios can be defined: (see [6] for details).
Second, factor analysis model (1) in a general case is discussed. Let Σ be the variancecovariance matrix of manifest variable vector X = X 1 , X 2 , . . . , X p T ; let Ω be the p × p variance-covariance matrix of unique factor vector ε = ε 1 , ε 2 , . . . , ε p T ; let Λ be the p × m factor loading matrix of λ ij ; and let Φ be the correlation matrix of common-factor vector (1) can be expressed as X = Λξ + ε and we have Σ = ΛΦΛ T + Ω. Now, the above discussion is extended in a general factor analysis model (1) with the following variance-covariance matrix of X and ε: Let θ = Λξ be the predictor vector of manifest variable vector X T = X 1 , X 2 , . . . , X p . Then, the contribution of common-factor vector ξ to manifest variable vector X is defined by the following generalized signal-to-noise ratio: whereΩ is the cofactor matrix of Ω. The signal is trΩΛΦΛ T and the noise |Ω|, and both are positive. Hence, the above quantity is defined as the explained entropy with the factor analysis model, and the same notation KL(X, ξ) as above is used, having to do with the Kullback-Leibler information for the factor analysis model with normal distribution errors (4). Similarly, in the general model, as in (9), signal-to-noise ratio (11) is decomposed into so the above theorems hold true as well. Thus, the results mentioned above are applicable to factor analysis models with error terms with non-normal distributions.

Canonical Factor Analysis
In order to derive interpretable factors from the common-factor space, we propose taking advantage of the results of canonical correlation analysis applied to manifest variables and common factors. This approach can be referred to as "canonical factor analysis" [10].
In the proof of the above theorem, we have It implies that Theorem 4 shows that the contribution of common-factor vector ξ to manifest variable vector X is decomposed into those of canonical common factors η j , i.e., KL V j , η j , j = 1, 2, . . . , m.
Let us assume According to the entropy-based criterion in Theorem 4, the order of importance of canonical common factors is that of canonical correlation coefficients. The interpretation of factors η j can be made with the corresponding manifest canonical variables V j and the factor loading matrix of canonical common factors η = Fξ. For the canonical common factors, the factor loading matrix can be obtained as Λ * = ΛF −1 . We refer to the canonical correlation analysis in Theorem 4 as canonical factor analysis [10].
Theorem 5. In factor analysis model (1), for any p × p and m × m nonsingular matrices P and Q, the canonical factor analysis between manifest variable vector PX and common-factor vector Qξ is invariant.
Proof. Since the variance-covariance matrix of PX and Qξ is given by the theorem follows.
From the above theorem, the results of the canonical factor analysis do not depend on the initial common factors ξ j in factor analysis model (1). For factor analysis model (1), it follows that implying that where R i are the multiple correlation coefficients between manifest variables X i and factor vector ξ = (ξ 1 , ξ 2 , . . . , ξ m ), i = 1, 2, . . . , p.
Numerical Example 1 Table 1 shows the results of orthogonal factor analysis (varimax method by S-PLUS ver. 8.2) as reported in [6]; the same example is used here to demonstrate the canonical factor analysis mentioned above. In Table 1, manifest variables X 1 , X 2 , and X 3 are scores in some subjects in the liberal arts, while variables X 4 and X 5 are those in the sciences. We refer to the factors as the initial common factors. In this example, from Table 1, the variance-covariance matrices in (10) are given as follows: where covariance matrix Λ T is given in Table 1. Uniqueness is the proportion of unique factor ε i related to manifest variable X i .
From the above matrices, to obtain the pairs of canonical variables, linear transformation matrices B (1) and F in Theorem 4 are as follows: By the above matrices, we have the following pairs of canonical variables (V i , η i ) and their squared canonical correlation coefficients ρ 2 i : 73. According to the above canonical variables, the factor loading for canonical factors η i , i = 1, 2 is calculated with the initial loading matrix Λ and the rotation matrix F, and we have From the above results, the first canonical factor η 1 can be viewed as a general common ability (factor) to solve all five subjects. The second factor η 2 can be regarded as a factor related to subjects in the liberal arts, which is independent of the first canonical factor. In the canonical correlation analysis, the contributions of canonical factors are calculated. Since the multiple correlation coefficient between η 1 and X = (X 1 , X 2 , . . . , X 5 ) T is ρ 2 1 = 0.88 and that between η 2 and X is ρ 2 2 = 0.73, we have Let ξ = (ξ 1 , ξ 2 ). From the above results, we have C(ξ → X) = KL(ξ, X) = C(η 1 → X) + C(η 2 → X) = 9.86, CR(ξ → X) = KL(ξ,X) KL(ξ,X)+1 = 0.91(= ECD(ξ, X)).
From this, 91% of the variation of manifest random vector X in entropy is explained by the common latent factors ξ. The contribution ratios of canonical common factors are calculated as follows: The contribution of the first canonical factor is about 2.6 times greater than that of the second one.

Deriving Important Common Factors Based on Decomposition of Manifest Variables into Subsets
From (9) From Theorem 2, the theorem follows.
To derive important common factors, the above theorem can be used. In many of the data in factor analysis, manifest variables can be classified into subsets that have common concepts (factors) to be measured. For example, in the data used for Table 1, it is meaningful to classify the five variables into two subsets X (1) = (X 1 , X 2 , X 3 ) and X (2) = (X 4 , X 5 ), where the first subset is related to the liberal arts and the second one is related to the sciences. In X (1) , ξ and X (2) , ξ , it is possible to derive the latent ability for the liberal arts and that for the sciences, respectively.
From the above results, canonical factors η 1 and η 1 can be interpreted as general common factors for the liberal arts and for the sciences, respectively. By using the factors, the factor loadings are given in Table 2. In this case, Table 2 is similar to Table 1; however, the factor analysis model is oblique and the correlation coefficient between η (1) 1 and η (2) 1 is 0.374. The contributions of the factors to manifest variable vector X = (X 1 , X 2 , X 3 , X 4 , X 5 ) = X (1) , X (2) are calculated as follows: In this case, factors η (1) 1 and η (2) 1 are correlated, so it follows that CR η (1) 1 → X + CR η (2) 1 → X = 1.129 > 1. Table 2. Factor loadings by using canonical common factors η (1) 1 and η (2) 1 .  Table 3 shows the results of the maximum likelihood factor analysis (orthogonal) for six scores X i , i = 1, 2, . . . , 6 ([11], pp. 61-65); such results are treated as the initial estimates in the present analysis. In this example, variables are classified into the following three groups: variable X 1 is related to the Spearman's g factor; variables X 2 , X 3 , and X 4 account for problem-solving ability; and variables X 5 and X 6 are associated with verbal ability [11]; however, it is difficult to explain the three factors by using Table 3. In this example, the present approach is employed for deriving the three factors. From (10) and Table 3, the correlation matrix of the manifest variables is given as follows: Let X (2) = (X 2 , X 3 , X 4 ), let X (3) = (X 5 , X 6 ), and let ξ = (ξ 1 , ξ 2 ). Canonical correlation analysis is carried out for (X 1 , ξ), X (2) , ξ , and X (3) , ξ , and we have the following canonical variables, respectively: The contributions of canonical factors η (k) i , i = 1, 2; k = 2.3 are calculated as follows: 2 → X (2) = 0.01.
The common-factor space is two-dimensional, and the factor loadings with common factors η (2) 1 and η 1 are calculated as in Table 4. The table shows a clear interpretation of the common factors. Thus, the present method is effective for deriving interpretable factors in situations such as that of this example. The expressions of the factor analysis model can also be given by factor vectors g, η (2) 1 and g, η , respectively. The present method is applicable for any subsets of manifest variables.

Discussion
In order to find interpretable common factors in factor analysis models, methods of factor rotation are often used. The methods are based on maximizations of variation functions of squares of factor loadings, and orthogonal or oblique factors are applied. The factors derived by the conventional methods may be interpretable; however, it may be more useful to propose a method for detecting interpretable common factors based on factor contribution measurement, i.e., importance of common factors. An entropy-based method for measuring factor contribution [6] can measure the contribution of the common-factor vector to the manifest variable vector, and one can decompose such a contribution into those of single manifest variables (Theorem 1) and into that of some manifest variable subvectors as well (Theorem 2). A characterization in the case of orthogonal factors can be also given (Theorem 3). The paper shows that the most important common factor with respect to entropy can be identified by using canonical correlation analysis between the factor vector and the manifest variable vector (Theorem 4). Theorem 4 shows that the contribution of the common-factor vector to the manifest variable vector can be decomposed into those of canonical factors and that the order of canonical correlation coefficients is that of factor contributions. In most multivariate data, manifest variables can be naturally classified into subsets according to common concepts as in Examples 1 and 2. By using Theorems 2 and 5, canonical correlation analysis can also be applied to derive canonical common factors from subsets of manifest variables and the initial common-factor vector (Theorem 6). According to the analysis, interpretable common factors can be obtained easily, as demonstrated in Examples 1 and 2. In Example 1, Tables 1 and 2 have similar factor patterns; however, the derived factors in Table 1 are orthogonal and those in Table 2 are oblique. In Example 2, it may be difficult to interpret the factors in Table 3 produced by the varimax method. On the other hand, Table 4, obtained by using the present method, can be interpreted clearly. Finally, according to Theorem 5, the present method produces results that are invariant with respect to linear transformations of common factors, so that the method is independent of the initial common factors. The present method is the first one to derive interpretable factors based on a factor contribution measure, and the interpretable factors can be obtained easily through canonical correlation analysis between manifest variable subvectors and the factor vectors.