When it counts -- Econometric identification of the basic factor model based on GLT structures

Despite the popularity of factor models with sparse loading matrices, little attention has been given to formally address identifiability of these models beyond standard rotation-based identification such as the positive lower triangular (PLT) constraint. To fill this gap, we review the advantages of variance identification in sparse factor analysis and introduce the generalized lower triangular (GLT) structures. We show that the GLT assumption is an improvement over PLT without compromise: GLT is also unique but, unlike PLT, a non-restrictive assumption. Furthermore, we provide a simple counting rule for variance identification under GLT structures, and we demonstrate that within this model class the unknown number of common factors can be recovered in an exploratory factor analysis. Our methodology is illustrated for simulated data in the context of post-processing posterior draws in Bayesian sparse factor analysis.


Introduction
Ever since the pioneering work of Thurstone (1935Thurstone ( , 1947)), factor analysis has been a popular method to model the covariance matrix Ω of correlated, multivariate observations y t of dimension m, see e.g.Anderson (2003) for a comprehensive review.Assuming r uncorrelated factors, the basic factor model yields the representation Ω = ΛΛ ⊤ + Σ 0 , with a m × r factor loading matrix Λ and a diagonal matrix Σ 0 .The considerable reduction of the number of parameters compared to the m(m + 1)/2 elements of an unconstrained covariance matrix Ω is the main motivation for applying factor models to covariance estimation, especially if m is large; see, among many others, Fan et al. (2008) in finance and Forni et al. (2009) in economics.In addition, shrinkage estimation has been shown to lead to very efficient covariance estimation, see, for example, Kastner (2019) in Bayesian factor analysis and Ledoit and Wolf (2020) in a non-Bayesian context.
In numerous applications, factor analysis reaches beyond covariance modelling.From the very beginning, the goal of factor analysis has been to extract the underlying loading matrix Λ to understand the driving forces behind the observed correlation between the features, see e.g.Owen and Wang (2016) for a recent review.However, also in this setting, the only source of information is the observed covariance of the data, making the decomposition of the covariance matrix Ω into the cross-covariance matrix ΛΛ ⊤ and the variance Σ 0 of the idiosyncratic errors more challenging than estimating only Ω itself.
A huge literature, dating back to Koopmans and Reiersøl (1950) and Reiersøl (1950), has addressed this problem of identification which can be resolved only by imposing additional structure on the factor model.Anderson and Rubin (1956) considered identification as a two-step procedure, namely identification of Σ 0 from Ω (variance identification) and subsequent identification of Λ from ΛΛ ⊤ (solving rotational invariance).The most popular constraint in econometrics, statistics and machine learning for solving rotational invariance is to consider positive lower triangular loading matrices, see e.g.Geweke and Zhou (1996); West (2003); Lopes and West (2004), albeit other strategies have been put forward, see e.g.Neudecker (1990), Bai and Ng (2013), Aßmann et al. (2016), Chan et al. (2018), and Williams (2020).Only a few papers have addressed variance identification (e.g.Bekker, 1989) and to the best of our knowledge so far no structure has been put forward that simultaneously addresses both identification problems.
In this work, we discuss a new identification strategy based on generalized lower triangular (GLT) structures, see Figure 1 for illustration.This concept was originally introduced as part of an MCMC sampler for sparse Bayesian factor analysis where the number of factors is unknown in the (unpublished) work of Frühwirth-Schnatter and Lopes (2018).In the present paper, GLT structures are given a full and comprehensive mathematical treatment and are applied in Frühwirth-Schnatter et al. (2022) to develop an efficient reversible jump MCMC (RJMCMC) sampler for sparse Bayesian factor analysis under very general shrinkage priors.It will be proven that GLT structures simultaneously address rotational invariance and variance identification in factor models.Variance identification relies on a counting rule for the number of non-zero elements in the loading matrix Λ, which is a sufficient condition that extends previous work by Sato (1992).
In addition, we will show that GLT structures are useful in exploratory factor analysis where the Figure 1: Left: ordered sparse GLT matrix with six factors.Center: one of the 2 6 • 6! corresponding unordered sparse GLT matrices.Right: a corresponding sparse PLT matrix, i.e. enforced non-zeros on the main diagonal.The pivot rows (l 1 , . . ., l 6 ) = (1,3,10,11,14,17) are marked by triangles.Non-zero loadings are marked by circles, zero loadings are left blank.factor dimension r is unknown.Identification of the number of factors in applied factor analysis is a notoriously difficult problem, with considerable ambiguity which method works best, be it BIC-type criteria (Bai and Ng, 2002), marginal likelihoods (Lopes and West, 2004), techniques from Bayesian nonparametrics involving infinite-dimensional factor models (Bhattacharya and Dunson, 2011;Ročková and George, 2017;Legramanti et al., 2020) or more heuristic procedures (Kaufmann and Schuhmacher, 2019).Imposing an unordered GLT structure in exploratory factor analysis allows to identify the true loading matrix Λ and the matrix Σ 0 and to easily spot all spurious columns in a possibly overfitting model.This strategy underlies the RJMCMC sampler of Frühwirth-Schnatter et al. (2022) to estimate the number of factors.
The paper is structured as follows.Section 2 reviews the role of identification in factor analysis using illustrative examples.Section 3 introduces GLT structures, proves identification for sparse GLT structures and shows that any unconstrained loading matrix has a unique representation as a GLT matrix.Section 4 addresses variance identification under GLT structures.Section 5 discusses exploratory factor analysis under unordered GLT structures, while Section 6 presents an illustrative application.Section 7 concludes.

The role of identification in factor analysis
Let y t = (y 1t , . . ., y mt ) ⊤ be an observation vector of m measurements, which is assumed to arise from a multivariate normal distribution, y t ∼ N m (0, Ω), with zero mean and covariance matrix Ω.In factor analysis, the correlation among the observations is assumed to be driven by a latent r-variate random variable f t = (f 1t , . . ., f rt ) ⊤ , the so-called common factors, through the following observation equation: where the m × r matrix Λ containing the factor loadings Λ ij is of full column rank, rk (Λ) = r, equal to the factor dimension r.In the present paper, we focus on the so-called basic factor model where the vector ǫ t = (ǫ 1t , . . ., ǫ mt ) ⊤ accounts for independent, idiosyncratic variation of each measurement and is distributed as ǫ t ∼ N m (0, Σ 0 ), with Σ 0 = Diag σ 2 1 , . . ., σ 2 m being a positive definite diagonal matrix.The common factors are orthogonal, meaning that f t ∼ N r (0, I r ) , and independent of ǫ t .In this case, the observation equation (1) implies the following covariance matrix Ω, when we integrate w.r.t. the latent common factors f t : Hence, all dependence among the measurements in y t is explained through the latent common factors and the off-diagonal elements of ΛΛ ⊤ define the marginal covariance between any two measurements y i 1 ,t and y i 2 ,t : where Λ i,• is the ith row of Λ.Consequently, we will refer to ΛΛ ⊤ as the cross-covariance matrix.
Since the number of factors, r, is often considerably smaller than the number of measurements, m, (2) can be seen as a parsimonious representation of the dependence between the measurements, often with considerably fewer parameters in Λ than the m(m − 1)/2 off-diagonal elements in an unconstrained covariance matrix Ω.
Since the factors f t are unobserved, the only information available to estimate Λ and Σ 0 is the covariance matrix Ω.A rigorous approach toward identification of factor models was first offered by Reiersøl (1950) and Anderson and Rubin (1956).Identification in the context of a basic factor model means the following.For any pair (β, Σ), where β is an m × r matrix and Σ is a positive definite diagonal matrix, that satisfies (2), i.e.: it follows that β = Λ and Σ = Σ 0 .Note that both parameter pairs imply the same Gaussian distribution y t ∼ N m (0, Ω) for every possible realisation y t .Anderson and Rubin (1956) considered identification as a two-step procedure.The first step is identification of the variance decomposition, i.e. identification of Σ 0 from (2), which implies identification of ΛΛ ⊤ .The second step is subsequent identification of Λ from ΛΛ ⊤ , also know as solving the rotational invariance problem.The literature on factor analysis often reduces identification of factor models to the second problem, however as we will argue in the present paper, variance identification is equally important.
Rotational invariance.Let us assume for the moment that ΛΛ ⊤ is identified.Consider, for further illustration, the following factor loading matrix Λ and a loading matrix β = ΛP αb defined as a rotation of Λ: For any α ∈ [0, 2π) and b ∈ {0, 1}, the factor loading matrix β yields the same cross-covariance matrix for y t as Λ, as is easily verified: The rotational invariance apparent in (6) holds more generally for any basic factor model (1).Take any arbitrary r × r rotation matrix P (i.e.PP ⊤ = I r ) and define the basic factor model where β = ΛP and f ⋆ t = P ⊤ f t .Then both models imply the same covariance Ω, given by (2).Hence, without imposing further constraints, Λ is in general not identified from the cross-covariance matrix ΛΛ ⊤ .If interest lies in interpreting the factors through the factor loading matrix Λ, rotational invariance has to be resolved.The usual way of dealing with rotational invariance is to constrain Λ in such a way that the only possible rotation is the identity P = I r .For orthogonal factors at least r(r − 1)/2 restrictions on the elements of Λ are needed to eliminate rotational indeterminacy (Anderson and Rubin, 1956).
The most popular constraints are positive lower triangular (PLT) loading matrices, where the upper triangular part is constrained to be zero and the main diagonal elements Λ 11 , . . ., Λ rr of Λ are strictly positive, see Figure 1 for illustration.Despite its popularity, the PLT structure is restrictive, as outlined already by Jöreskog (1969).Let ββ ⊤ be an arbitrary cross-covariance matrix with factor loading matrix β.A PLT representation of ββ ⊤ is possible iff a rotation matrix P exists such that β can be rotated into a PLT matrix Λ = βP.However, as example (5) illustrates this is not necessarily the case.Obviously, Λ is not a PLT matrix, since Λ 22 = 0. Any of the possible rotations β = ΛP αb have non-zero elements above the main diagonal and are not PLT matrices either.This example demonstrates that the PLT representation is restrictive.To circumvent this problem in example (5), one could reorder the measurements in an appropriate manner.However, in applied factor analysis, such an appropriate ordering is typically not known in advance and the choice of the first r measurements is an important modeling decision under PLT constraints, see e.g.Lopes and West (2004) and Carvalho et al. (2008).
We discuss in Section 3 a new identification strategy to resolve rotational invariance in factor models based on the concept of generalized lower triangular (GLT) structures.Loosely speaking, GLT structures generalize PLT structures by freeing the position of the first non-zero factor loading in each column, see the loading matrix Λ in (5) and Figure 1 for an example.We show in Section 3.1 that a unique GLT structure Λ can be identified for any cross-covariance matrix ββ ⊤ , provided that variance identification holds and, consequently, ββ ⊤ itself is identified.Even if ββ ⊤ is obtained from a loading matrix β that does not take the form of a GLT structure, such as the matrix β in (5), we show in Section 3.3 that a unique orthogonal matrix G exists which represents β as a rotation of a unique GLT structure Λ: which we call rotation into GLT.Hence, the GLT representation is unrestrictive in the sense of Jöreskog (1969) and is, indeed, a new and generic way to resolve rotational invariance for any factor loading matrix.
Sparse factor loading matrices.The factor loading matrix Λ given in (5) is an example of a sparse loading matrix.While only a single zero loading would be needed to resolve rotational invariance, six zeros are present and each factor loads only on dedicated measurements.Such sparse loading matrices are generated by a binary indicator matrix δ of 0s and 1s of the same dimension as Λ, where Λ ij = 0 iff δ ij = 0, and Λ ij ∈ R is unconstrained otherwise.The binary matrix δ = I(Λ = 0), where the indicator function is applied element-wise, is called the sparsity matrix corresponding to Λ.The sparsity matrix δ contains a lot of information about the structure of Λ, see Figure 1 for illustration.The indicator matrix on the right hand side tells us that Λ obeys the PLT constraint.The fifth row of the left and center matrices contains only zeros, which tells us that observation y 5t is uncorrelated with the remaining observations, since Cov(y it , y 5t ) = 0 for all i = 5.
Variance identification.Constraints that resolve rotational invariance typically take variance identification, i.e. identification of ΛΛ ⊤ , for granted, see e.g.Geweke and Zhou (1996).Variance identification refers to the problem that the idiosyncratic variances σ 2 1 , . . ., σ 2 m in Σ 0 are identified only from the diag-onal elements of Ω, as all other elements are independent of the σ 2 i s; see again (3).To achieve variance identification of σ 2 i from Ω ii = Λ i,• Λ ⊤ i,• + σ 2 i , all factor loadings have to be identified solely from the off-diagonal elements of Ω. Variance identification, however, is easily violated, as the following considerations illustrate.
While the three factor loadings (λ 11 , λ 21 , λ 31 ) are still identified from the off-diagonal elements of Ω as before, variance identification of σ 2 4 and σ 2 5 fails.Since Cov(y 4t , y 5t ) = λ 42 λ 52 is the only non-zero element that depends on the loadings λ 42 and λ 52 , infinitely many different parameters (λ 42 , λ 52 , σ 2 4 , σ 2 5 ) imply the same covariance matrix Ω.From these considerations it is evident that a minimum of three non-zero loadings is necessary in each column to achieve variance identification, a condition which has been noted as early as Anderson and Rubin (1956).At the same time, this condition is not sufficient, as it is satisfied by the loading matrix β in (10), although variance identification does not hold.In general, variance identification is not straightforward to verify.We will introduce in Section 4.1 a new and convenient way to verify variance identification for GLT structures.
The row deletion property.As explained above, we need to verify uniqueness of the variance decomposition, i.e. the identification of the idiosyncratic variances σ 2 1 , . . ., σ 2 m in Σ 0 from the covariance matrix Ω given in (2).The identification of Σ 0 guarantees that ΛΛ ⊤ is identified.The second step of identification is then to ensure uniqueness of the factor loadings, i.e. unique identification of Λ from ΛΛ ⊤ .To verify variance identification, we rely in the present paper on a condition known as row-deletion property.
Definition 1 (Row deletion property AR (Anderson and Rubin, 1956)).An m × r factor loading matrix Λ satisfies the row-deletion property if the following condition is satisfied: whenever an arbitrary row is deleted from Λ, two disjoint submatrices of rank r remain.Anderson and Rubin (1956, Theorem 5.1) prove that the row-deletion property is a sufficient condition for the identification of ΛΛ ⊤ and Σ 0 from the marginal covariance matrix Ω given in (2).For any (not necessarily GLT) factor loading matrix Λ, the row deletion property AR can be trivially tested by a stepby-step analysis, where each single row of Λ is sequentially deleted and the two distinct submatrices are determined from examining the remaining matrix, as suggested e.g. by Hayashi and Marcoulides (2006).However, this procedure is inefficient and challenging in higher dimensions.
Hence, it is helpful to have more structural conditions for verifying variance identification under the row deletion property AR.The literature provides several necessary conditions for the row deletion property AR that are based on counting the number of non-zero factor loadings in Λ. Anderson and Rubin (1956), for instance, prove the following necessary conditions for AR: for every nonsingular r-dimensional square matrix G, the matrix β = ΛG contains in each column at least 3 and in each pair of columns at least 5 nonzero factor loadings.Sato (1992, Theorem 3.3) extends these necessary conditions in the following way: every subset of 1 ≤ q ≤ r columns of β = ΛG contains at least 2q + 1 nonzero factor loadings for every nonsingular matrix G.We call this the 3579 counting rule for obvious reasons.
For illustration, let us return to the examples in ( 5) and ( 10).First, apply the 3579 counting rule to the unrestricted matrix β in (10).Although the variance decomposition Ω = ββ ⊤ + Σ = ΛΛ ⊤ + Σ 0 , is not unique, the counting rules are not violated, since β has five non-zero rows except for the cases Only for these eight specific cases, which correspond to the trivial rotations , we find immediately that the counting rules are violated, since one of the two columns has only two non-zero elements.This example shows the need to check such counting rules not only for a single loading matrix β, but also for all rotations βP admissible under the chosen strategy toward rotational invariance.On the other hand, if we apply the 3579 counting rule to the unrestricted matrix β in (5), we find that the necessary counting rules are satisfied for all rotations βP αb .For this specific example, we have already verified explicitly that variance identification holds and one might wonder if, in general, the 3579 counting rule can lead to a sufficient criterion for variance identification under AR.
Sufficient conditions for variance identification are hardly investigated in the literature.One exception is the popular factor analysis model where Λ takes the form of a dense PLT matrix, where all factor loadings on and below the main diagonal are left unrestricted and can take any in value in R. For this model, condition AR and hence variance identification holds, except for a set of measure 0, if the condition m ≥ 2r + 1 is satisfied.Conti et al. (2014) investigate identification of a dedicated factor model, where equation ( 1) is combined with correlated (oblique) factors, f t ∼ N r (0, R), and the factor loading matrix Λ has a perfect simple structure, i.e. each observation loads on at most one factor, as in ( 5) and (10); however, the exact position of the non-zero elements is unknown.They prove necessary and sufficient conditions that imply uniqueness of the variance decomposition as well as uniqueness of the factor loading matrix, namely: the correlation matrix R is of full rank (rk (R) = r) and each column of Λ contains at least three nonzero loadings.
In the present paper, we build on and extend this previous work.We provide sufficient conditions for variance identification of a GLT structure Λ.These conditions are formulated as counting rules for the m × r sparsity matrix δ = I(β = 0) of β and are equivalent to the 3579 counting rules of Sato (1992, Theorem 3.3).More specifically, if the 3579 counting rule holds for the sparsity matrix δ of a GLT matrix Λ, then this is a sufficient condition for the row deletion property AR and consequently for variance identification, except for a set of measure 0.
Identification of the number of factors.Identification of the number of factors is a notoriously difficult problem and analysing this problem from the view point of variance identification is helpful in understanding some fundamental difficulties.Assume that Ω has a representation as in (2) with r factors which is variance identified.Then, on the one hand, no equivalent representation exists with r ′ < r number of factors.On the other hand, as shown in Reiersøl (1950, Theorem 3.3), any such structure (Λ, Σ 0 ) creates solutions (β k , Σ k ) with m × k loading matrices β k of dimension k = r + 1, r + 2, . . ., m bigger than r and Σ k being a positive definite matrix different from Σ 0 which imply the same covariance matrix Ω as (Λ, Σ 0 ), i.e.: Furthermore, for any fixed k > r, infinitely many such solutions (β k , Σ k ) can be created that satisfy the decomposition (12) which, consequently, no longer is variance identified.This problem is prevalent regardless of the chosen strategy toward rotational invariance.For illustration, we return to example ( 5) and construct an equivalent solution for k = 3.While the first two columns of β 3 are equal to Λ, the third column is a so-called spurious factor with a single non-zero loading and Σ 3 is defined as follows: We can place the spurious factor loading β i3 in any row i and β i3 can take any value satisfying 0 < β 2 i3 < σ 2 i .It is easy to verify that any such pair (β 3 , Σ 3 ) indeed implies the same covariance matrix Ω as in (9).This ambiguity in an overfitting model renders the estimation of true number of factors r a challenging problem and leads to considerable uncertainty how to choose the number of factors in applied factor analysis.In Section 5, we follow up on this problem in more detail.An important necessary condition for k to be the true number of factors is that variance identification of Σ k in (12) holds.Therefore, the counting rules that we introduce in this paper will also be useful in cases where the true number of factors r is unknown.
Overfitting GLT structures.Finally, we investigate in Section 5 the class of potentially overfitting GLT structures where the matrix β k in ( 12) is constrained to be an unordered GLT structure.We apply results by Tumura and Sato (1980) to this class and show how easily spurious factors and the underlying true factor loading matrix Λ are identified under GLT structures, even if the model is overfitting.Our strategy relies on the concept of extended variance identification and the extended row deletion property introduced by Tumura and Sato (1980), where more than one row is deleted from the loading matrix.An extended counting rule will be introduced for the sparsity matrix of a GLT loading matrices β k in Section 4 which is useful in this context.
3 Solving rotational invariance through GLT structures

Ordered and unordered GLT structures
In this work, we introduce a new identification strategy to resolve rotational invariance based on the concept of generalized lower triangular (GLT) structures.First, we introduce the notion of pivot rows of a factor loading matrix Λ.
Definition 2 (Pivot rows).Consider an m × r factor loading matrix Λ with r non-zero columns.For each column j = 1, . . ., r of Λ, the pivot row l j is defined as the row index of the first non-zero factor loading in column j, i.e.Λ ij = 0, ∀ i < l j and Λ l j ,j = 0.The factor loading Λ l j ,j is called the leading factor loading of column j.
For PLT factor loading matrices the pivot rows lie on the main diagonal, i.e. (l 1 , . . ., l r ) = (1, . . ., r), and the leading factor loadings Λ jj > 0 are positive for all columns j = 1, . . ., r. GLT structures generalize the PLT constraint by freeing the pivot rows of a factor loading matrix Λ and allowing them to take arbitrary positions (l 1 , . . ., l r ), the only constraint being that the pivot rows are pairwise distinct.GLT structures contain PLT matrices as the special case where l j = j for j = 1, . . ., r.Our generalization is particularly useful if the ordering of the measurements y it is in conflict with the PLT assumption.Since Λ jj is allowed to be 0, measurements different from the first r ones may lead the factors.For each factor j, the leading variable is the response variable y l j ,t corresponding to the pivot row l j .
We will distinguish between two types of GLT structures, namely ordered and unordered GLT structures.The following definition introduces ordered GLT matrices.Unordered GLT structures will be motivated and defined below.Examples of ordered and unordered GLT matrices are displayed in Figure 1 for a model with r = 6 factors.
Definition 3 (Ordered GLT structures).An m × r factor loading matrix Λ with full column rank r has an ordered GLT structure if the pivot rows l 1 , . . ., l r of Λ are ordered, i.e. l 1 < . . .< l r , and the leading factor loadings are positive, i.e.Λ l j ,j > 0 for j = 1, . . ., r.
Evidently, imposing an ordered GLT structure resolves rotational invariance if the pivot rows are known.For any two ordered GLT matrices β and Λ with identical pivot rows l 1 , . . ., l r , the identity β = ΛP evidently holds iff P = I r .In practice, the pivot rows l 1 , . . ., l r of a GLT structure are unknown and need to be identified from the marginal covariance matrix Ω for a given number of factors r.Given variance identification, i.e. assuming that the cross-covariance matrix ΛΛ ⊤ is identified, a particularly important issue for the identification of a GLT factor model is whether Λ is uniquely identified from ΛΛ ⊤ if the pivot rows l 1 , . . ., l r are unknown.Non-trivial rotations β = ΛP of a loading matrix Λ with pivot rows l 1 , . . ., l r might exist such that ββ ⊤ = ΛΛ ⊤ , while the pivot rows l1 , . . ., lr of β are different from the pivot rows of Λ. Very assuringly, Theorem 1 shows that this is not the case: not only the pivot rows, but the entire loading matrices Λ and β are identical, if ΛΛ ⊤ = ββ ⊤ (see Appendix A for a proof).
Definition 4 introduces, as an extension of Definition 3, unordered GLT structures under which Λ is identified from ΛΛ ⊤ only up to signed permutations.A signed permutation permutes the columns of the factor loading matrix Λ and switches the sign of all factor loadings in any specific column.This leads to a trivial case of rotational invariance.For r = 2, for instance, the eight signed permutations of the loading matrix Λ defined in (10) are depicted in (11).More formally, β is a signed permutation of Λ, iff where the permutation matrix P ρ corresponds to one of the r! permutations of the r columns of Λ and the reflection matrix P ± = Diag(±1, . . ., ±1) corresponds to one of the 2 r ways to switch the signs of the r columns of Λ. Often, it is convenient to employ identification rules that guarantee identification of Λ only up to such column and sign switching, see e.g.Conti et al. (2014).Any structure Λ obeying such an identification rule represents a whole equivalence class of matrices given by all 2 r r! signed permutation β = ΛP ± P ρ of Λ.This trivial form of the rotational invariance does not impose any additional mathematical challenges and is often convenient from a computational viewpoint, in particular for Bayesian inference, see for e.g.Conti et al. (2014) and Frühwirth-Schnatter et al. (2022).
It is easy to verify how identification up to trivial rotational invariance can be achieved for GLT structures and motivates the following definition of unordered GLT structures as loadings matrices β where the pivot rows l 1 , . . ., l r simply occupy r different rows.In Definition 4, no order constraint is imposed on the pivot rows and no sign constraint is imposed on the leading factor loadings.This very general structure allows to design highly efficient sampling schemes for sparse Bayesian factor analysis under GLT structures, see Frühwirth-Schnatter et al. (2022).
Definition 4 (Unordered GLT structures).An m × r factor loading matrix β with full column rank r has an unordered GLT structure if the pivot rows l 1 , . . ., l r of β are pairwise distinct.
Theorem 1 is easily extended to unordered GLT structures.Any signed permutation β = ΛP ρ P ± of Λ is uniquely identified from ββ ⊤ = ΛΛ ⊤ , provided that ΛΛ ⊤ is identified.Hence, under unordered GLT structures the factor loading matrix Λ is uniquely identified up to signed permutations.Full identification can easily be obtained from unordered GLT structures β.Any unordered GLT structure β has unordered pivot rows l 1 , . . ., l r , occupying different rows.The corresponding ordered GLT structure Λ is recovered from β by sorting the columns in ascending order according to the pivot rows.In other words, the pivot rows of Λ are equal to the order statistics l (1) , . . ., l (r) of the pivot rows l 1 , . . ., l r of β, see again Figure 1.This procedure resolves rotational invariance, since the pivot rows l 1 , . . ., l r in the unordered GLT structure are distinct.Furthermore, imposing the condition Λ l j ,j > 0 in each column j resolves sign switching: if Λ l j ,j < 0, then the sign of all factor loadings Λ ij in column j is switched.

Sparse GLT structures
In Definition 3 and 4, "structural" zeros are introduced for a GLT structure for all factor loading above the pivot row l j , while the factor loading Λ l j ,j in the pivot row is non-zero by definition.We call Λ a dense GLT structure if all loadings below the pivot row are unconstrained and can take any value in R.
A sparse GLT structure results if factor loadings at unspecified places below the pivot rows are zero and only the remaining loadings are unconstrained.A sparse loading matrix Λ can be characterized by the so-called sparsity matrix, defined as a binary indicator matrix δ of 0/1s of the same size as Λ, where δ ij = I(Λ ij = 0).Let δ Λ be the sparsity matrix of a GLT matrix Λ.The sparsity matrix δ corresponding to the signed permutation β = ΛP ρ P ± is equal to δ = δ Λ P ρ and is invariant to sign switching.Hence, for any sparse unordered GLT matrix β, the corresponding sparsity matrix δ obeys an unordered GLT structure with the same pivot rows as β, see Figure 1 for illustration.
In sparse factor analysis, single factor loadings take zero-values with positive probability and the corresponding sparsity matrix δ is a binary matrix that has to be identified from the data.Identification in sparse factor analysis has to provide conditions under which the entire 0/1 pattern in δ can be identified from the covariance matrix Ω if δ is unknown.Whether this is possible hinges on variance identification, i.e. whether the decomposition of Ω into ΛΛ ⊤ and Σ 0 is unique.How variance identification can be verified for (sparse) GLT structures is investigated in detail in Section 4. Let us assume at this point that variance identification holds, i.e. the cross-covariance matrix ΛΛ ⊤ is identified.Then an important step toward the identification of a sparse factor model is to verify whether the 0/1 pattern of Λ, characterized by δ, is uniquely identified from ΛΛ ⊤ .Very importantly, if Λ is assumed to be a GLT structure, then the entire GLT structure Λ and hence the indicator matrix δ is uniquely identified from ΛΛ ⊤ , as follows immediately from Theorem 1, since δ ij = 0, iff Λ ij = 0 for all i, j.By identifying the 0/1 pattern in δ we can uniquely identify the pivot rows of Λ and the sparsity pattern below.
We would like to emphasize that in sparse factor analysis with unconstrained loading matrices Λ this is not necessarily the case.The indicator matrix δ is in general not uniquely identified from ΛΛ ⊤ , because (non-trivial) rotations P change the zero pattern in β = ΛP, while ββ ⊤ = ΛΛ ⊤ .For illustration, let us return to the example in (5) where we showed that ΛΛ ⊤ is uniquely identified if the true sparsity matrix δ Λ is known.Now assume that δ Λ is unknown and allow the loading matrix β = ΛP to be any rotation of Λ.It is then evident that the corresponding sparsity matrix δ is not unique and two solutions exists.For all rotations where (α, b) ∈ {0, π 2 , π, 3π 2 } × {0, 1}, β correspond to one of the eight signed permutation of Λ given in (11) and the sparsity matrix δ is equal to δ Λ up to this signed permutation.For all other rotations, all elements of β are different from zero and δ is simply a matrix of ones.

Rotation into GLT
As discussed above, GLT structures generalize the PLT constraint, but one might wonder how restrictive this structure still is.We will show in this section that for a basic factor model with unconstrained loading matrix β there exists an equivalent representation involving a unique GLT structure Λ which is related to β by an orthogonal transformation, provided that uniqueness of the variance decomposition holds.
The proof of this result uses a relationship between a matrix with GLT structure and the so-called reduced row echelon form in linear algebra that results from the Gauss-Jordan elimination for solving linear systems, see e.g.Anton and Rorres (2013).Any transposed GLT loading matrix Λ ⊤ has a row echelon form which can be turned into a reduced row echelon form (RREF) B = A ⊤ Λ ⊤ with the help of an r × r matrix A which is constructed from the pivot rows l 1 , . . ., l r of Λ and invertible by definition: Since the RREF of any matrix is unique, see e.g.Yuster (1984), we find that the pivot columns of B coincide with the pivot rows l 1 , . . ., l r of Λ.Hence, for a basic factor model with an arbitrary, unstructured loading matrix β with full column rank r, we prove in Theorem 2 that the RREF of β ⊤ can be used to represent β as a unique GLT structure Λ, where the pivot rows l 1 , . . ., l r of Λ coincide with the pivot columns of the RREF of β ⊤ (see Appendix A for a proof).
Theorem 2 (Rotation into GLT).Let β be an arbitrary loading matrix with full column rank r.Then the following holds: (a) There exists an equivalent representation of β involving a unique GLT structure Λ, where G is a unique orthogonal matrix.Λ is called the GLT representation of β.
(b) Let l 1 < . . .< l r be the pivot columns of the RREF B of β ⊤ and let β 1 be the r × r submatrix of β containing the corresponding rows l 1 , . . ., l r .The GLT representation Λ = βG of β has pivot rows l 1 , . . ., l r and is obtained through rotation into GLT with a rotation matrix which results from the QR decomposition Would it be possible to obtain a similar results with the factor loading matrix Λ being constrained to be a PLT structure?The answer is definitely no, as has already been established in Section 2 for example (5).As mentioned above, GLT structures encompass PLT structures as a special case.Hence, if a PLT representation Λ exists for a loading matrix β = ΛP, then the GLT representation in ( 16) automatically reduces to the PLT structure Λ, since R = β ⊤ 1 is obtained from the first r rows of β and the "rotation into GLT" is equal to the identity, Q = I r .On the other hand, if the GLT representation Λ differs from a PLT structure, then no equivalent PLT representation exists.Hence, forcing a PLT structure in the representation (1) may introduce a systematic bias in estimating the marginal covariance matrix Ω.

Variance identification and GLT structures
As mentioned in the previous sections, constraints imposed on the structure of a factor loading matrix Λ will resolve rotational invariance only if uniqueness of the variance decomposition holds and the crosscovariance matrix ΛΛ ⊤ is identified.However, rotational constraints alone do not necessarily guarantee uniqueness of the variance decomposition.Consider, for instance, a sparse PLT loading matrix where in some column j in addition to the diagonal element Λ jj (which is nonzero by definition) only a single further factor loading Λ n j ,j in some row n j > j is nonzero.Such a loading matrix obviously violates the necessary condition for variance identification that each column contains at least three nonzero elements.Similarly, while GLT structures resolve rotational invariance, they do not guarantee uniqueness of the variance decomposition either.
In Section 4.1, we derive sufficient conditions for variance identification of GLT structures based on the 3579 counting rule of Sato (1992, Theorem 3.3).In Section 4.2, we discuss how to verify variance identification for sparse GLT structures in practice.

Counting rules for variance identification
We will show how to verify from the 0/1 pattern δ of an unordered GLT structure β, whether the row deletion property AR holds for β and all its signed permutations.Our condition is a structural counting rule expressed solely in terms of the sparsity matrix δ underlying β and does not involve the values of the unconstrained factor loadings in β, which can take any value in R. For any factor model, variance identification is invariant to signed permutations.If we can verify variance identification for a single signed permutation β = ΛP ± P ρ of Λ, as defined in ( 14), then variance identification of Λ holds, since β and Λ imply the same cross-covariance matrix ΛΛ ⊤ .Hence, we focus in this section on variance identification of unordered GLT structures.
In Definition 5, we recall the so-called extended row deletion property, introduced by Tumura and Sato (1980).
Definition 5 (Extended row deletion property RD(r, s)).A m × r factor loading matrix β satisfies the row-deletion property RD(r, s), if the following condition is satisfied: whenever s ∈ N 0 rows are deleted from β, then two disjoint submatrices of rank r remain.
The row-deletion property of Anderson and Rubin (1956) results as a special case where s = 1.As will be shown in Section 5, the extended row deletion properties RD(r, s) for s > 1 are useful in exploratory factor analysis, when the factor dimension r is unknown.In Definition 6, we introduce a counting rule for binary matrices.
Note that the counting rule CR(r, s), like the extended row deletion property RD(r, s), is invariant to signed permutations.Lemma 8 in Appendix A summarizes further useful properties of CR(r, s).
For a given binary matrix δ of dimension m × r, let Θ δ be the space generated by the non-zero elements of all unordered GLT structure β with sparsity matrix δ and all their 2 r r! − 1 trivial rotations βP ± P ρ .We prove in Theorem 3 that for GLT structures the counting rule CR(r, s) and the extended row deletion property RD(r, s) are equivalent conditions for all loading matrices in Θ δ , except for a set of measure 0.
Theorem 3. Let δ be a binary m × r matrix with unordered GLT structure.Then the following holds: (a) If δ violates the counting rule CR(r, s), then the extended row deletion property RD(r, s) is violated for all β ∈ Θ δ generated by δ.
(b) If δ satisfies the counting rule CR(r, s), then the extended row deletion property RD(r, s) holds for all β ∈ Θ δ except for a set of measure 0.
See Appendix A for a proof.The special case s = 1 is relevant for verifying the row deletion property AR.It proves that for unordered GLT structures the 3579 counting rule of Sato (1992) is not only a necessary, but also a sufficient condition for AR to hold.In addition, this means that the counting rule needs to be verified only for the sparsity matrix δ of a single trivial rotation β = ΛP ± P ρ rather than for every nonsingular matrix G.This result is summarized in Corollary 4.
Corollary 4 (Variance identification rule for GLT structures).For any unordered m × r GLT structure β, the following holds: (a) If δ satisfies the 3579 counting rule, i.e. every column of δ has at least 3 non-zero elements, every pair of columns at least 5 and, more generally, every possible combination of q = 3, . . ., r columns has at least 2q + 1 non-zero elements, then variance identification is given for all β ∈ Θ δ except for a set of measure 0; i.e. for any other factor decomposition of the marginal covariance matrix (c) For r = 1, r = 2, and r = 3, condition CR(r, 1) is both sufficient and necessary for variance identification.
A few comments are in order.If δ satisfies CR(r, 1), then AR holds for all β ∈ Θ δ and a sufficient condition for variance identification is satisfied.As shown by Anderson and Rubin (1956), AR is a necessary condition for variance identification only for r = 1 and r = 2. Tumura and Sato (1980, Theorem 3) show the same for r = 3, provided that m ≥ 7. It follows that CR(r, 1) is a necessary and sufficient condition for variance identification for the models summarized in (c).In all other cases, variance identification may hold for loading matrices β ∈ Θ δ , even if δ violates CR(r, 1).
The definition of unordered GLT structures given in Section 3 imposes no constraint on the pivot rows l 1 , . . ., l r beyond the assumption that they are distinct.This flexibility can lead to GLT structures that can never satisfy the 3579 rule, even if all elements below the pivot rows are non-zero.Consider, for instance, a GLT matrix with the pivot row in column r being equal to l r = m − 1.The loading matrix has at most two nonzero elements in column r and violates the necessary condition for variance identification.This example shows that there is an upper bound for the pivot elements beyond which the 3579 rule can never hold.This insight is formalized in Definition 7.
Definition 7.An unordered GLT structure β fulfills condition GLT-AR if the following constraint on the pivot rows l 1 , . . ., l r of β is satisfied, where z j is the rank of l j in the ordered sequence l (1) < . . .< l (r) : Evidently, an ordered GLT structure Λ fulfills condition GLT-AR if the pivot rows l 1 , . . ., l r of Λ satisfy the constraint l j ≤ m − 2(r − j + 1).For the special case of a PLT structure where l j = j, this constraint reduces to m ≥ 2r + 1 which is equivalent to a well-known upper bound for the number of factors.For dense unordered GLT structures with m (non-zero) rows, condition GLT-AR is a sufficient condition for AR.For sparse GLT structures GLT-AR is only a necessary condition for AR and the 3579 rule has to be verified explicitly, as shown by the example discussed above.Very conveniently for verifying variance identification in sparse factor analysis based on GLT structures, Theorem 3 and Corollary 4 operate solely on the sparsity matrix δ corresponding to β.

Variance identification in practice
To verify CR(r, s) in practice, all submatrices of q columns have to be extracted from the sparsity matrix δ to verify if at least 2q + 1 rows of this submatrix are non-zero.For q = 1, 2, r − 1, r, this condition is easily verified from simple functionals of δ, see Corollary 5 which follows immediately from Theorem 3 (see Appendix A for details).
Corollary 5 (Simple counting rules for CR(r, s)).Let δ be a m × r unordered GLT sparsity matrix.
The following conditions on δ are necessary for CR(r, s) to hold: where the indicator function I(δ ⋆ > 0) is applied element-wise and 1 n×k denotes a n × k matrix of ones.For r ≤ 4, these conditions are also sufficient for CR(r, s) to hold for δ.
Using Corollary 5 for s = 1, one can efficiently verify, if the 3579 counting rule and hence the row deletion property AR holds for unordered GLT factor models with up to r ≤ 4 factors.For models with more than four factors (r > 4), a more elaborated strategy is needed.After checking the conditions of Corollary 5, CR(r, s) could be verified for a given binary matrix δ by iterating over all remaining r!/(q!(r − q)!) subsets of q = 3, . . ., r − 2 columns of δ.While this is a finite task, such a naïve approach may need to visit 2 r − 1 matrices in order to make a decision and the combinatorial explosion quickly becomes an issue in practice as r increases.Recent work by Hosszejni and Frühwirth-Schnatter (2022) establishes the applicability of this framework for large models.

Identification in exploratory factor analysis
In this section, we discuss how the concept of GLT structures is helpful for addressing identification problems in exploratory factor analysis (EFA).Consider data {y 1 , . . ., y T } from a multivariate Gaussian distribution, y t ∼ N m (0, Ω), where an investigator wants to perform factor analysis since she expects that the covariances of the measurements y it are driven by common factors.In practice, the number of factors is typically unknown and often it is not obvious, whether all m measurements in y t are actually correlated.It is then common to employ EFA by fitting a basic factor model to the entire collection of measurements in y t , i.e. assuming the model with an assumed number of factors k, a m × k loading matrix β k with elements β ij and a diagonal matrix Σ k with strictly positive entries.The EFA model ( 21) is potentially overfitting in two ways.First, the true number of factors r is possibly smaller than k, i.e. β k has too many columns.Second, some measurements in y t are possibly irrelevant, which means that β k allows for too many non-zero rows.
The goal is then to determine the true number of factors and to identify irrelevant measurements from the EFA model ( 21).
We will address identification under the assumption that the data are generated by a basic factor model with loading matrix β 0 with r factors which implies the following covariance matrix Ω: Instead of ( 22), for a given k, the EFA model ( 21) yields the alternative representation of Ω: The question is then under which conditions can the true loading matrix β 0 be recovered from (23).Let us assume for the moment that no constraint that resolves rotational invariance is imposed on β 0 or β k .
"Revealing the truth" in an overfitting EFA model.A fundamental problem in factor analysis is the following.If the EFA model is overfitting, i.e. k > r, could we nevertheless recover the true loading matrix β 0 directly from β k ?We will show how this can be achieved mathematically by combining the important work by Tumura and Sato (1980) with the framework of GLT structures.We have demonstrated in Section 2 using example (13) that solutions in an overfitting model can be constructed by adding spurious columns (Reiersøl, 1950;Geweke and Singleton, 1980).Additional solutions are obtained as rotations of such solutions.For instance, one of the following solutions may result: , both with the same Σ 3 as in (13).The first case is a signed permutation of β 3 , while the second case combines a signed permutation of β 3 with a rotation of the spurious and Λ's first column involving P αb .
In the first case, despite the rotation, both the spurious column and the columns of Λ are clearly visible, while in the second case the presence of a spurious column is by no means obvious and the columns of Λ are disguised.
In general, for an EFA model that is overfitting by a single column, i.e. k = r + 1, and β k is left unconstrained, infinitely many representations (β k , Σ k ) with covariance matrix Ω = β k β ⊤ k + Σ k can be constructed in the following way.Let the first r columns of β k be equal to β 0 and append an extra column to its right.In this extra column, which will be called a spurious column, add a single non-zero loading β l k ,k in any row 1 ≤ l k ≤ m taking any value that satisfies 0 < β 2 l k ,k < σ 2 l k ; then reduce the idiosyncratic variance in row l k to σ 2 l k − β 2 l k ,k ; and finally apply an arbitrary rotation P: Interesting questions are then the following: under which conditions is (24) an exhaustive representation of all possible solutions β k in an EFA model where the degree of overfitting defined as s = k − r is equal to one?How can all solutions β k be represented if s > 1?
Such identifiability problems in overfitting EFA models have been analyzed in depth by Tumura and Sato (1980).They show that a stronger condition than RD(r, 1) is needed for β 0 in the underlying variance decomposition (22) to ensure that only spurious and no additional common factors are added in the overfitting representation (23).In addition, Tumura and Sato (1980) provide a general representation of the factor loading matrix β k in overfitting representation (23) with k > r.
Theorem 6. (Tumura and Sato, 1980, Theorem 1) Suppose that Ω has a decomposition as in ( 22) with r factors and that for some S ∈ N with m ≥ 2r + S + 1 the extended row deletion property RD(r, 1 + S) holds for β 0 .If Ω has another decomposition such that Ω = β k β ⊤ k + Σ k where β k is a m × (r + s)matrix of rank k = r + s with 1 ≤ s ≤ S, then there exists an orthogonal matrix T k of rank k such that where the off-diagonal elements of M s M ⊤ s are zero.
The m × s-matrix M s is a so-called spurious factor loading matrix that does not contribute to explaining the covariance in y t , since While this theorem is an important result, without imposing further structure on the factor loading matrix β k in the EFA model it cannot be applied immediately to "recover the truth", as the separation of β k into the true factor loading matrix β 0 and the spurious factor loading matrix M s is possible only up to a rotation T k of β k .However, the truth" in an overfitting EFA model can be recovered, if Tumura and Sato (1980, Theorem 1) is applied within the class of unordered GLT structures introduced in this paper.If we assume that Λ is a GLT structure which satisfies the extended row deletion property RD(r, 1 + S), we prove in Theorem 7 the following result.If β k in an overfitting EFA model is an unordered GLT structure, then β k has a representation, where the rotation in ( 25) is a signed permutation T k = P ± P ρ .Hence, spurious factors in β k are easily spotted and Λ can be recovered immediately from β k .
Definition 8 (Unordered spurious GLT structure).A m × s unordered GLT factor loading matrix M Λ s with pivots rows {n 1 , . . ., n s } is an unordered spurious GLT structure if all columns are spurious columns with a single nonzero loading in the corresponding pivot row.
Theorem 7. Let Λ be a m × r GLT factor loading matrix with pivot rows l 1 < . . .< l r which obeys the extended row deletion property RD(r, 1 + S) for some S ∈ N. Assume that the m × k matrix β k in the EFA variance decomposition GLT matrix,then (25) reduces to where M Λ s is a spurious ordered GLT structure with pivot rows n 1 < . . .< n s which are distinct from the r pivot rows in Λ.Hence, r columns of β k are a signed permutation of the true loading matrix Λ, while the remaining s columns of β k are an unordered spurious GLT structure with pivots n 1 , . . ., n s .
See Appendix A for a proof.
Identifying irrelevant variables.In applied factor analysis, the assumption that each measurement y it is correlated with at least one other measurement is too restrictive, because irrelevant measurements might be present that are uncorrelated with all the other measurements.As argued by Boivin and Ng (2006), it is useful to identify such variables.Within the framework of sparse factor analysis, irrelevant variables are identified in Kaufmann and Schuhmacher (2017) by exploring the sparsity matrix δ of a factor loading matrix β 0 with respect to zero rows.Since Cov(y it , y lt ) = 0 for all l = i, if the entire ith row of β 0 is zero (see also (3)), the presence of m 0 irrelevant measurements causes the corresponding m 0 rows of β 0 and δ to be zero.As before, we assume that the variance decomposition (22) of the underlying basic factor model is variance identified.
Let us first investigate identification of the zero rows in β 0 and the corresponding sparsity matrix δ for the case that the assumed and the true number of factors in the EFA model ( 21) are identical, i.e. k = r.Since variance identification of ( 22) in the underlying model holds, we obtain that Σ 0 = Σ r , β 0 β ⊤ 0 = β r β ⊤ r and β r = β 0 P is a rotation of β 0 .Therefore, the position of the zero rows both in β 0 and β r are identical and all irrelevant variables can be identified from β r or the corresponding sparsity matrix δ, regardless of the strategy toward rotational invariance.
What makes this task challenging in applied factor analysis is that in practice only the total number m of observations is known, whereas the investigator is ignorant both about the number of factors r and the number of irrelevant measurements m 0 .In such a situation, variance identification of Σ k for an EFA model with k assumed factors is easily lost if too many irrelevant variables are included in relation to k.These considerations have important implication for exploratory factor analysis.While the investigator can choose k, she is ignorant about the number of irrelevant variables and the recovered model might not be variance identified.For this reason, it is relevant to verify in any case that the solution β k obtained from any EFA model satisfies variance identification.
Under AR this means that the loading matrix of the correlated measurements, i.e. the non-zero rows of β 0 , satisfies RD(r, 1).If variance identification relies on AR, then a minimum requirement for β k to satisfy RD(k, 1) is that 2k + 1 ≤ m − m 0 .If no irrelevant measurement are present, then the well-known upper bound k ≤ m−1 2 results.However, if irrelevant measurements are present, then there is a trade-off between m 0 and k: the more irrelevant measurements are included, the smaller the maximum number of assumed factors k has to be.Hence, the presence of m 0 zero rows in β 0 , while β k in the EFA model is allowed to have m potentially non-zero rows requires stronger conditions for variance identification than for an EFA model where the underlying loading matrix β 0 contains only non-zero rows.More specifically, for a given number m 0 ∈ N of irrelevant measurements, variance identification necessitates the more stringent upper bound k ≤ m−m 0 −1

2
, where m − m 0 is the number of non-zero rows.On the other hand, for a given number of factors k in an EFA model, the maximum number of irrelevant measurements that can be included is given by m 0 ≤ m − (2k + 1).
Identifying the number of factors through an EFA model.Let us assume that the variance decomposition (22) of the unknown underlying basic factor model is identified.As shown by Reiersøl (1950), the true number of factors r is equal to the smallest value k that satisfies (23).However, in practice, it is not obvious how to solve this "minimization" problem.As the following considerations show, verifying variance identification for β k in an EFA model can be helpful in this regard.
If r is unknown, then we need to find a decomposition of Ω as in ( 23) where Σ k is variance identified.Since the true underlying decomposition ( 22) is variance identified, any solution where Σ k is not variance identified can be rejected.As has been discussed above, any overfitting EFA model, where k > r, has infinitely many decompositions of Ω and therefore is never variance identified.Hence, if any solution Σ k of an EFA model with k assumed factors is not variance identified, then we can deduce that k is bigger than r.On the other hand, if variance identification holds for Σ k , then the decompositions ( 22) and ( 23) are equivalent and we can conclude that r = k, Σ 0 = Σ k and therefore As a consequence, we can identify the true loading matrix β 0 = β k P from β k mathematically up to a rotation P (Anderson and Rubin, 1956, Lemma 5.1).
This insight shows that verifying variance identification is relevant beyond resolving rotational invariance and is essential for recovering the true number of factors.This has important implications for applied factor analysis.Most importantly, the rank or the number of non-zero columns of a factor loading matrix β k recovered from an EFA model with assumed number k of factors might overfit the true number of factors r, if variance identification for Σ k is not satisfied and the variance decomposition is not unique.Hence, extracting the number of factors from an EFA model makes only sense in connection with ensuring that variance identification holds.

Sparse Bayesian factor analysis
A common goal of Bayesian factor analysis is to identify the unknown factor dimension r of a factor loading matrix from the overfitting factor model ( 21) with potentially k > r factors, see, among many others, Ročková andGeorge (2017), Frühwirth-Schnatter andLopes (2018), and Ohn and Kim (2022).Often, spike-and slab priors are employed, where the elements β ij of the loading matrix β k apriori are allowed to be exactly zero with positive probability.This is achieved through a prior on the corresponding m × k sparsity matrix δ k .In each column j, the indicators δ ij are active apriori with a column-specific probability τ j , i.e.Pr(δ ij = 1|τ j ) = τ j for i = 1, . . ., m, where the slab probabilities τ 1 , . . ., τ k arise from an exchangeable shrinkage prior: If γ is unknown, then ( 26) is called a two-parameter-beta (2PB) prior.If γ = 1, then ( 26) is called a one-parameter-beta (1PB) prior and takes the form: Prior ( 27) converges to the Indian buffet process prior (Teh et al., 2007) for k → ∞.As recently shown by Frühwirth-Schnatter (2022), prior (27) has a representation as a cumulative shrinkage process (CUSP) prior (Legramanti et al., 2020).
This specification leads to a Dirac-spike-and-slab prior for the factor loadings, where the columns of the loading matrix are increasingly pulled toward 0 as the column index increases.In (28), a Gaussian slab distribution is assumed with a random global shrinkage parameter κ, although other slab distributions are possible, see e.g.Zhao et al. (2016) andFrühwirth-Schnatter et al. (2022).
The hyperparameters α and γ are instrumental in controlling prior sparsity.Choosing α = k and γ = 1 leads to a uniform distribution for τ j , with the smallest slab probability τ (1) = min j=1,...,k τ j also being uniform, while the largest slab probability τ (k) = max j=1,...,k τ j ∼ B (k, 1), see Frühwirth-Schnatter (2022).Such a prior is likely to overfit the number of factors, regardless of all other assumptions.A prior with α < k and γ = 1 induces sparsity, since the largest slab probability τ (k) ∼ B (α, 1), while the smallest slab probability τ (1) ∼ B (α/k, 1).To control the small probabilities, which are important in identifying the true number of factors, α is assumed to be a random parameter and learnt from the data under the prior α ∼ G (a α , b α ).γ controls the prior information in (26).Priors with γ > 1 and γ < 1, respectively, decrease and increase the difference between τ (1) and τ (k) .Typically, γ is unknown and is estimated from the data using the prior γ ∼ G (a γ , b γ ).
MCMC estimation.For a given choice of hyperparameters, Markov chain Monte Carlo (MCMC) methods are applied to sample from the posterior distribution p(β k , Σ k , δ k |y), given T multivariate observations y = (y 1 , . . ., y T ), see e.g.Kaufmann and Schuhmacher (2019) among many others.In Frühwirth-Schnatter et al. ( 2022), such a sampler is developed for GLT factor models.To move between factor models of different factor dimension, Frühwirth-Schnatter et al. ( 2022) exploit Theorem 7 to add and delete spurious columns through a reversible jump MCMC (RJMCMC) sampler.For each posterior draw β k , the active columns β r (i.e.all columns with at least 2 non-zero elements) and the corresponding sparsity matrix δ r are determined.If δ r satisfies the counting rule CR(r, 1), then β r is a signed permutation of Λ with the corresponding covariance matrix Σ r = Σ k + M Λ s (M Λ s ) ⊤ , where M Λ s contains the spurious columns of β k .These variance identified draws are kept for further inference and the number of columns of β r is considered a posterior draw of the unknown factor dimension r.This algorithm is easily extended to EFA models without any constraints.

An illustrative simulation study
For illustration, we perform a simulation study and consider three different data scenarios with m = 30 and T = 150.In all three scenarios, r true = 5 factors are assumed, however, the zero/non-zero pattern is quite different.The first setting is a dedicated factor model, where the first 6 variables load on factor 1, the next 6 variables load on factor 2, and so forth, and the final 6 variables load on factor 5. A dedicated factor model has a GLT structure by definition.The second scenario is a block factor model, where the first 15 observations load only on factor 1 and 2, while the remaining 15 observations only load on factor 3, 4 and 5 and the covariance matrix has a block-diagonal structure.All loadings within a block are non-Table 1: Sparse Bayesian factor analysis under GLT and unconstrained structures (EFA) under a 1PB prior (α ∼ G (6, 2)) and a 2PB prior (α ∼ G (6, 2) , γ ∼ G (6, 6)).GLT and EFA-V use only the variance identified draws (M V is the percentage of variance identified draws), EFA uses all posterior draws.Med is the median and QR are the 5% and the 95% quantile of the various statistics over the 21 simulated data sets.
zero.The third scenario is a dense factor loading matrix without any zero loadings and the corresponding GLT representation has a PLT structure.For all three scenarios, non-zero factor loadings are drawn as λ ij = (−1) b ij (1 + 0.1N (0, 1)), where the exponent b ij is a binary variable with Pr(b ij = 1) = 0.2.
In all three scenarios, Σ 0 = I.21 data sets are sampled under these three scenarios from the Gaussian factor model (1).
A sparse overfitting factor model is fitted to each simulated data set with the maximum number of factors k = 14 being equal to the upper bound.Regarding the structure, we compare a model where the non-zero columns of β k are left unconstrained with a model where a GLT structure is imposed.Inference is based on the Bayesian approach described in Section 6.1 with two different shrinkage priors on the sparsity matrix δ k : the 1PB prior (27) with random hyperparameter α ∼ G (6, 2) and the 2PB prior (26) with random hyperparameters α ∼ G (6, 2) and γ ∼ G (6, 6).MCMC estimation is run for 3000 iterations after a burn-in of 2000 using the RJMCMC algorithm of Frühwirth-Schnatter et al. (2022).
For each of the 21 simulated data sets, we evaluate all 12 combinations of data scenarios, structural constraints (GLT versus unconstrained) and priors on the sparsity matrix (1PB versus 2PB) through Monte Carlo estimates of following statistics: to assess the performance in estimating the true number r true of factors, we consider the mode r of the posterior distribution p(r|y) and the magnitude of the posterior ordinate p(r = r true |y).To assess the accuracy in estimating the covariance matrix Ω of the data, we consider the mean squared error (MSE) defined by MSE Ω = i ℓ≤i E((Ω r,iℓ − Ω iℓ ) 2 |y)/(m(m + 1)/2), which accounts both for posterior variance and bias of the estimated covariance matrix Ω r = β r β ⊤ r +Σ r in comparison to the true matrix.Table 1 reports, for all 12 combinations the median, the 5% and the 95% quantile of these statistics across all simulated data sets.For inference under GLT structures, posterior draws which are not variance identified have been removed.The fraction of variance identified draws is also reported in the table and is in general pretty high.As common for sparse Bayesian factor analysis with unstructured loading matrices, the posterior draws are not screened for variance identification and inference is based on all draws.Some interesting conclusions can be drawn from Table 1.First of all, sparse Bayesian factor analysis under the GLT constraint successfully recovers the true number of factors in all three scenarios.For most of the simulated data sets, the posterior ordinate p(r = r true |y) is larger than 0.9.Sparse Bayesian factor analysis with unstructured loading matrices is also quite successful in recovering r true , but with less confidence.Both over-and underfitting can be observed and the posterior ordinate p(r = r true |y) is much smaller than under a GLT structure.For both structures, the 2PB prior yields higher posterior ordinates than the 1PB prior.
Recently, Hosszejni and Frühwirth-Schnatter (2022) proved that the counting rule CR(r, 1) can also be applied to verify variance identification for unconstrained loading matrices.As is evident from Table 1, the fraction of variance identified draws is however, much smaller than under GLT structures.Nevertheless, inference w.r.t. to the number of factors can be improved also for an unconstrained EFA model by rejecting all draws that do not obey the counting rule CR(r, 1).
It should be emphasized that the ability of Bayesian factor analysis to recover the number of factors from an overfitting model is closely tied to choosing a suitable shrinkage prior on the sparsity matrix δ k .For illustration, we also consider a uniform prior for τ j and report the corresponding statistics in Table 2.As expected from the considerations in Section 6.1, considerable overfitting is observed for all simulated data sets, regardless of the chosen structure.

Concluding remarks
We have given a full and comprehensive mathematical treatment to generalized lower triangular (GLT) structures, a new identification strategy that improves on the popular positive lower triangular (PLT) assumption for factor loadings matrices.We have proven that GLT retains PLT's good properties: uniqueness and rotational invariance.At the same time and unlike PLT, GLT exists for any factor loadings matrix; i.e. it is not a restrictive assumption.Furthermore, we have shown that verifying variance identification under GLT structures is simple and is based purely on the zero-nonzero pattern of the factor loadings matrix.Additionally, we have embedded the GLT model class into exploratory factor analysis with unknown factor dimension and discussed how easily spurious factors and irrelevant variables are recognized in that setup.At the end, we demonstrated the power of the framework in a simulation study.

Table 2 :
Bayesian factor analysis under GLT and unconstrained structures (EFA) under a uniform prior on τ j .GLT and EFA-V use only the variance identified draws (M V is the percentage of variance identified draws), EFA uses all posterior draws.Med is the median and QR are the 5% and the 95% quantile of the various statistics over the 21 simulated data sets.