1. Introduction
Asymmetric square matrices naturally arise in numerous scientific domains, including sociology (e.g., social mobility matrices), marketing (e.g., brand-switching data), and psychology (e.g., stimulus identification experiments) [
1,
2]. To facilitate the analysis of such data, Harshman [
3] introduced the DEcomposition into DIrectional COMponents (DEDICOM) model, which approximates an asymmetric matrix
as
where
(with
) is typically assumed to be column-orthonormal and encodes object coordinates in a latent space,
represents the (possibly asymmetric) inter-dimensional relationships, and
E denotes the residual matrix. While DEDICOM offers interpretability in modeling asymmetric relationships, it lacks capabilities for effective visualization. To address this limitation, Chino [
4] introduced the GIPSCAL (Generalized Inner Product SCALing) model, which not only handles asymmetric data but also provides a framework for graphical representation. However, the original GIPSCAL formulation may exhibit inadequate fit to empirical data. A generalized variant, introduced by Kiers and Takane [
5], improves model flexibility via the decomposition
where
is a weight matrix,
is the
identity matrix,
is skew-symmetric, and
E denotes the error matrix. The symmetric part
may be visualized using Classical Multidimensional Scaling (MDS) techniques [
6,
7], while the asymmetric component
is treated using Gower’s method [
7,
8]. The model parameters are typically estimated by minimizing the least-squares objective, as outlined in Kiers and Takane [
5]. Subsequent developments by Trendafilov and Gallo [
9], Trendafilov [
10] recast the GIPSCAL framework into the following constrained optimization problem:
Here,
denotes the Frobenius norm,
is the Stiefel manifold of orthonormal matrices,
is the set of
diagonal matrices with nonnegative entries, and
denotes the set of skew-symmetric
matrices.
This reformulation clarifies the connection between GIPSCAL and the DEDICOM model in (
1), revealing that GIPSCAL constitutes a special case of DEDICOM in which the symmetric part of
R is constrained to lie in the nonnegative diagonal cone. Additionally, this formulation can be interpreted as a natural asymmetric extension of the INdividual Differences SCALing (INDSCAL) model [
7,
11]. The GIPSCAL methodology has further been generalized to handle three-way data arrays comprising
N asymmetric slices
, analogous to three-way extensions of INDSCAL. In this setting, each slice is modeled as [
9,
10,
12,
13]
where
is a common loading matrix shared across all slices,
are slice-specific nonnegative diagonal matrices, and
are slice-specific skew-symmetric matrices. Consequently, the three-way GIPSCAL seeks to determine
by fitting the model to the
asymmetric data matrices
in a least-squares sense [
9,
10,
12,
13,
14]
where
and
denote
N independent copies of the nonnegative diagonal cone and skew-symmetric matrix space, respectively.
In this work, we revisit the numerical challenge of fitting the three-way GIPSCAL model (
5), a problem characterized by nonlinear coupling among variables and the need for scalable, efficient algorithms. Despite its relevance in multivariate data analysis, the literature on this topic remains relatively sparse. Early contributions by Trendafilov [
10,
13] reformulated problem (
5) as a constrained gradient dynamical system and proposed a continuous-time projected gradient flow algorithm. This method guarantees global convergence and has shown strong empirical performance across various applications in multivariate analysis [
9,
15,
16,
17,
18]. Nevertheless, its scalability may be limited in large-scale settings due to computational inefficiencies. To accelerate the inherently slow convergence of alternating least squares (ALS) methods, Loisel and Takane [
19] proposed a minimal polynomial extrapolation (MPE) scheme, leveraging vector sequence fixed-point iteration. Empirical results indicate that this approach can substantially speed up convergence. However, in practical implementations, selecting an appropriate backtracking step size often relies on heuristic tuning rather than principled criteria. More recently, Trendafilov and Gallo [
9] explored optimization-based reformulations of multivariate models on matrix manifolds and demonstrated the effectiveness of the Manopt toolbox [
20] in addressing such problems through Riemannian optimization techniques. Motivated by these developments, this paper builds upon the fixed-point acceleration strategy of Loisel and Takane [
19], extending its application to problem (
5). We propose a new algorithmic framework that interprets approximate ALS iterations as a matrix-valued fixed-point iteration. Furthermore, we develop acceleration schemes based on the vector
-algorithm (VEA), the topological
-algorithm (TEA), and its simplified variant (STEA) [
21,
22], incorporating recent advances in numerical extrapolation and available algorithmic toolboxes. Extensive numerical experiments show that, compared to the original matrix sequences generated by fixed-point iteration, the matrix sequence extrapolation acceleration significantly improves convergence. Furthermore, when compared with existing solvers for Problem (
5), such as the continuous-time projection gradient flow algorithm and the Riemannian optimization-based Manopt toolbox solvers, the
-algorithms accelerated fixed-point iteration demonstrates a notable reduction in iteration time.
The remainder of the paper is organized as follows: In
Section 2, we present the fixed-point iteration framework for solving the three-way GIPSCAL problem in (
5).
Section 3 introduces the core acceleration principles and describes the implementation of the VEA, TEA, and STEA within this context.
Section 4 reports a comprehensive set of numerical experiments, benchmarking the proposed acceleration schemes against the continuous-time projected gradient flow method and several first- and second-order Riemannian optimization algorithms implemented in Manopt. Finally,
Section 5 concludes the paper.
2. Fixed-Point Iteration Framework for Problem (5)
Building on the conditional minimization strategy introduced by Loisel and Takane [
19], the three-way GIPSCAL problem in (
5) can be reformulated as a fixed-point iteration scheme. Their original framework also incorporates minimal polynomial extrapolation (MPE) to accelerate convergence. For the sake of completeness, we revisit and extend this approach by developing an alternating least squares (ALS) framework, which naturally leads to a fixed-point iteration formulation.
Let
denote the space of symmetric
matrices. For a point
in the product space
, we define the residual mapping
which allows us to express the objective function of problem (
5) as
Since the variables
and
are independent for each slice
i, through straightforward algebraic derivation, the Euclidean gradient of
can be decomposed component-wise as follows:
Here,
Additionally,
and
denote the symmetric and skew-symmetric parts of matrix
A, respectively.
Given a current iterate
, the ALS-based iterative scheme for solving (
5) is defined by the following update rules:
The update for
involves solving a convex constrained matrix optimization problem:
By deriving the first-order optimality condition, we obtain the variational inequality
According to Theorem 3.1.1 of Hiriart-Urruty and Lemaréchal [
23], this is equivalent to solving the implicit projection equation:
Here, the projection
is defined element-wise via
where ⊙ denotes the Hadamard (element-wise) product. The closed-form solution is thus
Since
is a linear subspace, the optimal
satisfies the projected gradient condition
where
. Hence, the solution is explicitly given by
The update for
Q entails solving the following orthogonally constrained, nonconvex optimization problem
The associated Lagrangian, incorporating the constraint
, is given by
where
is a symmetric Lagrange multiplier matrix. The first-order optimality condition yields
Although solving (
16) analytically is challenging, we follow the strategy of Loisel and Takane [
19] and approximate the solution as follows:
Compute the thin singular value decomposition
, and set
By integrating the subproblem updates (
13), (
14), and (
18) within the ALS scheme from (
8), we obtain the following iterative sequence:
This iterative process naturally defines a fixed-point iteration:
Here,
denotes the nonlinear mapping associated with one complete ALS update.
To further analyze the convergence of problem (
5) and iteration (
19), we present the following theoretical results:
Theorem 1.
Problem (5) has a global optimal solution. Proof. The feasible set is compact under the Frobenius norm topology. The objective function is continuous. Therefore, by the Weierstrass extreme value theorem, f achieves its minimum value on . □
Theorem 2.
The problem (5) does not have a closed-form analytical solution. This conclusion holds even in special cases where or . Proof. (i) The non-convex feasible set induced by the orthogonal constraint and the quartic complexity of the objective function with respect to Q results in the Hessian matrix being indefinite at the critical points.
(ii) When
are fixed, the
Q-subproblem degenerates into a generalized orthogonal Procrustes problem
where the asymmetry of
(due to
) causes this subproblem to have no analytical solution (unlike the case when
is symmetric and can be solved by eigenvalue decomposition);
(iii) The coupling between the slices induced by the summation in the objective function.
(i)–(iii) together exclude the possibility of a closed-form solution, so numerical iterative methods must be used to approximate the solution. □
Theorem 3.
The iterative sequence , where , generated by alternating least squares to solve problem (5), has its objective function value sequence converge to a non-negative limit L. Proof. The objective function is a sum of squared Frobenius norms, and its value is always non-negative. Thus, the sequence is bounded below by 0. In the alternating update process,
Fixing
and updating
, the global optimality of the convex subproblem (
13) ensures that
Fixing
and updating
, the closed-form solution in (
14) guarantees that
Fixing
and updating
, the orthogonal Procrustes projection in (
18) as a contraction mapping ensures that
Thus, . By the monotonicity and boundedness convergence theorem, the sequence converges to . □
3. -Algorithms Acceleration for the Fixed-Point Problem (19)
In numerical analysis and applied mathematics, sequences arise naturally across a broad range of computational problems. When a sequence converges slowly, acceleration techniques are often employed to improve the convergence rate. A common strategy involves transforming the original sequence into another that converges more rapidly to the same limit, assuming appropriate regularity conditions. One of the most influential transformations in this context is the Shanks transformation [
24], originally derived by Schmidt [
25] for iterative solutions of linear systems. It was later implemented algorithmically through the scalar
-algorithm introduced by Wynn [
26]. In a subsequent extension, Wynn [
27] generalized the scalar
-algorithm to handle vector-valued sequences. However, the algebraic structure underlying the vector version does not follow directly from the scalar case. To address this gap, Brezinski [
28] proposed two distinct generalizations of the Shanks transformation and its associated algorithms to sequences in vector spaces. This led to the formulation of the topological Shanks transformation and the development of two corresponding topological
-algorithms (TEA1 and TEA2). These algorithms operate using elements from both a vector space
E and its dual space
, enabling a rigorous extension to infinite-dimensional settings. Recognizing the computational overhead associated with dual space operations, Brezinski [
29] introduced the simplified topological
-algorithms (STEA1 and STEA2). These variants avoid direct manipulation of dual space elements by substituting scalar
-algorithm outputs, thereby reducing memory usage and enhancing numerical stability. In parallel, the Shanks transformation has inspired a family of vector extrapolation methods, including minimal polynomial extrapolation (MPE), modified minimal polynomial extrapolation (MMPE), and reduced-rank extrapolation (RRE). These techniques—collectively referred to as vector extrapolation methods—share the advantages of simple iterative structure and avoidance of explicit matrix decompositions. Owing to their general applicability and efficiency,
-type and vector extrapolation algorithms have found widespread use in the numerical solution of linear and nonlinear systems, eigenvalue computations, Padé-type approximations, matrix functions, matrix equations, and Krylov subspace methods such as Lanczos iterations [
28,
30,
31,
32,
33].
In this section, we introduce the principles and specific implementation processes of three algorithms: VEA, TEA and STEA. We then explain how each of these methods can be systematically integrated into our fixed-point iteration scheme (introduced in the previous section) to accelerate convergence in the context of solving the three-way GIPSCAL problem.
3.1. Scalar Shanks Transform and Scalar -Algorithm
Let
be a sequence of scalars in the field
K, where
K is either
or
. If
, under certain conditions, then the sequence
can be transformed into a new sequence
that converges to the same limit more efficiently, as described by
Shanks [
24] introduced a transformation technique in his 1955 study, which can be used to determine the anticipated limiting value of a sequence of
terms. The Shanks transformation assumes that the sequence satisfies the following relation:
Here, the coefficient
is an arbitrary constant independent of
n and satisfies the condition
. Assuming that the equation
holds for all
n, we can expand and rearrange it to derive the difference form
where the forward difference operator
is defined as
. To determine the
coefficients
, we further require that
, leading to a linear system consisting of
k scalar equations derived from (
22):
The coefficients
can then be linearly combined to obtain
S:
Even if the sequence
does not satisfy the relation in Equation (
21), the coefficients
can still be determined by solving the linear system in Equation (
23) using a similar approach. Consequently, an approximate solution for the limit of the sequence
S can be obtained via the linear combination outlined in Equation (
24). It is important to note that these coefficients and the approximate solution now depend on the current starting index
n and the depth
k of the acceleration window. These dependencies are denoted as
and
, respectively. Therefore, we have
where
satisfies the following linear system:
The original sequence
is transformed into a new sequence
, and this transformation,
, is known as the Shanks transform. By applying Cramer’s rule for solving systems of linear equations,
can be written as the ratio of determinants, as shown below:
Since directly computing the determinant in Equation (
26) is relatively complex, Wynn [
26] proposed the scalar
-algorithm (SEA), which implements the Shanks transform using a straightforward recursive procedure. The computational rules for the SEA are given by
These elements are typically organized into a two-dimensional array, as shown in
Figure 1, which is referred to as the
-table. Rule (
27) establishes a connection between the four vertices of a diamond in the
-table, where the column index
k remains constant across each column, and the row index
n remains constant along the descending diagonals. Furthermore, Wynn [
26] demonstrated the relationship between the Shanks transformation and the
-algorithm by applying Sylvester’s and Schweins’ determinant identities
3.2. Topological Shanks Transformation and Topological -Algorithm
Assume that the sequence
is a sequence of vectors. The Samelson inverse of a nonzero vector in the vector space
E is defined by the following formula:
let
denote the standard inner product on the vector space
E. Wynn [
27] proposed the vector
-algorithm (VEA) by “vectorizing” the scalar version of the
-algorithm. However, the Shanks transformation cannot be directly extended to vector spaces. To address this, Brezinski et al. [
34] introduced the algebraic dual space
of the vector space
E, along with an auxiliary vector
, and used the duality product
to derive two types of topological Shanks transformations. The first topological Shanks transformation is defined by the formula
where the coefficients
satisfy the following system of linear equations:
Similar to the scalar Shanks transformation,
can be expressed as a ratio of determinants as follows:
Here,
. Furthermore, the second topological Shanks transformation can be obtained by replacing
with
in (
30).
We then introduce a new ordered vector pair
and define the inverse of the ordered vector pair as follows:
Based on this definition, Brezinski and Redivo-Zaglia [
21] derived two distinct forms of topological
-algorithms, denoted as TEA1 and TEA2. The recursive rule for the first topological
-algorithm (TEA1) to compute
is given by:
Due to the different inversion rules for elements in
E and its algebraic dual space
, as stated in Equation (
33), and the dependence on the ordered vector pair, the calculation rules for the odd-indexed and even-indexed sequences in TEA1 differ from those in SEA and VEA. Specifically, for the odd-indexed sequence in TEA1, corresponding to the calculation rule in SEA (
27), we have
where
, and the inversion corresponds to the ordered vector pair
, where
, i.e.,
Similarly, corresponding to (
27), for the even-indexed sequence in TEA1, we have
where
, and the inversion corresponds to the ordered vector pair
, i.e.,
The recursive rule for computing
using the second topological
-algorithm (TEA2) is given by
The computation of the odd-indexed sequence in TEA2 follows the same rule as
in TEA1. For the even-indexed sequence, corresponding to (
27), we have
and, unlike TEA1, the inversion in TEA2 for
corresponds to the ordered vector pair
, i.e.,
Note the connection between the topological Shanks transformation and the topological
-algorithm. For the first topological Shanks transformation,
, and
in TEA1, the following holds:
For the second topological Shanks transformation
and
in TEA2, the following holds:
The computational rules for the odd-indexed and even-indexed sequences in TEA1 and TEA2 are summarized in the
Figure 2. For odd indices, the computational rules for the two types of topological
-algorithms are identical and consistent with those of the scalar
-algorithm (SEA). The computation of
requires only the three elements located at the vertices of the diamond structure table:
,
, and
. However, for even-indexed sequences, both TEA1 and TEA2 require additional elements beyond these three. Moreover, during the recursion process, both topological
-algorithms must simultaneously store information about the odd-indexed sequence in
E and the even-indexed sequence in the algebraic dual space
, thereby significantly increasing the storage burden in large-scale computations. Furthermore, both TEA1 and TEA2 require dual product calculations during the recursion process, which further increases the computational complexity.
3.3. Simplified Topological -Algorithm
To optimize the computational rules of the two topological
-algorithms, Brezinski and Redivo-Zaglia [
21] proposed simplified versions of the these algorithms (STEA1 and STEA2) for TEA1 and TEA2. These algorithms combine the odd and even recursive rules of TEA into a single unified rule, thereby requiring the storage of only the even-indexed column vectors. This streamlined computational process not only reduces storage requirements and algorithmic complexity but also enhances overall efficiency, leading to improved performance in large-scale data processing.
For the scalar sequence
, based on the relationship in Equation (
28) between the scalar Shanks transformation and SEA, the following holds:
From Equations (
35) and (
37), the following holds:
Therefore, the recursive rule for the even-indexed sequence in TEA1 can be rewritten as
Then, by combining the recursive rules in Equations (
27) and (
34), the following four equivalent forms for the even indices of the TEA1 algorithm can be derived:
Here,
. Similarly, the following four mutually equivalent recursive formulas for the even-indexed sequences in TEA2 can be derived:
Here,
.
From the above calculation, it is evident that the generation of
in both STEA1 and STEA2 depends solely on the even-indexed sequence, with the odd-indexed sequence serving only as an auxiliary sequence. As a result, during the recursive processes of both simplified topological
-algorithms, only the information of the even-indexed sequences needs to be stored, eliminating the need to store the odd-indexed sequences. Additionally, in STEA1 and STEA2, the number of pairwise product operations is reduced during recursion. The linear functional
in
is involved solely in the duality product operation with the initial sequence
in
E, generating a scalar sequence
. Furthermore, the recursion of this scalar sequence
can be performed using SEA, as shown in Equation (
27).
3.4. Implementation of the -Algorithms
The core principle of the -algorithms is founded on the Shanks transformation, which leverages the linear difference structure of the sequence by forming specific linear combinations aimed at eliminating the dominant error terms during the convergence process. The algorithm calculates these constructed linear combinations following explicit recursive rules, thus facilitating the acceleration of the sequence’s convergence.
The implementation of -algorithms (SEA, VEA, TEA, and STEA) and the construction of the associated -number table are most directly achieved by storing all elements corresponding to each pair of indices k and n. This process begins with a prescribed number of terms in the first two columns and proceeds recursively, computing subsequent entries column by column. As each successive column contains one fewer element than the previous, the resulting structure forms the lower triangular part of the -table. However, this approach requires retaining all elements within the triangular region, which can lead to considerable memory overhead, especially in the case of vector- or matrix-valued sequences.
To address this limitation, a more memory-efficient strategy computes the terms of the original sequence incrementally and constructs the
-table along ascending diagonals [
21,
22]. Specifically, after generating an initial triangular portion of the table and retaining only the last row along its ascending diagonal, a new element from the original sequence
is introduced, and the next ascending diagonal is computed iteratively. This approach requires storing only a single ascending diagonal
and three auxiliary temporary variables, thereby significantly reducing the storage demands.
The implementations of the SEA, VEA, TEA, and STEA algorithms were previously provided in the MATLAB toolbox EPSfun [
22]. In this paper, we present a more intuitive and transparent implementation of these algorithms. Specifically, Algorithms 1 and 2 illustrate the procedures for SEA and VEA, respectively. Although both algorithms share the same fundamental computational structure, they differ in how the inverse operation is treated. In particular, the inverse operation in VEA is defined as shown in Equation (
29).
For a fixed acceleration window of width
k, both SEA and VEA compute new elements incrementally along the rising diagonals until the
-th column is completed. In Algorithm 2, the inner product in
is defined using the standard Euclidean inner product, namely,
, for all
.
Algorithm 1 Scalar -algorithm (SEA) |
- Require:
elements of the scalar sequence : , where k is the acceleration window width. - Ensure:
Return the values W and e - 1:
- 2:
for
do - 3:
- 4:
for do - 5:
- 6:
- 7:
end for - 8:
end for
|
Algorithm 2 Vector -algorithm (VEA) |
- Require:
elements of the vector sequence : , where k is the acceleration window width. - Ensure:
Return the values W and e - 1:
- 2:
for
do - 3:
% Initialize a zero vector of the same size as . - 4:
- 5:
for do - 6:
- 7:
- 8:
end for - 9:
end for
|
Algorithm 3 presents the implementation of the TEA algorithm. It is important to note that TEA applies distinct computational rules to subsequences with odd and even indices. Specifically, the computational rules for odd-indexed terms are identical to those used in SEA and VEA, whereas the computation of even-indexed terms requires the introduction of additional elements. Since the construction of the
-table proceeds in a top-down manner, it is sufficient to store only the elements along a single ascending diagonal. In TEA2, the additional elements required for computing even-indexed terms correspond to the newly introduced elements from the initial sequence, and therefore, no extra storage is required. In contrast, in TEA1, the additional elements required for computing the even-indexed terms are not located on the current ascending diagonal. Therefore, TEA1 must additionally store the even-indexed elements from the previous ascending diagonal.
Algorithm 3 Topological -algorithm (TEA) |
- Require:
elements of sequence : , where k is the acceleration window width. - Ensure:
Return the values W and e - 1:
Select the appropriate duality pairing between the linear generalized function and the ordered vector pair , where denotes the algebraic dual space of E. - 2:
- 3:
for
do - 4:
- 5:
for do - 6:
- 7:
if mod(counter, 2) == 1 then - 8:
- 9:
else - 10:
- 11:
end if - 12:
- 13:
end for - 14:
end for
|
Algorithm 4 presents the implementation of the STEA method. Notably, the structure of STEA consists of two components: a scalar part and a vector component. The scalar component retains a diamond-shaped structure, while the computational rules for the vector component are adapted to a triangular form. In the scalar part, each new scalar value is obtained by computing the duality product between the newly introduced elements of the sequence
and the vector
while preserving the entire ascending diagonal during computation. In contrast, the vector component retains only the even-indexed elements along the ascending diagonal. As a result, the algorithm requires storing only
k vectors. The distinction between STEA1 and STEA2 is analogous to that between TEA1 and TEA2: STEA1 requires storing additional even-indexed elements from the previous ascending diagonal, while STEA2 avoids this, making it generally more memory-efficient. In the implementation provided in the literature [
22], the scalar component of STEA is computed first using SEA, and the resulting scalar values are then incorporated into the STEA algorithm. In this paper, we integrate these two procedures into a unified approach. Algorithm 4 provides the detailed implementation of STEA2-3, while the other three equivalent variants can be obtained by modifying the index variable
j within the algorithm.
Algorithm 4 The simplified topological -algorithms (STEA) |
- Require:
elements of sequence : , where k is the acceleration window width. - Ensure:
Return the values W and e - 1:
Select the appropriate duality pairing between the linear generalized function and the ordered vector pair , where denotes the algebraic dual space of E. - 2:
- 3:
for
do - 4:
if mod(i, 2) == 0 then - 5:
% Deletes the first element of the table and shifts the subsequent elements forward. - 6:
- 7:
end if - 8:
- 9:
for do - 10:
- 11:
if mod(counter, 2) == 0 then - 12:
- 13:
- 14:
- 15:
end if - 16:
- 17:
end for - 18:
end for
|
Remark 1.
The ε-acceleration algorithms (SEA, VEA, TEA, and STEA) do not require matrix decomposition or subproblem solving, which gives them a significant advantage over polynomial extrapolation methods (such as MPE, MMPE, and RRE) and Anderson acceleration.
Remark 2. In the algebraic dual space of E, the selection of linear functionals and the corresponding dual product are detailed in the literature [21,22] for common selection methods. If the original sequence is a vector sequence or a matrix sequence , and considering that the vector space and the matrix space are algebraically self-dual, the following selections are made: , typically , i.e., the m-dimensional vector with all elements equal to 1, and the dual product is defined as ;
, typically , and the dual product is defined as ;
, typically , and the dual product is defined as .
3.5. Combining -Algorithms with Fixed-Point Iterations to Solve the Problem (5)
Given that the sequence generated by the fixed-point iteration in Equation (
19) for solving the GIPSCAL problem is a matrix sequence, and that the operator
is a nonlinear mapping, we adopt a restart-based acceleration strategy for applying the
-algorithm. This approach follows the vector sequence polynomial extrapolation acceleration framework proposed in [
35].
In practical implementations of the acceleration algorithm, a delayed start strategy is commonly employed to prevent premature acceleration and improve overall performance. Specifically, the basic fixed-point iteration (
19) is first executed for a fixed number of steps, or until the matrix sequence
a specified level of initial accuracy. Only after this preliminary phase is the acceleration algorithm applied. The detailed implementation steps of the iterative acceleration algorithm based on the
-algorithm for solving the three-way GIPSCAL problem (
5) are presented in Algorithm 5.
Algorithm 5 VEA, TEA, and STEA accelerated fixed-point iterations for solving the three-way GIPSCAL problem (5) |
- Require:
N asymmetric n -order matrices , the initial iteration , and the window width parameter k. - 1:
Basic Iteration: Initialize the iteration matrix to , such that = and perform iterations through the nonlinear map to get . - 2:
Perform extrapolation acceleration: Use as the initial value of acceleration , carry out iterations to obtain , and use this sequence as the sequence of history iterations required for extrapolating the acceleration. Perform the VEA, TEA, or STEA algorithms to generate , which is denoted as after re-orthogonalization by the “economic” SVD decomposition. - 3:
Convergence judgment and iterative update: Calculate and according to Equations ( 13) and ( 14) check whether satisfies the termination condition, and if it doesn’t, update , and return to Step 1 to continue iterating until convergence. - 4:
return , , .
|
The termination criterion of Algorithm 5 refers to the first-order optimality conditions of problem (
5) as presented in [
10]. Note also that the Stiefel manifold
is an embedded submanifold in the Euclidean space
, as discussed in [
36], while
is a convex set, and
is a linear subspace. The termination criterion can be formulated as follows:
Here,
is a predefined accuracy threshold, and
denotes the orthogonal projection onto the tangent space
at the point
. As shown in [
9,
36], for any matrix
and any
, the projection is given by
where
denotes the symmetric part of the matrix
A. For the implementation of Algorithm 5, the following notation is introduced:
Remark 3.
The selection of the linear generalized functional and the corresponding duality product in the dual space should follow the approach outlined in Remark 2.
Remark 4.
The TEA and STEA methods presented in Algorithm 5 can be applied directly to the matrix sequence generated by the fixed-point iteration in Equation (19). However, if Algorithm 5 employs VEA for sequence acceleration, a combination of matrix vectorization (straightening) and inverse vectorization (inverse straightening) operators is needed. Specifically, for the matrices involved in each single-step acceleration loop, the matrix straightening operator is applied to obtain the corresponding vectors . VEA is then applied to these vectors to calculate the accelerated vector , which is subsequently converted back to matrix form via the inverse straightening operator, resulting in the matrix . An “economy-size” singular value decomposition (SVD) is performed on to reorthogonalize it, producing the updated matrix . Finally, Step 3 of Algorithm 5 is executed to proceed the iterative process. 4. Numerical Experiments
In this section, we present a comprehensive numerical evaluation of the proposed
-algorithm accelerated fixed-point iterations for solving the three-way GIPSCAL problem in Equation (
5). We begin by comparing the original fixed-point iteration with its accelerated variants. Additionally, we benchmark these methods against the continuous-time projected gradient flow algorithm introduced by Trendafilov [
10,
13] as well as several state-of-the-art first- and second-order Riemannian optimization algorithms from the MATLAB toolbox Manopt [
9,
20]. All experiments were conducted on a standard desktop computer equipped with an Intel(R) Core(TM)i7-13620H CPU (2.40 GHz) and 16.00 GB of RAM and running MATLAB R202b.
To enable controlled and diverse benchmarking, we generated a collection of
N square, asymmetric data matrices
using a factorial design approach inspired by Takane et al. [
37], originally developed for orthogonal INDSCAL problems [
9,
17]. This setup includes three types of datasets: one purely random and two structured variants. In the random setting, each entry of
is drawn independently from a standard normal distribution, i.e.,
. Such unstructured randomness often leads to large fit errors in the objective function of the problem (
5). For the structured datasets, we construct each slice as
where
,
, and
are randomly generated, and
denotes an additive noise matrix. The matrix
Q is populated with uniformly distributed random entries from
via
rand(n, r) and column-orthogonalized using singular value decomposition (SVD). The diagonal entries of
are sampled from a standard normal distribution. Each
is drawn from the uniform distribution on
and skew-symmetrized as
To evaluate robustness under varying structural assumptions, we consider two structured variants: one allowing potentially negative diagonal elements in
(indefinite case), and the other enforcing nonnegativity by taking element-wise absolute values (nonnegative definite or
nnd case). The disturbance terms
are sampled from a normal distribution with zero mean and variance
, where
is set to 10% of the standard deviation of the structural term
. To better reflect practical scenarios, the number of “subjects”
N varies between 20 and 50, while the number of “stimuli”
n is capped at 200. For effective visualization in multidimensional scaling (MDS), the target dimensionality
r is set to three representative levels:
and 5.
The initial iterate
, used in both the original and accelerated fixed-point iterations, is computed using a structured initialization strategy consistent with the recommendations of [
10,
17]. Specifically, we perform an eigenvalue decomposition (EVD) on the symmetric component of the aggregated data
where the eigenvalues in
are sorted in descending order. The first
r eigenvectors, denoted
, are used to initialize
. The corresponding initial values of
and
, required by both the Riemannian optimization framework and the projected gradient flow method used to solve the equivalent product-manifold-constrained optimization problem in Equation (
42), are given by
Due to the matrix-valued nature of the fixed-point sequence, storage constraints require relatively small window sizes
k for
-algorithm acceleration. To evaluate the impact of window width on acceleration performance, we experiment with
and 6, selecting the value that produces the best empirical performance. To further enhance the reliability of the acceleration, particularly in cases where the base fixed-point iteration converges slowly, we adopt a delayed-start strategy. Under this scheme, the fixed-point algorithm is executed until an intermediate solution
satisfies a specified error threshold,
. Only then is the
-algorithm-based acceleration activated. In our experiment, the core implementations of various
-algorithms (including SEA, VEA, TEA, and STEA) are provided by the MATLAB toolbox
EPSfun, as described in [
22]. The specific code can be downloaded from the following website:
http://www.netlib.org/numeralgo/. In this experiment, the topological
-algorithm uses the TEA2 version, while the simplified topological
-algorithm uses the STEA2-3 version. For both the original fixed-point iteration and each of the acceleration algorithms, different termination thresholds are applied based on the data generation method for
. In the case of RAND-generated data, convergence is relatively slow; therefore, the termination tolerance is set to
. For the NND and IND cases, which typically converge faster, a tighter tolerance of
is used. The computation of
follows the definition given in Equation (
39). In all subsequent numerical results, the reported
values are consistently calculated using this formula.
4.1. Numerical Comparison of Fixed-Point Acceleration Methods
This section presents a comprehensive numerical comparison between the original fixed-point iteration (denoted
FPI) and several accelerated variants. These include
-algorithm-based methods (
FPI-VEA,
FPI-TEA, and
FPI-STEA), polynomial extrapolation techniques (
FPI-MPE,
FPI-RRE, and
FPI-MMPE), and Anderson acceleration (
FPI-Anderson). The experimental results, summarized in
Table 1, span a range of configurations including different data generation strategies for the matrices
, system dimensions
, number of components
N, and acceleration window widths
k. In the table,
IT denotes the total number of iterations to reach convergence,
CPU refers to the total computation time in seconds, and
Error represents the final residual error. It is important to note that for all acceleration schemes—including
FPI-VEA,
FPI-TEA,
FPI-STEA,
FPI-MPE,
FPI-RRE,
FPI-MMPE, and
FPI-Anderson—the reported iteration count (
IT) includes the initial delayed-start iterations, the base iterations required to form the extrapolation sequence, and the subsequent accelerated iterations.
From
Table 1, we observe that to achieve equivalent termination accuracy, the polynomial extrapolation method
FPI-MPE generally outperforms the
-based methods in terms of iteration count and runtime. This performance gap is attributed to the number of sequence elements each method requires. Specifically, for a given window size
k, polynomial extrapolation methods utilize
vectors from the underlying sequence, while
-algorithms require
vectors [
35]. Nonetheless, the
-based methods (
FPI-VEA,
FPI-TEA, and
FPI-STEA) demonstrate consistent acceleration across different system sizes and window widths, exhibiting good performance scalability.
In Section 6, Theorem 6.8 of the paper [
21], the theoretical analysis of the STEA algorithm for accelerating the convergence rate with the window width
k is presented. Specifically, as
k increases, the convergence order after acceleration improves. The specific content is as follows:
Theorem 4
([
21]).
We consider sequences of the formwhere , , and . Then, when k is fixed and n tends to infinity, Table 1 demonstrates that the choice of window width
k significantly influences convergence behavior. While increasing
k generally improves acceleration by leveraging more sequence information, it also raises the per-iteration computational cost. To strike a balance between efficiency and overhead, the acceleration window is fixed at
for subsequent experiments.
Figure 3 illustrates the evolution of the error norm
as a function of iteration time across different system configurations. The results are presented for three data-generation scenarios: nonnegative definite (NND), indefinite (IND), and fully random (RAND), with
held fixed throughout. For data generated via NND and IND schemes, the underlying fixed-point sequences exhibit relatively fast convergence. In these cases, extrapolation is applied immediately, without delay. As seen in the top four subplots of
Figure 3 (with the top two corresponding to NND and the next two to IND), the application of
-acceleration after the initial
base iterations leads to a substantial and rapid drop in the residual error. In contrast, for RAND-generated data—where the underlying sequence converges more slowly—a delayed-start strategy is employed. The final six subplots of
Figure 3 show the evolution of
for RAND matrices across varying system sizes. These results confirm that even under more challenging conditions, the accelerated methods remain effective and yield significant convergence improvements.
The numerical experiments collectively demonstrate that
-based acceleration methods achieve convergence performance comparable to polynomial extrapolation and Anderson acceleration when applied to the three-way GIPSCAL problem (
5). A noteworthy advantage of the
-algorithms is their ability to operate directly on matrix sequences without requiring vectorization (or “straightening”) and inverse reshaping procedures—transformations that are typically needed for polynomial and Anderson-based methods. In addition,
-methods avoid costly matrix factorizations and auxiliary subproblem solves, resulting in simpler implementation and reduced computational overhead.
4.2. Comparison with Riemannian Optimization Methods in Manopt
In 2021, Trendafilov and Gallo [
9] proposed a unified framework for classical multivariate data analysis models by leveraging the geometric structure of matrix manifolds. They further introduced a computational approach for the three-way GIPSCAL problem based on the Riemannian optimization toolbox Manopt [
20]. In this section, we present a numerical comparison between the proposed fixed-point iteration acceleration methods (VEA, TEA, and STEA) and the existing optimization algorithms implemented in the Manopt toolbox. To enable the application of Riemannian optimization techniques, the original three-way GIPSCAL problem (
5) is reformulated as a constrained optimization problem defined on a product manifold, as follows:
Here,
denotes the linear subspace of diagonal
matrices. Through straightforward algebraic derivation, the Euclidean gradient of the objective function
in (
42) with respect to
can be expressed componentwise as
Since
is an embedded submanifold of the Euclidean space, the Riemannian gradient is obtained by orthogonally projecting the Euclidean gradient onto the tangent space
[
36]. Using the known projection operator (
40) and the tangent space of the Stiefel manifold
and recognizing that
and
are linear subspaces of
and
, respectively, the Riemannian gradient of
in (
42) at
is given by
The generic update step in a Riemannian optimization algorithm over
is given by
where
is the step size,
is the search direction, and
is a retraction operator that maps tangent vectors back to the manifold. Specifically, the retraction is defined as
where
is a retraction on the Stiefel manifold
. For conjugate gradient-type methods, vector transport operations are required as well. Given two tangent vectors
, their transport is defined by
where
denotes transport on the Stiefel manifold
. Our experiments use Manopt’s default retraction and transport implementations. For second-order algorithms, the Riemannian Hessian is required as well. For brevity, derivations are omitted; detailed formulas can be found in Absil et al. [
38]. For further literature on applying Riemannian optimization techniques to various matrix optimization models arising in multidimensional scaling, see also [
14,
39,
40]. Optimization terminates when the following first-order optimality condition is satisfied:
We benchmarked our accelerated fixed-point methods against the following Manopt solvers: Riemannian steepest descent (Manopt-RSD), conjugate gradient (Manopt-RCG), Barzilai-Borwein (Manopt-RBB), trust-region (Manopt-RTR), limited-memory BFGS (Manopt-RLBFGS), and adaptive regularization by cubics (Manopt-ARC). All solvers were executed using default parameter settings with a maximum iteration cap of 10,000. To ensure consistency, all methods were initialized from the same starting point, and both the Riemannian gradient and (when required) Hessian were supplied explicitly. Notably, Manopt defaults to finite-difference approximations when the Hessian is not provided. In contrast, our implementation consistently utilized exact second-order information.
Table 2 summarizes performance across multiple problem sizes and input data types: completely random (RAND), and two structured categories (IND and NND). The performance metrics follow those in
Table 1, with “
CPU” denoting total runtime, “
IT” indicating the number of iterations, and “
Fvalue” representing the final objective value
f or
. The accelerated fixed-point methods are applied to the original three-way GIPSCAL problem (
5), while the Riemannian optimization algorithms solve the equivalent reformulation (
42). The “
Error” column records the norm associated with the respective stopping criteria (
39) for accelerated fixed-point methods and (
44) for Riemannian solvers. Convergence trajectories are visualized in
Figure 4, where the horizontal axis denotes runtime and the vertical axis plots
. The results clearly demonstrate that the proposed accelerated fixed-point methods outperform several Manopt solvers in terms of both convergence rate and computational efficiency. Notably, as shown by the pink-dotted solid line and black-dotted dashed line in
Figure 4, both
Manopt-RBB and
Manopt-RSD exhibit pronounced slowdowns in convergence near stationary points. Furthermore,
Table 2 highlights the elevated computational cost of
Manopt-RLBFGS, which aligns with the internal implementation details of Manopt. Specifically, its default memory size of 30 necessitates 30 projection-based vector transport operations per iteration, contributing to significant per-iteration overhead.
4.3. Comparison with the Projected Gradient Flow Method
Trendafilov [
10,
13] reformulated the equivalent three-way GIPSCAL problem (
42) as a constrained dynamical system and proposed a continuous-time projected gradient flow algorithm. This method is globally convergent, conceptually simple, and broadly applicable to matrix optimization problems arising in multidimensional data analysis [
9,
15,
16,
17,
18]. In this subsection, we conduct a numerical comparison between the proposed accelerated fixed-point algorithms—VEA, TEA, and STEA—and the projected gradient flow algorithm.
For a general constrained optimization problem of the form
, the projected gradient method generates a sequence of iterates
via
where
is the step size, and
denotes the orthogonal projection onto the feasible set
. The associated continuous-time version, known as the projected gradient flow, evolves along the negative projected gradient and is governed by the following differential equation:
For the equivalent reformulated three-way GIPSCAL problem (
42), this yields the following system of ordinary differential equations:
We employ MATLAB’s
ode15s solver [
41,
42] to integrate this system numerically. The
ode15s routine is a variable-step, variable-order implicit solver designed for stiff differential equations and is part of the Klopfenstein–Shampine family of solvers. We set both absolute and relative error tolerances to
to ensure highly accurate tracking of the flow dynamics. While such precision may exceed practical requirements in typical data analysis tasks, it facilitates a fair and rigorous comparison of algorithmic performance. Although looser tolerance settings could reduce runtime, their effect is marginal in this context. During integration, solution states are recorded at regular intervals of 10 time units. The integration process is terminated automatically when the relative decrease in the objective function between successive outputs falls below
, indicating proximity to a local minimum. This termination criterion, adopted from Loisel and Takane [
19], is significantly more lenient than the convergence threshold used in the proposed accelerated fixed-point algorithms—VEA, TEA, and STEA.
Although the projected gradient flow algorithm exhibits global convergence and features a simple, intuitive structure that facilitates implementation, its efficiency tends to degrade when applied to large-scale problems.
Table 3 reports a detailed numerical comparison between the accelerated fixed-point iteration methods (namely,
FPI-VEA,
FPI-TEA, and
FPI-STEA) and the projection gradient flow approach (denoted as
PG-ODE), under a fixed acceleration window width parameter of
. The experiments are conducted across varying coefficient dimensions and under three distinct original data generation schemes: NND, IND, and RAND. The definitions of
CPU,
IT,
Obj, and
Error are consistent with those provided in
Table 1. The results in
Table 3 demonstrate that, in terms of iteration efficiency, the
-accelerated fixed-point iteration algorithms significantly outperform the projected gradient flow method.
5. Conclusions
This paper examines a Generalized Inner Product SCALing model (GIPSCAL) in multidimensional scaling from a numerical perspective, with a focus on individual differences among the observed objects. The model can be formulated as a multivariate constrained matrix optimization problem with column orthogonality and non-negative diagonal constraints (problem (
5)). Using the alternating least squares iterative approach, the original model is first transformed into a matrix-based, fixed-point iteration problem. Furthermore, by incorporating the
-acceleration principle from vector sequence acceleration, we design fixed-point iteration acceleration algorithms based on the vector
-algorithm, topological
-algorithm, and simplified topological
-algorithm (denoted as
FPI-VEA,
FPI-TEA, and
FPI-STEA). Extensive numerical experiments show that, when solving the GIPSCAL problem in (
5), the fixed-point iteration acceleration algorithms combined with the
-algorithm achieve better acceleration convergence compared to the original fixed-point iteration algorithm. Additionally, compared to existing algorithms for solving matrix optimization models in multidimensional data analysis, such as the Riemannian optimization-based Manopt toolbox algorithms and the classical projected gradient flow algorithm, the fixed-point iteration acceleration algorithms demonstrate a clear advantage in iteration time. Improving the convergence speed of scalar, vector, matrix, or tensor sequences generated by iterative methods is highly significant in fields such as scientific and engineering computing and machine learning. It is worth noting that, unlike traditional vector sequence polynomial extrapolation acceleration methods, the implementation of
FPI-VEA,
FPI-TEA, and
FPI-STEA does not require the use of matrix flattening and unflattening operators, nor does it involve matrix decomposition to solve related subproblems.