Kaczmarz-Type Methods for Solving Matrix Equation AXB = C

Zheng, Wei; Xing, Lili; Bao, Wendi; Li, Weiguo

doi:10.3390/axioms14050367

Open AccessArticle

Kaczmarz-Type Methods for Solving Matrix Equation AXB = C

¹

School of Mathematics and Computer Science, Chuxiong Normal University, Chuxiong 675099, China

²

College of Science, China University of Petroleum, Qingdao 266580, China

^*

Author to whom correspondence should be addressed.

Axioms 2025, 14(5), 367; https://doi.org/10.3390/axioms14050367

Submission received: 23 March 2025 / Revised: 30 April 2025 / Accepted: 10 May 2025 / Published: 13 May 2025

Download

Browse Figures

Versions Notes

Abstract

This paper proposes a class of randomized Kaczmarz and Gauss–Seidel-type methods for solving the matrix equation

A X B = C

, where the matrices A and B may be either full-rank or rank deficient and the system may be consistent or inconsistent. These iterative methods offer high computational efficiency and low memory requirements, as they avoid costly matrix–matrix multiplications. We rigorously establish theoretical convergence guarantees, proving that the generated sequences converge to the minimal Frobenius-norm solution (for consistent systems) or the minimal Frobenius-norm least squares solution (for inconsistent systems). Numerical experiments demonstrate the superiority of these methods over conventional matrix multiplication-based iterative approaches, particularly for high-dimensional problems.

Keywords:

matrix equation; random; Kaczmarz method; Gauss–Seidel method

MSC:

65F10; 65F45; 65H10

1. Introduction

The study of matrix equations has emerged as a prominent research topic in numerical linear algebra. In this paper, we investigate the solution of the matrix equation of the form

A X B = C,

(1)

where

A \in R^{m \times p}

,

B \in R^{q \times n}

, and

C \in R^{m \times n}

. This typical linear matrix equation has attracted considerable attention in both matrix theory and numerical analysis due to its important applications in fields such as systems theory, control theory, image restoration, and more [1,2].

There have been many literature studies and achievements on the matrix equation

A X B = C

. For instance, Chu provided sufficient and necessary conditions for the existence of symmetric solutions through singular value decomposition and generalized eigenvalue decomposition [3]. Most of these methods use direct methods, applying generalized inverse, singular value decomposition, QR factorization, and standard correlation decomposition to obtain the necessary and sufficient conditions and analytical expressions for the solution and least squares solution of matrix equations. Due to limitations in computer storage space and computing speed, the direct method may be less efficient in solving large matrices. Therefore, iterative methods for solving linear matrix equations have attracted great interest, such as in [4,5,6,7]. Many methods among these frequently use the matrix multiplication, which consumes a great deal of computing time.

By using the Kronecker products symbol, the matrix equation

A X B = C

can be written in the following equivalent matrix vector form:

(B^{T} \otimes A) vec (X) = vec (C)

(2)

where the Kronecker product

B^{T} \otimes A \in R^{m n \times p q}

, the right-side vector vec

(C) \in R^{m n \times 1}

, and the unknown vector vec

(X) \in R^{p q \times 1}

. With the application of Kronecker products, many algorithms are proposed to solve the matrix Equation (1) (see, e.g., [8,9,10,11,12]). However, when the dimensions of matrices A and B are large, the dimensions of the linear system in Equation (2) increase dramatically, which increases the memory usage and computational cost of numerical algorithms for finding an approximate solution.

Recently, the Kaczmarz method for solving the linear equation

A x = b

has been extended to solve matrix equations. In [13], Shafiei and Hajarian proposed a hierarchical Kaczmarz method for solving the Sylvester matrix equation

A X + X B = F

. Du et al. proposed the randomized block coordinate descent (RBCD) method for solving the matrix least squares problem

min_{X \in R^{p \times q}} {∥ C - A X B ∥}_{F}^{2}

in [14]. For the consistent matrix equation

A X B = C

, Wu et al. proposed the relaxed greedy randomized Kaczmarz (ME-RGRK) and maximal weighted residual Kaczmarz (ME-MWRK) methods [15]. Niu and Zheng proposed two randomized block Kaczmarz algorithms and studied block partitioning techniques [16]. Xing et al. proposed a randomized block Kaczmarz method along with an extended version to solve the matrix equation

A X B = C

(consistent or inconsistent) [17]. In [18], Li et al. systematically derived Kaczmarz and coordinate descent algorithms for solving the matrix equations

A X = B

and

X A = C

. Based on this work, we consider the use of Kaczmarz-type methods (row iteration) and Gauss–Seidel methods (column iteration) to solve (1) (which may be consistent or inconsistent) without matrix–matrix production.

The notations used throughout this paper are explained as follows. For a matrix A, its transpose, Moore–Penrose generalized inverse, rank, column space, and Frobenius norm are represented as

A^{T}

,

A^{+}

,

r (A)

,

R (A)

, and

{∥ A ∥}_{F} = \sqrt{trace (A^{T} A)}

, respectively. We use

{〈 A, B 〉}_{F} = trace (A^{T} B)

to denote the inner product of two matrices A and B. Let I denote the identity matrix, where the order is clear from the context. In addition, for a given matrix

G = (g_{i j}) \in R^{m \times n}

, we use

G_{i, :}

,

G_{:, j}

, and

σ_{min} (G)

to denote its ith row, jth column, and smallest nonzero singular value of G, respectively. In addition, we use

E

to denote the expectation and

E_{k}

to denote the conditional expectation of the first k iterations, that is,

E_{k} [\cdot] = E [\cdot | i_{0}, j_{0}, i_{1}, j_{1}, \dots, i_{k - 1}, j_{k - 1}],

where

i_{t}

and

j_{t} (t = 0, 1, \dots, k - 1)

are the row and column indicators selected in the tth step. Let

E_{k}^{i} [\cdot] = E [\cdot | i_{0}, j_{0}, i_{1}, j_{1}, \dots, i_{k - 1}, j_{k - 1}, j_{k}]

and

E_{k}^{j} [\cdot] = E [\cdot | i_{0}, j_{0}, i_{1}, j_{1}, \dots, i_{k - 1}, j_{k - 1}, i_{k}]

represent the conditional expectations with respect to the random row index and random column index. Then, it is obvious that

E_{k} [\cdot] = E_{k}^{i} [E_{k}^{j} [\cdot]]

.

The rest of this paper is organized as follows. In the next two sections, we discuss the randomized Kaczmarz method for solving the consistent matrix equation

A X B = C

and the randomized Gauss–Seidel (coordinate descent) method for solving the inconsistent matrix equation

A X B = C

. In Section 4, the extended Kaczmarz method and extended Gauss–Seidel method are derived for finding the minimal F-norm least squares solution of the matrix equation in (1). Section 5 presents some numerical experiments of the proposed methods, then conclusions are provided in the final section. This paper makes several significant advances in solving the matrix equation

A X B = C

:

Algorithm Development: We propose novel randomized Kaczmarz (RK) and Gauss–Seidel-type methods that efficiently handle both full-rank and rank-deficient cases for matrices A and B, addressing both consistent and inconsistent systems.
Theoretical Guarantees: We rigorously prove that all proposed algorithms converge linearly to either the minimal Frobenius-norm solution (for consistent systems) or the minimal Frobenius-norm least squares solution $A^{+} C B^{+}$ (for inconsistent systems). We summarize the convergence of the proposed methods in expectation to the minimal F-norm solution $X^{*} = A^{+} C B^{+}$ for all types of matrix equations in Table 1.
Computational Efficiency: All proposed methods avoid matrix–matrix multiplications, allowing them to achieve superior performance with low per-iteration cost and minimal storage requirements. Extensive experiments demonstrate the advantages of our approaches over conventional methods, particularly in high-dimensional scenarios.

2. Kaczmarz Method for Consistent Case

If the matrix equation in (1) is consistent, i.e.,

A A^{+} C B^{+} B = C

(necessary and sufficient conditions for consistency; hence,

A^{+} C B^{+}

is a solution of the consistent matrix equation in (1)). In particular, if A is full row rank (

m \leq p

) and B is full column rank (

q \geq n

), then the matrix equation in (1) is consistent because

X^{*} = A^{T} {(A A^{T})}^{- 1} C {(B^{T} B)}^{- 1} B^{T}

is one solution of this equation. In general, the matrix equation in (1) has multiple solutions. Next, we try to find its minimal F-norm solution

X^{*} = A^{+} C B^{+}

via the Kaczmarz method.

Assume that A has no row that is all zeros and B has no column that is all zeros. Then, the matrix equation in (1) can be rewritten as the following system of matrix equations:

\{\begin{matrix} A Y = C, \\ B^{T} X^{T} = Y^{T} \end{matrix}

(3)

where

Y \in R^{p \times n}

. The classical Kaczmarz method, introduced in 1937 [19], is an iterative row projection algorithm for solving a consistent system

A x = b

in which

A \in R^{m \times p}

,

b \in R^{m}

, and

x \in R^{p}

. This method involves only a single equation per iteration, which converges to the least norm solution

A^{+} b

of

A x = b

with an initial iteration

x^{(0)} \in R (A^{T})

, as follows:

x^{(k + 1)} = x^{(k)} + \frac{b_{i} - A_{i, :} x^{(k)}}{∥ A_{i, :} ∥_{2}^{2}} A_{i, :}^{T}, k \geq 0

(4)

where

i = (k m o d m) + 1

. If we iterate the system of linear equations

A Y_{:, j} = C_{:, j}

,

j = 1, \dots, n

simultaneously and denote

Y^{(k)} = [Y_{:, 1}^{(k)}, Y_{:, 2}^{(k)}, \dots, Y_{:, n}^{(k)}]

, we obtain

Y^{(k + 1)} = Y^{(k)} + \frac{A_{i, :}^{T}}{∥ A_{i, :} ∥_{2}^{2}} (C_{i, :} - A_{i, :} Y^{(k)}), k \geq 0,

(5)

where

i = (k m o d m) + 1

. Then,

A_{i, :} Y^{(k + 1)} = C_{i, :}

holds; that is,

Y^{(k + 1)}

is the projection of

Y^{(k)}

onto the subspace

H_{i} = \{Y \in R^{p \times n} : A_{i, :} Y = C_{i, :}\}

. Thus, we obtain an orthogonal projection method that can be used to solve the matrix equation

A Y = C

.

Similarly, we can obtain the following orthogonal column projection method to solve the equation

B^{T} X^{T} = {(Y^{(k + 1)})}^{T}

:

X^{(k + 1)} = X^{(k)} + \frac{Y_{:, j}^{(k + 1)} - X^{(k)} B_{:, j}}{∥ B_{:, j} ∥_{2}^{2}} B_{:, j}^{T}, k \geq 0

(6)

where

j = (k m o d n) + 1

. Then,

X^{(k + 1)} B_{:, j} = Y_{:, j}^{(k + 1)}

holds; that is,

X^{(k + 1)}

is the projection of

X^{(k)}

onto the subspace

{\hat{H}}_{j} = \{X \in R^{p \times q} : X B_{:, j} = Y_{:, j}^{(k + 1)}\}

.

If i in (5) and j in (6) are selected randomly with probability proportional to the row norm of A and column norm of B, we obtain a randomized Kaczmarz-type algorithm, which is called the CME-RK algorithm (see Algirithm 1).

Algorithm 1 RK method for consistent matrix equation

A X B = C

(CME-RK)

Input:: $A \in R^{m \times p}$ , $B \in R^{q \times n}$ , $C \in R^{m \times n}$ , $X^{(0)} \in R^{p \times q}$ , ${(X_{i, :}^{(0)})}^{T} \in R (B), i = 1, \dots, p$ , $Y^{(0)} \in R^{p \times n}$ , $Y_{:, j}^{(0)} \in R (A^{T}), j = 1, \dots, n$ , K

1:: For $i = 1 : m$ , $M (i) = ∥ A_{i, :} ∥_{2}^{2}$
2:: For $j = 1 : n$ , $N (j) = ∥ B_{:, j} ∥_{2}^{2}$
3:: for $k = 0, 1, 2, \dots, K - 1$ do
4:: Pick i with probability $p_{i} (A) = \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}$ and j with probability ${\hat{p}}_{j} (B) = \frac{∥ B_{:, j} ∥_{2}^{2}}{{∥ B ∥}_{F}^{2}}$
5:: Compute $Y^{(k + 1)} = Y^{(k)} + \frac{A_{i, :}^{T}}{M (i)} (C_{i, :} - A_{i, :} Y^{(k)})$
6:: Compute $X^{(k + 1)} = X^{(k)} + \frac{Y_{:, j}^{(k + 1)} - X^{(k)} B_{:, j}}{N (j)} B_{:, j}^{T}$
7:: end for
8:: Output $X^{(K)}$

If the squared row norms of A and squared column norms of B are precomputed in advance, then the cost of each iteration of this method is

4 p (n + q) + 2 p

flops (

4 n p + p

for step 5 and

4 p q + p

for step 6). In the following theorem, we use the idea of the RK method [20] to prove that

X^{(k)}

generated by Algorithm 1 converges to the the minimal F-norm solution of

A X B = C

when i and j are selected randomly at each iteration.

Before proving the convergence result of Algorithm 1, we analyze the convergence of

Y^{(k)}

and provide the following lemmas, which can be considered as an extension of the theoretical results for solving the linear system of equations in [20]. Let

Y^{*} = A^{+} C

. The sequence

{Y^{(k)}}

is generated by (5) starting from the initial matrix

Y^{(0)} \in R^{p \times n}

. The following Lemmas 1–3 and their proofs can be found in [18].

Lemma 1

([18]). If the sequence

{Y^{(k)}}

is convergent, it must converge to

Y^{*} = A^{+} C

, provided that

Y_{:, j}^{(0)} \in R (A^{T}), j = 1, \dots, n

.

Lemma 2

([18]). Let

A \in R^{m \times p}

be any nonzero matrix. For

Y_{:, j} \in R (A^{T}), j = 1, \dots, n

, it holds that

{∥ A Y ∥}_{F}^{2} \geq σ_{min}^{2} (A) {∥ Y ∥}_{F}^{2} .

Lemma 3

([18]). The sequence

{Y^{(k)}}

generated by (5) starting from the initial matrix

Y^{(0)} \in R^{p \times n}

in which

Y_{:, j}^{(0)} \in R (A^{T}), j = 1, \dots, n

converges linearly to

A^{+} C

in mean square form. Moreover, the solution error in the expectation for the iteration sequence

Y^{(k)}

obeys

E [{∥Y^{(k)} - A^{+} C∥}_{F}^{2}] \leq ρ_{1}^{k} {∥Y^{(0)} - A^{+} C∥}_{F}^{2},

(7)

where the ith row of A is selected with probability

p_{i} (A) = \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}

and where

ρ_{1} = 1 - \frac{σ_{min}^{2} (A)}{{∥ A ∥}_{F}^{2}}

.

Similarly, we can obtain the following convergence result of the RK method for the matrix equation

B^{T} X^{T} = {(Y^{*})}^{T}

.

Lemma 4.

Let

X^{*} = A^{+} C B^{+}

.

\tilde{X}

be generated by running a one-step RK update for solving the matrix equation

B^{T} X^{T} = {(Y^{*})}^{T}

starting from any matrix

\hat{X} \in R^{p \times q}

in which

{({\hat{X}}_{i, :})}^{T} \in R (B),

i = 1, \dots, p

. Then, it holds that

E [{∥\tilde{X} - A^{+} C B^{+}∥}_{F}^{2}] \leq ρ_{2} {∥\hat{X} - A^{+} C B^{+}∥}_{F}^{2},

(8)

where the jth column of B is selected with probability

{\hat{p}}_{j} (B) = \frac{∥ B_{:, j} ∥_{2}^{2}}{{∥ B ∥}_{F}^{2}}

and where

ρ_{2} = 1 - \frac{σ_{min}^{2} (B)}{{∥ B ∥}_{F}^{2}}

.

Lemma 5.

Let

{\tilde{H}}_{j} = \{X \in R^{p \times q} : X B_{:, j} = Y_{:, j}^{*}\}

be the subspaces consisting of the solutions to the unperturbed equations and let

{\hat{H}}_{j} = \{X \in R^{p \times q} : X B_{:, j} = Y_{:, j}^{(k + 1)}\}

be the solution spaces of the noisy equations. Then,

{\hat{H}}_{j} = \{W + α_{j}^{(k + 1)} B_{:, j}^{T}, W \in {\tilde{H}}_{j}\}

, where

α_{j}^{(k + 1)} = \frac{Y_{:, j}^{(k + 1)} - Y_{:, j}^{*}}{∥ B_{:, j} ∥_{2}^{2}}

.

Proof.

First, if

W \in {\tilde{H}}_{j}

, then

(W + α_{j}^{(k + 1)} B_{:, j}^{T}) B_{:, j} = W B_{:, j} + α_{j}^{(k + 1)} B_{:, j}^{T} B_{:, j} = Y_{:, j}^{*} + Y_{:, j}^{(k + 1)} - Y_{:, j}^{*} = Y_{:, j}^{(k + 1)},

so

W + α_{j}^{(k + 1)} B_{:, j}^{T} \in {\hat{H}}_{j}

.

Next, let

V \in {\hat{H}}_{j}

. Set

W = V - α_{j}^{(k + 1)} B_{:, j}^{T}

; then,

W B_{:, j} = (V - α_{j}^{(k + 1)} B_{:, j}^{T}) B_{:, j} = V B_{:, j} - α_{j}^{(k + 1)} {∥ B_{:, j} ∥}_{2}^{2} = Y_{:, j}^{(k + 1)} - (Y_{:, j}^{(k + 1)} - Y_{:, j}^{*}) = Y_{:, j}^{*},

meaning that

W \in {\tilde{H}}_{j}

. This completes the proof. □

We present the convergence result of Algorithm 1 in the following theorem.

Theorem 1.

The sequence

{X^{(k)}}

generated by Algorithm 1 starting from the initial matrix

X^{(0)} \in R^{p \times q}

converges linearly to the solution

X^{*} = A^{+} C B^{+}

of the consistent matrix equation in (1) in mean square form if

{(X_{i, :}^{(0)})}^{T} \in R (B), i = 1, \dots, p

and

Y_{:, j}^{(0)} \in R (A^{T}), j = 1, \dots, n

. Moreover, the following relationship holds:

E [{∥X^{(k)} - A^{+} C B^{+}∥}_{F}^{2}] \leq ({∥X^{(0)} - A^{+} C B^{+}∥}_{F}^{2} + \frac{η}{{∥ B ∥}_{F}^{2}} {∥Y^{(0)} - A^{+} C∥}_{F}^{2}) ρ^{k}

where

ρ = max {ρ_{1}, ρ_{2}}

and

η = \{\begin{matrix} \frac{ρ_{1}}{| ρ_{1} - ρ_{2} |}, & i f ρ_{1} \neq ρ_{2}, \\ k, & i f ρ_{1} = ρ_{2} . \end{matrix}

Proof.

Let

X^{(k)}

denote the kth iteration of the randomized Kaczmarz method (6) and let

{\hat{H}}_{j}

be the solution space chosen in the

(k + 1)

th iteration. Then,

X^{(k + 1)}

is the orthogonal projection of

X^{(k)}

onto

{\hat{H}}_{j}

. Let

{\tilde{X}}^{(k + 1)}

denote the orthogonal projection of

X^{(k)}

onto

{\tilde{H}}_{j}

. Using (6) and Lemma 5, we have

\begin{matrix} X^{(k + 1)} & = X^{(k)} + \frac{Y_{:, j}^{(k + 1)} - X^{(k)} B_{:, j}}{∥ B_{:, j} ∥_{2}^{2}} B_{:, j}^{T} \\ = X^{(k)} + \frac{Y_{:, j}^{*} - X^{(k)} B_{:, j}}{∥ B_{:, j} ∥_{2}^{2}} B_{:, j}^{T} + \frac{Y_{:, j}^{(k + 1)} - Y_{:, j}^{*}}{∥ B_{:, j} ∥_{2}^{2}} B_{:, j}^{T} \\ = {\tilde{X}}^{(k + 1)} + \frac{Y_{:, j}^{(k + 1)} - Y_{:, j}^{*}}{∥ B_{:, j} ∥_{2}^{2}} B_{:, j}^{T} . \end{matrix}

Then,

\begin{matrix} {〈 X^{(k + 1)} - {\tilde{X}}^{(k + 1)}, {\tilde{X}}^{(k + 1)} - X^{*} 〉}_{F} & = {〈\frac{Y_{:, j}^{(k + 1)} - Y_{:, j}^{*}}{∥ B_{:, j} ∥_{2}^{2}} B_{:, j}^{T}, {\tilde{X}}^{(k + 1)} - X^{*}〉}_{F} \\ = trace (B_{:, j} \frac{{(Y_{:, j}^{(k + 1)} - Y_{:, j}^{*})}^{T}}{∥ B_{:, j} ∥_{2}^{2}} ({\tilde{X}}^{(k + 1)} - X^{*})) \\ = trace (({\tilde{X}}^{(k + 1)} - X^{*}) B_{:, j} \frac{{(Y_{:, j}^{(k + 1)} - Y_{:, j}^{*})}^{T}}{∥ B_{:, j} ∥_{2}^{2}}) \\ (by trace (M N) = trace (N M) for any matrices M, N) \\ = 0 (by {\tilde{X}}^{(k + 1)} B_{:, j} = Y_{:, j}^{*}, X^{*} B_{:, j} = Y_{:, j}^{*}) \end{matrix}

and

\begin{matrix} {∥X^{(k + 1)} - {\tilde{X}}^{(k + 1)}∥}_{F}^{2} & = {∥\frac{Y_{:, j}^{(k + 1)} - Y_{:, j}^{*}}{∥ B_{:, j} ∥_{2}^{2}} B_{:, j}^{T}∥}_{F}^{2} \\ = \frac{1}{∥ B_{:, j} ∥_{2}^{4}} trace (B_{:, j} {(Y_{:, j}^{(k + 1)} - Y_{:, j}^{*})}^{T} (Y_{:, j}^{(k + 1)} - Y_{:, j}^{*}) B_{:, j}^{T}) \\ = \frac{∥ Y_{:, j}^{(k + 1)} - Y_{:, j}^{*} ∥_{2}^{2}}{∥ B_{:, j} ∥_{2}^{4}} trace (B_{:, j} B_{:, j}^{T}) = \frac{∥ Y_{:, j}^{(k + 1)} - Y_{:, j}^{*} ∥_{2}^{2}}{∥ B_{:, j} ∥_{2}^{2}} . \end{matrix}

Therefore,

\begin{matrix} {∥X^{(k + 1)} - X^{*}∥}_{F}^{2} & = {∥X^{(k + 1)} - {\tilde{X}}^{(k + 1)}∥}_{F}^{2} + {∥{\tilde{X}}^{(k + 1)} - X^{*}∥}_{F}^{2} \\ = {∥{\tilde{X}}^{(k + 1)} - X^{*}∥}_{F}^{2} + \frac{∥ Y_{:, j}^{(k + 1)} - Y_{:, j}^{*} ∥_{2}^{2}}{∥ B_{:, j} ∥_{2}^{2}} . \end{matrix}

By taking the conditional expectation on both side of this equality, we can obtain

\begin{matrix} E_{k} [{∥X^{(k + 1)} - X^{*}∥}_{F}^{2}] & = E_{k} [{∥{\tilde{X}}^{(k + 1)} - X^{*}∥}_{F}^{2}] + E_{k} [\frac{∥ Y_{:, j}^{(k + 1)} - Y_{:, j}^{*} ∥_{2}^{2}}{∥ B_{:, j} ∥_{2}^{2}}] . \end{matrix}

(9)

Next, we provide the estimates for the first and second parts of the right-hand side of the equality in (9). If

{(X_{i, :}^{(0)})}^{T} \in R (B), i = 1, \dots, p

, then

{(X^{(0)} - A^{+} C B^{+})}_{i, :}^{T} \in R (B),

i = 1, \dots, p

. It is easy to show that

{(X^{(k)} - A^{+} C B^{+})}_{i, :}^{T} \in R (B),

i = 1, \dots, p

by induction of (6). Then, per Lemma 4, we

\begin{matrix} E_{k} [∥ {\tilde{X}}^{(k + 1)} - A^{+} C B^{+} ∥_{F}^{2}] \leq ρ_{2} {∥X^{(k)} - A^{+} C B^{+}∥}_{F}^{2} . \end{matrix}

(10)

For the second part of the right-hand side of (9), we have

\begin{matrix} E_{k} [\frac{∥ Y_{:, j}^{(k + 1)} - Y_{:, j}^{*} ∥_{2}^{2}}{∥ B_{:, j} ∥_{2}^{2}}] & = E_{k}^{i} E_{k}^{j} [\frac{∥ Y_{:, j}^{(k + 1)} - Y_{:, j}^{*} ∥_{2}^{2}}{∥ B_{:, j} ∥_{2}^{2}}] \\ = E_{k}^{i} [\sum_{j = 1}^{n} \frac{1}{{∥ B ∥}_{F}^{2}} {∥ Y_{:, j}^{(k + 1)} - Y_{:, j}^{*} ∥}_{2}^{2}] \\ = \frac{1}{{∥ B ∥}_{F}^{2}} E_{k}^{i} [{∥Y^{(k + 1)} - Y^{*}∥}_{F}^{2}] \\ = \frac{1}{{∥ B ∥}_{F}^{2}} E_{k} [{∥Y^{(k + 1)} - Y^{*}∥}_{F}^{2}] . \end{matrix}

(11)

Substituting (10) and (11) into (9), we can obtain

\begin{matrix} E_{k} [{∥X^{(k + 1)} - A^{+} C B^{+}∥}_{F}^{2}] & \leq ρ_{2} {∥X^{(k)} - A^{+} C B^{+}∥}_{F}^{2} + \frac{1}{{∥ B ∥}_{F}^{2}} E_{k} [{∥Y^{(k + 1)} - A^{+} C∥}_{F}^{2}], \end{matrix}

Then, applying this recursive relation iteratively and taking the full expectation, we have

\begin{matrix} E [{∥X^{(k + 1)} - A^{+} C B^{+}∥}_{F}^{2}] & \leq ρ_{2} E [{∥X^{(k)} - A^{+} C B^{+}∥}_{F}^{2}] + \frac{1}{{∥ B ∥}_{F}^{2}} E [{∥Y^{(k + 1)} - A^{+} C∥}_{F}^{2}] \\ \leq ρ_{2} (ρ_{2} E [{∥X^{(k - 1)} - A^{+} C B^{+}∥}_{F}^{2}] + \frac{1}{{∥ B ∥}_{F}^{2}} ρ_{1}^{k} {∥Y^{(0)} - A^{+} C∥}_{F}^{2}) \\ + \frac{1}{{∥ B ∥}_{F}^{2}} ρ_{1}^{k + 1} {∥Y^{(0)} - A^{+} C∥}_{F}^{2} (by Lemma 3) \\ \leq \dots \\ \leq ρ_{2}^{k + 1} {∥X^{(0)} - A^{+} C B^{+}∥}_{F}^{2} + \frac{\sum_{j = 0}^{k} ρ_{1}^{j + 1} ρ_{2}^{k - j}}{{∥ B ∥}_{F}^{2}} {∥Y^{(0)} - A^{+} C∥}_{F}^{2} . \end{matrix}

(12)

If

ρ_{1} < ρ_{2}

, then

\sum_{j = 0}^{k} ρ_{1}^{j + 1} ρ_{2}^{k - j} = ρ_{2}^{k + 1} \sum_{j = 0}^{k} {(\frac{ρ_{1}}{ρ_{2}})}^{j + 1} \leq \frac{ρ_{1}}{ρ_{2} - ρ_{1}} ρ_{2}^{k + 1} .

If

ρ_{1} > ρ_{2}

, then

\sum_{j = 0}^{k} ρ_{1}^{j + 1} ρ_{2}^{k - j} = ρ_{1}^{k + 1} \sum_{j = 0}^{k} {(\frac{ρ_{2}}{ρ_{1}})}^{j + 1} \leq \frac{ρ_{1}}{ρ_{1} - ρ_{2}} ρ_{1}^{k + 1} .

Setting

ρ = max {ρ_{1}, ρ_{2}}

, (12) then becomes

\begin{matrix} E [{∥X^{(k + 1)} - A^{+} C B^{+}∥}_{F}^{2}] \leq ({∥X^{(0)} - A^{+} C B^{+}∥}_{F}^{2} + \frac{ρ_{1}}{| ρ_{1} - ρ_{2} |} \frac{{∥Y^{(0)} - A^{+} C∥}_{F}^{2}}{{∥ B ∥}_{F}^{2}}) ρ^{k + 1} . \end{matrix}

If

ρ_{1} = ρ_{2}

, then

\sum_{j = 0}^{k} ρ_{1}^{j + 1} ρ_{2}^{k - j} = (k + 1) ρ^{k + 1}

. Therefore, (12) becomes

\begin{matrix} E [{∥X^{(k + 1)} - A^{+} C B^{+}∥}_{F}^{2}] \leq ({∥X^{(0)} - A^{+} C B^{+}∥}_{F}^{2} + (k + 1) \frac{{∥Y^{(0)} - A^{+} C∥}_{F}^{2}}{{∥ B ∥}_{F}^{2}}) ρ^{k + 1} . \end{matrix}

This completes the proof. □

Remark 1.

Algorithm 1 has the advantage that

X^{(k)}

and

Y^{(k)}

can be iteratively solved alternately at each step; alternatively, the approximate value of

Y^{*} = A^{+} C

can first be obtained iteratively, then the approximate value of

X^{*} = A^{+} C B^{+}

can be solved iteratively.

Generally, if we take

X^{(0)} = 0 \in R^{p \times q}

and

Y^{(0)} = 0 \in R^{p \times n}

, then the initial conditions are all satisfied (

0 \in R (A^{T})

,

0 \in R (B)

).

3. Gauss–Seidel Method for Inconsistent Case

If the matrix equation in (1) is inconsistent, then there is no solution to the equation. Considering the least squares solution of the matrix equation in (1), it is obvious that

X^{*} = A^{+} C B^{+}

is the unique minimal F-norm least squares solution of the matrix equation in (1), that is,

X^{*} = A^{+} C B^{+} = arg min \{{∥ X ∥}_{F} : X \in arg min_{X \in R^{p \times q}} {∥ A X B - C ∥}_{F}\} .

If A possesses full column rank and B possesses full row rank, then the matrix equation in (1) has a unique least squares solution

X^{*} = {(A^{T} A)}^{- 1} A^{T} C B^{T} {(B B^{T})}^{- 1}

. In general, the matrix equation in (1) has multiple least squares solutions. Here, we assume that A has no column that is all zeros and B has no row that is all zeros. Next, we find

X^{*}

with the Gauss–Seidel (or coordinate descent) method.

When a linear system of equations

A x = b

is inconsistent, i.e., where

A \in R^{m \times p}

and

r (A) = p

(

p \leq m

), the RGS (RCD) method [21] below is a very effective method for solving its least squares solution:

α_{k} = \frac{A_{:, j}^{T} R^{(k)}}{∥ A_{:, j} ∥_{2}^{2}}, x_{j}^{(k + 1)} = x_{j}^{(k)} + α_{k}, R^{(k + 1)} = r^{(k)} - α_{k} A_{:, j}, {\hat{p}}_{j} (A) = \frac{∥ A_{:, j} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}

(13)

where

x^{(0)} \in R^{p}

is arbitrary and

r^{(0)} = b - A x^{(0)} \in R^{m}

. Using simultaneous n-iterative formulae for solving

A Y_{:, l} = C_{:, l}, l = 1, \dots, n

, we obtain

W^{(k)} = \frac{A_{:, j}^{T} R^{(k)}}{∥ A_{:, j} ∥_{2}^{2}}, Y_{j, :}^{(k + 1)} = Y_{j, :}^{(k)} + W^{(k)}, R^{(k + 1)} = R^{(k)} - A_{:, j} W^{(k)}, {\hat{p}}_{j} (A) = \frac{∥ A_{:, j} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}},

(14)

where

Y^{(0)} \in R^{p \times n}

,

R^{(0)} = C - A Y^{(0)}

. This is a column projection method for solving the least squares solution of

A Y = C

; the cost of each iteration in the method is

4 m n + n

flops if the squared the column norms of A are precomputed in advance.

Similarly, we can solve the least squares solution of

B^{T} X^{T} = {(Y^{(k + 1)})}^{T}

using the RGS method:

U^{(k)} = \frac{E^{(k)} B_{i, :}^{T}}{∥ B_{i, :} ∥_{2}^{2}}, X_{:, i}^{(k + 1)} = X_{:, i}^{(k)} + U^{(k)}, E^{(k + 1)} = E^{(k)} - U^{(k)} B_{i, :} + I_{:, i} W^{(k)}, p_{i} (B) = \frac{∥ B_{i, :} ∥_{2}^{2}}{{∥ B ∥}_{F}^{2}}

(15)

where

X^{(0)} \in R^{p \times q}

,

E^{(0)} = Y^{(1)} - X^{(0)} B

. This is a row projection method; the cost of each iteration of the method is

4 n p + n + p

flops if the squared row norms of B are precomputed in advance.

With (14) and (15), we can obtain an RGS method for solving (1) as follows, which is called the IME-RGS algorithm (see Algorithm 2).

Algorithm 2 RGS method for inconsistent matrix equation

A X B = C

(IME-RGS)

Input:: $A \in R^{m \times p}$ , $B \in R^{q \times n}$ , $C \in R^{m \times n}$ , $X^{(0)} \in R^{p \times q}$ , $Y^{(0)} \in R^{p \times n}$ , $R^{(0)} = C - A Y^{(0)}$ , $E^{(- 1)} = 0 \in R^{p \times n}$ , K

1:: For $j = 1 : p$ , $M (j) = ∥ A_{:, j} ∥_{2}^{2}$
2:: For $i = 1 : q$ , $N (i) = ∥ B_{i, :} ∥_{2}^{2}$
3:: for $k = 0, 1, 2, \dots, K - 1$ do
4:: Pick j with probability ${\hat{p}}_{j} (A) = \frac{∥ A_{:, j} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}$ and i with probability $p_{i} (B) = \frac{∥ B_{i, :} ∥_{2}^{2}}{{∥ B ∥}_{F}^{2}}$
5:: Compute $W^{(k)} = \frac{A_{:, j}^{T} R^{(k)}}{M (j)}$ , $Y_{j, :}^{(k + 1)} = Y_{j, :}^{(k)} + W^{(k)}$ , $R^{(k + 1)} = R^{(k)} - A_{:, j} W^{(k)}$ , $E_{j, :}^{(k)} = E_{j, :}^{(k - 1)} + W^{(k)}$
6:: Compute $U^{(k)} = \frac{E^{(k)} B_{i, :}^{T}}{N (i)}$ , $X_{:, i}^{(k + 1)} = X_{:, i}^{(k)} + U^{(k)}$ , $E^{(k + 1)} = E^{(k)} - U^{(k)} B_{i, :}$
7:: end for
8:: Output $X^{(K)}$

In order to prove the convergence of Algorithm 2, we provide the following Lemma 6, the proof of which can be found in [18].

Lemma 6

([18]). Let

Y^{*} = A^{+} C

. The sequence

{Y^{(k)}}

is generated by (14) starting from the initial matrix

Y^{(0)} \in R^{p \times n}

; then, the following holds:

E [∥ A Y^{(k)} - A A^{+} {C ∥}_{F}^{2}] \leq ρ_{1}^{k} {∥ A Y^{(0)} - A A^{+} C ∥}_{F}^{2}

(16)

where the jth column of A is selected with probability

{\hat{p}}_{j} (A) = \frac{∥ A_{:, j} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}

.

Lemma 7.

Let

X^{*} = A^{+} C B^{+}

.

\tilde{X}

be generated by running a one-step RGS update for solving the matrix equation

B^{T} X^{T} = {(Y^{*})}^{T}

starting from any matrix

\hat{X} \in R^{p \times q}

. Then, it holds that

E [∥ A (\tilde{X} - X^{*}) {B ∥}_{F}^{2}] \leq ρ_{2} {∥ A (\hat{X} - X^{*}) B ∥}_{F}^{2},

(17)

where the ith row of B is selected with probability

p_{i} (B) = \frac{∥ B_{i, :} ∥_{2}^{2}}{{∥ B ∥}_{F}^{2}}

.

Proof.

From the definition of coordinate descent updates for

B^{T} X^{T} = {(Y^{*})}^{T}

, we have

\tilde{X} = \hat{X} + \frac{(Y^{*} - \hat{X} B) B_{i, :}^{T}}{∥ B_{i, :} ∥_{2}^{2}} I_{i, :},

which yields

A \tilde{X} B = A \hat{X} B + \frac{1}{∥ B_{i, :} ∥_{2}^{2}} A (Y^{*} - \hat{X} B) B_{i, :}^{T} B_{i, :}

. Using the projection formula satisfied by coordinate descent

B_{i, :} B^{T} {(\tilde{X})}^{T} = B_{i, :} {(Y^{*})}^{T}

and the properties of MP generalized inversion, we have

\tilde{X} B B_{i, :}^{T} = Y^{*} B_{i, :}^{T}, X^{*} B B^{T} = Y^{*} B^{T}, A^{T} A X^{*} B B^{T} = A^{T} C B^{T} .

Then,

\begin{matrix} {〈 A (\tilde{X} - \hat{X}) B, A (\tilde{X} - X^{*}) B 〉}_{F} & = \frac{1}{∥ B_{i, :} ∥_{2}^{2}} {〈 A (Y^{*} - \hat{X} B) B_{i, :}^{T} B_{i, :}, A (\tilde{X} - X^{*}) B 〉}_{F} \\ = \frac{1}{∥ B_{i, :} ∥_{2}^{2}} trace (B_{i, :}^{T} B_{i, :} {(Y^{*} - \hat{X} B)}^{T} A^{T} A (\tilde{X} - X^{*}) B) \\ = \frac{1}{∥ B_{i, :} ∥_{2}^{2}} trace (A (\tilde{X} - X^{*}) B B_{i, :}^{T} B_{i, :} {(Y^{*} - \hat{X} B)}^{T} A^{T}) \\ = 0 (by \tilde{X} B B_{i, :}^{T} = X^{*} B B_{i, :}^{T}) \end{matrix}

and

\begin{matrix} ∥ A (\tilde{X} - \hat{X}) {B ∥}_{F}^{2} & = \frac{1}{∥ B_{i, :} ∥_{2}^{4}} {∥ A (Y^{*} - \hat{X} B) B_{i, :}^{T} B_{i, :} ∥}_{F}^{2} \\ = \frac{1}{∥ B_{i, :} ∥_{2}^{4}} trace (B_{i, :}^{T} B_{i, :} {(Y^{*} - \hat{X} B)}^{T} A^{T} A (Y^{*} - \hat{X} B) B_{i, :}^{T} B_{i, :}) \\ = \frac{1}{∥ B_{i, :} ∥_{2}^{4}} trace (A (Y^{*} - \hat{X} B) B_{i, :}^{T} B_{i, :} B_{i, :}^{T} B_{i, :} {(Y^{*} - \hat{X} B)}^{T} A^{T}) \\ = \frac{1}{∥ B_{i, :} ∥_{2}^{2}} trace (A (Y^{*} - \hat{X} B) B_{i, :}^{T} B_{i, :} {(Y^{*} - \hat{X} B)}^{T} A^{T}) \\ = \frac{∥ A (Y^{*} - \hat{X} B) B_{i, :}^{T} ∥_{2}^{2}}{∥ B_{i, :} ∥_{2}^{2}} = \frac{∥ A (X^{*} - \hat{X}) B B_{i, :}^{T} ∥_{2}^{2}}{∥ B_{i, :} ∥_{2}^{2}} . \end{matrix}

Therefore,

\begin{matrix} ∥ A (\tilde{X} - X^{*}) {B ∥}_{F}^{2} & = ∥ A (\hat{X} - X^{*}) {B ∥}_{F}^{2} - {∥ A (\tilde{X} - \hat{X}) B ∥}_{F}^{2} \\ = ∥ A (\hat{X} - X^{*}) {B ∥}_{F}^{2} - \frac{∥ A (X^{*} - \hat{X}) B B_{i, :}^{T} ∥_{2}^{2}}{∥ B_{i, :} ∥_{2}^{2}} . \end{matrix}

(18)

By taking the expectation on both sides of (18), we can obtain

\begin{matrix} E [∥ A (\tilde{X} - X^{*}) B ∥_{F}^{2}] & = E [∥ A (\hat{X} - X^{*}) {B ∥}_{F}^{2} - \frac{∥ A (X^{*} - \hat{X}) B B_{i, :}^{T} ∥_{2}^{2}}{∥ B_{i, :} ∥_{2}^{2}}] \\ = ∥ A (\hat{X} - X^{*}) {B ∥}_{F}^{2} - \sum_{i = 1}^{q} \frac{∥ B_{i, :} ∥_{2}^{2}}{{∥ B ∥}_{F}^{2}} \frac{∥ A (X^{*} - \hat{X}) B B_{i, :}^{T} ∥_{2}^{2}}{∥ B_{i, :} ∥_{2}^{2}} \\ = ∥ A (\hat{X} - X^{*}) {B ∥}_{F}^{2} - \frac{∥ A (X^{*} - \hat{X}) B B^{T} ∥_{F}^{2}}{{∥ B ∥}_{F}^{2}} \\ = ∥ A (\hat{X} - X^{*}) {B ∥}_{F}^{2} - \frac{∥ B {(A (X^{*} - \hat{X}) B)}^{T} ∥_{F}^{2}}{{∥ B ∥}_{F}^{2}} \\ \leq ∥ A (\hat{X} - X^{*}) {B ∥}_{F}^{2} - \frac{σ_{min}^{2} (B)}{{∥ B ∥}_{F}^{2}} {∥ {(A (X^{*} - \hat{X}) B)}^{T} ∥}_{F}^{2} \\ = (1 - \frac{σ_{min}^{2} (B)}{{∥ B ∥}_{F}^{2}}) {∥ A (X^{*} - \hat{X}) B ∥}_{F}^{2} . \end{matrix}

The inequality is obtained by Lemma 2 because all columns of

{(A (X^{*} - \hat{X}) B)}^{T}

are in the range of

B^{T}

. This completes the proof. □

Theorem 2.

Let

{X^{(k)}}

denote the sequence generated by Algorithm 2 for the inconsistent matrix equation in (1) starting from any initial matrix

X^{(0)} \in R^{p \times q}

and

Y^{(0)} \in R^{p \times n}

. In exact arithmetic, it holds that

E [∥ A X^{(k)} B - A A^{+} C B^{+} B ∥_{F}^{2}] \leq [∥ A X^{(0)} B - A A^{+} C B^{+} {B ∥}_{F}^{2} + η {∥ A Y^{(0)} - A A^{+} C ∥}_{F}^{2}] ρ^{k} .

Proof.

Let

X^{(k)}

denote the kth iteration of the RGS method in (15) that solves

B^{T} X^{T} = {(Y^{(k + 1)})}^{T}

and let

{\tilde{X}}^{(k + 1)}

be the one-step RGS iteration that solves

B^{T} X^{T} = {(Y^{*})}^{T}

from

X^{(k)}

; then, we have

X^{(k + 1)} = X^{(k)} + \frac{(Y^{(k + 1)} - X^{(k)} B) B_{i, :}^{T}}{∥ B_{i, :} ∥_{2}^{2}} I_{i, :}, {\tilde{X}}^{(k + 1)} = X^{(k)} + \frac{(Y^{*} - X^{(k)} B) B_{i, :}^{T}}{∥ B_{i, :} ∥_{2}^{2}} I_{i, :} .

Then,

\begin{matrix} {〈 A (X^{(k + 1)} - {\tilde{X}}^{(k + 1)}) B, A ({\tilde{X}}^{(k + 1)} - X^{*}) B 〉}_{F} \\ = \frac{1}{∥ B_{i, :} ∥_{2}^{2}} {〈 A (Y^{(k + 1)} - Y^{*}) B_{i, :}^{T} B_{i, :}, A ({\tilde{X}}^{(k + 1)} - X^{*}) B 〉}_{F} \\ = \frac{1}{∥ B_{i, :} ∥_{2}^{2}} trace (B_{i, :}^{T} B_{i, :} {(Y^{(k + 1)} - Y^{*})}^{T} A^{T} A ({\tilde{X}}^{(k + 1)} - X^{*}) B) \\ = \frac{1}{∥ B_{i, :} ∥_{2}^{2}} trace (A ({\tilde{X}}^{(k + 1)} - X^{*}) B B_{i, :}^{T} B_{i, :} {(Y^{(k + 1)} - Y^{*})}^{T} A^{T}) \\ = 0 (by {\tilde{X}}^{(k + 1)} B B_{i, :}^{T} = Y^{*} B_{i, :}^{T}, X^{*} B B_{i, :}^{T} = Y^{*} B_{i, :}^{T}) \end{matrix}

and

\begin{matrix} ∥ A (X^{(k + 1)} - {\tilde{X}}^{(k + 1)}) {B ∥}_{F}^{2} & = {∥\frac{A (Y^{(k + 1)} - Y^{*}) B_{i, :}^{T} B_{i, :}}{∥ B_{i, :} ∥_{2}^{2}}∥}_{F}^{2} \\ = \frac{1}{∥ B_{i, :} ∥_{2}^{4}} trace (B_{i, :}^{T} B_{i, :} {(Y^{(k + 1)} - Y^{*})}^{T} A^{T} A (Y^{(k + 1)} - Y^{*}) B_{i, :}^{T} B_{i, :}) \\ = \frac{1}{∥ B_{i, :} ∥_{2}^{4}} trace (A (Y^{(k + 1)} - Y^{*}) B_{i, :}^{T} B_{i, :} B_{i, :}^{T} B_{i, :} {(Y^{(k + 1)} - Y^{*})}^{T} A^{T}) \\ = \frac{∥ A (Y^{(k + 1)} - Y^{*}) B_{i, :}^{T} ∥_{2}^{2}}{∥ B_{i, :} ∥_{2}^{2}} \leq {∥ A (Y^{(k + 1)} - Y^{*}) ∥}_{F}^{2} . \end{matrix}

Therefore,

\begin{matrix} ∥ A (X^{(k + 1)} - X^{*}) {B ∥}_{F}^{2} & = ∥ A (X^{(k + 1)} - {\tilde{X}}^{(k + 1)}) {B ∥}_{F}^{2} + {∥ A ({\tilde{X}}^{(k + 1)} - X^{*}) B ∥}_{F}^{2} \\ \leq ∥ A (Y^{(k + 1)} - Y^{*}) ∥_{F}^{2} + {∥ A ({\tilde{X}}^{(k + 1)} - X^{*}) B ∥}_{F}^{2} . \end{matrix}

(19)

By taking the conditional expectation on both sides of (19), we can obtain

\begin{matrix} E_{k} [∥ A (X^{(k + 1)} - X^{*}) {B ∥}_{F}^{2}] & \leq E_{k} [∥ A ({\tilde{X}}^{k + 1} - X^{*}) {B ∥}_{F}^{2}] + E_{k} [∥ A (Y^{(k + 1)} - Y^{*}) ∥_{F}^{2}] \\ \leq ρ_{2} ∥ A (X^{(k)} - X^{*}) {B ∥}_{F}^{2} + ρ_{1} {∥ A (Y^{(k)} - Y^{*}) ∥}_{F}^{2} . \end{matrix}

The last inequality is obtained by Lemmas 6 and 7. Applying this recursive relation iteratively, we have

\begin{matrix} E [∥ A (X^{(k + 1)} - X^{*}) {B ∥}_{F}^{2}] & \leq ρ_{2} E [∥ A (X^{(k)} - X^{*}) {B ∥}_{F}^{2}] + ρ_{1} E [∥ A Y^{(k)} - A Y^{*} ∥_{F}^{2}] \\ \leq ρ_{2}^{2} E [∥ A (X^{(k - 1)} - X^{*}) {B ∥}_{F}^{2}] + ρ_{1} (ρ_{1}^{k - 1} ρ_{2} + ρ_{1}^{k}) {∥ A Y^{(0)} - A Y^{*} ∥}_{F}^{2} \\ \leq \dots \\ \leq ρ_{2}^{k + 1} ∥ A (X^{(0)} - X^{*}) {B ∥}_{F}^{2} + \sum_{i = 0}^{k} ρ_{1}^{i + 1} ρ_{2}^{k - i} {∥ A Y^{(0)} - A Y^{*} ∥}_{F}^{2} \\ \leq (∥ A (X^{(0)} - X^{*}) {B ∥}_{F}^{2} + η ∥ A Y^{(0)} - A Y^{*} ∥_{F}^{2}) ρ^{k + 1}, \end{matrix}

(20)

where

η

and

ρ

are defined in Theorem 1. This completes the proof. □

Remark 2.

If A has full column rank and B has full row rank, Theorem 2 implies that

X^{(k)}

converges linearly in expectation to

A^{+} C B^{+}

. If A does not have full column rank or B does not have full row rank, Algorithm 2 fails to converge (see Section 3.3 of Ma et al. [22]).

Remark 3.

If

Y^{(0)} = X^{(0)} B

, then

\begin{matrix} ∥ A Y^{(0)} - A Y^{*} ∥_{F}^{2} & = ∥ A X^{(0)} B - A A^{+} C B^{+} B - A A^{+} C (I - B^{+} B) ∥_{F}^{2} \\ = ∥ A (X^{(0)} - A^{+} C B^{+}) {B ∥}_{F}^{2} + {∥ A A^{+} C - A A^{+} C B^{+} B ∥}_{F}^{2} \\ - 2 {〈 A (X^{(0)} - A^{+} C B^{+}) B, A A^{+} C - A A^{+} C B^{+} B 〉}_{F} . \end{matrix}

It follows from

\begin{matrix} {〈 A (X^{(0)} - A^{+} C B^{+}) B, A A^{+} C - A A^{+} C B^{+} B 〉}_{F} \\ = trace (B^{T} {(X^{(0)} - A^{+} C B^{+})}^{T} A^{T} (A A^{+} C - A A^{+} C B^{+} B)) \\ = trace (A^{T} (A A^{+} C - A A^{+} C B^{+} B) B^{T} {(X^{(0)} - A^{+} C B^{+})}^{T}) \\ = trace (A^{T} A A^{+} C B^{T} - A^{T} A A^{+} C B^{+} B B^{T}) {(X^{(0)} - A^{+} C B^{+})}^{T}) \\ = trace (A^{T} C B^{T} - A^{T} C B^{T}) {(X^{(0)} - A^{+} C B^{+})}^{T}) = 0 \end{matrix}

that

\begin{matrix} ∥ A Y^{(0)} - A Y^{*} ∥_{F}^{2} = ∥ A (X^{(0)} - A^{+} C B^{+}) {B ∥}_{F}^{2} + {∥ A A^{+} C - A A^{+} C B^{+} B ∥}_{F}^{2} . \end{matrix}

Substituting this equality into (20), we have

\begin{matrix} E [∥ A (X^{(k + 1)} - X^{*}) {B ∥}_{F}^{2}] & \leq [(1 + η) ∥ A X^{(0)} B - A A^{+} C B^{+} {B ∥}_{F}^{2} \\ + η ∥ A A^{+} C - A A^{+} C B^{+} {B ∥}_{F}^{2}] ρ^{k + 1} . \end{matrix}

(21)

Remark 4.

If the matrix equation in (1) is consistent, then

A A^{+} C = A A^{+} C B^{+} B = C

; therefore, (21) becomes

E [∥ A X^{(k)} {B - C ∥}_{F}^{2}] \leq (1 + η) ρ^{k} {∥ A X^{(0)} B - C ∥}_{F}^{2},

that is,

A X^{(k)} B

converges to C in expectation (although

X^{(k)}

does not necessarily converge).

Remark 5.

In Algorithm 2

X^{(k)}

and

Y^{(k)}

can be iteratively solved alternately at each step, or the approximate value of

Y^{*} = A^{+} C

can be iteratively obtained first and then the approximate value of

X^{*} = A^{+} C B^{+}

can be iteratively solved. By using Lemmas 6 and 7, we can obtain similar convergence results. We omit the proof for the sake of conciseness.

4. Extended Kaczmarz Method and Extended Gauss–Seidel Method for $AXB = C$

When the matrix equation in (1) is inconsistent and either matrix A or matrix B is not full of rank, we can consider the REK method or REGS method to solve the matrix equation in (1) using the ideas of [23,24,25].

4.1. $A X B = C$ Inconsistent, A Not Full Rank, and B Full Column Rank ( $q \geq n$ )

The matrix equation

A Y = C

is solved by the REK method [23,24], while the matrix equation

X B = Y

is solved by the RK method [20], because

X B = Y

always has a solution (

B^{T}

is full row rank). For this case, we use the REK-RK method to solve

A X B = C

, which is called the IME-REKRK algorithm (see Algorithm 3). The cost of each iteration of this method is

4 m n + 4 n p + 4 p q + m + 2 p

flops (

4 m n - n + m

for step 6,

4 n p + p + n

for step 7, and

4 p q + p

for step 8) if the squared row and column norms of A and the squared column norms of B are precomputed in advance.

Algorithm 3 REK-RK method for inconsistent matrix equation

A X B = C

(IME-REKRK)

Input:: $A \in R^{m \times p}$ , $B \in R^{q \times n}$ , $C \in R^{m \times n}$ , $X^{(0)} = 0 \in R^{p \times q}$ , $Y^{(0)} = 0 \in R^{p \times n}$ , $Z^{(0)} = C$ , K

1:: For $i = 1 : m$ , $M (i) = ∥ A_{i, :} ∥_{2}^{2}$
2:: For $j = 1 : p$ , $N (j) = ∥ A_{:, j} ∥_{2}^{2}$
3:: For $l = 1 : n$ , $T (l) = ∥ B_{:, l} ∥_{2}^{2}$
4:: for $k = 0, 1, \dots, K - 1$ do
5:: Pick i with probability $p_{i} (A) = \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}$ , j with probability ${\hat{p}}_{j} (A) = \frac{∥ A_{:, j} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}$ and l with probability ${\hat{p}}_{l} (B) = \frac{∥ B_{:, l} ∥_{2}^{2}}{{∥ B ∥}_{F}^{2}}$
6:: Compute $Z^{(k + 1)} = Z^{(k)} - \frac{A_{:, j}}{N (j)} A_{:, j}^{T} Z^{(k)}$
7:: Compute $Y^{(k + 1)} = Y^{(k)} + \frac{A_{i, :}^{T}}{M (i)} (C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} Y^{(k)})$
8:: Compute $X^{(k + 1)} = X^{(k)} + \frac{Y_{:, l}^{(k + 1)} - X^{(k)} B_{:, l}}{T (l)} B_{:, l}^{T}$
9:: end for
10:: Output $X^{(K)}$

Lemma 8.

Let

A \in R^{m \times p}

and

Y \in R^{p \times n}

; then, it holds that

\begin{matrix} \sum_{i = 1}^{m} \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}} {∥(I - \frac{A_{i, :}^{T} A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}}) Y∥}_{F}^{2} = {∥ Y ∥}_{F}^{2} - \frac{{∥ A Y ∥}_{F}^{2}}{{∥ A ∥}_{F}^{2}} . \end{matrix}

Proof.

Because

{∥ A ∥}_{F}^{2} = \sum_{i = 1}^{m} ∥ A_{i, :} ∥_{2}^{2} = \sum_{j = 1}^{p} {∥ A_{:, j} ∥}_{2}^{2}

and

{(I - \frac{A_{i, :}^{T} A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}})}^{2} = I - \frac{A_{i, :}^{T} A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}}

, we have

\begin{matrix} \sum_{i = 1}^{m} \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}} {∥(I - \frac{A_{i, :}^{T} A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}}) Y∥}_{F}^{2} & = \sum_{i = 1}^{m} \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}} \sum_{j = 1}^{n} {∥(I - \frac{A_{i, :}^{T} A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}}) Y_{:, j}∥}_{2}^{2} \\ = \sum_{i = 1}^{m} \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}} \sum_{j = 1}^{n} Y_{:, j}^{T} {(I - \frac{A_{i, :}^{T} A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}})}^{2} Y_{:, j} \\ = \sum_{j = 1}^{n} Y_{:, j}^{T} [\sum_{i = 1}^{m} \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}} (I - \frac{A_{i, :}^{T} A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}})] Y_{:, j} \\ = \sum_{j = 1}^{n} Y_{:, j}^{T} (I - \frac{A^{T} A}{{∥ A ∥}_{F}^{2}}) Y_{:, j} \\ = {∥ Y ∥}_{F}^{2} - \frac{{∥ A Y ∥}_{F}^{2}}{{∥ A ∥}_{F}^{2}} . \end{matrix}

This completes the proof. □

Similar to the proof of Lemma 3, we can prove the following Lemma 9.

Lemma 9.

Let

Z^{*} = (I - A A^{+}) C

, and let

{Z^{(k)}}

denote the kth iterate of RK applied to

A^{T} Z = 0

with the initial guess

Z^{(0)} \in R^{m \times n}

. If

Z_{:, j}^{(0)} \in C_{:, j} + R (A), j = 1, \dots, n

, then

Z^{(k)}

converges linearly to

(I - A A^{+}) C

in mean square form. Moreover, the solution error in expectation for the iteration sequence

Z^{(k)}

obeys

E [∥ Z^{(k)} - (I - A A^{+}) {C ∥}_{F}^{2}] \leq ρ_{1}^{k} {∥ Z^{(0)} - (I - A A^{+}) C ∥}_{F}^{2},

(22)

where the jth column of A is selected with probability

{\hat{p}}_{j} (A) = \frac{∥ A_{:, j} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}

.

Lemma 10.

The sequence

{Y^{(k)}}

is generated by the REK method for

A Y = C

starting from the initial matrix

Y^{(0)} \in R^{p \times n}

, in which

Y_{:, j}^{(0)} \in R (A^{T}), j = 1, \dots, n

and the initial guess is

Z^{(0)} \in R^{m \times n}

with

Z_{:, j}^{(0)} \in C_{:, j} + R (A), j = 1, \dots, n

. In exact arithmetic, it holds that

E [∥ Y^{(k)} - A^{+} C ∥_{F}^{2}] \leq \frac{k ρ_{1}^{k}}{{∥ A ∥}_{F}^{2}} {∥Z^{(0)} - (I - A A^{+}) C∥}_{F}^{2} + ρ_{1}^{k} {∥Y^{(0)} - A^{+} C∥}_{F}^{2},

(23)

where the ith row of A is selected with probability

p_{i} (A) = \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}

and the jth column of A is selected with probability

{\hat{p}}_{j} (A) = \frac{∥ A_{:, j} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}

.

Proof.

Let

Y^{(k)}

denote the kth iteration of the REK method for

A Y = C

and let

{\tilde{Y}}^{(k + 1)}

be the one-step Kaczmarz update for the matrix equation

A Y = A A^{+} C

from

Y^{(k)}

, i.e.,

{\tilde{Y}}^{(k + 1)} = Y^{(k)} + \frac{A_{i, :}^{T}}{∥ A_{i, :} ∥_{2}^{2}} (A_{i, :} A^{+} C - A_{i, :} Y^{(k)}) .

We have

\begin{matrix} {\tilde{Y}}^{(k + 1)} - A^{+} C & = Y^{(k)} - A^{+} C + \frac{A_{i, :}^{T} A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}} (A^{+} C - Y^{(k)}) = (I - \frac{A_{i, :}^{T} A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}}) (Y^{(k)} - A^{+} C) \end{matrix}

and

Y^{(k + 1)} - {\tilde{Y}}^{(k + 1)} = \frac{A_{i, :}^{T}}{∥ A_{i, :} ∥_{2}^{2}} (C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{+} C) .

It follows from

\begin{matrix} {〈 {\tilde{Y}}^{(k + 1)} - A^{+} C, Y^{(k + 1)} - {\tilde{Y}}^{(k + 1)} 〉}_{F} \\ = {〈(I - \frac{A_{i, :}^{T} A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}}) (Y^{(k)} - A^{+} C), \frac{A_{i, :}^{T}}{∥ A_{i, :} ∥_{2}^{2}} (C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{+} C)〉}_{F} \\ = trace ({(Y^{(k)} - A^{+} C)}^{T} (I - \frac{A_{i, :}^{T} A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}}) \frac{A_{i, :}^{T}}{∥ A_{i, :} ∥_{2}^{2}} (C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{+} C)) \\ = 0 (by (I - \frac{A_{i, :}^{T} A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}}) \frac{A_{i, :}^{T}}{∥ A_{i, :} ∥_{2}^{2}} = 0) \end{matrix}

and

\begin{matrix} {∥Y^{(k + 1)} - {\tilde{Y}}^{(k + 1)}∥}_{F}^{2} = {∥\frac{A_{i, :}^{T}}{∥ A_{i, :} ∥_{2}^{2}} (C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{+} C)∥}_{F}^{2} \\ = trace ({(C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{+} C)}^{T} \frac{A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}} \frac{A_{i, :}^{T}}{∥ A_{i, :} ∥_{2}^{2}} (C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{+} C)) \\ = \frac{{∥C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{+} C∥}_{2}^{2}}{∥ A_{i, :} ∥_{2}^{2}} \end{matrix}

that

\begin{matrix} {∥Y^{(k + 1)} - A^{+} C∥}_{F}^{2} & = {∥Y^{(k + 1)} - {\tilde{Y}}^{(k + 1)}∥}_{F}^{2} + {∥ {\tilde{Y}}^{(k + 1)} - A^{+} C ∥}_{F}^{2} \\ = \frac{{∥C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{+} C∥}_{2}^{2}}{∥ A_{i, :} ∥_{2}^{2}} + {∥ {\tilde{Y}}^{(k + 1)} - A^{+} C ∥}_{F}^{2} . \end{matrix}

By taking the conditional expectation on the both sides of this equality, we have

\begin{matrix} E_{k} [{∥Y^{(k + 1)} - A^{+} C∥}_{F}^{2}] & = E_{k} [\frac{{∥C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{+} C∥}_{2}^{2}}{∥ A_{i, :} ∥_{2}^{2}}] + E_{k} [∥ {\tilde{Y}}^{(k + 1)} - A^{+} {C ∥}_{F}^{2}] . \end{matrix}

(24)

Next, we provide the estimates for the two parts of the right-hand side of (24). It follows from

\begin{matrix} E_{k} [\frac{{∥C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{+} C∥}_{2}^{2}}{∥ A_{i, :} ∥_{2}^{2}}] & = E_{k}^{j} E_{k}^{i} [\frac{{∥C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{+} C∥}_{2}^{2}}{∥ A_{i, :} ∥_{2}^{2}}] \\ = E_{k}^{j} [\frac{1}{{∥ A ∥}_{F}^{2}} \sum_{i = 1}^{m} {∥C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{+} C∥}_{2}^{2}] \\ = \frac{1}{{∥ A ∥}_{F}^{2}} E_{k}^{j} [{∥C - Z^{(k + 1)} - A A^{+} C∥}_{F}^{2}] \\ = \frac{1}{{∥ A ∥}_{F}^{2}} E_{k} [{∥Z^{(k + 1)} - (I - A A^{+}) C∥}_{F}^{2}] \end{matrix}

that

\begin{matrix} E [\frac{{∥C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{+} C∥}_{2}^{2}}{∥ A_{i, :} ∥_{2}^{2}}] & = \frac{1}{{∥ A ∥}_{F}^{2}} E [{∥Z^{(k + 1)} - (I - A A^{+}) C∥}_{F}^{2}] \\ \leq \frac{ρ_{1}^{k + 1}}{{∥ A ∥}_{F}^{2}} {∥Z^{(0)} - (I - A A^{+}) C∥}_{F}^{2} (by Lemma 9) . \end{matrix}

(25)

By

Y_{:, j}^{(0)} \in R (A^{T})

and

{(A^{+} C)}_{:, j} \in R (A^{T})

,

j = 1, \dots, n

, we have

{(Y^{(0)} - A^{+} C)}_{:, j} \in R (A^{T}), j = 1, \dots, n

. Then, by

Z_{:, j}^{(0)} \in C_{:, j} + R (A)

, it is easy to show that

Z_{:, j}^{(k)} \in C_{:, j} + R (A)

and

{(Y^{(k)} - A^{+} C)}_{:, j} \in R (A^{T}), j = 1, \dots, n

by induction. It follows from

\begin{matrix} E_{k} [∥ {\tilde{Y}}^{(k + 1)} - A^{+} C ∥_{F}^{2}] & = E_{k}^{i} [{∥(I - \frac{A_{i, :}^{T} A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}}) (Y^{(k)} - A^{+} C)∥}_{F}^{2}] \\ = \sum_{i = 1}^{m} \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}} {∥(I - \frac{A_{i, :}^{T} A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}}) (Y^{(k)} - A^{+} C)∥}_{F}^{2} \\ = {∥Y^{(k)} - A^{+} C∥}_{F}^{2} - \frac{∥ A (Y^{(k)} - A^{+} C) ∥_{F}^{2}}{{∥ A ∥}_{F}^{2}} (by Lemma 8) \\ \leq {∥Y^{(k)} - A^{+} C∥}_{F}^{2} - \frac{σ_{min}^{2} (A)}{{∥ A ∥}_{F}^{2}} {∥Y^{(k)} - A^{+} C∥}_{F}^{2} (by Lemma 2) \\ = ρ_{1} {∥Y^{(k)} - A^{+} C∥}_{F}^{2} \end{matrix}

that

\begin{matrix} E [{∥{\tilde{Y}}^{(k + 1)} - A^{+} C∥}_{F}^{2}] \leq ρ_{1} E [{∥Y^{(k)} - A^{+} C∥}_{F}^{2}] . \end{matrix}

(26)

Combining (24), (25), and (26) yields

\begin{matrix} E [∥ Y^{(k + 1)} - A^{+} {C ∥}_{F}^{2}] & = E [\frac{{∥C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{+} C∥}_{2}^{2}}{∥ A_{i, :} ∥_{2}^{2}}] + E [∥ {\tilde{Y}}^{(k + 1)} - A^{+} C ∥_{F}^{2}] \\ \leq \frac{ρ_{1}^{k + 1}}{{∥ A ∥}_{F}^{2}} {∥Z^{(0)} - (I - A A^{+}) C∥}_{F}^{2} + ρ_{1} E [{∥Y^{(k)} - A^{+} C∥}_{F}^{2}] \\ \leq \frac{2 ρ_{1}^{k + 1}}{{∥ A ∥}_{F}^{2}} {∥Z^{(0)} - (I - A A^{+}) C∥}_{F}^{2} + ρ_{1}^{2} E [{∥Y^{(k - 1)} - A^{+} C∥}_{F}^{2}] \\ \leq \dots \leq \frac{(k + 1) ρ_{1}^{k + 1}}{{∥ A ∥}_{F}^{2}} {∥Z^{(0)} - (I - A A^{+}) C∥}_{F}^{2} + ρ_{1}^{k + 1} {∥Y^{(0)} - A^{+} C∥}_{F}^{2} . \end{matrix}

This completes the proof. □

With these preparations, the convergence proof of Algorithm 3 is provided below.

Theorem 3.

Let

{X^{(k)}}

denote the sequence generated by Algorithm 3 (B is full column rank) with the initial guess

X^{(0)} \in R^{p \times q}

in which

{(X_{i, :}^{(0)})}^{T} \in R (B), i = 1, \dots, p

. The sequence

{Y^{(k)}}

is generated by the REK method for

A Y = C

starting from the initial matrix

Y^{(0)} \in R^{p \times n}

, in which

Y_{:, j}^{(0)} \in R (A^{T})

and

Z^{(0)} \in R^{m \times n}

with

Z_{:, j}^{(0)} \in C_{:, j} + R (A), j = 1, \dots, n

. In exact arithmetic, it holds that

\begin{matrix} E [∥ X^{(k)} - A^{+} C B^{+} ∥_{F}^{2}] \leq & ({∥X^{(0)} - A^{+} C B^{+}∥}_{F}^{2} + \frac{η}{{∥ B ∥}_{F}^{2}} {∥Y^{(0)} - A^{+} C∥}_{F}^{2} \\ + \frac{γ}{{∥ A ∥}_{F}^{2} {∥ B ∥}_{F}^{2}} {∥Z^{(0)} - (I - A A^{+}) C∥}_{F}^{2}) ρ^{k} \end{matrix}

where

γ = \{\begin{matrix} \frac{ρ_{1} ρ_{2}}{{(ρ_{2} - ρ_{1})}^{2}}, & i f ρ_{1} < ρ_{2} \\ \frac{k ρ_{1}}{ρ_{1} - ρ_{2}}, & i f ρ_{1} > ρ_{2} \\ \frac{k (k + 1)}{2}, & i f ρ_{1} = ρ_{2} . \end{matrix}

Proof.

Similar to the proof of Theorem 1 using Lemma 10 and Lemma 4, we can obtain

\begin{matrix} E [{∥X^{(k + 1)} - A^{+} C B^{+}∥}_{F}^{2}] \leq ρ_{2} E [{∥X^{(k)} - A^{+} C B^{+}∥}_{F}^{2}] + \frac{1}{{∥ B ∥}_{F}^{2}} E [∥ Y^{(k + 1)} - A^{+} {C ∥}_{F}^{2}] \\ \leq ρ_{2} E [{∥X^{(k)} - A^{+} C B^{+}∥}_{F}^{2}] + \frac{(k + 1) ρ_{1}^{k + 1}}{{∥ A ∥}_{F}^{2} {∥ B ∥}_{F}^{2}} {∥Z^{(0)} - (I - A A^{+}) C∥}_{F}^{2} + \frac{ρ_{1}^{k + 1}}{{∥ B ∥}_{F}^{2}} {∥Y^{(0)} - A^{+} C∥}_{F}^{2} \\ \leq ρ_{2}^{2} E [∥ X^{(k - 1)} - A^{+} C B^{+} ∥_{F}^{2}] + \frac{ρ_{1}^{k + 1} + ρ_{1}^{k} ρ_{2}}{{∥ B ∥}_{F}^{2}} {∥Y^{(0)} - A^{+} C∥}_{F}^{2} \\ + \frac{(k + 1) ρ_{1}^{k + 1} + k ρ_{1}^{k} ρ_{2}}{{∥ A ∥}_{F}^{2} {∥ B ∥}_{F}^{2}} {∥Z^{(0)} - (I - A A^{+}) C∥}_{F}^{2} \leq \dots \\ \leq ρ_{2}^{k + 1} {∥X^{(0)} - A^{+} C B^{+}∥}_{F}^{2} + \frac{\sum_{j = 0}^{k} ρ_{1}^{j + 1} ρ_{2}^{k - j}}{{∥ B ∥}_{F}^{2}} {∥Y^{(0)} - A^{+} C∥}_{F}^{2} \\ + \frac{\sum_{j = 0}^{k} (j + 1) ρ_{1}^{j + 1} ρ_{2}^{k - j}}{{∥ A ∥}_{F}^{2} {∥ B ∥}_{F}^{2}} {∥Z^{(0)} - (I - A A^{+}) C∥}_{F}^{2} . \end{matrix}

(27)

If

ρ_{1} < ρ_{2}

, then from the analytical properties of geometric series we can obtain

\sum_{j = 0}^{k} (j + 1) ρ_{1}^{j + 1} ρ_{2}^{k - j} = ρ_{2}^{k + 1} \sum_{j = 0}^{k} (j + 1) {(\frac{ρ_{1}}{ρ_{2}})}^{j + 1} \leq \frac{ρ_{1} ρ_{2}}{{(ρ_{2} - ρ_{1})}^{2}} ρ_{2}^{k + 1} .

If

ρ_{1} > ρ_{2}

, then

\sum_{j = 0}^{k} (j + 1) ρ_{1}^{j + 1} ρ_{2}^{k - j} = ρ_{1}^{k + 1} \sum_{i = 0}^{k} (k + 1 - i) {(\frac{ρ_{2}}{ρ_{1}})}^{i} \leq \frac{(k + 1) ρ_{1}}{ρ_{1} - ρ_{2}} {ρ_{1}}^{k + 1} .

If

ρ_{1} = ρ_{2}

, then

\sum_{j = 0}^{k} (j + 1) ρ_{1}^{j + 1} ρ_{2}^{k - j} = \frac{(k + 1) (k + 2)}{2} ρ^{k + 1} .

Substituting the above inequalities into (27) completes the proof. □

4.2. $A X B = C$ Inconsistent, A Not Full Rank, and B Full Row Rank ( $q \leq n$ )

The matrix equation

A Y = C

is solved by the REK method, while the matrix equation

X B = Y

is solved by the RGS method, because

X B = Y

has a unique least-squares solution (

B^{T}

is full column rank). For this case, we refer the algorithm for solving

A X B = C

as IME-REKRGS (see Algorithm 4). The cost of each iteration of this method is

4 m n + 8 n p + m + n + 2 p

flops (

4 m n - n + m

for step 6,

4 n p + p + n

for step 7, and

4 n p + n + p

for step 8) if the squared row and column norms of A and the squared row norms of B are precomputed in advance.

Algorithm 4 REK-RGS method for inconsistent matrix equation

A X B = C

(IME-REKRGS)

Input:: $A \in R^{m \times p}$ , $B \in R^{q \times n}$ , $C \in R^{m \times n}$ , $X^{(0)} = 0 \in R^{p \times q}$ , $Y^{(0)} = 0 \in R^{p \times n}$ , $Z^{(0)} = C$ , $E^{(0)} = 0 \in R^{p \times n}$ , K

1:: For $i = 1 : m$ , $M (i) = ∥ A_{i, :} ∥_{2}^{2}$
2:: For $j = 1 : p$ , $N (j) = ∥ A_{:, j} ∥_{2}^{2}$
3:: For $l = 1 : q$ , $T (l) = ∥ B_{l, :} ∥_{2}^{2}$
4:: for $k = 0, 1, \dots, K - 1$ do
5:: Pick i with probability $p_{i} (A) = \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}$ , j with probability ${\hat{p}}_{j} (A) = \frac{∥ A_{:, j} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}$ and l with probability $p_{l} (B) = \frac{∥ B_{l, :} ∥_{2}^{2}}{{∥ B ∥}_{F}^{2}}$
6:: Compute $Z^{(k + 1)} = Z^{(k)} - \frac{A_{:, j}}{N (j)} A_{:, j}^{T} Z^{(k)}$
7:: Compute $Y^{(k + 1)} = Y^{(k)} + \frac{A_{i, :}^{T}}{M (i)} (C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} Y^{(k)})$
8:: Compute $U^{(k)} = \frac{E^{(k)} B_{l, :}^{T}}{T (l)}$ , $X_{:, l}^{(k + 1)} = X_{:, l}^{(k)} + U^{(k)}$ , $E^{(k + 1)} = E^{(k)} - U^{(k)} B_{l, :}$
9:: end for
10:: Output $X^{(K)}$

Similarly, letting

Y = A X

, we can transform the equation

A X B = C

into the system of equations composed of

\{\begin{matrix} B^{T} Y^{T} = C^{T}, \\ A X = Y . \end{matrix}

(28)

The matrix equation

B^{T} Y^{T} = C^{T}

is solved by the RGS method because it has a unique least squares solution (

B^{T}

is full column rank), while the matrix equation

A X = Y

is solved by the REK method. For this case, we refer the algorithm for solving

A X B = C

as IME-RGSREK.

The above two methods can be seen as the combination of two separate algorithms; thus, we do not discuss the algorithms in detail. We only provide the convergence results of the IME-REKRGS method, and omit the proof.

Theorem 4.

Let

{X^{(k)}}

denote the sequence generated by IME-REKRGS method (B is full row rank) with initial guess

X^{(0)} \in R^{p \times q}

. The sequence

{Y^{(k)}}

is generated by the REK method for

A Y = C

starting from the initial matrix

Y^{(0)} \in R^{p \times n}

, in which

Y_{:, j}^{(0)} \in R (A^{T})

and the initial guess is

Z^{(0)} \in R^{m \times n}

with

Z_{:, j}^{(0)} \in C_{:, j} + R (A), j = 1, \dots, n

. In exact arithmetic, it holds that

\begin{matrix} E [∥ (X^{(k)} - A^{+} C B^{+}) B ∥_{F}^{2}] \leq & ({∥(X^{(0)} - A^{+} C B^{+}) B∥}_{F}^{2} + η {∥Y^{(0)} - A^{+} C∥}_{F}^{2} \\ + \frac{γ}{{∥ A ∥}_{F}^{2}} {∥Z^{(0)} - (I - A A^{+}) C∥}_{F}^{2}) ρ^{k} . \end{matrix}

4.3. Double Extended Kaczmarz Method for Solving General Matrix Equation $A X B = C$

In general, the matrix equation in (1) may be inconsistent, and A and B may not be full rank; thus, we consider solving both matrix equations

A Y = C

and

X B = Y

by the REK method. The algorithm is described below (see Algorithm 5).

Algorithm 5 Double REK method for general

A X B = C

(DREK)

Input:: $A \in R^{m \times p}$ , $B \in R^{q \times n}$ , $C \in R^{m \times n}$ , $X^{(0)} = 0 \in R^{p \times q}$ , $Y^{(0)} = 0 \in R^{p \times n}$ , $Z^{(0)} = C$ , $K_{1}, K_{2} \in R$

1:: For $i = 1 : m$ , $M i (i) = ∥ A_{i, :} ∥_{2}^{2}$ ; For $j = 1 : p$ , $M j (j) = ∥ A_{:, j} ∥_{2}^{2}$
2:: For $i = 1 : q$ , $N i (i) = ∥ B_{i, :} ∥_{2}^{2}$ ; For $j = 1 : n$ , $N j (j) = ∥ B_{:, j} ∥_{2}^{2}$
3:: for $k = 0, 1, 2, \dots, K_{1} - 1$ do
4:: Set $p_{i} (A) = \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}$ and ${\hat{p}}_{j} (A) = \frac{∥ A_{:, j} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}$
5:: Compute $Z^{(k + 1)} = Z^{(k)} - \frac{A_{:, j}}{N (j)} (A_{:, j}^{T} Z^{(k)})$
6:: Compute $Y^{(k + 1)} = Y^{(k)} + \frac{A_{i, :}^{T}}{M (i)} (C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} Y^{(k)})$
7:: end for
8:: Set $W^{(0)} = {(Y^{(K_{1})})}^{T}$
9:: for $k = 0, 1, 2, \dots, K_{2} - 1$ do
10:: Set $p_{s} (B) = \frac{∥ B_{s, :} ∥_{2}^{2}}{{∥ B ∥}_{F}^{2}}$ and ${\hat{p}}_{t} (B) = \frac{∥ B_{:, t} ∥_{2}^{2}}{{∥ B ∥}_{F}^{2}}$
11:: Compute $W^{(k + 1)} = W^{(k)} - \frac{B_{s, :}^{T}}{N i (s)} B_{s, :} W^{(k)}$
12:: Compute $X^{(k + 1)} = X^{(k)} + (Y_{:, t}^{(K_{1})} - {(W_{t, :}^{(k + 1)})}^{T} - X^{(k)} B_{:, t}) \frac{B_{:, t}^{T}}{N j (t)}$
13:: end for
14:: Output $X^{(K_{2})}$

The convergence result is the superposition of the corresponding convergence results of the two REK methods, and the proof is omitted.

Similar to the DREK method, we can employ the double REGS (DREGS) method to solve the general matrix equation

A X B = C

(see Algorithm 6). Because the convergence results and proof methods are very similar to the previous ones, we omit them here.

Algorithm 6 Double REGS method for general

A X B = C

(DREGS)

Input:: $A \in R^{m \times p}$ , $B \in R^{q \times n}$ , $C \in R^{m \times n}$ , $X^{(0)} = 0 \in R^{p \times q}$ , $Y^{(0)} = 0 \in R^{p \times n}$ , $F^{(0)} \in R^{p \times n}$ , $U^{(0)} \in R^{p \times q}$ , $R^{(0)} = C$ , $K_{1}, K_{2} \in R$

1:: For $i = 1 : m$ , $M i (i) = ∥ A_{i, :} ∥_{2}^{2}$ ; For $j = 1 : p$ , $M j (j) = ∥ A_{:, j} ∥_{2}^{2}$
2:: For $i = 1 : q$ , $N i (i) = ∥ B_{i, :} ∥_{2}^{2}$ ; For $j = 1 : n$ , $N j (j) = ∥ B_{:, j} ∥_{2}^{2}$
3:: for $k = 0, 1, 2, \dots, K_{1} - 1$ do
4:: Set $p_{i} (A) = \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}$ and ${\hat{p}}_{j} (A) = \frac{∥ A_{:, j} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}$
5:: Compute $W^{(k)} = \frac{A_{:, j}^{T} R^{(k)}}{M j (j)}$ , $F_{j, :}^{(k + 1)} = F_{j, :}^{(k)} + W^{(k)}$ , $R^{(k + 1)} = R^{(k)} - A_{:, j} W^{(k)}$
6:: Compute $Y^{(k + 1)} = Y^{(k)} - A_{i, :}^{T} \frac{A_{i, :} (Y^{(k)} - F^{(k + 1)})}{M i (i)}$
7:: end for
8:: Set $E^{(0)} = Y^{(K_{1})}$
9:: for $k = 0, 1, 2, \dots, K_{2} - 1$ do
10:: Set $p_{s} (B) = \frac{∥ B_{s, :} ∥_{2}^{2}}{{∥ B ∥}_{F}^{2}}$ and ${\hat{p}}_{t} (B) = \frac{∥ B_{:, t} ∥_{2}^{2}}{{∥ B ∥}_{F}^{2}}$
11:: Compute $V^{(k)} = \frac{E^{(k)} B_{s, :}^{T}}{N i (s)}$ , $U_{:, i}^{(k + 1)} = U_{:, i}^{(k)} + V^{(k)}$ , $E^{(k + 1)} = E^{(k)} - V^{(k)} B_{s, :}$
12:: Compute $X^{(k + 1)} = X^{(k)} - \frac{(X^{(k)} - U^{(k + 1)}) B_{:, t}}{N j (t)} B_{:, t}^{T}$
13:: end for
14:: Output $X^{(K_{2})}$

5. Numerical Experiments

In this section, we present some experimental results of the proposed algorithms for solving various matrix equations and compare them with ME-RGRK and ME-MWRK from [15] for consistent matrix equations and RBCD from [14] for inconsistent matrix equations. All experiments were carried out in MATLAB (version R2020a) on a DESKTOP-8CBRR86 with an Intel(R) Core(TM) i7-4712MQ CPU@2.30GHz, 8GB RAM, and Windows 10.

All computations were started from the initial guess

X^{(0)} = 0, Y^{(0)} = 0

and terminated when the relative error (RE) of the solution, defined by

R E = \frac{∥ X^{(k)} - X^{*} ∥_{F}^{2}}{∥ X^{*} ∥_{F}^{2}}

at the the current iteration

X^{(k)}

, satisfied

R E < 10^{- 6}

or exceeded the maximum iteration K = 50,000, where

X^{*} = A^{+} C B^{+}

. We report the average number of iterations (denoted as “IT”) and average computing time in seconds (denoted as“CPU”) for 20 repeated trial runs of the corresponding method. We considered the following methods:

CME-RK (Algorithm 1) compared with ME-RGRK and ME-MWRK in [15] for consistent matrix equations. We used $θ = 0.5$ in the ME-RGRK method, which is the same as in [15].
IME-RGS (Algorithm 2), IME-REKRGS (Algorithm 4) compared with RBCD in [14] for inconsistent matrix equations. We used $α = \frac{1.5}{{∥ A ∥}_{2}^{2}}$ in the RBCD method, which is the same as in [14].
IME-REKRK (Algorithm 3), DREK (Algorithm 5), and DREGS (Algorithm 6) for inconsistent matrix equations. The last two methods have no requirements regarding whether matrix A and matrix B have full row rank or full column rank.

We tested the performance of various methods with synthetic dense data and real-world sparse data. Synthetic data were generated as follows:

Type I: For given $m, p, q, n$ , the entries of A and B were generated from a standard normal distribution, i.e., $A = r a n d n (m, p), B = r a n d n (q, n) .$ We also constructed the rank-deficient matrix by $A = r a n d n (m, p / 2), A = [A, A]$ or $B = r a n d n (q / 2, n), B = [B; B]$ , etc.
Type II: As in [16], for given $m, p$ , and $r_{1} = r a n k (A)$ , we constructed a matrix A by $A = U_{1} D_{1} V_{1}^{T}$ , where $U_{1} \in R^{m \times r_{1}}$ and $V_{1} \in R^{p \times r_{1}}$ are orthogonal columns matrices, $D \in R^{r_{1} \times r_{1}}$ is a diagonal matrix with the first $r_{1} - 2$ diagonal entries uniformly distributed numbers in $[σ_{min (A)}, σ_{max (A)}]$ , and the last two diagonal entries are $σ_{max (A)}, σ_{min (A)}$ . Similarly, for given $q, n$ , and $r_{2} = r a n k (B)$ , we constructed a matrix B by $B = U_{2} D_{2} V_{2}^{T}$ , where $U_{2} \in R^{q \times r_{2}}$ and $V_{2} \in R^{n \times r_{2}}$ are orthogonal columns matrices, $D \in R^{r_{2} \times r_{2}}$ is a diagonal matrix with the first $r_{2} - 2$ diagonal entries uniformly distributed numbers in $[σ_{min (B)}, σ_{max (B)}]$ , and the last two diagonal entries are $σ_{max (B)}, σ_{min (B)}$ .

The real-world sparse data were taken from the Florida sparse matrix collection [26]. Table 2 lists the features of these sparse matrices.

5.1. Consistent Matrix Equation

First, we compared the performance of the ME-RGRK, ME-MWRK, and CME-RK methods for the consistent matrix equation

A X B = C

. To construct a consistent matrix equation, we set

C = A X^{*} B

, where

X^{*}

is a random matrix generated by

X^{*} = r a n d n (p, q)

.

Example 1.

The ME-RGRK, ME-MWRK, and CME-RK methods with synthetic dense data.

Table 3 and Table 4 report the average IT and CPU of ME-RGRK, ME-MWRK, and CME-RK for solving a consistent matrix with Type I and Type II matrices. In the following tables, the item “>” indicates that the number of iteration steps exceeded the maximum iterations (50,000), while the item “-” indicates that the method did not converge. From Table 3, it can be seen that the CME-RK method vastly outperforms the ME-RGRK and ME-MWRK methods in terms of both IT and CPU times. The CME-RK method has the least iteration steps and runs for the least time regardless of whether or not matrices A and B are full column/row rank. We observe that when the linear system is consistent, the speed-up is at least 2.00 and the largest speed-up reaches 3.75. As the matrix dimension increases, the CPU time of the CME-RK method increases only slowly, while the running times of ME-RGRK and ME-MWRK increase dramatically. The numerical advantages of CME-RK for large consistent matrix equations are more obvious in Table 4. Moreover, when

\frac{σ_{max} (A)}{σ_{min} (A)}

and

\frac{σ_{max} (B)}{σ_{min} (B)}

are large (e.g.,

\frac{σ_{max}}{σ_{min}} = 5

), the convergence speeds of ME-RGRK and ME-MWRK are very slow, which is because the convergence rate of these two methods depends on

1 - \frac{σ_{min}^{2} (A) σ_{min}^{2} (B)}{{∥ A ∥}_{F}^{2} {∥ B ∥}_{F}^{2}}

.

Figure 1 shows the plots of the relative error (RE) in base-10 logarithm versus the IT and CPU of different methods with Type I (

A = r a n d n (500, 50), A = [A, A]

,

B = r a n d n (150, 600)

) and Type II (

m = 500, p = 100, r_{1} = 50, \frac{σ_{max} (A)}{σ_{min} (A)} = 2, q = 150

,

n = 600, r_{2} = 50, \frac{σ_{max} (B)}{σ_{min} (B)} = 2

). Again, it can be seen that the relative error of CME-RK decreases rapidly with the increase in iteration steps and the computing time.

Example 2.

The ME-RGRK, ME-MWRK, and CME-RK methods with real-world sparse data.

For the sparse matrices from [26], the numbers of iteration steps and the computing times for the ME-RGRK, ME-MWRK, and CME-RK methods are listed in Table 5. It can be observed that the CME-RK method successfully computes an approximate solution of the consistent matrix equation for various A and B. For the fist three cases in Table 5, the ME-RGRK, ME-MWRK, and CME-RK methods all converge to the solution; however, the CME-RK method is significantly better than the ME-RGRK and ME-MWRK methods in terms of both iteration steps and running time. For the last three cases, the ME-RGRK and ME-MWRK methods fail to converge to the solution because the iteration steps exceed the maximum of 50,000.

5.2. Inconsistent Matrix Equation

Next, we compare the performance of the RBCD, IME-RGS, and IME-REKRGS methods for the inconsistent matrix equation

A X B = C

, where B is full row rank. To construct an inconsistent matrix equation, we set

C = A X^{*} B + R

, where

X^{*}

and R are random matrices generated by

X^{*} = r a n d n (p, q)

and

R = δ * r a n d n (p, q), δ \in (0, 1)

. In addition, we show the experiment results of the REKRK, DREK, and DREGS methods, which do not require B to have full row rank.

Example 3.

The RBCD, IME-RGS, and IME-REKRGS methods with synthetic dense data.

Table 6 and Table 7 report the average IT and CPU of the RBCD, IME-RGS, and IME-REKRGS methods for solving an inconsistent matrix with Type I and Type II matrices. Figure 2 shows the plots of the relative error (RE) in base-10 logarithm versus the IT and CPU of different methods with Type I (

A = r a n d n (500, 100), B = r a n d n (150, 600)

) and Type II (

m = 500, p = 100, r_{1} = 100, \frac{σ_{max} (A)}{σ_{min} (A)} = 2, q = 150, n = 600, r_{2} = 150, \frac{σ_{max} (B)}{σ_{min} (B)} = 2

). From these tables, it can be seen that the IME-RGS and IME-REKRGS methods are better than the RBCD method in terms of IT and CPU time, especially when the matrix dimension is large (see the last two cases in Table 6) or when

\frac{σ_{max}}{σ_{min}}

is large (see the last three cases in Table 7). From Figure 2, it can be seen that the IME-RGS and IME-REKRGS methods converge faster than the RBCD method, although the relative error of RBCD decreases faster in the initial iteration.

Example 4.

The RBCD, IME-RGS, and IME-REKRGS methods with real-world sparse data.

Table 8 lists the average IT and CPU of the RBCD, IME-RGS, and IME-REKRGS methods for solving an inconsistent matrix with sparse matrices. It can be observed that the IME-RGS and IME-REKRGS methods require less CPU time than the RBCD method in all case and less IT in all cases except for

A = a s h 958, B = a s h 219^{T}

.

Example 5.

The IME-REKRK, DREK, and DREGS methods.

Finally, we tested the effectiveness of the IME-REKRK, DREK, and DREGS methods for inconsistent matrix equations, including both synthetic dense data and real-world sparse data. The features of A and B are provided in Table 9, while the experimental results are listed in Table 10. For the DREK and DREGS methods, the iteration steps and running time for calculating

Y^{(k)}

and

X^{(k)}

are represented by “+”. From Table 10, it can be observed that the IME-REKRK method ia able to compute an approximate solution to the linear least squares problems when B has full column rank. The DREK and DREGS methods can successfully solve the linear least squares solution for all cases.

6. Conclusions

In this paper, we propose a Kaczmarz-type algorithm for the consistent matrix equation

A X B = C

. We develop a Gauss–Seidel-type algorithm to address the inconsistent case when A is full column rank and B is full row rank. In addition, we introduce extended Kaczmarz and extended Gauss–Seidel algorithms for inconsistent systems where either A or B lacks full rank. Theoretical analyses establish the linear convergence of these methods to the unique minimal Frobenius-norm solution or the least squares solution (i.e.,

A^{+} C B^{+}

). Numerical experiments demonstrate the efficiency and robustness of the proposed algorithms.

In future work, we aim to extend the proposed algorithms to tensor equations and more general multilinear systems as well as to further explore their use in large-scale optimization and machine learning applications. Additionally, we will investigate accelerated variants of the proposed algorithms to enhance their convergence rates by incorporating strategies such as greedy selection rules, block iterative processing, and adaptive step size techniques. These improvements are expected to significantly boost computational efficiency, making the proposed methods more suitable for high-dimensional and real-world problems.

Author Contributions

Conceptualization, W.Z. and L.X.; methodology, L.X.and W.B.; validation, W.Z. and W.B.; writing—original draft preparation, W.Z.; writing—review and editing, L.X. and W.L.; software, W.Z. and L.X.; visualization, W.Z. and L.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability Statement

The datasets that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors are thankful to the referees for their constructive comments and valuable suggestions, which have greatly improved the original manuscript of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bouhamidi, A.; Jbilou, K. A note on the numerical approximate solutions for generalized Sylvester matrix equations with applications. Appl. Math. Comput. 2008, 206, 687–694. [Google Scholar] [CrossRef]
Zhou, B.; Duan, G. On the generalized Sylvester mapping and matrix equations. Syst. Control Lett. 2008, 57, 200–208. [Google Scholar] [CrossRef]
Chu, K. Symmetric solutions of linear matrix equations by matrix decompositions. Linear Algebra Appl. 1989, 119, 35–50. [Google Scholar] [CrossRef]
Ding, F.; Chen, T.W. Iterative least-squares solutions of coupled sylvester matrix equations. Syst. Control Lett. 2005, 54, 95–107. [Google Scholar] [CrossRef]
Ding, F.; Liu, P.; Ding, J. Iterative solutions of the generalized Sylvester matrix equations by using the hierarchical identification principle. Appl. Math. Comput. 2008, 197, 41–50. [Google Scholar] [CrossRef]
Wang, X.; Li, Y.; Dai, L. On Hermitian and skew-Hermitian splitting iteration methods for the linear matrix equation AXB = C. Comput. Math. Appl. 2013, 65, 657–664. [Google Scholar] [CrossRef]
Tian, Z.; Tian, M.; Liu, Z.; Xu, T. The Jacobi and Gauss-Seidel-type iteration methods for the matrix equation AXB = C. Appl. Math. Comput. 2017, 292, 63–75. [Google Scholar] [CrossRef]
Fausett, D.; Fulton, C. Large least squaress problems involving Kronecker products. SIAM J. Matrix Anal. Appl. 1994, 15, 219–227. [Google Scholar] [CrossRef]
Zha, H. Comments on large least squaress problems involving Kronecker products. SIAM J. Matrix Anal. Appl. 1995, 16, 1172. [Google Scholar] [CrossRef]
Zhang, F.; Li, Y.; Guo, W.; Zhao, J. Least squares solutions with special structure to the linear matrix equation AXB = C. Appl. Math. Comput. 2011, 217, 10049–10057. [Google Scholar] [CrossRef]
Cvetkovic, D. Re-nnd solutions of the matrix equation AXB = C. J. Aust. Math. Soc. 2008, 84, 63–72. [Google Scholar] [CrossRef]
Peng, Z. A matrix LSQR iterative method to solve matrix equation AXB = C. Int. J. Comput. Math. 2010, 87, 1820–1830. [Google Scholar] [CrossRef]
Shafiei, S.; Hajarian, M. Developing Kaczmarz method for solving Sylvester matrix equations. J. Frankl. Inst. 2022, 359, 8991–9005. [Google Scholar] [CrossRef]
Du, K.; Ruan, C.; Sun, X. On the convergence of a randomized block coordinate descent algorithm for a matrix least squaress problem. Appl. Math. Lett. 2022, 124, 107689. [Google Scholar] [CrossRef]
Wu, N.; Liu, C.; Zuo, Q. On the Kaczmarz methods based on relaxed greedy selection for solving matrix equation AXB = C. J. Comput. Appl. Math. 2022, 413, 114374. [Google Scholar] [CrossRef]
Niu, Y.; Zheng, B. On global randomized block Kaczmarz algorithm for solving large-scale matrix equations. arXiv 2022, arXiv:2204.13920. [Google Scholar]
Xing, L.; Bao, W.; Li, W. On the convergence of the randomized block Kaczmarz algorithm for solving a matrix equation. Mathematics 2023, 11, 4554. [Google Scholar] [CrossRef]
Li, W.; Bao, W.; Xing, L. Kaczmarz-type methods for solving matrix equations. Int. J. Comput. Math. 2024, 101, 708–731. [Google Scholar] [CrossRef]
Kaczmarz, S. Angen $\ddot{a}$ herte aufl $\ddot{o}$ sung von systemen linearer gleichungen. Bull. Internat. Acad. Polon. Sci. Lett. A 1937, 32, 335–357. [Google Scholar]
Strohmer, T.; Vershynin, R. A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 2009, 15, 262–278. [Google Scholar] [CrossRef]
Leventhal, D.; Lewis, A. Randomized methods for linear constraints: Convergence rates and conditioning. Math. Oper. Res. 2010, 35, 641–654. [Google Scholar] [CrossRef]
Needell, D. Randomized Kaczmarz solver for noisy linear systems. BIT Numer. Math. 2010, 50, 395–403. [Google Scholar] [CrossRef]
Zouzias, A.; Freris, N. Randomized extended Kaczmarz for solving least squares. SIAM J. Matrix Anal. Appl. 2013, 34, 773–793. [Google Scholar] [CrossRef]
Du, K. Tight upper bounds for the convergence of the randomized extended Kaczmarz and Gauss-Seidel algorithms. Numer. Linear Algebra Appl. 2019, 26, e2233. [Google Scholar] [CrossRef]
Ma, A.; Needell, D.; Ramdas, A. Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methods. SIAM J. Matrix Anal. Appl. 2015, 36, 1590–1604. [Google Scholar] [CrossRef]
Davis, T.; Hu, Y. The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 2011, 38, 1–25. [Google Scholar] [CrossRef]

Figure 1. IT (left) and CPU (right) of different methods for consistent matrix equations with Type I (top) and Type II (bottom).

Figure 2. IT (left) and CPU (right) of different methods for inconsistent matrix equations with Type I (top) and Type II (bottom).

Table 1. Summary of the convergence of ME-RGRK [15], ME-MWRK [15], RBCD [14], CME-RK (Theorem 1), IME-RGS (Theorem 2), IME-REKRK (Theorem 3), IME-REKRGS (Theorem 4), DREK, and DREGS in expectation to the minimal F-norm solution

X^{*} = A^{+} C B^{+}

for all types of matrix equations. (Note: Y means that the algorithm is convergent and N means that it is not.)

Table 1. Summary of the convergence of ME-RGRK [15], ME-MWRK [15], RBCD [14], CME-RK (Theorem 1), IME-RGS (Theorem 2), IME-REKRK (Theorem 3), IME-REKRGS (Theorem 4), DREK, and DREGS in expectation to the minimal F-norm solution

X^{*} = A^{+} C B^{+}

for all types of matrix equations. (Note: Y means that the algorithm is convergent and N means that it is not.)

Matrix Equation	r(A)	r(B)	ME-RGRK	ME-MWRK	RBCD	CME-RK	IME-RGS	IME-REKRK	IME-REKRGS	DREK	DREGS
consistent	=p	=q	Y	Y	Y	Y	Y	Y	Y	Y	Y
	=p	<q	Y	Y	N	Y	N	Y	Y (r(B) = n)	Y	Y
	<P	=q	Y	Y	Y	Y	N	N	Y	Y	Y
	<p	<q	Y	Y	N	Y	N	Y(r(B) = n)	N	Y	Y
inconsistent	=p	=q	N	N	Y	N	Y	Y	Y	Y	Y
	=p	<q	N	N	N	N	N	Y(r(B) = n)	N	Y	Y
	<P	=q	N	N	Y	N	N	N	Y	Y	Y
	<p	<q	N	N	N	N	N	Y(r(B) = n)	N	Y	Y

Table 2. Detailed features of the sparse matrices from [26].

Name	Size	Rank	Sparsity
ash219	$219 \times 85$	85	$97.65 %$
ash958	$958 \times 292$	292	$99.32 %$
divorce	$50 \times 9$	9	$50 %$
Worldcities	$315 \times 100$	100	$76.13 %$

Table 3. IT and CPU of ME-RGRK, ME-MWRK, and CME-RK for consistent matrix equations with Type I.

m	p	$r_{1}$	q	n	$r_{2}$		ME-RGRK	ME-MWRK	CME-RK
100	40	40	40	100	40	IT	49,707	27,579	1600.9
100	40	40	40	100	40	CPU	0.71	2.01	0.06
100	40	20	40	100	20	IT	2979.6	1064	454.2
100	40	20	40	100	20	CPU	0.04	0.09	0.02
40	100	40	100	40	40	IT	>	49,332.7	1807.2
40	100	40	100	40	40	CPU	>	2.15	0.13
40	100	20	100	40	20	IT	14,788	3484	441.1
40	100	20	100	40	20	CPU	0.19	0.20	0.03
500	100	100	100	500	100	IT	>	32,109	2250.4
500	100	100	100	500	100	CPU	>	57.61	0.33
500	100	50	100	500	50	IT	9193.9	3158.6	935.3
500	100	50	100	500	50	CPU	6.07	5.66	0.13
1000	200	100	100	1000	50	IT	15,206.4	5076.7	1655.5
1000	200	100	100	1000	50	CPU	58.43	52.45	1.23
1000	200	200	100	1000	100	IT	>	39,848	3906.7
1000	200	200	100	1000	100	CPU	>	402.15	2.51

Table 4. IT and CPU of ME-RGRK, ME-MWRK and CME-RK for consistent matrix equations with Type II.

m	p	$r_{1}$	$\frac{σ_{max} (A)}{σ_{min} (A)}$	q	n	$r_{2}$	$\frac{σ_{max} (B)}{σ_{min} (B)}$		ME-RGRK	B	CME-RK
100	40	40	2	40	100	40	2	IT	10,865.1	5617	842.3
100	40	40	2	40	100	40	2	CPU	0.15	0.50	0.03
100	40	20	2	40	100	20	2	IT	2409	836	422
100	40	20	2	40	100	20	2	CPU	0.03	0.07	0.02
100	40	20	5	40	100	20	5	IT	22,423.2	6439.8	1145.2
100	40	20	5	40	100	20	5	CPU	0.33	0.57	0.04
500	100	100	2	100	500	100	2	IT	40,768	20,507.2	1992.3
500	100	100	2	100	500	100	2	CPU	21.03	34.02	0.29
500	100	50	5	500	100	50	5	IT	>	35,159	2893.6
500	100	50	5	500	100	50	5	CPU	>	59.02	0.39
500	100	50	10	500	100	50	10	IT	>	>	10,693.4
500	100	50	10	500	100	50	10	CPU	>	>	1.68
1000	200	100	2	100	1000	50	2	IT	19,679.2	6974.9	1722.8
1000	200	100	2	100	1000	50	2	CPU	73.45	70.56	1.20
1000	200	100	5	100	1000	50	5	IT	>	>	6037.4
1000	200	100	5	100	1000	50	5	CPU	>	>	4.23

Table 5. IT and CPU of ME-RGRK, ME-MWRK, and CME-RK for the consistent matrix equations with sparse matrices from [26], where T represents transposition.

A	B		ME-RGRK	ME-MWRK	CME-RK
ash219	divorce^T	IT	49,871.1	15,423.5	3522.4
ash219	divorce^T	CPU	0.78	1.26	0.17
divorce	ash219^T	IT	43,927.8	14,164.4	3521.3
divorce	ash219^T	CPU	1.15	1.35	0.17
divorce	ash219	IT	40,198.7	17,251.4	3238.9
divorce	ash219	CPU	0.63	0.80	0.14
ash958	ash219^T	IT	>	>	6706.2
ash958	ash219^T	CPU	>	>	1.23
ash219	ash958^T	IT	>	>	5762.6
ash219	ash958^T	CPU	>	>	1.18
ash958	Worldcities^T	IT	-	>	38,088.5
ash958	Worldcities^T	CPU	-	>	7.94

Table 6. IT and CPU of RBCD, IME-RGS, and IME-REKRGS for inconsistent matrix equations with Type I.

m	p	$r_{1}$	q	n	$r_{2}$		RBCD	IME-RGS	IME-REKRGS
100	40	40	40	100	40	IT	11,116	1883.7	2449.2
100	40	40	40	100	40	CPU	0.43	0.09	0.18
100	40	20	40	100	40	IT	12,416	-	1725.8
100	40	20	40	100	40	CPU	0.49	-	0.13
500	100	100	50	200	50	IT	2820.1	2011.5	2603.7
500	100	100	50	200	50	CPU	0.56	0.31	0.62
500	100	100	100	500	100	IT	5067	2314.7	2782.2
500	100	100	100	500	100	CPU	4.06	1.76	2.55
1000	100	100	200	1000	200	IT	5969.7	3833.4	3867.9
1000	100	100	200	1000	200	CPU	27.02	18.18	19.47
1000	200	200	200	1000	200	IT	9738.4	4485.2	5442
1000	200	200	200	1000	200	CPU	50.43	24.27	39.64

Table 7. IT and CPU of RBCD, IME-RGS, and IME-REKRGS for inconsistent matrix equations with Type II.

m	p	$r_{1}$	$\frac{σ_{max} (A)}{σ_{min} (A)}$	q	n	$r_{2}$	$\frac{σ_{max} (B)}{σ_{min} (B)}$		RBCD	IME-RGS	IME-REKRGS
100	40	40	2	40	100	40	2	IT	1176.5	716.2	949.6
100	40	40	2	40	100	40	2	CPU	0.04	0.03	0.09
100	40	40	5	40	100	40	5	IT	27,631	2974.3	3773.5
100	40	40	5	40	100	40	5	CPU	1.10	0.16	0.27
500	100	100	2	100	500	100	2	IT	2953.4	2101.2	2307.8
500	100	100	2	100	500	100	2	CPU	2.52	1.78	2.25
500	100	100	5	100	500	100	5	IT	>	6432.5	8101.3
500	100	100	5	100	500	100	5	CPU	>	5.29	7.96
1000	100	100	2	200	1000	200	2	IT	5577.3	3242.6	3380.7
1000	100	100	2	200	1000	200	2	CPU	25.68	15.33	17.13
1000	100	100	5	200	1000	200	5	IT	>	10,672.5	11,006.4
1000	100	100	5	200	1000	200	5	CPU	>	49.49	56.54

Table 8. IT and CPU of RBCD, IME-RGS, and IME-REKRGS for inconsistent matrix equations with sparse matrices from [26].

A	B		RBCD	IME-RGS	IME-REKRGS
ash219	divorce^T	IT	13,115.2	3543.7	3653.6
ash219	divorce^T	CPU	1.15	0.20	0.28
divorce	ash219^T	IT	>	3371.8	4150.9
divorce	ash219^T	CPU	>	0.26	0.48
ash958	ash219^T	IT	7200	7536.4	7808.6
ash958	ash219^T	CPU	10.44	5.06	6.04
ash219	ash958^T	IT	21,118.5	6749.2	5675.7
ash219	ash958^T	CPU	14.93	5.37	6.36
ash958	Worldcities^T	IT	>	39,183.6	38,538.1
ash958	Worldcities^T	CPU	>	42.34	53.65

Table 9. Detailed features of A and B for Example 5.

Type	A	$r (A)$	B	$r (B)$
Type a	$A = r a n d n (500, 100), A = [A, A; A, A]$	100	$B = r a n d n (1000, 100)$	100
Type b	$A = r a n d n (500, 100), A = [A, A; A, A]$	100	$B = r a n d n (50, 500), B = [B, B; B, B]$	50
Type c	$A = U_{1} D_{1} V_{1}^{T},$	50	$B = U_{2} D_{2} V_{2}^{T}$	40
Type c	$m = 1000, p = 100, \frac{σ_{max} (A)}{σ_{min} (A)} = 2$	50	$q = 200, n = 1000, \frac{σ_{max} (B)}{σ_{min} (B)} = 2$	40
Type d	$A = U_{1} D_{1} V_{1}^{T},$	100	$B = U_{2} D_{2} V_{2}^{T},$	100
Type d	$m = 1000, p = 100, \frac{σ_{max} (A)}{σ_{min} (A)} = 5$	100	$q = 1000, n = 100, \frac{σ_{max} (B)}{σ_{min} (B)} = 5$	100
Type e	A = ash219	85	B = Worldcities	100
Type f	A = ash958	292	B = ash219	85

Table 10. IT and CPU of the IME-REKRK, DREK, and DREGS methods for the inconsistent matrix equations.

Method		Type a	Type b	Type c	Type d	Type e	Type f
IME-REKRK	IT	2692.3	2569.4	-	8087.4	41,373.1	8733.5
IME-REKRK	CPU	3.81	14.52	-	2.76	8.36	2.60
DREK	IT	1785.2 + 1981.7	2110.5 + 1067.2	1171.6 + 942.1	8023.7.2 +6153.4	1831.3 + 596.6	7780.7 + 3236.6
DREK	CPU	0.60 + 1.98	10.87 + 1.51	5.88 + 0.97	1.98 + 1.32	0.33 + 0.16	1.56 + 0.62
DREGS	IT	2087.3 + 1792.6	2358.2 + 1199.5	1327.6 + 1194.8	9069.2 + 7623.4	2498.2 + 703.9	7998.1 + 3568.5
DREGS	CPU	0.74 + 3.26	12.36 + 1.54	6.97 + 1.22	2.45 + 1.47	0.30 + 0.20	1.69 + 0.71

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zheng, W.; Xing, L.; Bao, W.; Li, W. Kaczmarz-Type Methods for Solving Matrix Equation AXB = C. Axioms 2025, 14, 367. https://doi.org/10.3390/axioms14050367

AMA Style

Zheng W, Xing L, Bao W, Li W. Kaczmarz-Type Methods for Solving Matrix Equation AXB = C. Axioms. 2025; 14(5):367. https://doi.org/10.3390/axioms14050367

Chicago/Turabian Style

Zheng, Wei, Lili Xing, Wendi Bao, and Weiguo Li. 2025. "Kaczmarz-Type Methods for Solving Matrix Equation AXB = C" Axioms 14, no. 5: 367. https://doi.org/10.3390/axioms14050367

APA Style

Zheng, W., Xing, L., Bao, W., & Li, W. (2025). Kaczmarz-Type Methods for Solving Matrix Equation AXB = C. Axioms, 14(5), 367. https://doi.org/10.3390/axioms14050367

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Kaczmarz-Type Methods for Solving Matrix Equation AXB = C

Abstract

1. Introduction

2. Kaczmarz Method for Consistent Case

3. Gauss–Seidel Method for Inconsistent Case

4. Extended Kaczmarz Method and Extended Gauss–Seidel Method for $AXB = C$

4.1. $A X B = C$ Inconsistent, A Not Full Rank, and B Full Column Rank ( $q \geq n$ )

4.2. $A X B = C$ Inconsistent, A Not Full Rank, and B Full Row Rank ( $q \leq n$ )

4.3. Double Extended Kaczmarz Method for Solving General Matrix Equation $A X B = C$

5. Numerical Experiments

5.1. Consistent Matrix Equation

5.2. Inconsistent Matrix Equation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Kaczmarz-Type Methods for Solving Matrix Equation AXB = C

Abstract

1. Introduction

2. Kaczmarz Method for Consistent Case

3. Gauss–Seidel Method for Inconsistent Case

4. Extended Kaczmarz Method and Extended Gauss–Seidel Method for AXB = C

4.1. A X B = C Inconsistent, A Not Full Rank, and B Full Column Rank ( q ≥ n )

4.2. A X B = C Inconsistent, A Not Full Rank, and B Full Row Rank ( q ≤ n )

4.3. Double Extended Kaczmarz Method for Solving General Matrix Equation A X B = C

5. Numerical Experiments

5.1. Consistent Matrix Equation

5.2. Inconsistent Matrix Equation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4. Extended Kaczmarz Method and Extended Gauss–Seidel Method for $AXB = C$

4.1. $A X B = C$ Inconsistent, A Not Full Rank, and B Full Column Rank ( $q \geq n$ )

4.2. $A X B = C$ Inconsistent, A Not Full Rank, and B Full Row Rank ( $q \leq n$ )

4.3. Double Extended Kaczmarz Method for Solving General Matrix Equation $A X B = C$