Block Kaczmarz–Motzkin Method via Mean Shift Clustering

Liao, Yimou; Lu, Tianxiu; Yin, Feng

doi:10.3390/math10142408

Open AccessArticle

Block Kaczmarz–Motzkin Method via Mean Shift Clustering

by

Yimou Liao

¹,

Tianxiu Lu

^1,2,*

and

Feng Yin

³

¹

College of Mathematical and Statistics, Sichuan University of Science and Engineering, Zigong 643000, China

²

Key Laboratory of Higher Education of Sichuan Province for Enterprise Informationalization and Internet of Things, Zigong 643000, China

³

College of Mathematics and Physics, Chengdu University of Technology, Chengdu 643000, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(14), 2408; https://doi.org/10.3390/math10142408

Submission received: 7 June 2022 / Revised: 4 July 2022 / Accepted: 7 July 2022 / Published: 9 July 2022

(This article belongs to the Special Issue Matrix Equations and Their Algorithms Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

Solving systems of linear equations is a fundamental problem in mathematics. Combining mean shift clustering (MS) with greedy techniques, a novel block version of the Kaczmarz–Motzkin method (BKMS), where the blocks are predetermined by MS clustering, is proposed in this paper. Using a greedy strategy, which collects the row indices with the almost maximum distance of the linear subsystem per iteration, can be considered an efficient extension of the sampling Kaczmarz–Motzkin algorithm (SKM). The new method linearly converges to the least-norm solution when the system is consistent. Several examples show that the BKMS algorithm is more efficient compared with other methods (for example, RK, Motzkin, GRK, SKM, RBK, and GRBK).

Keywords:

consistent linear system; Kaczmarz–Motzkin; clustering method; convergence property

MSC:

65F10; 65F25; 90C25; 15A06

1. Introduction

This paper is concerned with the approximate solution of a large-scale linear system of equations of the form

A x = b, A \in R^{m \times n}, b \in R^{m},

(1)

where

x \in R^{n}

need to be determined. If

m > n

, the linear systems (1) with such thin (or tall) coefficient matrices A are overdetermined and are often inconsistent. In this case, we are interested in the computation of the least-squares solution of systems (1) by

x_{L S} : = a r g min_{x} {∥ A x - b ∥}_{2}^{2},

(2)

where

{∥ \cdot ∥}_{2}

denotes the Euclidean norm. If

m < n

, the linear system (1) with such a fat (or flat) matrix A is underdetermined. If the linear system is consistent and has many solutions, then the least Euclidean-norm solution

x_{L N} : = a r g min_{x} {∥ x ∥}_{2}, A x = b

(3)

will often be considered. In this paper, we focus on the approximate of the least-norm solution when (1) is consistent.

The Kaczmarz method in [1] is a popular algorithm for solving the linear system (1), which is called the algebraic reconstruction technique (ART), and has a large range of fields of applications, such as image reconstruction in computerized tomography [2,3], signal processing [4], and distributed computing [5,6]. In the classical Kaczmarz method [1], all rows of the matrix A are circularly passed according to the given order, and the current iteration point is orthogonally projected to the hyperplane formed by the index row for the next iteration. For a given initial solution

x_{0}

, the iteration scheme of the Kaczmarz method can be presented as

x_{k + 1} = x_{k} + \frac{b^{i_{k}} - A^{i_{k}} x_{k}}{∥ A^{i_{k}} ∥_{2}^{2}} {(A^{i_{k}})}^{T},

(4)

where

i_{k} = m o d (k, m) + 1

,

k = 0, 1, \dots

,

A^{i}

denotes the ith row of A,

b_{i}

presents the ith element of b, and

A^{T}

denotes the transpose of A.

For the consistent linear system (1), Strohmer and Vershynin [7] in 2009 presented a randomized Kaczmarz algorithm (RK) with expected exponential convergence. The RK algorithm selects each iteration row with the row index

i_{k} = ∥ A^{i_{k}} ∥_{2}^{2} / {∥ A ∥}_{F}^{2}

. A theoretical analysis showed that, compared with the selection of a row in the natural order, randomly selecting a row of iterations can greatly improve the convergence speed. In [8], Bai and Wu designed a greedy randomized Kaczmarz (GRK) algorithm. GRK uses a greedy probability strategy, which is derived from the maximum distance rule, to grasp larger entries of the residual vector, and the work row with a larger residual error is determined. The theoretical analysis shows that the GRK converges to the least norm solution with the expected exponential, and the numerical experiments illustrate that GRK converges much faster than RK. In [9], De Loera proposed a sampling Kaczmarz–Motzkin (SKM) for linear feasibility problems by combining the RK method with the Motzkin method [10]. This method mainly consists of two stages. The row index set

{1, 2, \dots, m}

of matrix A is divided into multiple subsets

T = {τ_{1}, τ_{2}, \dots, τ_{p}}

for the first stage, and the second stage involves determining the working row

i_{k} = \arg {max}_{i_{k} \in τ_{k}} {| b_{i_{k}} - A^{i_{k}} x_{k} |^{2}}

by uniformly extracting one (

τ_{k}, k \in [p]

) in each iteration, and then updating

x_{k}

by (4). In particular, the SKM method overcomes the weakness of the RK and Motzkin methods. For example, the RK method may be slow to converge because it does not utilize a greedy strategy, and the calculations of all residual vectors are expensive for the Motzkin method. However, the convergence rate of the SKM method may be slow since it enforces only one constraint for the current iteration.

The block Kaczmarz methods, which utilize the multi-row constraints, have received extensive attention for their fast convergence properties. The block iterative methods were first presented by Elfving [11] and Eggermont et al. [12] to solve (1). Then, Needell et al. [13] proposed a randomized block Kaczmarz (RBK) algorithm to solve the least-squares problem by selecting a subsystem from the predetermined partition at random, which linearly converges to the least-norm solution in expectation. The randomized block Kaczmarz algorithm iterates the form

x_{k + 1} = x_{k} + A_{τ_{i_{k}}}^{†} (b_{τ_{i_{k}}} - A_{τ_{i_{k}}} x_{k}),

(5)

where

A_{τ_{i_{k}}}

and

b_{τ_{i_{k}}}

are the submatrix and subvector of A and b, respectively,

τ_{i_{k}} \subseteq {1, 2, \dots, m}

and

A_{τ_{i_{k}}}^{†}

represent the Moore–Penrose pseudoinverse of subsystem matrix

A_{τ_{i_{k}}}

. Although the existence of this “good” paving of all rows is theoretically guaranteed, it is not always effortless to find it. Liu and Gu in [14] designed a randomized greedy block Kaczmarz method to choose a “good” target block by embedding a greedy strategy in the predetermined subsystems, which was derived from the probability criterion proposed by Bai and Wu in [8]. The convergence theory of GRBK shows that the iterative solution sequence linearly converges to the least-norm solution in expectation. In addition, the partitioning method may not be a good partition since it only focuses on the average partitioning of the row index set. Many variants of the block Kaczmarz have recently received extensive attention. For example, literature studies [15,16,17,18,19,20,21] and references therein.

By exploring the structural properties of the coefficient matrix A, some criteria can be used to determine the row partitions (see [13,14,16,22]). In the context of data science, the mean shift clustering (MS) is an iterative algorithm based on kernel density estimation [23], which is widely used for data clustering. So, the combination of the clustering algorithm and greedy technique is extremely meaningful. Inspired by this, to improve the efficiency of block Kaczmarz, we constructed a novel block Kaczmarz–Motzkin method (BKMS(

δ

,

η

)) by determining the largest residual block, and then introducing an almost maximum distance criterion for the current block to collect the row indices, close to the largest entry (absolute value), as the iteration index set. The row partitions were divided based on MS clustering, and the row correlation coefficient of the system matrix A was considered the clustering criterion. Moreover, unlike k-means clustering [16,24], it only needs to initialize the bandwidth’s

δ \in (0, 1]

automatic partitions instead of pre-specifying the number of partitions

κ

, which depends on the initialized cluster centers.

This paper is organized as follows. The RBK method in [13] and the GRBK method in [14] are summarized in Section 2. Section 3 introduces the MS clustering briefly, and then the BKMS method is proposed and its convergence theory is constructed when (1) is consistent. Section 4 reports several examples and applications. We complete this work with some conclusions and discuss future work in Section 5.

2. Block Kaczmarz Algorithm and Its Variants

For a matrix

Q = (q_{i j}) \in R^{m \times n}

, we indicate by

Q^{i}

,

Q_{τ_{k}}

,

{∥ Q ∥}_{2} : = max_{x \neq 0} \frac{{∥ Q x ∥}_{2}}{{∥ x ∥}_{2}}

,

{∥ Q ∥}_{F} : = \sqrt{\sum_{i = 1}^{m} \sum_{j = 1}^{n} {| q_{i j} |}^{2}}

and

Q^{T}

the ith row and a row submatrix (

τ_{k}

is a row indicator block), the spectral norm, Frobenius norm, and the transpose of the Q, respectively. If any two rows of Q are independently and identically distributed (IID), the correlation coefficient of Q is given by

μ_{k} = \frac{| Q^{l} {(Q^{p})}^{T} |}{∥ Q^{l} ∥_{2} {∥ Q^{p} ∥}_{2}}, l \neq p \in {1, 2, \dots, m} .

In addition,

σ_{m a x} (Q)

and

σ_{m i n} (Q)

are the largest and smallest positive singular values of the Q, respectively. Moreover,

p_{i}

denotes the ith component of a vector

p = (p_{i}) \in R^{m}

. For constant

c \in R

, we call

⌊ c ⌋

the largest integer not exceeding c.

2.1. The RBK Algorithm

The RBK method [13] is summarized in Algorithm 1, which is described by Needell and Tropp to solve the overdetermined least-squares problem (2).

Algorithm 1 The randomized block Kaczmarz method (RBK)

Input: $A \in R^{m \times n}$ is standardized (i.e., $∥ A^{i} ∥_{2} = 1, i \in [m]$ ), $b \in R^{m}$ , $x_{0} \in R^{n}$ , $l \in N^{+}$ and partition $T = {τ_{1}, τ_{2}, \dots, τ_{p}}_{p \leq m}$
Output: $x_{l}$
- for $k = 0, 1, \dots l - 1$ do
- Select a block $τ_{i_{k}}$ uniformly at random from $T$
- Update $x_{k + 1}$ by (5)
- end for

The restatement of Lemma 1 in [13] guarantees the linear convergence of the RBK method.

Lemma 1.

Assume that the overdetermined linear system (1) with A is of full rank and row normalized (i.e.,

∥ A^{i} ∥_{2} = 1, i \in [m]

). Let

α \leq σ_{m i n}^{2} (A_{τ})

and

β \geq σ_{m a x}^{2} (A_{τ})

for each

τ \in T

. Given an initial vector

x_{0} \in R (A^{T})

, then the sequence

{x_{k}}_{0}^{\infty}

generated by the RBK method exponentially converges to the least-squares solution

x_{*} = A^{†} b

of (1) in expectation. Furthermore,

E ∥ x_{k} - x_{*} ∥_{2}^{2} \leq {(1 - \frac{σ_{m i n}^{2} (A)}{β m})}^{k} {∥ x_{0} - x_{*} ∥}_{2}^{2} + \frac{β}{α} \cdot \frac{∥ A x_{*} {- b ∥}_{2}^{2}}{σ_{m i n}^{2} (A)} .

According to Lemma 1, the convergence rate of RBK is limited by

σ_{m i n}^{2} (A)

. If A is well-conditioned, the upper bound of the convergence error of RBK is large. In addition, we ignore the assumption that the linear system is inconsistent. The second term on the right-hand side of the inequality is zero. So, we remark the convergence factor of RBK as

ρ_{RBK} = 1 - \frac{σ_{m i n}^{2} (A)}{β m} .

2.2. The GRBK Algorithm

Based on the motivation of the GRK method, Liu and Gu in [14] applied the greedy sampling to the RBK method to replace uniform sampling, and proposed the GRBK method. Algorithm 2 summarizes the GRBK method to solve (1), as follows.

Algorithm 2 The greedy randomized Kaczmarz block method (GRBK)

Input: $A \in R^{m \times n}, b \in R^{m}$ , $x_{0} \in R^{n}$ , $l \in N^{+}$ and partition $T = {τ_{1}, τ_{2}, \dots, τ_{p}}$ .
Output: $x_{l}$
- for $k = 0, 1, \dots l - 1$ do
- Compute
  
  $ϵ_{k} = \frac{1}{2} (\frac{1}{∥ b - A x_{k} ∥_{2}^{2}} max_{τ_{i_{k}} \in T} \{\frac{| b_{τ_{i_{k}}} - A_{τ_{i_{k}}} x_{k} |^{2}}{∥ A_{τ_{i_{k}}} ∥_{2}^{2}}\} + \frac{1}{{∥ A ∥}_{F}^{2}})$
- Determine the partition of the indicator set
  
  $U_{k} = {τ_{i_{k}} | {| b_{τ_{i_{k}}} - A_{τ_{i_{k}}} x_{k} |}^{2} \geq ϵ_{k} ∥ b - A x_{k} ∥_{2}^{2} ∥ A_{τ_{i_{k}}} ∥_{2}^{2}}$
- Compute the block ${\tilde{r}}_{k}^{τ_{i}}$ of the entry ${\tilde{r}}_{k}^{i}$ according to
  
  ${\tilde{r}}_{k}^{τ_{i}} = \{\begin{matrix} {\tilde{r}}_{k}^{i} = b_{τ_{i}} - A_{τ_{i}} x_{k}, & i f τ_{i} \in U_{k}, \\ 0, & otherwise \end{matrix}$
- Select $τ_{i_{k}} \in U_{k}$ with probability $P_{r} (block = τ_{i_{k}}) = \frac{{| {\tilde{r}}_{k}^{τ_{k}} |}^{2}}{∥ {\tilde{r}}_{k} ∥_{2}^{2}}$
- Update $x_{k + 1}$ by (5)
- end for

According to Algorithm 2, we consider that, if there is only a row in each block, then GRK in [8] is a special case of the GRBK method. Liu and Gu proved that the GRBK algorithm has a lower convergence upper bound than RBK and demonstrated the effectiveness of GRBK with several numerical experiments. The result in [14] is restated as follows.

Lemma 2.

For a fix partition

T = {τ_{1}, τ_{2}, \dots, τ_{p}}

of the row indices

{1, 2, \dots, m}

, we assume that

ξ = {min}_{τ \in T} {∥ A ∥}_{F}^{2}

and

β \geq σ_{m a x}^{2} (A_{τ})

for each

τ \in T

. Given an initial vector

x_{0} \in R (A^{T})

, then the sequence

{x_{k}}_{0}^{\infty}

generated by the GRBK method exponentially converges to the least norm solution

x_{*} = A^{†} b

in expectation when (1) is consistent. Moreover, it satisfies

E ∥ x_{1} - x_{*} ∥_{2}^{2} \leq (1 - ξ \frac{σ_{m i n}^{2} (A)}{{β ∥ A ∥}_{F}^{2}}) {∥ x_{0} - x_{*} ∥}_{2}^{2}

and

E ∥ x_{k + 1} - x_{*} ∥_{2}^{2} \leq {(1 - γ ξ \frac{σ_{m i n}^{2} (A)}{{β ∥ A ∥}_{F}^{2}})}^{k - 1} (1 - ξ \frac{σ_{m i n}^{2} (A)}{{β ∥ A ∥}_{F}^{2}}) {∥ x_{0} - x_{*} ∥}_{2}^{2},

where

γ = \frac{1}{2} (\frac{{∥ A ∥}_{F}^{2}}{{∥ A ∥}_{F}^{2} - ξ} + 1)

.

By Lemma 2, the following estimates bound the convergence factor of the GRBK method

ρ_{G R B K} = 1 - \frac{ξ}{2} (\frac{{∥ A ∥}_{F}^{2}}{{∥ A ∥}_{F}^{2} - ξ} + 1) \frac{σ_{m i n}^{2} (A)}{{β ∥ A ∥}_{F}^{2}},

which is less than

ρ_{R B K}

when A is row normalized.

3. Block Kaczmarz–Motzkin Methods with Mean Shift Clustering

The greedy strategy in the SKM method ensures that the entry with the largest residual is prioritized. We hope to exploit the correlations between subsystem matrices to determine the row partition of the coefficient matrix A in an efficient way. In the framework of big data, the mean shift method (MS) in [25] is one of the clustering algorithms that divides the data into different categories. Therefore, the combination of the MS clustering and the Kaczmarz method with greedy techniques comes naturally. Inspired by this, a novel block Kaczmarz–Motzkin method based on MS clustering is proposed. In this section, we first restate the sketch of the MS clustering, then design a new, more realistic block Kaczmarz–Motzkin method and construct its convergence theory.

3.1. The Mean Shift Clustering

The MS clustering is an iterative algorithm based on kernel density estimation. it mainly consists of three steps, as follows. The first step is to select a point

{\tilde{x}}_{i}, i \in {1, 2, \dots m}

randomly from the unclassified data items

X = {{\tilde{x}}_{1}, {\tilde{x}}_{2}, \dots, {\tilde{x}}_{m}}

as the center point

y_{j}

. The second step is to find out all the data points in the circle entered on

y_{j}

and the bandwidth

δ / 2

as the radius. It can be written as

S_{δ} (y_{j}) = {{\tilde{x}}_{i} | ∥ y_{j} - {\tilde{x}}_{i} ∥_{2}^{2} < δ / 2}, j = 1 .

Considering these points belong to the cluster

C_{j}

, calculate the vector from the center point

y_{j}

to each element

x_{i}

in the set

S_{δ} (y_{j})

, and add these vectors to obtain the mean shift vector

m (y_{i})

, as follows

m (y_{j}) = \frac{\sum_{i = 1}^{n} {\tilde{x}}_{i} g (∥ \frac{y_{j} - {\tilde{x}}_{i}}{δ} ∥^{2})}{\sum_{i = 1}^{n} g (∥ \frac{y_{j} - {\tilde{x}}_{i}}{δ} ∥^{2})} - y_{j},

where

{\tilde{x}}_{i} \in S_{δ} (\tilde{x})

and the Gaussian kernel function

g (\cdot)

are designed to assign different weights to each data point. The center point is moved in the direction of the mean shift vector

m (y_{j})

by

| m (y_{j}) |

. Update the current center point by

y_{j + 1} = m (y_{j}) + y_{j}

(

j = 1, 2, \dots

), then repeat the above process until the iterative sequence

{y_{j}}_{j \geq 1}

satisfies

∥ y_{j + 1} - y_{j} ∥_{2}^{2} < ϵ

, where

ϵ

is a user-preset threshold. The last step is to repeat the above two steps until all points are classified. Algorithm 3 summarizes the MS algorithm in [26] for data clustering.

Algorithm 3 The mean shift method for clustering (MS)

Input: A dataset $X = {{\tilde{x}}_{1}, {\tilde{x}}_{2}, \dots, {\tilde{x}}_{m}}$ containing m data items, the threshold $ϵ$ and the bandwidth $δ > 0$ .
Output: Estimate cluster centers $y_{1}^{*}, y_{2}^{*}, \dots, y_{k}^{*}$ , where k is the number of clusters.
- for $i = 1, 2, \dots n$ do
- Select an initial data point ${\tilde{x}}_{j}$ randomly from the data set $X$ , let $y_{j} = {\tilde{x}}_{j}$
- and $j \leftarrow 1$ .
- repeat
- Estimate the mean shift vector by
  
  $m (y_{j}) = \frac{\sum_{i = 1}^{n} {\tilde{x}}_{i} g (∥ \frac{y_{j} - {\tilde{x}}_{i}}{δ} ∥^{2})}{\sum_{i = 1}^{n} g (∥ \frac{y_{j} - {\tilde{x}}_{i}}{δ} ∥^{2})} - y_{j}$
- Update the cluster centers $y_{j + 1} = m (y_{j}) + y_{j}$
- until $∥ y_{j + 1} - y_{j} ∥_{2}^{2} \leq ϵ$
- $y_{j}^{*} \leftarrow y_{j}$
- end for
- Merge the cluster centers with distances smaller than $δ$

According to Algorithm 3, one knows that, unlike the k-means algorithm in [16] and the random partition method in [13], the MS clustering algorithm does not require us to predetermine the number of clusters. For details, we refer to [25,26].

3.2. Block Kaczmarz–Motzkin Method with Mean Shift

The following describe in detail how to combine MS clustering and the Kaczmarz–Motzkin method. Considering the row partitions of A, we first construct the dataset

X = (F (r o w), μ_{k}) \in R^{m \times 2}

. The

μ_{k}

is calculated by

μ_{k} = \frac{| A^{i} {(A^{I})}^{T} |}{∥ A^{i} ∥_{2} {∥ A^{I} ∥}_{2}},

where

i \neq I \in {1, 2, \dots, m}

and the superscript I refer to selecting an index randomly from the set formed by the maximum row norm of A (if there are multiple maximum row norm values).

F (r o w)

means linear mapping from the row index set of A. By implementing the MS algorithm, we have partition

\hat{T} = {{\hat{τ}}_{1}, {\hat{τ}}_{2}, \dots, {\hat{τ}}_{p}}

of

{1, 2, \dots, m}

, i.e.,

A = [A_{{\hat{τ}}_{1}}; A_{{\hat{τ}}_{2}}; \dots, A_{{\hat{τ}}_{p}}] and b = [b_{{\hat{τ}}_{1}}; b_{{\hat{τ}}_{2}}; \dots, b_{{\hat{τ}}_{p}}] .

Similar to SKM, at the kth iteration, if

∥ b_{{\hat{τ}}_{s}} - A_{{\hat{τ}}_{s}} x_{k} ∥_{2}^{2} > {∥ b_{{\hat{τ}}_{i}} - A_{{\hat{τ}}_{i}} x_{k} ∥}_{2}^{2}

, then the

{\hat{τ}}_{s}

block is to be sampled rather than block

{\hat{τ}}_{i}

for all

i \neq s \in [p]

. Considering that a block extracted may be very large, we introduce an almost-maximum distance criterion in one iteration to control the iterative index set. Following this idea, the block Kaczmarz–Motzkin method with MS clustering is summarized in Algorithm 4.

Algorithm 4 Block Kaczmarz–Motzkin method with mean shift (BKMS).

Input: $A \in R^{m \times n}, b \in R^{m}$ , $x_{0} \in R^{n}$ , $l \in N^{+}$ , the bandwidth $δ \in (0, 1]$ , and the parameter $η \in (0, 1]$ .
Output: $x_{l}$
- Data set $X$ is obtained by data preprocessing on A
- Apply the MS( $δ$ ) method to $X$ to obtain partition $\hat{T} = {{\hat{τ}}_{1}, {\hat{τ}}_{2}, \dots, {\hat{τ}}_{p}}_{p \geq 1}$ from the set of row index sets ${1, 2, \dots, m}$
- for $k = 0, 1, \dots l - 1$ do
- Determine the partition of the indicator set
  
  $R_{k} = \{τ_{i_{k}} | τ_{i_{k}} = \arg max_{1 \leq j \leq p} {∥ r_{k}^{{\hat{τ}}_{j}} ∥}_{2}^{2}\}$
- Compute $ϵ_{k} = η max_{j \in τ_{i_{k}}} \{\frac{| b_{j} - A^{j} x_{k} |^{2}}{∥ A^{j} ∥_{2}^{2}}\}$
- Choose $τ_{k} = \{i_{k} | | b_{i_{k}} - A^{i_{k}} x_{k} |^{2} \geq ϵ_{k} {∥ A^{i_{k}} ∥}_{2}^{2}\}$
- Update $x_{k + 1} = x_{k} + A_{τ_{k}}^{†} (b_{τ_{k}} - A_{τ_{k}} x_{k})$
- end for

Remark 1.

We compress the row index set of A to the interval of

μ_{k}

, i.e.,

[1, m] \overset{F}{⟶} [μ_{k}^{m i n}, μ_{k}^{m a x}]

, and remark that the

F

is a linear map. In fact, if

\forall x, y \in {1, 2, \dots, m}

and

a, b \in R

, the following conditions hold:

F (a x + b y) = a F (x) + b F (y) .

Now, the convergence of the BKMS algorithm is analyzed. The fact guaranteed in [2] is required.

Lemma 3.

If

A \in R^{m \times n}

is a nonzero real matrix, then

σ_{m i n}^{2} {(A) ∥ u ∥}_{2}^{2} \leq {∥ A u ∥}_{2}^{2} \leq σ_{m a x}^{2} (A) {∥ u ∥}_{2}^{2}

for any

u \in r a n g e (A^{T})

.

The following theorem gives the convergence rate of the BKMS Algorithm 4 when (1) is consistent.

Theorem 1.

Let

{{\hat{τ}}_{p}}_{p \geq 1}

be the target blocks generated by Algorithm 3, Assume that

ζ \geq {max}_{τ} {∥ A_{τ} ∥}_{F}^{2}

and

α \geq max | τ |

for each

τ \in {τ_{p}}_{p \geq 1}

, where

| τ |

is the cardinality of τ. Given an initial guess

x_{0} \in R (A^{T})

, then the sequence

{x_{k}}_{0}^{\infty}

generated by the BKMS Algorithm 4 converges linearly to the least norm solution

x_{*} = A^{†} b

of (1). Furthermore,

∥ x_{1} - x_{*} ∥_{2}^{2} \leq [1 - \frac{η ∥ A_{τ_{0}} ∥_{F}^{2} σ_{m i n}^{2} (A)}{α p ζ σ_{m a x}^{2} (A_{τ_{0}})}] {∥ x_{0} - x_{*} ∥}_{2}^{2}

and

∥ x_{k + 1} - x_{*} ∥_{2}^{2} \leq [1 - \frac{η ∥ A_{τ_{k}} ∥_{F}^{2} σ_{m i n}^{2} (A)}{α (m - N_{k - 1}) ζ σ_{m a x}^{2} (A_{τ_{k}})}] {∥ x_{k} - x_{*} ∥}_{2}^{2}, k = 1, 2, \dots .

Then,

∥ x_{k + 1} - x_{*} ∥_{2}^{2} \leq {[1 - \frac{η ∥ A_{τ_{k}} ∥_{F}^{2} σ_{m i n}^{2} (A)}{α (m - N_{k - 1}) ζ σ_{m a x}^{2} (A_{τ_{k}})}]}^{k} \cdot [1 - \frac{η ∥ A_{τ_{0}} ∥_{F}^{2} σ_{m i n}^{2} (A)}{α p ζ σ_{m a x}^{2} (A_{τ_{0}})}] {∥ x_{0} - x_{*} ∥}_{2}^{2},

(6)

where

σ_{m a x}^{2} (A_{τ_{k}})

is the largest nonzero singular value of

A_{τ}

and

N_{k - 1}

is the cardinality of the set

τ_{k - 1}

.

Proof.

According to the (5) of Algorithm 4

x_{k + 1} - x_{k} = A_{τ_{k}}^{†} (b_{τ_{k}} - A_{τ_{k}} x_{k}) .

Since

A_{τ_{k}} x_{*} = b_{τ_{k}}

, one can see that

x_{k + 1} - x_{k} = A_{τ_{k}}^{†} (A_{τ_{k}} x_{*} - A_{τ_{k}} x_{k}) = A_{τ_{k}}^{†} A_{τ_{k}} (x_{*} - x_{k}),

which implies that

x_{k + 1} - x_{k}

belongs to the column space of matrix

A_{τ_{k}}^{†} A_{τ_{k}}

.

From the iteration scheme of the BKMS,

\begin{matrix} A_{τ_{k}} (x_{k + 1} - x_{*}) & = A_{τ_{k}} [x_{k} + A_{τ_{k}}^{†} (b_{τ_{k}} - A_{τ_{k}} x_{k})] - A_{τ_{k}} x_{*} \\ = A_{τ_{k}} x_{k} + A_{τ_{k}} A_{τ_{k}}^{†} b_{τ_{k}} - A_{τ_{k}} A_{τ_{k}}^{†} A_{τ_{k}} x_{k} - A_{τ_{k}} x_{*} \\ = A_{τ_{k}} x_{k} + A_{τ_{k}} x_{*} - A_{τ_{k}} x_{k} - A_{τ_{k}} x_{*} \\ = 0 . \end{matrix}

(7)

One can obtain that

A_{τ_{k}}^{†} A_{τ_{k}} (x_{k + 1} - x_{*}) = 0

, which shows

x_{k + 1} - x_{*}

is orthonormal to

A_{τ_{k}}^{†} A_{τ_{k}}

. So,

{(x_{k + 1} - x_{k})}^{T} (x_{k + 1} - x_{*}) = 0

is clearly established.

Thus, the following formula can be obtained

\begin{matrix} ∥ x_{k + 1} - x_{*} ∥_{2}^{2} & = ∥ x_{k} - x_{*} ∥_{2}^{2} - {∥ x_{k + 1} - x_{k} ∥}_{2}^{2} \\ = ∥ x_{k} - x_{*} ∥_{2}^{2} - {∥ A_{τ_{k}}^{†} A_{τ_{k}} (x_{k} - x_{*}) ∥}_{2}^{2} . \end{matrix}

(8)

On the other hand, according to the algorithm flow, we know that

∥ r_{k}^{τ_{i_{k}}} ∥_{2}^{2} = max_{i \in [p]} {∥ r_{k}^{τ_{i}} ∥}_{2}^{2} and \frac{∥ r_{k}^{τ_{i_{k}}} ∥_{2}^{2}}{∥ A_{τ_{i_{k}}} ∥_{F}^{2}} = max_{τ_{i_{k}} \in R_{k}} \frac{∥ r_{k}^{τ_{i}} ∥_{2}^{2}}{∥ A_{τ_{i}} ∥_{F}^{2}} .

So,

\begin{matrix} ∥ A_{τ_{k}}^{†} A_{τ_{k}} (x_{k} - x_{*}) ∥_{2}^{2} & \geq σ_{m i n}^{2} (A_{τ_{k}}^{†}) ∥ A_{τ_{k}} (x_{k} - x_{*}) ∥_{2}^{2} = \frac{1}{σ_{m a x}^{2} (A_{τ_{k}})} \sum_{i \in τ_{k}} {| A^{i} (x_{k} - x_{*}) |}^{2} \\ = \frac{1}{σ_{m a x}^{2} (A_{τ_{k}})} \sum_{i \in τ_{k}} {| r_{k}^{i} |}^{2} \\ \geq \frac{1}{σ_{m a x}^{2} (A_{τ_{k}})} \sum_{i \in τ_{k}} η max_{j \in τ_{i_{k}}} \frac{| r_{k}^{j} |^{2}}{∥ A^{j} ∥_{2}^{2}} {∥ A^{i} ∥}_{2}^{2} \\ \geq \frac{η ∥ A_{τ_{k}} ∥_{F}^{2}}{σ_{m a x}^{2} (A_{τ_{k}})} \frac{{max}_{j \in τ_{i_{k}}} {| r_{k}^{j} |}^{2}}{∥ A_{τ_{i_{k}}} ∥_{F}^{2}} . \end{matrix}

Let

α \geq max_{i \in [p]} | τ_{i} |

, where

| τ_{i} |

is the cardinality of

τ_{i}

, then

\begin{matrix} max_{j \in τ_{i_{k}}} {| r_{k}^{j} |}^{2} & = \frac{{max}_{j \in τ_{i_{k}}} {| r_{k}^{j} |}^{2}}{\sum_{j \in τ_{i_{k}}} {| r_{k}^{j} |}^{2}} ∥ r_{k}^{τ_{i_{k}}} ∥_{2}^{2} \geq \frac{1}{| τ_{i_{k}} |} ∥ r_{k}^{τ_{i_{k}}} ∥_{2}^{2} \geq \frac{1}{α} {∥ r_{k}^{τ_{i_{k}}} ∥}_{2}^{2} . \end{matrix}

So, one can obtain

∥ A_{τ_{k}}^{†} A_{τ_{k}} (x_{k} - x_{*}) ∥_{2}^{2} \geq \frac{η ∥ A_{τ_{k}} ∥_{F}^{2}}{α σ_{m a x}^{2} (A_{τ_{k}}) {∥ A_{τ_{i_{k}}} ∥}_{F}^{2}} max_{i \in [p]} {∥ r_{k}^{τ_{i}} ∥}_{2}^{2} .

For

k = 0

,

max_{i \in [p]} ∥ r_{0}^{τ_{i}} ∥_{2}^{2} = max_{i \in [p]} ∥ r_{0}^{τ_{i}} ∥_{2}^{2} \cdot \frac{∥ r_{0} ∥_{2}^{2}}{\sum_{i \in [p]} {∥ r_{0}^{τ_{i}} ∥}_{2}^{2}} \geq \frac{1}{p} {∥ r_{0} ∥}_{2}^{2},

then,

∥ A_{τ_{0}}^{†} A_{τ_{0}} (x_{1} - x_{*}) ∥_{2}^{2} \geq \frac{η ∥ A_{τ_{0}} ∥_{F}^{2} σ_{m i n}^{2} (A)}{α p σ_{m a x}^{2} (A_{τ_{0}}) {∥ A_{τ_{i_{0}}} ∥}_{F}^{2}} {∥ x_{0} - x_{*} ∥}_{2}^{2} .

(9)

Assume that

ζ \geq {max}_{τ} {∥ A_{τ} ∥}_{F}^{2}

for each

τ \in {τ_{p}}_{p \geq 1}

. Substituting (9) into (8), one has

∥ x_{1} - x_{*} ∥_{2}^{2} \leq [1 - \frac{η ∥ A_{τ_{0}} ∥_{F}^{2} σ_{m i n}^{2} (A)}{α p ζ σ_{m a x}^{2} (A_{τ_{0}})}] {∥ x_{0} - x_{*} ∥}_{2}^{2} .

(10)

For

k = 1, 2, \dots

,

max_{i \in [p]} ∥ r_{k}^{τ_{i}} ∥_{2}^{2} = \frac{{max}_{i \in [p]} {∥ r_{k}^{τ_{i}} ∥}_{2}^{2}}{\sum_{i \in [p]} {∥ r_{k}^{τ_{i}} ∥}_{2}^{2}} ∥ r_{k} ∥_{2}^{2} = \frac{{max}_{i \in [p]} {∥ r_{k}^{τ_{i}} ∥}_{2}^{2}}{\sum_{\binom{i \in [m]}{i \notin τ_{k - 1}}} {| r_{k}^{i} |}_{2}^{2}} ∥ r_{k} ∥_{2}^{2} \geq \frac{1}{m - N_{k - 1}} {∥ r_{k} ∥}_{2}^{2},

the establishment of the second equation above is guaranteed by the (7), then

∥ A_{τ_{k}}^{†} A_{τ_{k}} (x_{k} - x_{*}) ∥_{2}^{2} \geq \frac{η ∥ A_{τ_{k}} ∥_{F}^{2} σ_{m i n}^{2} (A)}{α (m - N_{k - 1}) ζ σ_{m a x}^{2} (A_{τ_{k}})} {∥ x_{k} - x_{*} ∥}_{2}^{2} .

(11)

Substituting (11) into (8), one has

∥ x_{k + 1} - x_{*} ∥_{2}^{2} \leq [1 - \frac{η ∥ A_{τ_{k}} ∥_{F}^{2} σ_{m i n}^{2} (A)}{α (m - N_{k - 1}) ζ σ_{m a x}^{2} (A_{τ_{k}})}] {∥ x_{k} - x_{*} ∥}_{2}^{2} .

(12)

The recurrence relation with the inequalities (10) and (12) results in (6). □

Remark 2.

The convergence rate of BKMS, which is denoted by

ρ_{B K M S} = ∥ x_{k + 1} - x_{*} ∥_{2}^{2} / {∥ x_{k} - x_{*} ∥}_{2}^{2}

, is subject to

0 < ρ_{B K M S} < 1

under the conditions of Theorem 1. In fact,

\begin{matrix} 1 - \frac{η ∥ A_{τ_{k}} ∥_{F}^{2} σ_{m i n}^{2} (A)}{α (m - N_{k - 1}) ζ σ_{m a x}^{2} (A_{τ_{k}})} & \leq 1 - \frac{η}{α (m - N_{k - 1})} \frac{∥ A_{τ_{k}} ∥_{F}^{2}}{σ_{m a x}^{2} (A_{τ_{k}})} \frac{σ_{m i n}^{2} (A)}{{max}_{i \in [p]} {∥ A_{τ_{i}} ∥}_{F}^{2}} \\ \leq 1 - \frac{η}{α (m - N_{k - 1})} \frac{σ_{m i n}^{2} (A)}{{max}_{i \in [p]} {∥ A_{τ_{i}} ∥}_{F}^{2}} \\ \leq 1 - \frac{η}{α (m - N_{k - 1})} \frac{σ_{m i n}^{2} (A)}{{∥ A ∥}_{F}^{2}} . \end{matrix}

The second inequality holds because

\frac{∥ A_{τ_{k}} ∥_{F}^{2}}{σ_{m a x}^{2} (A_{τ_{k}})} \geq 1

. Moreover,

0 < η \leq 1

,

0 < N_{k - 1}, α \leq m

and

0 < \frac{σ_{m i n}^{2} (A)}{{∥ A ∥}_{F}^{2}} \leq 1

. Then it holds that

0 < \frac{η}{α (m - N_{k - 1})} \frac{σ_{m i n}^{2} (A)}{{∥ A ∥}_{F}^{2}} < 1 .

Thus

0 < ρ_{B K M S} < 1

, which implies that the BKMS has a linear convergence.

4. Numerical Examples

In this section, the numerical performances of the RK, Motzkin, GRK, RBK, SKM, GRBK, and BKMS algorithms are compared in different settings of the least-squares problem (1). The operating environment of all experiments is the MATLAB 2020b version on a private computer with AMD Ryzen 5 4600H and 16 GB of memory. The relative solution error is denoted by

RSE : = \frac{∥ x_{k} - x_{*} ∥_{2}^{2}}{∥ x_{*} ∥_{2}^{2}},

which is used to measure the efficiency of all methods.

The initial vector is set as

x_{0} = {(0, 0, \dots, 0)}^{T}

for BKMS and other methods. We terminate the iteration process of each method once RSE

\leq 10^{- 6}

. The concept of a random partition sampling in [13,14] is sampling

τ_{k}

randomly from row partition

T = {τ_{1}, τ_{2}, \dots, τ_{p}}

at the k-th iteration, and for each i, the subset

τ_{i}

can be denoted as

τ_{i} = \{⌊ (i - 1) \frac{m}{p} ⌋ + 1, ⌊ (i - 1) \frac{m}{p} ⌋ + 2, \dots, ⌊ i \frac{m}{p} ⌋\}, i = 1, 2, \dots p .

The iterative least-squares solver CGLS in [14] is utilized to solve the underdetermined linear subsystem at each iteration since it requires fewer floating-point numbers between matrix–vector products than the MATLAB function “pinv”.

This section consists of three parts. In the first part (Examples 1–3), we implement MS clustering to visualize the row partition of the coefficient matrix A and verify that BKMS is more competitive than other methods to solve consistent linear systems (1) with different A. In the second part (Example 4), we consider the effects of parameters

δ

and

η

on the performance of BKMS. In the last part (Examples 5 and 6), we provide two practical applications of the BKMS method.

Example 1.

This example implements the partitions of different system matrices by adopting the MS Algorithm 3. It is known that an ill-conditioned matrix means a high linear correlation between the rows of the matrix [16]. In the left graph of Figure 1a, we tested a dense matrix

A \in R^{1000 \times 500}

with the condition number (cond(A)) being

2.60

, in the middle graph of Figure 1b, we tested a sparse matrix named “Cities” with cond(A) being 207.15, and in the right graph of Figure 1c, we tested an ill-conditioned sparse matrix named “relat6” with the cond(A) being 4.03 × 10¹⁶. Figure 1b plots the number of rows contained in each block of the matrices above, respectively.

As shown in Figure 1a, the rows with similar correlation coefficients

μ_{k}

are likely to be allocated in a block. This may lead to the considerable control index set

τ_{k}

in the BKMS method being very large, which is verified in Figure 1b. To alleviate this problem, we continue to partition the current block, so it is reasonable to introduce an almost-maximum distance rule in BKMS.

Example 2.

This example considers a linear system (1) with a tall (or fat) coefficient matrix

A \in R^{m \times n}

, i.e.,

m > n

(or

m < n

). Its elements are pseudorandom numbers from a standard normal distribution generated by the MATLAB function

r a n d n

. Table 1 reports the iteration steps (IT) and CPU time in seconds (CPU(s)) of RK, Motzkin, GRK, SKM, RBK(p), and GRBK(p) with the same number of blocks

p = 8

and BKMS (

δ_{o p t}, η

). The

δ_{o p t}

refers to the numerical optimum parameter rather than a theoretical one. We will further discuss the

δ_{o p t}

in Example 4. Figure 2 shows the numerical performance of all methods, in particular, we control RBK(p), GRBK(p), and BKMS(0.5, 0.1) to have the same number of blocks

p = 5

. Figure 2a shows the number of rows contained in each subblock of BKMS. Figure 2b,c depict the curves of RSE versus IT and RSE versus CPU(s) of Algorithm 4, applied to solve (1) with the coefficient matrix

A \in R^{1000 \times 500}

. In addition, we default

η = 0.01

for BKMS(δ) in all examples unless otherwise specified.

It is observed from Table 1 that RBK(p), GRBK(p), and BKMS(

δ

) are almost superior to RK, Motzkin, GRK, and SKM in terms of iteration steps and CPU time, thanks to the fact that block iteration takes multiple constraints. In addition, in block Kaczmarz methods, we find that BKMS(

δ

) is more competitive than RBK(p) and GRBK(p). For example, the CPU time speed-up of the BKMS(

δ

) over GRBK(p) ranges from 1.206 to 1.610. Specifically, the value 1.610 is calculated by a ratio of 2.790 (CPU time consumed by GRBK(5) with the size of A is 30,000 × 5000) over 1.733 (CPU time used by BKMS(

0.45, 0.1

) for the same size of A). Furthermore, from Figure 2, BKMS(

0.5

) converges in less iterations and CPU time than other methods.

Example 3.

This example executes the RK, Motzkin, GRK, SKM, RBK, GRBK, and BKMS methods to solve the sparse linear Equation (1). Table 2 summarizes different sparse matrices with different properties, which are taken from the Florida sparse matrix collection in [27]. The density of a matrix A means the percentage of nonzero elements of A. The IT and CPU times for all methods are reported in Table 2. Similar to Example 2, Figure 3a shows the number of rows contained in each subblock of BKMS(

0.5

), Figure 3b,c display the plots of RSE versus IT and RSE versus the CPU times of all methods, which are applied to solve a sparse linear system (1) named “WorldCities”.

From Table 2, RBK(p), GRBK(p), and BKMS(

δ

) require fewer iteration steps and CPU times. Similar to the block Kaczmarz methods above, we believe that the BKMS method has better performance. For example, we calculate the CPU time speed-up of the BKMS(

δ

) over GRBK(p) ranges from 1.692 to 3.493 for solving the (1) with a different sparse matrix A listed in Table 2. From Figure 3, we can see that BKMS converges faster than other methods. By carefully comparing Figure 3c, it is not difficult to find that SKM and RBK have similar time efficiencies. The reason is that calculating the Moore–Penrose pseudoinverse of large blocks reduces the efficiency of RBK. So, the number of coefficient matrix partitions affects the convergence speed of RBK, GRBK, and BKMS. Based on this, in Example 4, we will analyze the sensitivity of the parameters (

δ, η

) in BKMS.

Example 4.

This example investigates the effects of the parameters δ and η on the BKMS method. We use BKMS to solve system (1) with

A \in R^{5000 \times 1000}

and the sparse system (1) named “abtaha1”, respectively. Table 3 and Table 4 list the iteration steps and CPU time of BKMS(

δ, η

) once the RSE

< 10^{- 6}

. The “- -” means abandoning the introduction of the almost maximum distance rule (i.e., removing steps 5 and 6 in Algorithm 4). In fact, we usually care more about the CPU time, so we use enumeration to select optimal parameters δ and η. This tuning method is also called the grid search in machine learning. That is, find the corresponding different parameter values by locking the optimal value. For example, we select initialization seeds

δ = 0.7

and

η = 0.1

for BKMS(δ, η) in Table 3.

As depicted in Table 3, fix

δ = 0.3

, for the different

η

; although the number of iteration steps increases, the CPU time shows a trend of first falling and then rising. For each column, this law is roughly the same. Finding a suitable

δ

is crucial. When

δ

becomes larger, the number of blocks becomes smaller, which requires computing the Moore–Penrose pseudoinverse of a large block to be time-consuming. On the contrary, when

δ

becomes smaller, the number of blocks increases. Considering an extreme case, when A is row-normalized and there is only one row index in each block, BKMS is mathematically equivalent to the Motzkin method. A similar result also occurs in Table 4. The initialization seeds

δ = 0.3

and

η = 0.1

can be selected for BKMS(

δ

,

η

) in Table 4.

Example 5

(Discrete ill-posed problem). This example uses the BKMS method to the solution of Phillips ill-posed problem in [28] and compares it with RK, Motzkin, GRK, SKM, RBK, and GRBK methods. The linear systems (1) are obtained by the discretization of the Fredholm integral equation of the first kind,

\int_{- 6}^{6} K (s, t) ϕ (t) d t = f (s)

on the square

[- 6, 6] \times [- 6, 6]

, where the kernel function is presented by

K (s, t) = ϕ (s - t)

with

ϕ (x) = \{\begin{matrix} 1 + c o s (\frac{π x}{3}), & | x | < 3, \\ 0, & | x | \geq 3, \end{matrix}

and the right-hand side

f (s) = (6 - | s |) (1 + \frac{1}{2} c o s (\frac{s π}{3})) + \frac{9}{2 π} s i n (\frac{| s | π}{3}),

which is often severely ill-conditioned. In fact, ordinary linear systems (2) may yield poor approximate solutions in this case, then one of the effective ways to overcome this issue is ridge regression, also called the Tikhonov regularization, namely

min_{x \in R^{n}} {{∥ A x - b ∥}_{2}^{2} + {λ ∥ x ∥}_{2}^{2}},

which can be restated as

min_{x \in R^{n}} {{∥ A x - b ∥}_{2}^{2} + {λ ∥ x ∥}_{2}^{2}} = min_{x \in R^{n}} ∥ [\begin{matrix} A \\ \sqrt{λ} I_{n} \end{matrix}] x - [\begin{matrix} b \\ 0 \end{matrix}] ∥_{2}^{2} = min_{x \in R^{n}} {∥ \tilde{A} x - \tilde{b} ∥}_{2}^{2},

where

\tilde{A} \in R^{(n + n) \times n}

and

λ > 0

is a regularization parameter. The measurement matrix

A \in R^{n \times n}

, exact solution

x_{*} \in R^{n}

, and

b \in R^{n}

are all calculated by the MATLAB function

p h i l l i p s (n)

in [28].

Set

n = 1024

and

λ = 0.01

. Then the size of the square matrix A is

1024 \times 1024

and the condition number of A is

2.9043 \times 10^{10}

while

\tilde{A}

is

58.04

. In Figure 4, the (a) reports the approximation solution calculated by RK, Motzkin, GRK, SKM, RBK(4), GRBK(4), and BKMS(0.4, 0.8) together with the exact solution for the phillips test problem when the maximum number of iteration steps is 100. The (b) shows the partition performance of BKMS(

0.4, 0.8

). Table 5 lists the RSE, CPU time, and iteration of BKMS(0.4, 0.8) compared with RK, Motzkin, GRK, SKM, RBK(4), and GRBK(4) for the “phillips” problem when

R S E \leq 10^{- 10}

.

Figure 4 shows that the recovered solutions by RBK(4), GRBK(4), and BKMS(0.4, 0.8) are much closer to the exact solution than RK, Motzkin, GRK, and SKM do. It can be seen from Table 5 that BKMS obtains less RSE than other methods. In addition, BKMS needs less CPU time and iterations to reach the stop rule.

Example 6

(CT reconstruction problem). This example implements the reconstruction of a computer tomography (CT) image. This test problem can be implemented by the MATLAB function

p a r a l l e l t o m o (N, θ, p)

in the ART Tools package in [29]. We set

N = 50

,

θ = 0^{\circ} : 1^{\circ} : 178^{\circ}

and

p = 60

, then resulting in the size of A is 10,740 × 2500, the exact solution

x_{*}

(See the (a) of Figure 5) and the b we obtained by

b = A x_{*}

. The BKMS(0.25, 0.3) is applied to reconstruct

x_{*}

from b and compared with the RK, SKM, RBK(7), and GRBK(7) methods.

Figure 5 shows the recovered images by RK, SKM, RBK(7), GRBK(7), and BKMS(0.25, 0.3) together with the original image. Table 6 reports the RSE, the CPU time, and iteration steps of BKMS(0.25, 0.3) compared with RK, SKM, RBK(7), and GRBK(7) once

RSE \leq 10^{- 6}

. We set the maximum iterative number (

{IT}_{\max}

) to 8000.

It is shown from Figure 5 that all methods obtained well-restored images above. It can be seen from Table 6 that BKMS(0.25, 0.3) needs less CPU time and iteration steps than other methods for restoring images.

5. Conclusions

This work proposes a block Kaczmarz–Motzkin method with mean shift clustering as an efficient extension of the Kaczmarz algorithm to solve large consistent linear equations. The row correlation coefficient of the system matrix is utilized as the clustering criterion and this method does not need to specify the number of partitions in advance. The convergence theory of the BKMS method is established when the linear system is consistent. Several numerical examples demonstrate the effectiveness of the current method in this paper.

Although the effects of the clustering bandwidth

δ

and the parameter

η

are investigated in the numerical example, the theoretical analysis of

δ

affecting the performance of BKMS needs further discussion. According to the properties of the linear system, choosing a more appropriate clustering criterion may improve the algorithm performance; we will use these problems as part of our future work.

Author Contributions

Y.L.: software, validation, formal analysis, investigation, writing—original draft. T.L.: conceptualization, validation, formal analysis, writing—review and editing, supervision, funding acquisition. F.Y.: validation, formal analysis, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Project of the Department of Science and Technology of Sichuan Province (No. 2021ZYD0005), the Opening Project of the Key Laboratory of Higher Education of Sichuan Province (No. 2020WZJ01), the Scientific Project of SUSE (No. 2020RC24), and the Postgraduate Research Fund (No. y2021101).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors have no financial or proprietary interests in any material discussed in this article.

References

Kaczmarz, S. Angenäherte auflösung von systemen linearer Gleichungen. Bull. Int. Acad. Polon. Sci. Lett. 1937, 35, 355–357. [Google Scholar]
Byrne, C. A unified treatment of some iterative algorithms in signal processing and image reconstruction. Inverse Probl. 2004, 20, 103–120. [Google Scholar] [CrossRef] [Green Version]
Censor, Y. Parallel application of block-iterative methods in medical imaging and radiation therapy. Math. Program. 1988, 42, 307–325. [Google Scholar] [CrossRef]
Marijan, M.; Ignjatovic, Z. Nonlinear reconstruction of delta-sigma modulated signals: Randomized surrogate constraint decoding algorithm. IEEE Trans. Signal Process. 2013, 61, 5361–5373. [Google Scholar] [CrossRef]
Elble, J.M.; Sahinidis, N.V.; Vouzis, P. GPU computing with Kaczmarz and other iterative algorithms for linear systems. Parallel Comput. 2010, 36, 215–231. [Google Scholar] [CrossRef] [Green Version]
Pasqualetti, F.; Carli, R.; Bullo, F. Distributed estimation via iterative projections with application to power network monitoring. Automatica 2012, 48, 747–758. [Google Scholar] [CrossRef] [Green Version]
Strohmer, T.; Vershynin, R. A randomized Kaczmarz algorithm with exponential convergence. Fourier Anal. Appl. 2009, 15, 262–278. [Google Scholar] [CrossRef] [Green Version]
Bai, Z.Z.; Wu, W.T. On greedy randomized Kaczmarz method for solving large sparse linear systems. SIAM J. Sci. Comput. 2018, 40, A592–A606. [Google Scholar] [CrossRef]
De Loera, J.A.; Haddock, J.; Needell, D. A sampling Kaczmarz–Motzkin algorithm for linear feasibility. SIAM J. Sci. Comput. 2017, 39, S66–S87. [Google Scholar] [CrossRef] [Green Version]
Motzkin, T.S.; Schoenberg, I.J. The relaxation method for linear inequalities. Canad. J. Math. 1954, 6, 393–404. [Google Scholar] [CrossRef] [Green Version]
Elfving, T. Block-iterative methods for consistent and inconsistent linear equations. Numer. Math. 1980, 35, 1–12. [Google Scholar] [CrossRef]
Eggermont, P.; Herman, G.; Lent, A. Iterative algorithms for large partitioned linear systems with applications to image reconstruction. Linear Algebra Appl. 1981, 40, 37–67. [Google Scholar] [CrossRef] [Green Version]
Needell, D.; Tropp, J.A. Paved with good intentions: Analysis of a randomized block Kaczmarz method. Linear Algebra Appl. 2014, 441, 199–221. [Google Scholar] [CrossRef] [Green Version]
Liu, Y.; Gu, C.Q. On greedy randomized block Kaczmarz method for consistent linear systems. Linear Algebra Appl. 2021, 616, 178–200. [Google Scholar] [CrossRef]
Chen, J.Q.; Huang, Z.D. On a fast deterministic block Kaczmarz method for solving large-scale linear systems. Numer. Algorithms 2022, 89, 1007–1029. [Google Scholar] [CrossRef]
Jiang, X.L.; Zhang, K.; Yin, J.F. Randomized block Kaczmarz methods with k-means clustering for solving lare linear systems. Comput. Appl. Math. 2022, 413, 113828. [Google Scholar] [CrossRef]
Ma, A.; Needell, D.; Ramdas, A. Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methods. SIAM J. Matrix Anal. Appl. 2015, 36, 1590–1604. [Google Scholar] [CrossRef] [Green Version]
Necoara, I. Faster randomized block Kaczmarz algorithms. SIAM J. Matrix Anal. Appl. 2019, 40, 1425–1452. [Google Scholar] [CrossRef]
Needell, D.; Zhao, R.; Zouzias, A. Randomized block Kaczmarz method with projection for solving least squares. Linear Algebra Appl. 2015, 484, 322–343. [Google Scholar] [CrossRef]
Wen, L.; Yin, F.; Liao, Y.M.; Huang, G.X. A greedy average block Kaczmarz method for the large scaled consistent system of linear equations. AIMS Math. 2022, 7, 6792–6806. [Google Scholar] [CrossRef]
Zhang, Y.J.; Li, H.Y. Block sampling Kaczmarz-Motzkin methods for consistent linear systems. Calcolo 2021, 58, 39. [Google Scholar] [CrossRef]
Needell, D.; Ward, R. Two-Subspace Projection Method for Coherent Overdetermined Systems. Fourier Anal. Appl. 2013, 19, 256–269. [Google Scholar] [CrossRef] [Green Version]
Izenman, A.J. Density Estimation for Statistics and Data Analysis. J. Am. Stat. Assoc. 1988, 83, 269. [Google Scholar] [CrossRef]
Jain, A.K. Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 2010, 31, 651–666. [Google Scholar] [CrossRef]
Fukunaga, K.; Hostetler, L. The estimation of the gradient of a density function with applications in pattern recognition. IEEE Trans. Inform. Theory 1975, 21, 32–40. [Google Scholar] [CrossRef] [Green Version]
Ghassabeh, Y.A.; Rudzicz, F. Modified mean shift algorithm. IET Image Process. 2018, 12, 2172–2177. [Google Scholar] [CrossRef]
Davis, T.A.; Hu, Y. The university of florida sparse matrix collection. ACM Trans. Math. Softw. 2011, 38, 1–25. [Google Scholar] [CrossRef]
Hansen, P.C. Regularization Tools: A Matlab package for analysis and solution of discrete ill-posed problems. Numer. Algorithms 1994, 6, 1–35. [Google Scholar] [CrossRef] [Green Version]
Gordon, R.; Bender, R.; Herman, G. Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and X-ray photography. J. Theor. Biol. 1970, 29, 471–481. [Google Scholar] [CrossRef]

Figure 1. Clustering performance of the MS(

δ

) Algorithm 3 on matrices

A \in R^{1000 \times 500}

with

δ = 0.45

(left), “Cities” with

δ = 0.45

(middle), and “relat6” with

δ = 0.3

(right). (a) The cluster visualizations of system matrix

A \in R^{1000 \times 500}

(left), “Cities” (middle), and “relat6” (right). (b) The number of rows contained in each subblock of the matrix

A \in R^{1000 \times 500}

(left), “Cities” (middle), and “relat6” (right).

Figure 1. Clustering performance of the MS(

δ

) Algorithm 3 on matrices

A \in R^{1000 \times 500}

with

δ = 0.45

(left), “Cities” with

δ = 0.45

(middle), and “relat6” with

δ = 0.3

(right). (a) The cluster visualizations of system matrix

A \in R^{1000 \times 500}

(left), “Cities” (middle), and “relat6” (right). (b) The number of rows contained in each subblock of the matrix

A \in R^{1000 \times 500}

(left), “Cities” (middle), and “relat6” (right).

Figure 2. (a) The number of rows contained in each partition in BKMS with

δ = 0.5

. The RSE versus IT (b) and RSE versus CPU (c) of BKMS compared with other methods for a consistent linear system with

A \in R^{1000 \times 500}

.

Figure 2. (a) The number of rows contained in each partition in BKMS with

δ = 0.5

. The RSE versus IT (b) and RSE versus CPU (c) of BKMS compared with other methods for a consistent linear system with

A \in R^{1000 \times 500}

.

Figure 3. (a) The number of rows contained in each partition in BKMS with

δ = 0.5

. The RSE versus IT (b) and RSE versus CPU (c) of BKMS compared with other methods for the consistent sparse linear system named “WorldCities”.

Figure 3. (a) The number of rows contained in each partition in BKMS with

δ = 0.5

. The RSE versus IT (b) and RSE versus CPU (c) of BKMS compared with other methods for the consistent sparse linear system named “WorldCities”.

Figure 4. (a) The approximation solutions of all methods for the Phillips test problem compare with the exact solution. (b) The number of rows contained in each partition of BKMS(

0.4, 0.8

).

Figure 4. (a) The approximation solutions of all methods for the Phillips test problem compare with the exact solution. (b) The number of rows contained in each partition of BKMS(

0.4, 0.8

).

Figure 5. The original "phantom” image (a), the recovered images by RK (b), Motzkin (c), SKM (d), RBK (e), GRBK (f), and BKMS (g). The visualization of row partitions by MS clustering (h).

Table 1. IT and CPU of all methods for the m-by-n consistent linear system.

Name	$m \times n$	10,000 × 5000	15,000 × 5000	20,000 × 5000	25,000 × 5000	30,000 × 5000
$δ$		0.45	0.40	0.50	0.55	0.45
RK	IT	$> 2 \times 10^{5}$	$> 2 \times 10^{5}$	$> 2 \times 10^{5}$	$> 2 \times 10^{5}$	$> 2 \times 10^{5}$
RK	CPU(s)	-	-	-	-	-
Motzkin	IT	54,496	25,199	17,208	13,473	11,566
Motzkin	CPU(s)	670.145	432.064	414.186	448.784	407.672
GRK	IT	58,073	25,151	17,593	13,887	11,934
GRK	CPU(s)	710.741	462.309	431.177	421.825	427.727
SKM	IT	55,246	25,083	17,134	13,308	11,587
SKM	CPU(s)	665.832	418.554	408.865	440.814	406.462
RBK	IT	271	107	64	49	39
RBK	CPU(s)	9.491	4.522	3.492	3.341	3.234
GRBK	IT	180	76	50	39	32
GRBK	CPU(s)	6.527	3.486	2.977	2.883	2.790
BKMS( $δ$ )	IT	55	52	28	22	18
BKMS( $δ$ )	CPU(s)	4.589	2.698	2.018	1.886	1.733
name	$m \times n$	5000 × 10,000	5000 × 20,000	5000 × 30,000	5000× 40,000	5000 × 50,000
$δ$		0.65	0.70	0.80	0.75	0.85
RK	IT	$> 2 \times 10^{5}$	$> 2 \times 10^{5}$	$> 2 \times 10^{5}$	$> 2 \times 10^{5}$	$> 2 \times 10^{5}$
RK	CPU(s)	-	-	-	-	-
Motzkin	IT	61,796	25,685	19,437	17,011	15,248
Motzkin	CPU(s)	734.014	609.400	696.297	798.004	910.382
GRK	IT	60,858	26,458	19,697	17,048	15,393
GRK	CPU(s)	726.831	633.951	773.126	918.833	954.293
SKM	IT	61,924	25,907	19,327	16,918	15,299
SKM	CPU(s)	725.727	598.987	684.460	785.677	906.202
RBK	IT	357	174	122	132	90
RBK	CPU(s)	14.832	14.124	14.812	23.130	19.744
GRBK	IT	173	66	52	44	41
GRBK	CPU(s)	5.490	4.336	5.062	6.261	7.265
BKMS( $δ$ )	IT	98	33	25	26	14
BKMS( $δ$ )	CPU(s)	4.552	3.550	4.154	4.434	4.524

Table 2. IT and CPU of all methods for the sparse consistent linear system.

Name		abtaha1	relat6	model1	mk11-b2	Cites	WorldCities
$m \times n$		$14,596 \times 209$	$2340 \times 157$	$362 \times 798$	$6930 \times 990$	$55 \times 46$	$315 \times 100$
density		$1.68 %$	$2.21 %$	$1.05 %$	$6.52 %$	53.04%	23.87%
cond(A)		12.23	4.03 × 10¹⁶	17.57	1.05 × 10¹⁶	207.15	66.0
full rank		Yes	No	Yes	No	Yes	Yes
p		4	3	4	10	3	7
$δ$		0.5	0.6	0.4	0.45	0.8	0.5
RK	IT	30,654	9717	22,196	13,093	50,815	38,031
RK	CPU(s)	61.699	1.245	1.622	42.695	0.662	0.878
Motzkin	IT	704	1501	3908	1531	39,838	9006
Motzkin	CPU(s)	1.232	0.134	0.233	4.776	0.145	0.090
GRK	IT	608	1467	3492	1675	13,590	4834
GRK	CPU(s)	0.585	0.114	0.172	2.822	0.213	0.106
SKM	IT	2074	1389	5433	1599	20,936	9733
SKM	CPU(s)	1.795	0.072	0.181	2.553	0.084	0.085
RBK( $p$ )	IT	167	205	849	72	1700	85
RBK( $p$ )	CPU(s)	0.681	0.095	0.21	0.387	0.265	0.082
GRBK( $p$ )	IT	70	142	772	46	1547	45
GRBK( $p$ )	CPU(s)	0.248	0.045	0.133	0.251	0.153	0.044
BKMS( $δ$ )	IT	36	127	711	15	162	23
BKMS( $δ$ )	CPU(s)	0.071	0.024	0.076	0.144	0.052	0.026

Table 3. Sensitivity analysis of BKMS(

δ, η

) for system (1) with

A \in R^{5000 \times 1000}

.

Table 3. Sensitivity analysis of BKMS(

δ, η

) for system (1) with

A \in R^{5000 \times 1000}

.

		0.1	0.3	0.5	0.7	0.9
η		0.1	0.3	0.5	0.7	0.9
- -	IT	199	10	2	2	2
- -	CPU(s)	2.196	1.227	0.323	0.453	0.469
0.1	IT	255	43	9	2	2
0.1	CPU(s)	1.716	0.808	0.646	0.207	0.226
0.3	IT	337	72	21	14	12
0.3	CPU(s)	1.673	0.741	0.440	0.555	0.552
0.5	IT	900	256	87	56	57
0.5	CPU(s)	2.542	1.126	0.388	0.267	0.253
0.7	IT	1742	636	246	158	144
0.7	CPU(s)	3.796	1.820	0.564	0.379	0.346
0.9	IT	4141	2276	933	694	634
0.9	CPU(s)	8.221	4.353	1.540	1.182	1.075
	Block Size	148	14	5	3	1

Table 4. Sensitivity analysis of BKMS(

δ, η

) for the sparse system (1) with the matrix A named “abtaha1”.

Table 4. Sensitivity analysis of BKMS(

δ, η

) for the sparse system (1) with the matrix A named “abtaha1”.

		0.1	0.3	0.5	0.7	0.9
η		0.1	0.3	0.5	0.7	0.9
- -	IT	66	65	160	170	134
- -	CPU(s)	0.1440	0.2360	0.8680	0.8730	1.2940
0.1	IT	67	54	58	62	57
0.1	CPU(s)	0.0900	0.0850	0.0890	0.0990	0.0990
0.3	IT	87	88	70	87	76
0.3	CPU(s)	0.1020	0.1080	0.0850	0.1160	0.1020
0.5	IT	103	109	111	108	91
0.5	CPU(s)	0.1170	0.1220	0.1260	0.1320	0.1140
0.7	IT	145	184	149	172	98
0.7	CPU(s)	0.1560	0.2040	0.1680	0.2020	0.1200
0.9	IT	343	391	270	519	195
0.9	CPU(s)	0.3720	0.4140	0.2920	0.5980	0.2380
	Block Size	26	9	5	2	1

Table 5. Numerical results for the “phillips” test problem.

Method	RSE	CPU Time	Iterations
RK	5.360 × 10⁻³	1.220 × 10⁻¹	100
Motzkin	1.934 × 10⁻⁷	6.385 × 10⁻²	100
GRK	3.382 × 10⁻⁵	6.473 × 10⁻²	100
SKM	1.826 × 10⁻⁷	5.904 × 10⁻²	100
RBK(4)	3.880 × 10⁻¹³	7.183 × 10⁻²	2
GRBK(4)	3.821 × 10⁻¹³	6.999 × 10⁻²	2
BKMS( $0.4, 0.8$ )	3.447 × 10⁻¹⁵	5.320 × 10⁻³	2

Table 6. Numerical results for restoring the “phantom” image.

Method	RSE	CPU Time	Iterations
RK	6.994 × 10⁻²	104.653	8000
Motzkin	2.681 × 10⁻³	87.496	8000
SKM	2.645 × 10⁻³	82.193	8000
RBK(7)	9.935 × 10⁻⁷	136.756	221
GRBK(7)	9.818 × 10⁻⁷	72.386	118
BKMS(0.25, 0.3)	9.740 × 10⁻⁷	32.361	104

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liao, Y.; Lu, T.; Yin, F. Block Kaczmarz–Motzkin Method via Mean Shift Clustering. Mathematics 2022, 10, 2408. https://doi.org/10.3390/math10142408

AMA Style

Liao Y, Lu T, Yin F. Block Kaczmarz–Motzkin Method via Mean Shift Clustering. Mathematics. 2022; 10(14):2408. https://doi.org/10.3390/math10142408

Chicago/Turabian Style

Liao, Yimou, Tianxiu Lu, and Feng Yin. 2022. "Block Kaczmarz–Motzkin Method via Mean Shift Clustering" Mathematics 10, no. 14: 2408. https://doi.org/10.3390/math10142408

APA Style

Liao, Y., Lu, T., & Yin, F. (2022). Block Kaczmarz–Motzkin Method via Mean Shift Clustering. Mathematics, 10(14), 2408. https://doi.org/10.3390/math10142408

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Block Kaczmarz–Motzkin Method via Mean Shift Clustering

Abstract

1. Introduction

2. Block Kaczmarz Algorithm and Its Variants

2.1. The RBK Algorithm

2.2. The GRBK Algorithm

3. Block Kaczmarz–Motzkin Methods with Mean Shift Clustering

3.1. The Mean Shift Clustering

3.2. Block Kaczmarz–Motzkin Method with Mean Shift

4. Numerical Examples

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI