On Iterative Algorithms with Different Mapping in Each Iteration

Tay, David B.

doi:10.3390/a19060470

Open AccessArticle

On Iterative Algorithms with Different Mapping in Each Iteration

by

David B. Tay

School of Information Technology, Deakin University, 75 Pigdons Road, Waurn Ponds, VIC 3216, Australia

Algorithms 2026, 19(6), 470; https://doi.org/10.3390/a19060470 (registering DOI)

Submission received: 31 March 2026 / Revised: 2 June 2026 / Accepted: 5 June 2026 / Published: 9 June 2026

Download

Browse Figure

Versions Notes

Abstract

Algorithm unrolling (unfolding) is a process where an existing iterative algorithm is converted into another iterative algorithm, but the mapping in each iteration of the new algorithm can potentially be different. An abstraction is to consider a sequence of mappings

T_{m}

, where each mapping potentially acts on a different metric space

(X_{m}, d_{m})

. We study the iterates from the sequence of mappings and derive conditions for convergence. The first result is when both the mapping and metric space are different in each iteration. The second result is when all metric spaces are the same, but the mapping is different in each iteration. The second result can be considered as a generalization of the Banach Fixed Point theorem. A concrete practical example is the unrolling of the Iterative Shrinkage–Thresholding Algorithm, which has applications in statistics, machine learning and signal processing. The convergence of this example will be analyzed with the aid of the result established through this work.

Keywords:

algorithm unrolling; metric spaces; contraction mapping; convergence; Fixed Point theorem

1. Introduction

Recently there has been an increasing interest in developing AI (artificial intelligence) algorithms using the technique of algorithm unrolling [1]. The motivation behind algorithm unrolling is to achieve some interpretability in the AI algorithm, unlike many other AI algorithms which are usually considered ’black boxes’. The main idea with algorithm unrolling is the conversion of an iterative algorithm, e.g., Iterative Shrinkage–Thresholding Algorithm (ISTA) [2,3,4,5], into another iterative algorithm with a different mapping in each iteration. Algorithm unrolling has been successfully applied in applications such as compressed sensing [6], phase retrieval [7], power systems [8], image fusion [9] and microscopy [10]. The original iterative algorithm is usually derived from theoretical and/or physical consideration of the problem at hand. The unrolled algorithm inherits some of the theoretical and/or physical features from the original algorithm, and therefore has some aspects of interpretability. For example with ISTA, the theoretical consideration is to achieve a sparse representation of a signal vector given a dictionary of atom (constituent) vectors. An abstraction of algorithm unrolling is to consider a sequence of mappings between metric spaces, where the application of a mapping can be considered as one iteration of the algorithm. This abstraction is also relevant in other areas like non-autonomous contraction mappings [11].

Consider a sequence of metric spaces

(X_{m}, d_{m})

,

m = 0, 1, 2, \dots

, where

X_{m}

denotes the set of elements (points) and

d_{m} (•, •)

denotes the corresponding distance function. The sequence of mappings

T_{m}

is between the metric spaces:

T_{m} : X_{m - 1} \to X_{m}

for

m = 1, \dots

, as illustrated in Figure 1. Starting with an initial point

x^{(0)} \in X_{0}

, we consider the new point after the application of the sequence of mappings:

x^{(M)} = T_{M} \circ \dots \circ T_{1} (x^{(0)}) \in X_{M},

(1)

where the symbol ‘∘’ denotes the composition of mappings. The superscript ‘

(m)

’ in

x^{(m)}

denotes the index of the space the point belongs to, i.e.,

x^{(m)} \in X_{m}

. However if the point is obtained using (1), m also represents the index of the term in the sequence. In other words,

x^{(m)}

is the

m^{t h}

term in the sequence

x^{(0)}, x^{(1)}, \dots

, where every term in the sequence, in general, belongs to a different space.

When all metric spaces are the same, i.e.,

X_{m} = X

,

\forall m

, and all mappings are the same, i.e.,

T_{m} = T

,

\forall m

, (1) represents the classical iteration procedure found in many branches of applied mathematics and data science, e.g., Jacobi iterations and Gauss–Seidel iterations [12]. Furthermore, when the mapping T is a contraction, the Banach Fixed Point theorem applies, and this also has applications in the study of differential and integral equations [12].

In practice, with algorithm unrolling, the different mappings are obtained through parametrization; i.e.,

T_{m} (•) = T (•; κ_{m})

where

κ_{m}

is the parameter vector that defines the mapping in each iteration.

Sequences of contraction mappings have also previously been considered in [13,14,15,16]. However the context considered in [13,14,15,16] is different to that considered here. The works in [13,14,15,16] are mainly interested in the convergence of the sequence of fixed points from the sequence of contraction mappings

T_{1}, T_{2}, \dots

, i.e., the convergence of

x_{n}^{*}

where

T_{n} (x_{n}^{*}) = x_{n}^{*}

. These previous works do not consider the tandem application of the sequence of mappings as shown in (1). In this work, we study the behavior of (1) as

M \to \infty

and prove some convergence results. More detailed comparisons are deferred to Section 5. Application to the unrolled ISTA will also be considered. To the best of our knowledge, similar results are not found elsewhere.

Organization of paper: In Section 2, we review some definitions and concepts in metric spaces that are relevant to this work. New definitions which are generalizations of classical notions will also be presented here. The convergence in the general case, with different mappings and different metric spaces, is analyzed in Section 3. In Section 4, the convergence with different mappings on the same metric space is analyzed. Detailed comparisons with previous works, which considered sequences of mappings, are found in Section 5. The general convergence result is then used to analyze the convergence of the unrolled ISTA in Section 6. Concluding remarks are found in Section 7.

2. Preliminaries and Definitions

Firstly, we provide some comments about the notation used in the paper. As mentioned earlier, the superscript ‘

(m)

’ in

x^{(m)}

denotes the index of the space the point belongs to, i.e.,

x^{(m)} \in X_{m}

. In general, the spaces are different; i.e.,

X_{m_{1}} \neq X_{m_{2}}

for

m_{1} \neq m_{2}

. Since we are concerned with a sequence of mappings from one space to another space, m also represents the index of the sequence; i.e.,

(x^{(0)}, x^{(1)}, \dots)

represents a sequence of points in different spaces (in general). In the special case when all the spaces are the same, we use a subscript to denote the index of the sequence, i.e.,

x_{0}, x_{1}, \dots

, which is the common convention for sequences.

We first recall some basic definitions for metric spaces [12] which are relevant to the developments that follow. A metric space is a set of points (elements) X and an endowed distance function

d (•, •)

that satisfies the following axioms. For all

x, y, z \in X

, we have

A1: $d (x, y)$ is a non-negative finite-valued function.
A2: $d (x, y) = 0$ if and only if $x = y$ .
A3: $d (x, y) = d (y, x)$ ; i.e., the distance function is symmetric.
A4: $d (x, y) \leq d (x, z) + d (z, y)$ which is known as the triangle inequality.

Using induction on axiom A4, we have the following.

Definition 1 (Generalized triangle inequality).

For all

x_{1}, x_{2}, \dots, x_{K} \in X

,

d (x_{1}, x_{K}) \leq \sum_{k = 1}^{K - 1} d (x_{k}, x_{k + 1})

(2)

Definition 2 (Cauchy sequence and completeness).

A sequence

x_{k} \in X

(

k = 1, \dots

) is a Cauchy sequence if for every

ϵ > 0

, there exists

K = K (ϵ)

such that

d (x_{k_{1}}, x_{k_{2}}) < ϵ f o r e v e r y k_{1}, k_{2} > K (ϵ) .

If every Cauchy sequence

x_{k}

converges to a limit point, i.e.,

\lim_{k \to \infty} x_{k} = x_{limit},

then the space X is complete.

Consider a mapping

T : X ⟶ X

which maps points in X onto itself. The mapping T is a contraction if there exists a constant

0 < \hat{α} < 1

such that for all

x, y \in X

d (T (x), T (y)) \leq \hat{α} d (x, y)

With a contraction mapping, we have the well-known Banach Fixed Point Theorem (BFPT).

Theorem 1 (BFPT).

Consider a complete metric space

(X, d)

and a contraction mapping T:

X ⟶ X

. We have the following:

1.: There exists a point $x^{*} \in X$ such that

$T (x^{*}) = x^{*} .$

The point $x^{*}$ is unique and is known as the fixed point of T.
2.: Given any initial point $x_{0} \in X$ , the sequence of points $x_{1}, x_{2}, \dots$ generated from the iterations

$x_{k} = T (x_{k - 1}) f o r k \geq 1$

converges to a fixed point; i.e.,

$\lim_{k \to \infty} x_{k} = x^{*}$

The discussions above are for a single metric space and a single mapping. However, in this work we are considering the general case of multiple metric spaces and multiple mappings as described in (1). Some of the results above are still relevant, e.g., axioms A1–A4 and Definitions 1 and 2, when we are considering each metric space in the sequence in isolation. The relationships above will have superscript ^(m) for the elements of the metric space

X_{m}

with the corresponding distance function

d_{m}

. However, new definitions are needed, when multiple metric spaces and mappings are involved.

Definition 3 (Lipschitz coefficient).

For the map

T_{m} : X_{m - 1} \to X_{m}

, let

S_{m} \equiv {(x_{A}^{(m - 1)}, x_{B}^{(m - 1)}) : x_{A}^{(m - 1)} \neq x_{B}^{(m - 1)}, x_{A}^{(m - 1)}, x_{B}^{(m - 1)} \in X_{m - 1}} .

The coefficient

α_{m}

of the map is defined as

α_{m} \equiv \sup_{S_{m}} \frac{d_{m} (T_{m} (x_{A}^{(m - 1)}), T_{m} (x_{B}^{(m - 1)}))}{d_{m - 1} (x_{A}^{(m - 1)}, x_{B}^{(m - 1)})}

Note that since

x_{A}^{(m - 1)} \neq x_{B}^{(m - 1)}

, by axiom A2, the denominator is non-zero. If

T_{m}

is a trivial constant map, then the numerator and

α_{m}

are equal to zero. We only consider non-trivial maps such that $α_{m}$ is positive. If $α_{m}$ is finite-valued, we have the following inequality:

d_{m} (T_{m} (x_{A}^{(m - 1)}), T_{m} (x_{B}^{(m - 1)})) \leq α_{m} d_{m - 1} (x_{A}^{(m - 1)}, x_{B}^{(m - 1)})

(3)

Note that when

x_{A}^{(m - 1)} = x_{B}^{(m - 1)}

,

T_{m} (x_{A}^{(m - 1)}) = T_{m} (x_{B}^{(m - 1)})

, and both sides of (3) are equal to zero; i.e., (3) is still valid.

Definition 4 (Contraction mapping).

The (non-trivial) mapping

T_{m}

is a contraction if

0 < α_{m} < 1

and we have

\begin{matrix} d_{m} (T_{m} (x_{A}^{(m - 1)}), T_{m} (x_{B}^{(m - 1)})) & \leq & α_{m} d_{m - 1} (x_{A}^{(m - 1)}, x_{B}^{(m - 1)}) \\ \leq & d_{m - 1} (x_{A}^{(m - 1)}, x_{B}^{(m - 1)}) \end{matrix}

(4)

\forall x_{A}^{(m - 1)}, x_{B}^{(m - 1)} \in X_{m - 1}

.

Definitions 3 and 4 are generalizations of the classical notions in a single metric space and mapping to sequences of metric spaces and mappings. Note that inequality (3) applies to any mapping

T_{m}

, but (4) applies only to mappings that are contractions.

3. Different Spaces and Different Mappings

Suppose we start the iterations in (1) with two different initial points

x_{A}^{(0)} \in X_{0}

and

x_{B}^{(0)} \in X_{0}

. We have the following result.

Theorem 2.

Suppose the mappings

T_{m}

(

m = 1, \dots

) satisfy the following conditions:

1.: The Lipschitz coefficients $α_{m}$ (for all m) are positive and finite; i.e., $0 < α_{m} < \infty$ .
2.: There exists a finite positive integer L such that

$\hat{α} \equiv \sup_{m \geq L} α_{m} \leq c < 1 .$

(5)

Then we have

\lim_{M \to \infty} d_{M} (x_{A}^{(M)}, x_{B}^{(M)}) = 0

(6)

for any

x_{A}^{(0)} \in X_{0}

and

x_{B}^{(0)} \in X_{0}

.

Proof.

If

x_{A}^{(0)} = x_{B}^{(0)}

, then

x_{A}^{(m)} = x_{B}^{(m)}

for all m, and (6) is satisfied. We then consider the general case when

x_{A}^{(0)} \neq x_{B}^{(0)}

. By definition

d_{M} (x_{A}^{(M)}, x_{B}^{(M)}) = d_{M} (T_{M} (x_{A}^{(M - 1)}), T_{M} (x_{B}^{(M - 1)})) .

Using inequality (3), we have

\begin{matrix} d_{M} (x_{A}^{(M)}, x_{B}^{(M)}) & \leq & α_{M} d_{M - 1} (x_{A}^{(M - 1)}, x_{B}^{(M - 1)}) \\ = & α_{M} d_{M - 1} (T_{M - 1} (x_{A}^{(M - 2)}), T_{M - 1} (x_{B}^{(M - 2)})) \\ \leq & α_{M} α_{M - 1} d_{M - 2} (x_{A}^{(M - 2)}, x_{B}^{(M - 2)}) \\ ⋮ \\ \leq & α_{M} α_{M - 1} \dots α_{1} d_{0} (x_{A}^{(0)}, x_{B}^{(0)}) \\ = & (\prod_{k = L}^{M} α_{k}) (\prod_{k = 1}^{L - 1} α_{k}) d_{0} (x_{A}^{(0)}, x_{B}^{(0)}) \end{matrix}

(7)

Condition (5) implies that

0 < α_{m} \leq \hat{α} < 1 for m \geq L

(8)

Using (8) in (7), we have

\begin{matrix} d_{M} (x_{A}^{(M)}, x_{B}^{(M)}) & \leq & {\hat{α}}^{M - L + 1} (\prod_{k = 1}^{L - 1} α_{k}) d_{0} (x_{A}^{(0)}, x_{B}^{(0)}) \\ = & {\hat{α}}^{M} \underset{\equiv K (L, x_{A}^{(0)}, x_{B}^{(0)})}{\underset{︸}{{\hat{α}}^{- L + 1} (\prod_{k = 1}^{L - 1} α_{k}) d_{0} (x_{A}^{(0)}, x_{B}^{(0)})}} \end{matrix}

∴ d_{M} (x_{A}^{(M)}, x_{B}^{(M)}) \leq {\hat{α}}^{M} K (L, x_{A}^{(0)}, x_{B}^{(0)})

(9)

Now

K (L, x_{A}^{(0)}, x_{B}^{(0)})

consists of a finite product of the terms involving

\hat{α}

,

α_{k}

and

d_{0} (x_{A}^{(0)}, x_{B}^{(0)})

. Since

\hat{α}

,

α_{k}

and

d_{0} (x_{A}^{(0)}, x_{B}^{(0)})

are all finite,

K (L, x_{A}^{(0)}, x_{B}^{(0)})

will also be finite. As a consequence of (8), the term

{\hat{α}}^{M}

can be made arbitrarily small by choosing a sufficiently large M; i.e.,

\lim_{M \to \infty} {\hat{α}}^{M} = 0 .

Therefore we have (6). □

Remark 1.

1.: Theorem 2 shows that not every mapping needs to be contractive for convergence. As long as the mappings become contractive after a finite number of iterations, convergence is achieved.
2.: It is important to note that, starting from an initial point $x_{A}^{(0)}$ , the iterates do not necessarily converge to a limit point. Rather, the initial point is immaterial, after a sufficiently large number of iterations, to the eventual sequence of iterates; i.e., $x_{A}^{(M)} \approx x_{B}^{(M)}$ for large M—trajectory convergence. This is useful, from a practical perspective, as one does not need to be too concerned with the initialization of the iterative algorithm, as long as there is a sufficient number of iterations.
3.: Note that, since the iterates $x^{(m)}$ belong to different function spaces in general, the notion of a Cauchy sequence is not relevant here. In a Cauchy sequence (Definition 2), all terms belong to the same space. Even when all iterates belong to the same space, trajectory convergence does not imply convergence to a limit point—see the example below.
4.: Conditions for convergence to a limit point are considered in Section 4.

Example 1.

Consider a simple example, where the metric space is the Banach space of vectors in

X_{m} = R^{q}

(for all m), with the following mapping:

T_{m} (x) = K e^{- α m} x + m^{2} 1

where

K > 1

,

α > 0

and

1

is the vector with all ones. We then have

d (x_{m}^{A}, x_{m}^{B}) = | | K e^{- α m} (x_{m - 1}^{A} - x_{m - 1}^{B}) | | = K e^{- α m} | | x_{m - 1}^{A} - x_{m - 1}^{B} | | = K e^{- α m} d (x_{m - 1}^{A}, x_{m - 1}^{B})

For trajectory convergence, we require

K e^{- α m} < 1

, which is achieved when

m > \frac{1}{α} \ln K

However, when we consider the difference between two successive iterates, we have

x_{m} - x_{m - 1} = (K e^{- α m} - 1) x_{m - 1} + m^{2} 1

As

m \to \infty

,

| | x_{m} - x_{m - 1} | |

does not approach zero; i.e., the sequence is not Cauchy and does not converge to a limit point.

When all mappings are contractions (

L = 1

), a simple formula for estimating the number of iterations to achieve a prescribed error is given by the following.

Corollary 1.

If all conditions of Theorem 2 are satisfied, when

L = 1

, for a given

ε \in (0, d_{0} (x_{A}^{(0)}, x_{B}^{(0)}))

, the bound

d_{M} (x_{A}^{(M)}, x_{B}^{(M)}) \leq ε

is achieved if the number of iterations M satisfies

M \geq \frac{\log (ε / D_{0})}{\log \hat{α}},

(10)

where

D_{0} \equiv d_{0} (x_{A}^{(0)}, x_{B}^{(0)}) a n d \hat{α} \equiv \sup_{m} α_{m} < 1 .

Proof.

Using (9) with

L = 1

, we require

d_{M} (x_{A}^{(M)}, x_{B}^{(M)}) \leq {\hat{α}}^{M} D_{0} \leq ε

⟹ {\hat{α}}^{M} \leq ε / D_{0}

Taking the logarithm (of any base) of both sides, and noting that

\log (ε / D_{0}) < 0

and

\log \hat{α} < 0

, so that the inequality is reversed, we have

M \log \hat{α} \geq \log (ε / D_{0})

Inequality (10) is then obtained. □

4. Different Mappings on the Same Metric Space

We now consider the case where all spaces are the same; i.e.,

X_{m} = X

, for all m and

T_{m} : X \to X

. Since all points are now in the same space, we denote the sequence

x^{(0)}, x^{(1)}, \dots

(from the iterations) as

x_{0}, x_{1}, \dots

, where

x_{m} = T_{m} (x_{m - 1}) \in X, for all m \geq 1 .

(11)

Two new definitions, pertaining to the properties of the mappings, are first given.

Definition 5 (Pairwise contraction).

A pair of mappings

(T_{p}, T_{q})

is a pairwise contraction if for any

x_{A}, x_{B} \in X

such that

x_{A} \neq x_{B}

, there exists a positive constant

{\tilde{γ}}_{p, q} < 1

such that

d (T_{p} (x_{A}), T_{q} (x_{B})) \leq {\tilde{γ}}_{p, q} d (x_{A}, x_{B}) < d (x_{A}, x_{B})

For a sequence of mappings

(T_{1}, T_{2}, \dots)

, we define the sequence of Lipschitz coefficients

γ_{m}

(

m = 1, \dots

) as

γ_{m} \equiv \sup_{\tilde{S}} \frac{d (T_{m + 1} (x_{A}), T_{m} (x_{B}))}{d (x_{A}, x_{B})}

where

\tilde{S} \equiv {(x_{A}, x_{B}) : x_{A} \neq x_{B}, x_{A} \in X, x_{B} \in X}

Definition 6 (Sequential contraction).

A sequence of mappings

T_{1}, T_{2}, \dots

is a sequential contraction if the following conditions hold:

1.: The sequence of Lipschitz coefficients $γ_{m}$ ( $m = 1, \dots$ ) are positive and finite-valued; i.e., $0 < γ_{m} < \infty$ .
2.: There exists a finite positive integer L such that $(T_{m + 1}, T_{m})$ is a pairwise contraction for all $m \geq L$ ; i.e.,

$d (T_{m + 1} (x_{A}), T_{m} (x_{B})) \leq γ_{m} d (x_{A}, x_{B}) < d (x_{A}, x_{B})$

(12)

for any $x_{A}, x_{B} \in X$ such that $x_{A} \neq x_{B}$ , and $0 < γ_{m} < 1$ .

Remark 2.

1.: The definition above reduces to the classical definition of a contraction mapping if all the mappings are the same; i.e., $T_{m} = T$ .
2.: If T is a contraction mapping (in the classical sense), then $(T, T)$ is a pairwise contraction and the sequence $T, T, \dots$ is a sequential contraction.

We then have the following result.

Theorem 3.

Consider the iterates from (11) on a complete metric space X. Suppose, for a given initial point

x_{0}

, the following conditions hold:

1.: The sequence $T_{1}, T_{2}, \dots$ is a sequential contraction.
2.: There exists a finite positive integer L such that

$\hat{γ} \equiv \sup_{m \geq L} γ_{m} \leq c < 1$

(13)
3.: $x_{m} \neq x_{m - 1}$ for $m \geq 1$ .

Then, for any initial point

x_{0} \in X

whose generated iterates satisfy

x_{m} \neq x_{m - 1}

, for all

m \geq 1

, there exists a limit point

x_{l i m i t} \in X

such that

\lim_{M \to \infty} x_{M} = x_{l i m i t} ⟺ \lim_{M \to \infty} d (x_{M}, x_{l i m i t}) = 0

Proof.

Since

γ_{m}

is positive and finite, we have

d (T_{m + 1} (x_{A}), T_{m} (x_{B})) \leq γ_{m} d (x_{A}, x_{B})

(14)

for all

x_{A}, x_{B} \in X

and

x_{A} \neq x_{B}

. For

m \geq L

,

γ_{m} < 1

, and for

m < L

,

γ_{m}

can be any finite positive value. Since

x_{m} \neq x_{m - 1}

, using (14) repeatedly, we have

\begin{matrix} d (x_{m + 1}, x_{m}) & = & d (T_{m + 1} (x_{m}), T_{m} (x_{m - 1})) \\ \leq & γ_{m} d (x_{m}, x_{m - 1}) \\ \leq & γ_{m} γ_{m - 1} d (x_{m - 1}, x_{m - 2}) \\ ⋮ \\ \leq & (\prod_{k = 1}^{m} γ_{k}) d (x_{1}, x_{0}) \\ = & (\prod_{k = L}^{m} γ_{k}) (\prod_{k = 1}^{L - 1} γ_{k}) d (x_{1}, x_{0}) \end{matrix}

Using (13), for

k \geq L

we have

0 < γ_{k} \leq \hat{γ} < 1

(15)

Therefore

\begin{matrix} d (x_{m + 1}, x_{m}) & \leq & {\hat{γ}}^{m - L + 1} (\prod_{k = 1}^{L - 1} γ_{k}) d (x_{1}, x_{0}) \\ = & {\hat{γ}}^{m} \underset{\equiv \tilde{K} (L, x_{1}, x_{0})}{\underset{︸}{{\hat{γ}}^{- L + 1} (\prod_{k = 1}^{L - 1} γ_{k}) d (x_{1}, x_{0})}} \end{matrix}

∴ d (x_{m + 1}, x_{m}) \leq {\hat{γ}}^{m} \tilde{K} (L, x_{1}, x_{0})

(16)

Using the generalized triangle inequality (2), for

n > m \geq L

we have

d (x_{m}, x_{n}) \leq d (x_{m}, x_{m + 1}) + d (x_{m + 1}, x_{m + 2}) + \dots + d (x_{n - 1}, x_{n})

Applying inequality (16) to each term on the R.H.S. we have

\begin{matrix} d (x_{m}, x_{n}) & \leq & {\hat{γ}}^{m} \tilde{K} (L, x_{1}, x_{0}) + {\hat{γ}}^{m + 1} \tilde{K} (L, x_{1}, x_{0}) + \dots + {\hat{γ}}^{n - 1} \tilde{K} (L, x_{1}, x_{0}) \\ = & ({\hat{γ}}^{m} + {\hat{γ}}^{m + 1} + \dots + {\hat{γ}}^{n - 1}) \tilde{K} (L, x_{1}, x_{0}) \end{matrix}

Applying the geometric series formula

\sum_{k = 0}^{N - 1} a r^{k} = \frac{a (1 - r^{N})}{1 - r}

to the summation in brackets, where

a = {\hat{γ}}^{m}

,

r = \hat{γ}

and

N = n - m

, we have

d (x_{m}, x_{n}) \leq {\hat{γ}}^{m} \frac{1 - {\hat{γ}}^{n - m}}{1 - \hat{γ}} \tilde{K} (L, x_{1}, x_{0})

Now (15) implies that

0 < {\hat{γ}}^{n - m} < 1

and therefore

0 < 1 - {\hat{γ}}^{n - m} < 1

. Using this inequality we have

d (x_{m}, x_{n}) < {\hat{γ}}^{m} \frac{1}{1 - \hat{γ}} \tilde{K} (L, x_{1}, x_{0})

(17)

Now

\tilde{K} (L, x_{1}, x_{0})

is a finite product of terms involving

\hat{γ}

,

γ_{k}

and

d (x_{1}, x_{0})

. Since the latter are all finite,

\tilde{K} (L, x_{1}, x_{0})

is also finite. The term

\frac{1}{1 - \hat{γ}}

is positive and finite as

0 < \hat{γ} < 1

. As a consequence of (15),

{\hat{γ}}^{m}

and therefore

d (x_{m}, x_{n})

can be made sufficiently small by choosing a sufficiently large m. This means that for every

ε > 0

, there exists a K such that

d (x_{m}, x_{n}) < ε for every n > m > K,

meaning

x_{1}, x_{2}, \dots

is a Cauchy sequence. Since the space X is complete, by Definition 2, a limit point exists. □

Remark 3.

1.: Although a limit point $x_{l i m i t}$ exists, in general, $x_{l i m i t}$ is not a fixed point of any of the mappings.
2.: If all mappings are the same and are contractions, the conditions for the Banach Fixed Point theorem apply, and the limit point is also the fixed point of the mapping.
3.: The limit point, in general, will depend on the initial point $x_{0}$ .

A simple formula for estimating the number of required iterations, when

L = 1

, is given by the following.

Corollary 2.

If all conditions of Theorem 3 are satisfied, when

L = 1

, for a given

ε \in (0, d (x_{1}, x_{0}))

, the bound

d (x_{M}, x_{l i m i t}) < ε

is achieved if the number of iterations M satisfies

M > \frac{\log ((1 - \hat{γ}) ε / {\tilde{D}}_{0})}{\log \hat{γ}}

(18)

where

{\tilde{D}}_{0} \equiv d (x_{1}, x_{0}) a n d \hat{γ} \equiv \sup_{m} γ_{m} < 1 .

Proof.

When

L = 1

,

\tilde{K} (L, x_{1}, x_{0}) = d (x_{1}, x_{0}) = {\tilde{D}}_{0} .

As

n \to \infty

,

x_{n} \to x_{l i m i t}

. Therefore, using inequality (17), we have

d (x_{M}, x_{l i m i t}) < {\hat{γ}}^{M} \frac{1}{1 - \hat{γ}} {\tilde{D}}_{0} < ε

⟹ {\hat{γ}}^{M} < (1 - \hat{γ}) ε / {\tilde{D}}_{0}

Taking the logarithm (of any base) of both sides, and noting that

\log ((1 - \hat{γ}) ε / {\tilde{D}}_{0}) < 0

and

\log \hat{γ} < 0

, so that the inequality is reversed, we have

M \log \hat{γ} > \log ((1 - \hat{γ}) ε / {\tilde{D}}_{0})

Inequality (18) is then obtained. □

5. Comparisons

Sequences of mappings have also been considered previously in [13,14,15,16]. We now make detailed comparisons with the previous works, and show that the previous results are fundamentally different to the results in this work.

The fundamental concept in previous works is to consider a sequence of self-mappings

T_{n}

on a complete metric space X (with distance function d). The mappings are assumed to be contractions but with the possibility of different Lipschitz coefficients:

d (T_{n} (x), T_{n} (y)) \leq {\hat{α}}_{n} d (x, y) for all x, y \in X

where

0 < {\hat{α}}_{n} < 1

for all n, and in general

{\hat{α}}_{n_{1}} \neq {\hat{α}}_{n_{2}}

(

n_{1} \neq n_{2}

). By the Banach Fixed Point Theorem, there exists a unique fixed point for each

T_{n}

; i.e.,

T_{n} (x_{n}^{*}) = x_{n}^{*} for all n

The condition imposed on the sequence of mappings is that there exists a limiting self-map T, which is a contraction, such that

\lim_{n \to \infty} T_{n} (x) = T (x) for all x \in X

(19)

The convergence can be pointwise or uniform over X. Then the sequence of fixed points

x_{n}

also converges to the fixed point of T; i.e.,

\lim_{n \to \infty} x_{n} = x^{*} where T (x^{*}) = x^{*}

(20)

The results above do not explicitly refer to any iteration process to obtain the fixed points. However, Theorem 1 implies that each fixed point

x_{n}^{*}

can be obtained as follows. Given any initial point

x_{n, 0} \in X

, we perform the following iterations:

x_{n, k} = T_{n} (x_{n, k - 1}) for k \geq 1

(21)

and

x_{n}^{*} = \lim_{k \to \infty} x_{n, k}

(22)

In (21) and (22), the subscript ‘k’ tracks the iteration number, and the subscript ‘n’ tracks the mapping that is used. In general, a different initial point

x_{n, 0}

can be used for each mapping

T_{n}

. The important observation is that the same mapping is used for the iterations in (21). This is fundamentally different to (11) where, in general, a different mapping is used in each iteration. The condition for convergence in (20) is the existence of a limiting map as shown in (19), but no such condition is needed for (11). Although the limit point exists in both cases, the limit point from (11) is, in general, not associated with any fixed point of the maps, unlike in (19) where the limit point is also the fixed point of the map. Furthermore, not all maps with (11) need to be contractions; only maps after a finite number of iterations need to be contractions. However, all maps in (19) are contractions.

From an application perspective, the scenario described above—where there are multiple iterative algorithms as implied in (21) (one for each n)—is not something typically found in practice. Equations (19) and (20) are therefore primarily of theoretical interest. The iterations in (1), however, are found in practice, e.g., algorithm unrolling, and we will next study a concrete practical example.

6. Unrolled ISTA Algorithm

We now analyze the convergence of a well-known iterative algorithm with the aid of the previous result. The relevant metric space is the Banach space of vectors in

R^{q}

with the

ℓ^{2}

distance function

d (x_{A}, x_{B}) = | | x_{A} - x_{B} {| |}_{2} \equiv {(\sum_{k = 1}^{q} {(x_{A} (k) - x_{B} (k))}^{2})}^{1 / 2} .

The conventional Iterative Shrinkage–Thresholding Algorithm (ISTA) [2,3] can be described by the following iterations:

x_{m} = S_{β} ((I - \frac{1}{μ} W^{T} W) x_{m - 1} + \frac{1}{μ} W^{T} y)

(23)

where

β, μ \in R^{+}

and

W \in R^{p \times q}

are parameters of the algorithm and

y \in R^{p}

is the given input. The soft thresholding function

S_{β}

is applied element-wise to vectors in

R^{q}

, and for a scalar

b \in R

, is defined as

S_{β} (b) \equiv \{\begin{matrix} b - β & b > β \\ 0 & - β \leq b \leq β \\ b + β & b < - β \end{matrix}

(24)

Starting with an initial point

x_{0} \in R^{q}

, a sequence of iterates

x_{1}, x_{2}, \dots \in R^{q}

is computed using (23). The algorithm arises in the context of LASSO (least absolute shrinkage and selection operator) in statistics and sparse coding in signal processing. In sparse coding, given an input

y

, the goal is to find a parsimonious representation of

y

using an overcomplete dictionary of vectors, which are columns of

W

. A common approach to achieve this is to solve the following convex optimization problem:

\min_{x} \{\frac{1}{2} | | y - {W x | |}_{2}^{2} + {β | | x | |}_{1}\}

(25)

where

β

is the regularization parameter that controls the level of sparsity. The solution to (25) can be achieved using the iterations in (23), where the parameter

μ

is the iteration step size.

The unrolled ISTA [1], by generalizing (23), is given by

x_{m} = S_{β_{m}} (X_{m} x_{m - 1} + Y_{m} y) \equiv T_{m} (x_{m - 1}; β_{m}, X_{m}, Y_{m})

(26)

where the parameters of the mappings are

β_{m}

,

X_{m}

and

Y_{m}

. When the parameters

β_{m}

,

X_{m}

and

Y_{m}

are determined via a machine learning framework, i.e., data-driven, we have what is commonly known as Learned ISTA or LISTA.

We will establish conditions for the convergence of (26). We first prove a relevant property of the soft thresholding function.

Lemma 1.

The function

S_{β} (b)

is non-expansive for any

β \in R^{+}

; i.e.,

| | S_{β} (z_{1}) - S_{β} (z_{2}) {| |}_{2} \leq | | z_{1} - z_{2} {| |}_{2}

(27)

for all

z_{1}, z_{2} \in R^{q}

.

Proof.

We first establish the scalar form of (27). For any

b_{1}, b_{2} \in R

, define

Δ_{1} \equiv | b_{1} - b_{2} |; Δ_{2} \equiv | S_{β} (b_{1}) - S_{β} (b_{2}) | .

Due to the piece-wise nature of

S_{β} (b)

, as shown in (24), there are four cases to consider:

1.: For $- β \leq b_{1}, b_{2} \leq β$ :

$S_{β} (b_{1}) = S_{β} (b_{2}) = 0 .$

Therefore $Δ_{2} = 0$ and since $Δ_{1} \geq 0$ , we have $Δ_{2} \leq Δ_{1}$ .
2.: For $- β \leq b_{1} \leq β$ , $b_{2} > β$ :

$S_{β} (b_{1}) = 0 and S_{β} (b_{2}) = b_{2} - β > 0 .$

Therefore $Δ_{2} = b_{2} - β$ . Since $- β \leq b_{1} \leq β$ ,

$Δ_{2} = b_{2} - β \leq b_{2} - b_{1} = Δ_{1} .$

Due to symmetry, this case is similar to the $- β \leq b_{2} \leq β$ , $b_{1} > β$ case.
3.: For $b_{1}, b_{2} > β$ :

$S_{β} (b_{1}) = b_{1} - β > 0 and S_{β} (b_{2}) = b_{2} - β > 0 .$

Therefore

$Δ_{2} = | b_{1} - b_{2} | = Δ_{1} .$

Due to symmetry, this case is similar to the $b_{1}, b_{2} < - β$ case.
4.: For $b_{1} > β$ , $b_{2} < - β$ :

$S_{β} (b_{1}) = b_{1} - β > 0; S_{β} (b_{2}) = b_{2} + β < 0 and b_{1} - b_{2} > 0 .$

Therefore

$Δ_{2} = | b_{1} - b_{2} - 2 β | < | b_{1} - b_{2} | = Δ_{1} .$

Due to symmetry, this case is similar to the $b_{2} > β$ , $b_{1} < - β$ case.

Therefore, for all

b_{1}, b_{2} \in R

, we have

\begin{matrix} {(Δ_{2})}^{2} & = & {(S_{β} (b_{1}) - S_{β} (b_{2}))}^{2} \\ \leq & {(b_{1} - b_{2})}^{2} = {(Δ_{1})}^{2} \end{matrix}

(28)

Now consider the square of the L.H.S. of (27). Using (28), we have

\begin{matrix} | | S_{β} (z_{1}) - S_{β} (z_{2}) {| |}_{2}^{2} & = & \sum_{k = 1}^{q} {(S_{β} (z_{1} (k)) - S_{β} (z_{2} (k)))}^{2} \\ \leq & \sum_{k = 1}^{q} {(z_{1} (k) - z_{2} (k))}^{2} \\ = & | | z_{1} - z_{2} {| |}_{2}^{2} \end{matrix}

Taking the square root yields the desired result. □

The induced/operator norm is defined as

| | X_{m} {| |}_{2} \equiv \sup_{x \neq 0; x \in R^{q}} \frac{| | X_{m} x {| |}_{2}}{{| | x | |}_{2}},

which is also the spectral norm. The convergence of the unrolled ISTA is given by the following.

Theorem 4.

Consider two initial points

x_{0}^{A} \in R^{q}

and

x_{0}^{B} \in R^{q}

, and the corresponding iterates,

x_{m}^{A} \in R^{q}

and

x_{m}^{B} \in R^{q}

, respectively, using (26). Suppose the parameters of the mappings

T_{m}

satisfy the following conditions:

1.: $| | X_{m} {| |}_{2}$ is positive and finite-valued for all m.
2.: There exists a finite positive integer L such that

$\sup_{m \geq L} | | X_{m} {| |}_{2} \leq c < 1$

(29)

We then have

\lim_{m \to \infty} | | x_{m}^{A} - x_{m}^{B} {| |}_{2} = 0 \Rightarrow x_{m}^{A} \approx x_{m}^{B} for large m

(30)

Proof.

From (26), with the parameters of the mapping suppressed for brevity, we have

T_{m} (x) = S_{β_{m}} (X_{m} x + Y_{m} y)

. For any two arbitrary points

x_{a}, x_{b} \in R^{q}

, using (27), we have

\begin{matrix} | | T_{m} (x_{a}) - T_{m} (x_{b}) {| |}_{2} & = & | | S_{β_{m}} (X_{m} x_{a} + Y_{m} y) - S_{β_{m}} (X_{m} x_{b} + Y_{m} y) {| |}_{2} \\ \leq & | | (X_{m} x_{a} + Y_{m} y) - (X_{m} x_{b} + Y_{m} y) {| |}_{2} \\ = & | | X_{m} (x_{a} - x_{b}) {| |}_{2} \end{matrix}

Using the definition of the operator norm on the last expression, we have

| | T_{m} (x_{a}) - T_{m} (x_{b}) {| |}_{2} \leq | | X_{m} {| |}_{2} | | x_{a} - x_{b} {| |}_{2}

The Lipschitz coefficients are then

α_{m} = | | X_{m} {| |}_{2}

(

m = 1, \dots

). By invoking Theorem 2, the required result is obtained. □

A special case of unrolling is when, instead of the generic

X_{m}

in each iteration, we use a predetermined dictionary

W

, and allow the step size

μ

to vary between iterations. We then have the following corollary.

Corollary 3.

Let

σ_{m i n} (W)

and

σ_{m a x} (W)

denote, respectively, the smallest and largest singular value of

W

. Suppose

X_{m} = (I - \frac{1}{μ_{m}} W^{T} W),

(31)

and the following conditions are satisfied:

1.: The matrix $W$ has full column rank, so that $W^{T} W$ is positive definite, with singular values that satisfy

$0 < σ_{m i n} (W) < σ_{m a x} (W) < \infty .$

(32)
2.: There exists a small positive ϵ ( $0 < ϵ < < 1$ ) such that

$\frac{σ_{m a x}^{2} (W)}{2 - ϵ} < μ_{m} < \frac{σ_{m i n}^{2} (W)}{ϵ}$

(33)

for all m.

Convergence in (30) is then achieved.

Proof.

We will show that conditions 1 and 2 of Theorem 4 are satisfied. The theorem is then invoked to prove the corollary. Firstly, we have

X_{m}^{T} = (I^{T} - \frac{1}{μ_{m}} {(W^{T} W)}^{T}) = (I - \frac{1}{μ_{m}} W^{T} W) = X_{m}

This means that

X_{m}

is symmetric and has real eigenvalues. Therefore, the singular value (

σ_{i}

) squared is equal to the eigenvalue (

λ_{i}

) squared; i.e.,

σ_{i}^{2} (X_{m}) = λ_{i}^{2} (X_{m})

and

σ_{i} (X_{m}) = | λ_{i} (X_{m}) |

. Using some fundamental identities of the eigenvalues of a matrix and the definition of singular values, we have

λ_{i} (X_{m}) = λ_{i} (I - \frac{1}{μ_{m}} W^{T} W) = 1 - \frac{1}{μ_{m}} λ_{i} (W^{T} W) = 1 - \frac{1}{μ_{m}} σ_{i}^{2} (W)

Since the operator norm is also equal to the spectral norm, we have

| | X_{m} {| |}_{2} = σ_{m a x} (X_{m}) = \max_{i} | λ_{i} (X_{m}) | = \max_{i} |1 - \frac{1}{μ_{m}} σ_{i}^{2} (W)|

(34)

Firstly, condition (33) ensures that

μ_{m}

is positive. Now condition (32) implies that there are at least two distinct singular values of

W

. Therefore the last expression in (34) cannot be zero. Furthermore, since condition (32) implies that all singular values

σ_{i} (W)

are finite, the last expression in (34) must be finite. Therefore

| | X_{m} {| |}_{2}

is finite and positive, which is condition 1 of Theorem 4.

Condition 2 of Theorem 4 (inequality (29)) is satisfied if there exists a small positive

ϵ

(

0 < ϵ < < 1

) such that

| | X_{m} {| |}_{2} = \max_{i} |1 - \frac{1}{μ_{m}} σ_{i}^{2} (W)| < 1 - ϵ .

for all m. This is achieved if

|1 - \frac{1}{μ_{m}} σ_{i}^{2} (W)| < 1 - ϵ

for all i. This implies

- 1 + ϵ < 1 - \frac{1}{μ_{m}} σ_{i}^{2} (W) < 1 - ϵ

⟹ ϵ < \frac{σ_{i}^{2} (W)}{μ_{m}} < 2 - ϵ

⟹ \frac{σ_{i}^{2} (W)}{2 - ϵ} < μ_{m} < \frac{σ_{i}^{2} (W)}{ϵ}

This condition is ensured by (33). Therefore condition 2 of Theorem 4 is satisfied. □

Remark 4.

1.: Note that $\sup_{m \geq L} | | X_{m} {| |}_{2} \leq c < 1$ in (29) is a sufficient condition but not necessary.
2.: In both Theorem 4 and Corollary 3, there is no restriction on either the soft-threshold parameters $β_{m}$ , or the matrix $Y_{m}$ .
3.: In the machine learning paradigm, the expressivity of the algorithm generally increases with the number of parameters. With the special case of unrolling in (31), there is only one parameter $μ_{m}$ in $X_{m}$ . However, for convergence, it is not necessary to have

$Y_{m} = \frac{1}{μ_{m}} W^{T},$

even though that is the case with the original ISTA algorithm in (23). A general $Y_{m} \in R^{q \times p}$ , which has $q \times p$ parameters, can be used and results in higher expressivity.
4.: The term $Y_{m}$ is a generalization of the term $\frac{1}{μ} W^{T} y$ in (23), which is related to the input $y$ . In the original ISTA formulation, there is only one input, but in a data-driven machine learning framework, there are multiple inputs $y_{n}$ . This generalization allows the algorithm to adapt to this situation.

7. Concluding Remarks

Iterative algorithms are at the heart of many areas in data science and mathematics. Traditionally in these algorithms, the iteration is fixed, and can be represented mathematically with a mapping on metric spaces. Convergence of these algorithms is usually analyzed with the aid of the well-known Banach Fixed Point theorem from functional analysis. This work has extended the convergence analysis to iterations with different mappings and different metric spaces. The results are relevant in the analysis of algorithm unrolling, which is a new approach for developing interpretable machine learning algorithms.

Funding

This research received no external funding.

Data Availability Statement

Data sharing is not applicable.

Acknowledgments

The author would like to thank the reviewers for their constructive comments, which led to an improvement in the quality of the paper.

Conflicts of Interest

The author declares no conflicts of interest.

References

Monga, V.; Li, Y.; Eldar, Y.C. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing. IEEE Signal Process. Mag. 2021, 38, 18–24. [Google Scholar] [CrossRef]
Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
Li, B.; Shi, B.; Yuan, Y.X. Proximal subgradient norm minimization of ISTA and FISTA. Appl. Comput. Harmon. Anal. 2026, 82, 101848. [Google Scholar] [CrossRef]
Ahmadi, S.; Hauffen, J.C.; Kästner, L.; Jung, P.; Caire, G.; Ziegler, M. Learned Block Iterative Shrinkage Thresholding Algorithm for Photothermal Super Resolution Imaging. Sensors 2022, 22, 5533. [Google Scholar] [CrossRef] [PubMed]
Gan, H.; Wang, X.; He, L.; Liu, J. Learned Two-Step Iterative Shrinkage Thresholding Algorithm for Deep Compressive Sensing. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 3943–3956. [Google Scholar] [CrossRef]
Kouni, V.; Panagakis, Y. Generalization analysis of an unfolding network for analysis-based compressed sensing. Appl. Comput. Harmon. Anal. 2025, 79, 101787. [Google Scholar] [CrossRef]
Naimipour, N.; Khobahi, S.; Soltanalian, M. Unfolded Algorithms for Deep Phase Retrieval. Algorithms 2024, 17, 587. [Google Scholar] [CrossRef]
Zhang, L.; Wang, G.; Giannakis, G.B. Real-time power system state estimation and forecasting via deep unrolled neural networks. IEEE Trans. Signal Process. 2019, 67, 4069–4077. [Google Scholar] [CrossRef]
Lohit, S.; Liu, D.; Mansour, H.; Boufounos, P.T. Unrolled projected gradient descent for multi-spectral image fusion. In ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: Piscataway, NJ, USA, 2019; pp. 7725–7729. [Google Scholar] [CrossRef]
Dardikman-Yoffe, G.; Eldar, Y.C. Learned SPARCOM: Unfolded deep super-resolution microscopy. Opt. Express 2020, 28, 27736–27763. [Google Scholar] [CrossRef] [PubMed]
Kiriki, S.; Nakano, Y.; Soma, T. Historic behaviour for nonautonomous contraction mappings. Nonlinearity 2019, 32, 1111–1124. [Google Scholar] [CrossRef]
Kreyszig, E. Introductory Functional Analysis with Applications; John Wiley and Sons: Hoboken, NJ, USA, 1978. [Google Scholar] [CrossRef][Green Version]
Akkouchi, M. On Sequences of Certain Contractive Mappings and Their Fixed Points. Montes Taurus J. Pure Appl. Math. 2021, 3, 70–77. Available online: https://mtjpamjournal.com/papers/article_id_mtjpam-d-20-00008/ (accessed on 30 March 2026).
Imdad, M.; Khan, M.S.; Sessa, S. On Sequences of Contractive Mappings and Their Fixed Points. Int. J. Math. Math. Sci. 1988, 3, 527–534. [Google Scholar] [CrossRef]
Nadler, S., Jr. Sequences of Contractions and Fixed Points. Pac. J. Math. 1968, 27, 579–585. [Google Scholar] [CrossRef]
Singh, S.B. On Sequences of Contractions Mappings. Riv. Mat. Univ. Parma 1970, 2, 227–231. Available online: https://www.rivmat.unipr.it/fulltext/1970-11/1970-11-227.pdf (accessed on 30 March 2026).

Figure 1. Sequence of mappings between metric spaces.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tay, D.B. On Iterative Algorithms with Different Mapping in Each Iteration. Algorithms 2026, 19, 470. https://doi.org/10.3390/a19060470

AMA Style

Tay DB. On Iterative Algorithms with Different Mapping in Each Iteration. Algorithms. 2026; 19(6):470. https://doi.org/10.3390/a19060470

Chicago/Turabian Style

Tay, David B. 2026. "On Iterative Algorithms with Different Mapping in Each Iteration" Algorithms 19, no. 6: 470. https://doi.org/10.3390/a19060470

APA Style

Tay, D. B. (2026). On Iterative Algorithms with Different Mapping in Each Iteration. Algorithms, 19(6), 470. https://doi.org/10.3390/a19060470

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On Iterative Algorithms with Different Mapping in Each Iteration

Abstract

1. Introduction

2. Preliminaries and Definitions

3. Different Spaces and Different Mappings

4. Different Mappings on the Same Metric Space

5. Comparisons

6. Unrolled ISTA Algorithm

7. Concluding Remarks

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI