Next Article in Journal
Adaptive Elite Differential Gold Rush Optimizer for Three-Dimensional UAV Path Planning in Complex Mountainous Environments
Previous Article in Journal
Voice-Driven Support System for Speech Practice in Older Adults: An Accessible Web–Mobile Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On Iterative Algorithms with Different Mapping in Each Iteration

School of Information Technology, Deakin University, 75 Pigdons Road, Waurn Ponds, VIC 3216, Australia
Algorithms 2026, 19(6), 470; https://doi.org/10.3390/a19060470 (registering DOI)
Submission received: 31 March 2026 / Revised: 2 June 2026 / Accepted: 5 June 2026 / Published: 9 June 2026

Abstract

Algorithm unrolling (unfolding) is a process where an existing iterative algorithm is converted into another iterative algorithm, but the mapping in each iteration of the new algorithm can potentially be different. An abstraction is to consider a sequence of mappings T m , where each mapping potentially acts on a different metric space ( X m , d m ) . We study the iterates from the sequence of mappings and derive conditions for convergence. The first result is when both the mapping and metric space are different in each iteration. The second result is when all metric spaces are the same, but the mapping is different in each iteration. The second result can be considered as a generalization of the Banach Fixed Point theorem. A concrete practical example is the unrolling of the Iterative Shrinkage–Thresholding Algorithm, which has applications in statistics, machine learning and signal processing. The convergence of this example will be analyzed with the aid of the result established through this work.

1. Introduction

Recently there has been an increasing interest in developing AI (artificial intelligence) algorithms using the technique of algorithm unrolling [1]. The motivation behind algorithm unrolling is to achieve some interpretability in the AI algorithm, unlike many other AI algorithms which are usually considered ’black boxes’. The main idea with algorithm unrolling is the conversion of an iterative algorithm, e.g., Iterative Shrinkage–Thresholding Algorithm (ISTA) [2,3,4,5], into another iterative algorithm with a different mapping in each iteration. Algorithm unrolling has been successfully applied in applications such as compressed sensing [6], phase retrieval [7], power systems [8], image fusion [9] and microscopy [10]. The original iterative algorithm is usually derived from theoretical and/or physical consideration of the problem at hand. The unrolled algorithm inherits some of the theoretical and/or physical features from the original algorithm, and therefore has some aspects of interpretability. For example with ISTA, the theoretical consideration is to achieve a sparse representation of a signal vector given a dictionary of atom (constituent) vectors. An abstraction of algorithm unrolling is to consider a sequence of mappings between metric spaces, where the application of a mapping can be considered as one iteration of the algorithm. This abstraction is also relevant in other areas like non-autonomous contraction mappings [11].
Consider a sequence of metric spaces ( X m , d m ) , m = 0 , 1 , 2 , , where X m denotes the set of elements (points) and d m ( , ) denotes the corresponding distance function. The sequence of mappings T m is between the metric spaces:
T m : X m 1 X m
for m = 1 , , as illustrated in Figure 1. Starting with an initial point x ( 0 ) X 0 , we consider the new point after the application of the sequence of mappings:
x ( M ) = T M T 1 ( x ( 0 ) ) X M ,
where the symbol ‘∘’ denotes the composition of mappings. The superscript ‘ ( m ) ’ in x ( m ) denotes the index of the space the point belongs to, i.e., x ( m ) X m . However if the point is obtained using (1), m also represents the index of the term in the sequence. In other words, x ( m ) is the m t h term in the sequence x ( 0 ) , x ( 1 ) , , where every term in the sequence, in general, belongs to a different space.
When all metric spaces are the same, i.e., X m = X , m , and all mappings are the same, i.e., T m = T , m , (1) represents the classical iteration procedure found in many branches of applied mathematics and data science, e.g., Jacobi iterations and Gauss–Seidel iterations [12]. Furthermore, when the mapping T is a contraction, the Banach Fixed Point theorem applies, and this also has applications in the study of differential and integral equations [12].
In practice, with algorithm unrolling, the different mappings are obtained through parametrization; i.e., T m ( ) = T ( ; κ m ) where κ m is the parameter vector that defines the mapping in each iteration.
Sequences of contraction mappings have also previously been considered in [13,14,15,16]. However the context considered in [13,14,15,16] is different to that considered here. The works in [13,14,15,16] are mainly interested in the convergence of the sequence of fixed points from the sequence of contraction mappings T 1 , T 2 , , i.e., the convergence of x n where T n ( x n ) = x n . These previous works do not consider the tandem application of the sequence of mappings as shown in (1). In this work, we study the behavior of (1) as M and prove some convergence results. More detailed comparisons are deferred to Section 5. Application to the unrolled ISTA will also be considered. To the best of our knowledge, similar results are not found elsewhere.
Organization of paper: In Section 2, we review some definitions and concepts in metric spaces that are relevant to this work. New definitions which are generalizations of classical notions will also be presented here. The convergence in the general case, with different mappings and different metric spaces, is analyzed in Section 3. In Section 4, the convergence with different mappings on the same metric space is analyzed. Detailed comparisons with previous works, which considered sequences of mappings, are found in Section 5. The general convergence result is then used to analyze the convergence of the unrolled ISTA in Section 6. Concluding remarks are found in Section 7.

2. Preliminaries and Definitions

Firstly, we provide some comments about the notation used in the paper. As mentioned earlier, the superscript ‘ ( m ) ’ in x ( m ) denotes the index of the space the point belongs to, i.e., x ( m ) X m . In general, the spaces are different; i.e., X m 1 X m 2 for m 1 m 2 . Since we are concerned with a sequence of mappings from one space to another space, m also represents the index of the sequence; i.e., ( x ( 0 ) , x ( 1 ) , ) represents a sequence of points in different spaces (in general). In the special case when all the spaces are the same, we use a subscript to denote the index of the sequence, i.e., x 0 , x 1 , , which is the common convention for sequences.
We first recall some basic definitions for metric spaces [12] which are relevant to the developments that follow. A metric space is a set of points (elements) X and an endowed distance function d ( , ) that satisfies the following axioms. For all x , y , z X , we have
A1 
d ( x , y ) is a non-negative finite-valued function.
A2 
d ( x , y ) = 0 if and only if x = y .
A3 
d ( x , y ) = d ( y , x ) ; i.e., the distance function is symmetric.
A4 
d ( x , y ) d ( x , z ) + d ( z , y ) which is known as the triangle inequality.
Using induction on axiom A4, we have the following.
Definition 1 (Generalized triangle inequality).
For all x 1 , x 2 , , x K X ,
d ( x 1 , x K ) k = 1 K 1 d ( x k , x k + 1 )
Definition 2 (Cauchy sequence and completeness).
A sequence x k X ( k = 1 , ) is a Cauchy sequence if for every ϵ > 0 , there exists K = K ( ϵ ) such that
d ( x k 1 , x k 2 ) < ϵ f o r   e v e r y   k 1 , k 2 > K ( ϵ ) .
If every Cauchy sequence x k converges to a limit point, i.e.,
lim k x k = x limit ,
then the space X is complete.
Consider a mapping T : X X which maps points in X onto itself. The mapping T is a contraction if there exists a constant 0 < α ^ < 1 such that for all x , y X
d ( T ( x ) , T ( y ) ) α ^ d ( x , y )
With a contraction mapping, we have the well-known Banach Fixed Point Theorem (BFPT).
Theorem 1 (BFPT).
Consider a complete metric space ( X , d ) and a contraction mapping T: X X . We have the following:
1. 
There exists a point x X such that
T ( x ) = x .
The point x is unique and is known as the fixed point of T.
2. 
Given any initial point x 0 X , the sequence of points x 1 , x 2 , generated from the iterations
x k = T ( x k 1 ) f o r k 1
converges to a fixed point; i.e.,
lim k x k = x
The discussions above are for a single metric space and a single mapping. However, in this work we are considering the general case of multiple metric spaces and multiple mappings as described in (1). Some of the results above are still relevant, e.g., axioms A1–A4 and Definitions 1 and 2, when we are considering each metric space in the sequence in isolation. The relationships above will have superscript (m) for the elements of the metric space X m with the corresponding distance function d m . However, new definitions are needed, when multiple metric spaces and mappings are involved.
Definition 3 (Lipschitz coefficient).
For the map T m : X m 1 X m , let
S m { ( x A ( m 1 ) , x B ( m 1 ) ) : x A ( m 1 ) x B ( m 1 ) , x A ( m 1 ) , x B ( m 1 ) X m 1 } .
The coefficient α m of the map is defined as
α m sup S m d m ( T m ( x A ( m 1 ) ) , T m ( x B ( m 1 ) ) ) d m 1 ( x A ( m 1 ) , x B ( m 1 ) )
Note that since x A ( m 1 ) x B ( m 1 ) , by axiom A2, the denominator is non-zero. If T m is a trivial constant map, then the numerator and α m are equal to zero. We only consider non-trivial maps such that α m is positive. If α m is finite-valued, we have the following inequality:
d m ( T m ( x A ( m 1 ) ) , T m ( x B ( m 1 ) ) ) α m d m 1 ( x A ( m 1 ) , x B ( m 1 ) )
Note that when x A ( m 1 ) = x B ( m 1 ) , T m ( x A ( m 1 ) ) = T m ( x B ( m 1 ) ) , and both sides of (3) are equal to zero; i.e., (3) is still valid.
Definition 4 (Contraction mapping).
The (non-trivial) mapping T m is a contraction if 0 < α m < 1 and we have
d m ( T m ( x A ( m 1 ) ) , T m ( x B ( m 1 ) ) ) α m d m 1 ( x A ( m 1 ) , x B ( m 1 ) ) d m 1 ( x A ( m 1 ) , x B ( m 1 ) )
x A ( m 1 ) , x B ( m 1 ) X m 1 .
Definitions 3 and 4 are generalizations of the classical notions in a single metric space and mapping to sequences of metric spaces and mappings. Note that inequality (3) applies to any mapping T m , but (4) applies only to mappings that are contractions.

3. Different Spaces and Different Mappings

Suppose we start the iterations in (1) with two different initial points x A ( 0 ) X 0 and x B ( 0 ) X 0 . We have the following result.
Theorem 2. 
Suppose the mappings T m ( m = 1 , ) satisfy the following conditions:
1. 
The Lipschitz coefficients α m (for all m) are positive and finite; i.e., 0 < α m < .
2. 
There exists a finite positive integer L such that
α ^ sup m L α m c < 1 .
Then we have
lim M d M ( x A ( M ) , x B ( M ) ) = 0
for any x A ( 0 ) X 0 and x B ( 0 ) X 0 .
Proof. 
If x A ( 0 ) = x B ( 0 ) , then x A ( m ) = x B ( m ) for all m, and (6) is satisfied. We then consider the general case when x A ( 0 ) x B ( 0 ) . By definition
d M ( x A ( M ) , x B ( M ) ) = d M ( T M ( x A ( M 1 ) ) , T M ( x B ( M 1 ) ) ) .
Using inequality (3), we have
d M ( x A ( M ) , x B ( M ) ) α M d M 1 ( x A ( M 1 ) , x B ( M 1 ) ) = α M d M 1 ( T M 1 ( x A ( M 2 ) ) , T M 1 ( x B ( M 2 ) ) ) α M α M 1 d M 2 ( x A ( M 2 ) , x B ( M 2 ) ) α M α M 1 α 1 d 0 ( x A ( 0 ) , x B ( 0 ) ) = k = L M α k k = 1 L 1 α k d 0 ( x A ( 0 ) , x B ( 0 ) )
Condition (5) implies that
0 < α m α ^ < 1 for m L
Using (8) in (7), we have
d M ( x A ( M ) , x B ( M ) ) α ^ M L + 1 k = 1 L 1 α k d 0 ( x A ( 0 ) , x B ( 0 ) ) = α ^ M α ^ L + 1 k = 1 L 1 α k d 0 ( x A ( 0 ) , x B ( 0 ) ) K ( L , x A ( 0 ) , x B ( 0 ) )
d M ( x A ( M ) , x B ( M ) ) α ^ M K ( L , x A ( 0 ) , x B ( 0 ) )
Now K ( L , x A ( 0 ) , x B ( 0 ) ) consists of a finite product of the terms involving α ^ , α k and d 0 ( x A ( 0 ) , x B ( 0 ) ) . Since α ^ , α k and d 0 ( x A ( 0 ) , x B ( 0 ) ) are all finite, K ( L , x A ( 0 ) , x B ( 0 ) ) will also be finite. As a consequence of (8), the term α ^ M can be made arbitrarily small by choosing a sufficiently large M; i.e.,
lim M α ^ M = 0 .
Therefore we have (6). □
Remark 1. 
1. 
Theorem 2 shows that not every mapping needs to be contractive for convergence. As long as the mappings become contractive after a finite number of iterations, convergence is achieved.
2. 
It is important to note that, starting from an initial point x A ( 0 ) , the iterates do not necessarily converge to a limit point. Rather, the initial point is immaterial, after a sufficiently large number of iterations, to the eventual sequence of iterates; i.e., x A ( M ) x B ( M ) for large M—trajectory convergence. This is useful, from a practical perspective, as one does not need to be too concerned with the initialization of the iterative algorithm, as long as there is a sufficient number of iterations.
3. 
Note that, since the iterates x ( m ) belong to different function spaces in general, the notion of a Cauchy sequence is not relevant here. In a Cauchy sequence (Definition 2), all terms belong to the same space. Even when all iterates belong to the same space, trajectory convergence does not imply convergence to a limit point—see the example below.
4. 
Conditions for convergence to a limit point are considered in Section 4.
Example 1.
Consider a simple example, where the metric space is the Banach space of vectors in X m = R q (for all m), with the following mapping:
T m ( x ) = K e α m x + m 2 1
where K > 1 , α > 0 and 1 is the vector with all ones. We then have
d ( x m A , x m B ) = | | K e α m ( x m 1 A x m 1 B ) | | = K e α m | | x m 1 A x m 1 B | | = K e α m d ( x m 1 A , x m 1 B )
For trajectory convergence, we require K e α m < 1 , which is achieved when
m > 1 α ln K
However, when we consider the difference between two successive iterates, we have
x m x m 1 = ( K e α m 1 ) x m 1 + m 2 1
As m , | | x m x m 1 | | does not approach zero; i.e., the sequence is not Cauchy and does not converge to a limit point.
When all mappings are contractions ( L = 1 ), a simple formula for estimating the number of iterations to achieve a prescribed error is given by the following.
Corollary 1. 
If all conditions of Theorem 2 are satisfied, when L = 1 , for a given ε ( 0 , d 0 ( x A ( 0 ) , x B ( 0 ) ) ) , the bound
d M ( x A ( M ) , x B ( M ) ) ε
is achieved if the number of iterations M satisfies
M log ( ε / D 0 ) log α ^ ,
where
D 0 d 0 ( x A ( 0 ) , x B ( 0 ) ) a n d α ^ sup m α m < 1 .
Proof. 
Using (9) with L = 1 , we require
d M ( x A ( M ) , x B ( M ) ) α ^ M D 0 ε
α ^ M ε / D 0
Taking the logarithm (of any base) of both sides, and noting that log ( ε / D 0 ) < 0 and log α ^ < 0 , so that the inequality is reversed, we have
M log α ^ log ( ε / D 0 )
Inequality (10) is then obtained. □

4. Different Mappings on the Same Metric Space

We now consider the case where all spaces are the same; i.e., X m = X , for all m and T m : X X . Since all points are now in the same space, we denote the sequence x ( 0 ) , x ( 1 ) , (from the iterations) as x 0 , x 1 , , where
x m = T m ( x m 1 ) X , for   all m 1 .
Two new definitions, pertaining to the properties of the mappings, are first given.
Definition 5 (Pairwise contraction).
A pair of mappings ( T p , T q ) is a pairwise contraction if for any x A , x B X such that x A x B , there exists a positive constant γ ˜ p , q < 1 such that
d ( T p ( x A ) , T q ( x B ) ) γ ˜ p , q d ( x A , x B ) < d ( x A , x B )
For a sequence of mappings ( T 1 , T 2 , ) , we define the sequence of Lipschitz coefficients γ m ( m = 1 , ) as
γ m sup S ˜ d ( T m + 1 ( x A ) , T m ( x B ) ) d ( x A , x B )
where
S ˜ { ( x A , x B ) : x A x B , x A X , x B X }
Definition 6 (Sequential contraction).
A sequence of mappings T 1 , T 2 , is a sequential contraction if the following conditions hold:
1. 
The sequence of Lipschitz coefficients γ m ( m = 1 , ) are positive and finite-valued; i.e., 0 < γ m < .
2. 
There exists a finite positive integer L such that ( T m + 1 , T m ) is a pairwise contraction for all m L ; i.e.,
d ( T m + 1 ( x A ) , T m ( x B ) ) γ m d ( x A , x B ) < d ( x A , x B )
for any x A , x B X such that x A x B , and 0 < γ m < 1 .
Remark 2. 
1. 
The definition above reduces to the classical definition of a contraction mapping if all the mappings are the same; i.e., T m = T .
2. 
If T is a contraction mapping (in the classical sense), then ( T , T ) is a pairwise contraction and the sequence T , T , is a sequential contraction.
We then have the following result.
Theorem 3. 
Consider the iterates from (11) on a complete metric space X. Suppose, for a given initial point x 0 , the following conditions hold:
1. 
The sequence T 1 , T 2 , is a sequential contraction.
2. 
There exists a finite positive integer L such that
γ ^ sup m L γ m c < 1
3. 
x m x m 1 for m 1 .
Then, for any initial point x 0 X whose generated iterates satisfy x m x m 1 , for all m 1 , there exists a limit point x l i m i t X such that
lim M x M = x l i m i t lim M d ( x M , x l i m i t ) = 0
Proof. 
Since γ m is positive and finite, we have
d ( T m + 1 ( x A ) , T m ( x B ) ) γ m d ( x A , x B )
for all x A , x B X and x A x B . For m L , γ m < 1 , and for m < L , γ m can be any finite positive value. Since x m x m 1 , using (14) repeatedly, we have
d ( x m + 1 , x m ) = d ( T m + 1 ( x m ) , T m ( x m 1 ) ) γ m d ( x m , x m 1 ) γ m γ m 1 d ( x m 1 , x m 2 ) k = 1 m γ k d ( x 1 , x 0 ) = k = L m γ k k = 1 L 1 γ k d ( x 1 , x 0 )
Using (13), for k L we have
0 < γ k γ ^ < 1
Therefore
d ( x m + 1 , x m ) γ ^ m L + 1 k = 1 L 1 γ k d ( x 1 , x 0 ) = γ ^ m γ ^ L + 1 k = 1 L 1 γ k d ( x 1 , x 0 ) K ˜ ( L , x 1 , x 0 )
d ( x m + 1 , x m ) γ ^ m K ˜ ( L , x 1 , x 0 )
Using the generalized triangle inequality (2), for n > m L we have
d ( x m , x n ) d ( x m , x m + 1 ) + d ( x m + 1 , x m + 2 ) + + d ( x n 1 , x n )
Applying inequality (16) to each term on the R.H.S. we have
d ( x m , x n ) γ ^ m K ˜ ( L , x 1 , x 0 ) + γ ^ m + 1 K ˜ ( L , x 1 , x 0 ) + + γ ^ n 1 K ˜ ( L , x 1 , x 0 ) = ( γ ^ m + γ ^ m + 1 + + γ ^ n 1 ) K ˜ ( L , x 1 , x 0 )
Applying the geometric series formula
k = 0 N 1 a r k = a ( 1 r N ) 1 r
to the summation in brackets, where a = γ ^ m , r = γ ^ and N = n m , we have
d ( x m , x n ) γ ^ m 1 γ ^ n m 1 γ ^ K ˜ ( L , x 1 , x 0 )
Now (15) implies that 0 < γ ^ n m < 1 and therefore 0 < 1 γ ^ n m < 1 . Using this inequality we have
d ( x m , x n ) < γ ^ m 1 1 γ ^ K ˜ ( L , x 1 , x 0 )
Now K ˜ ( L , x 1 , x 0 ) is a finite product of terms involving γ ^ , γ k and d ( x 1 , x 0 ) . Since the latter are all finite, K ˜ ( L , x 1 , x 0 ) is also finite. The term 1 1 γ ^ is positive and finite as 0 < γ ^ < 1 . As a consequence of (15), γ ^ m and therefore d ( x m , x n ) can be made sufficiently small by choosing a sufficiently large m. This means that for every ε > 0 , there exists a K such that
d ( x m , x n ) < ε for   every n > m > K ,
meaning x 1 , x 2 , is a Cauchy sequence. Since the space X is complete, by Definition 2, a limit point exists. □
Remark 3. 
1. 
Although a limit point x l i m i t exists, in general, x l i m i t is not a fixed point of any of the mappings.
2. 
If all mappings are the same and are contractions, the conditions for the Banach Fixed Point theorem apply, and the limit point is also the fixed point of the mapping.
3. 
The limit point, in general, will depend on the initial point x 0 .
A simple formula for estimating the number of required iterations, when L = 1 , is given by the following.
Corollary 2. 
If all conditions of Theorem 3 are satisfied, when L = 1 , for a given ε ( 0 , d ( x 1 , x 0 ) ) , the bound
d ( x M , x l i m i t ) < ε
is achieved if the number of iterations M satisfies
M > log ( 1 γ ^ ) ε / D ˜ 0 log γ ^
where
D ˜ 0 d ( x 1 , x 0 ) a n d γ ^ sup m γ m < 1 .
Proof. 
When L = 1 ,
K ˜ ( L , x 1 , x 0 ) = d ( x 1 , x 0 ) = D ˜ 0 .
As n , x n x l i m i t . Therefore, using inequality (17), we have
d ( x M , x l i m i t ) < γ ^ M 1 1 γ ^ D ˜ 0 < ε
γ ^ M < ( 1 γ ^ ) ε / D ˜ 0
Taking the logarithm (of any base) of both sides, and noting that log ( ( 1 γ ^ ) ε / D ˜ 0 ) < 0 and log γ ^ < 0 , so that the inequality is reversed, we have
M log γ ^ > log ( 1 γ ^ ) ε / D ˜ 0
Inequality (18) is then obtained. □

5. Comparisons

Sequences of mappings have also been considered previously in [13,14,15,16]. We now make detailed comparisons with the previous works, and show that the previous results are fundamentally different to the results in this work.
The fundamental concept in previous works is to consider a sequence of self-mappings T n on a complete metric space X (with distance function d). The mappings are assumed to be contractions but with the possibility of different Lipschitz coefficients:
d ( T n ( x ) , T n ( y ) ) α ^ n d ( x , y ) for   all x , y X
where 0 < α ^ n < 1 for all n, and in general α ^ n 1 α ^ n 2 ( n 1 n 2 ). By the Banach Fixed Point Theorem, there exists a unique fixed point for each T n ; i.e.,
T n ( x n ) = x n for   all   n
The condition imposed on the sequence of mappings is that there exists a limiting self-map T, which is a contraction, such that
lim n T n ( x ) = T ( x ) for   all   x X
The convergence can be pointwise or uniform over X. Then the sequence of fixed points x n also converges to the fixed point of T; i.e.,
lim n x n = x where T ( x ) = x
The results above do not explicitly refer to any iteration process to obtain the fixed points. However, Theorem 1 implies that each fixed point x n can be obtained as follows. Given any initial point x n , 0 X , we perform the following iterations:
x n , k = T n ( x n , k 1 ) for k 1
and
x n = lim k x n , k
In (21) and (22), the subscript ‘k’ tracks the iteration number, and the subscript ‘n’ tracks the mapping that is used. In general, a different initial point x n , 0 can be used for each mapping T n . The important observation is that the same mapping is used for the iterations in (21). This is fundamentally different to (11) where, in general, a different mapping is used in each iteration. The condition for convergence in (20) is the existence of a limiting map as shown in (19), but no such condition is needed for (11). Although the limit point exists in both cases, the limit point from (11) is, in general, not associated with any fixed point of the maps, unlike in (19) where the limit point is also the fixed point of the map. Furthermore, not all maps with (11) need to be contractions; only maps after a finite number of iterations need to be contractions. However, all maps in (19) are contractions.
From an application perspective, the scenario described above—where there are multiple iterative algorithms as implied in (21) (one for each n)—is not something typically found in practice. Equations (19) and (20) are therefore primarily of theoretical interest. The iterations in (1), however, are found in practice, e.g., algorithm unrolling, and we will next study a concrete practical example.

6. Unrolled ISTA Algorithm

We now analyze the convergence of a well-known iterative algorithm with the aid of the previous result. The relevant metric space is the Banach space of vectors in R q with the 2 distance function
d ( x A , x B ) = | | x A x B | | 2 k = 1 q ( x A ( k ) x B ( k ) ) 2 1 / 2 .
The conventional Iterative Shrinkage–Thresholding Algorithm (ISTA) [2,3] can be described by the following iterations:
x m = S β I 1 μ W T W x m 1 + 1 μ W T y
where β , μ R + and W R p × q are parameters of the algorithm and y R p is the given input. The soft thresholding function S β is applied element-wise to vectors in R q , and for a scalar b R , is defined as
S β ( b ) b β b > β 0 β b β b + β b < β
Starting with an initial point x 0 R q , a sequence of iterates x 1 , x 2 , R q is computed using (23). The algorithm arises in the context of LASSO (least absolute shrinkage and selection operator) in statistics and sparse coding in signal processing. In sparse coding, given an input y , the goal is to find a parsimonious representation of y using an overcomplete dictionary of vectors, which are columns of W . A common approach to achieve this is to solve the following convex optimization problem:
min x 1 2 | | y W x | | 2 2 + β | | x | | 1
where β is the regularization parameter that controls the level of sparsity. The solution to (25) can be achieved using the iterations in (23), where the parameter μ is the iteration step size.
The unrolled ISTA [1], by generalizing (23), is given by
x m = S β m X m x m 1 + Y m y T m ( x m 1 ; β m , X m , Y m )
where the parameters of the mappings are β m , X m and Y m . When the parameters β m , X m and Y m are determined via a machine learning framework, i.e., data-driven, we have what is commonly known as Learned ISTA or LISTA.
We will establish conditions for the convergence of (26). We first prove a relevant property of the soft thresholding function.
Lemma 1. 
The function S β ( b ) is  non-expansive  for any β R + ; i.e.,
| | S β ( z 1 ) S β ( z 2 ) | | 2 | | z 1 z 2 | | 2
for all z 1 , z 2 R q .
Proof. 
We first establish the scalar form of (27). For any b 1 , b 2 R , define
Δ 1   | b 1 b 2 | ; Δ 2 | S β ( b 1 ) S β ( b 2 ) | .
Due to the piece-wise nature of S β ( b ) , as shown in (24), there are four cases to consider:
1.
For β b 1 , b 2 β :
S β ( b 1 ) = S β ( b 2 ) = 0 .
Therefore Δ 2 = 0 and since Δ 1 0 , we have Δ 2 Δ 1 .
2.
For β b 1 β , b 2 > β :
S β ( b 1 ) = 0 and S β ( b 2 ) = b 2 β > 0 .
Therefore Δ 2 = b 2 β . Since β b 1 β ,
Δ 2 = b 2 β b 2 b 1 = Δ 1 .
Due to symmetry, this case is similar to the β b 2 β , b 1 > β case.
3.
For b 1 , b 2 > β :
S β ( b 1 ) = b 1 β > 0 and S β ( b 2 ) = b 2 β > 0 .
Therefore
Δ 2 = | b 1 b 2 | = Δ 1 .
Due to symmetry, this case is similar to the b 1 , b 2 < β case.
4.
For b 1 > β , b 2 < β :
S β ( b 1 ) = b 1 β > 0 ; S β ( b 2 ) = b 2 + β < 0 and b 1 b 2 > 0 .
Therefore
Δ 2 =   | b 1 b 2 2 β |   <   | b 1 b 2 |   = Δ 1 .
Due to symmetry, this case is similar to the b 2 > β , b 1 < β case.
Therefore, for all b 1 , b 2 R , we have
( Δ 2 ) 2 = ( S β ( b 1 ) S β ( b 2 ) ) 2 ( b 1 b 2 ) 2 = ( Δ 1 ) 2
Now consider the square of the L.H.S. of (27). Using (28), we have
| | S β ( z 1 ) S β ( z 2 ) | | 2 2 = k = 1 q ( S β ( z 1 ( k ) ) S β ( z 2 ( k ) ) ) 2 k = 1 q ( z 1 ( k ) z 2 ( k ) ) 2 = | | z 1 z 2 | | 2 2
Taking the square root yields the desired result. □
The induced/operator norm is defined as
| | X m | | 2 sup x 0 ; x R q | | X m x | | 2 | | x | | 2 ,
which is also the spectral norm. The convergence of the unrolled ISTA is given by the following.
Theorem 4. 
Consider two initial points x 0 A R q and x 0 B R q , and the corresponding iterates, x m A R q and x m B R q , respectively, using (26). Suppose the parameters of the mappings T m satisfy the following conditions:
1. 
| | X m | | 2 is positive and finite-valued for all m.
2. 
There exists a finite positive integer L such that
sup m L | | X m | | 2 c < 1
We then have
lim m | | x m A x m B | | 2 = 0 x m A x m B for large m
Proof. 
From (26), with the parameters of the mapping suppressed for brevity, we have T m ( x ) = S β m X m x + Y m y . For any two arbitrary points x a , x b R q , using (27), we have
| | T m ( x a ) T m ( x b ) | | 2 = | | S β m X m x a + Y m y S β m X m x b + Y m y | | 2 | | X m x a + Y m y X m x b + Y m y | | 2 = | | X m ( x a x b ) | | 2
Using the definition of the operator norm on the last expression, we have
| | T m ( x a ) T m ( x b ) | | 2 | | X m | | 2 | | x a x b | | 2
The Lipschitz coefficients are then α m = | | X m | | 2 ( m = 1 , ). By invoking Theorem 2, the required result is obtained. □
A special case of unrolling is when, instead of the generic X m in each iteration, we use a predetermined dictionary W , and allow the step size μ to vary between iterations. We then have the following corollary.
Corollary 3. 
Let σ m i n ( W ) and σ m a x ( W ) denote, respectively, the smallest and largest singular value of W . Suppose
X m = I 1 μ m W T W ,
and the following conditions are satisfied:
1. 
The matrix W has full column rank, so that W T W is positive definite, with singular values that satisfy
0 < σ m i n ( W ) < σ m a x ( W ) < .
2. 
There exists a small positive ϵ ( 0 < ϵ < < 1 ) such that
σ m a x 2 ( W ) 2 ϵ < μ m < σ m i n 2 ( W ) ϵ
for all m.
Convergence in (30) is then achieved.
Proof. 
We will show that conditions 1 and 2 of Theorem 4 are satisfied. The theorem is then invoked to prove the corollary. Firstly, we have
X m T = I T 1 μ m ( W T W ) T = I 1 μ m W T W = X m
This means that X m is symmetric and has real eigenvalues. Therefore, the singular value ( σ i ) squared is equal to the eigenvalue ( λ i ) squared; i.e., σ i 2 ( X m ) = λ i 2 ( X m ) and σ i ( X m ) = | λ i ( X m ) | . Using some fundamental identities of the eigenvalues of a matrix and the definition of singular values, we have
λ i ( X m ) = λ i I 1 μ m W T W = 1 1 μ m λ i ( W T W ) = 1 1 μ m σ i 2 ( W )
Since the operator norm is also equal to the spectral norm, we have
| | X m | | 2 = σ m a x ( X m ) = max i | λ i ( X m ) | = max i 1 1 μ m σ i 2 ( W )
Firstly, condition (33) ensures that μ m is positive. Now condition (32) implies that there are at least two distinct singular values of W . Therefore the last expression in (34) cannot be zero. Furthermore, since condition (32) implies that all singular values σ i ( W ) are finite, the last expression in (34) must be finite. Therefore | | X m | | 2 is finite and positive, which is condition 1 of Theorem 4.
Condition 2 of Theorem 4 (inequality (29)) is satisfied if there exists a small positive ϵ ( 0 < ϵ < < 1 ) such that
| | X m | | 2 = max i 1 1 μ m σ i 2 ( W ) < 1 ϵ .
for all m. This is achieved if
1 1 μ m σ i 2 ( W ) < 1 ϵ
for all i. This implies
1 + ϵ < 1 1 μ m σ i 2 ( W ) < 1 ϵ
ϵ < σ i 2 ( W ) μ m < 2 ϵ
σ i 2 ( W ) 2 ϵ < μ m < σ i 2 ( W ) ϵ
This condition is ensured by (33). Therefore condition 2 of Theorem 4 is satisfied. □
Remark 4. 
1. 
Note that sup m L | | X m | | 2 c < 1 in (29) is a sufficient condition but not necessary.
2. 
In both Theorem 4 and Corollary 3, there is no restriction on either the soft-threshold parameters β m , or the matrix Y m .
3. 
In the machine learning paradigm, the expressivity of the algorithm generally increases with the number of parameters. With the special case of unrolling in (31), there is only one parameter μ m in X m . However, for convergence, it is not necessary to have
Y m = 1 μ m W T ,
even though that is the case with the original ISTA algorithm in (23). A general Y m R q × p , which has q × p parameters, can be used and results in higher expressivity.
4. 
The term Y m is a generalization of the term 1 μ W T y in (23), which is related to the input y . In the original ISTA formulation, there is only one input, but in a data-driven machine learning framework, there are multiple inputs y n . This generalization allows the algorithm to adapt to this situation.

7. Concluding Remarks

Iterative algorithms are at the heart of many areas in data science and mathematics. Traditionally in these algorithms, the iteration is fixed, and can be represented mathematically with a mapping on metric spaces. Convergence of these algorithms is usually analyzed with the aid of the well-known Banach Fixed Point theorem from functional analysis. This work has extended the convergence analysis to iterations with different mappings and different metric spaces. The results are relevant in the analysis of algorithm unrolling, which is a new approach for developing interpretable machine learning algorithms.

Funding

This research received no external funding.

Data Availability Statement

Data sharing is not applicable.

Acknowledgments

The author would like to thank the reviewers for their constructive comments, which led to an improvement in the quality of the paper.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Monga, V.; Li, Y.; Eldar, Y.C. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing. IEEE Signal Process. Mag. 2021, 38, 18–24. [Google Scholar] [CrossRef]
  2. Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
  3. Li, B.; Shi, B.; Yuan, Y.X. Proximal subgradient norm minimization of ISTA and FISTA. Appl. Comput. Harmon. Anal. 2026, 82, 101848. [Google Scholar] [CrossRef]
  4. Ahmadi, S.; Hauffen, J.C.; Kästner, L.; Jung, P.; Caire, G.; Ziegler, M. Learned Block Iterative Shrinkage Thresholding Algorithm for Photothermal Super Resolution Imaging. Sensors 2022, 22, 5533. [Google Scholar] [CrossRef] [PubMed]
  5. Gan, H.; Wang, X.; He, L.; Liu, J. Learned Two-Step Iterative Shrinkage Thresholding Algorithm for Deep Compressive Sensing. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 3943–3956. [Google Scholar] [CrossRef]
  6. Kouni, V.; Panagakis, Y. Generalization analysis of an unfolding network for analysis-based compressed sensing. Appl. Comput. Harmon. Anal. 2025, 79, 101787. [Google Scholar] [CrossRef]
  7. Naimipour, N.; Khobahi, S.; Soltanalian, M. Unfolded Algorithms for Deep Phase Retrieval. Algorithms 2024, 17, 587. [Google Scholar] [CrossRef]
  8. Zhang, L.; Wang, G.; Giannakis, G.B. Real-time power system state estimation and forecasting via deep unrolled neural networks. IEEE Trans. Signal Process. 2019, 67, 4069–4077. [Google Scholar] [CrossRef]
  9. Lohit, S.; Liu, D.; Mansour, H.; Boufounos, P.T. Unrolled projected gradient descent for multi-spectral image fusion. In ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: Piscataway, NJ, USA, 2019; pp. 7725–7729. [Google Scholar] [CrossRef]
  10. Dardikman-Yoffe, G.; Eldar, Y.C. Learned SPARCOM: Unfolded deep super-resolution microscopy. Opt. Express 2020, 28, 27736–27763. [Google Scholar] [CrossRef] [PubMed]
  11. Kiriki, S.; Nakano, Y.; Soma, T. Historic behaviour for nonautonomous contraction mappings. Nonlinearity 2019, 32, 1111–1124. [Google Scholar] [CrossRef]
  12. Kreyszig, E. Introductory Functional Analysis with Applications; John Wiley and Sons: Hoboken, NJ, USA, 1978. [Google Scholar] [CrossRef][Green Version]
  13. Akkouchi, M. On Sequences of Certain Contractive Mappings and Their Fixed Points. Montes Taurus J. Pure Appl. Math. 2021, 3, 70–77. Available online: https://mtjpamjournal.com/papers/article_id_mtjpam-d-20-00008/ (accessed on 30 March 2026).
  14. Imdad, M.; Khan, M.S.; Sessa, S. On Sequences of Contractive Mappings and Their Fixed Points. Int. J. Math. Math. Sci. 1988, 3, 527–534. [Google Scholar] [CrossRef]
  15. Nadler, S., Jr. Sequences of Contractions and Fixed Points. Pac. J. Math. 1968, 27, 579–585. [Google Scholar] [CrossRef]
  16. Singh, S.B. On Sequences of Contractions Mappings. Riv. Mat. Univ. Parma 1970, 2, 227–231. Available online: https://www.rivmat.unipr.it/fulltext/1970-11/1970-11-227.pdf (accessed on 30 March 2026).
Figure 1. Sequence of mappings between metric spaces.
Figure 1. Sequence of mappings between metric spaces.
Algorithms 19 00470 g001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tay, D.B. On Iterative Algorithms with Different Mapping in Each Iteration. Algorithms 2026, 19, 470. https://doi.org/10.3390/a19060470

AMA Style

Tay DB. On Iterative Algorithms with Different Mapping in Each Iteration. Algorithms. 2026; 19(6):470. https://doi.org/10.3390/a19060470

Chicago/Turabian Style

Tay, David B. 2026. "On Iterative Algorithms with Different Mapping in Each Iteration" Algorithms 19, no. 6: 470. https://doi.org/10.3390/a19060470

APA Style

Tay, D. B. (2026). On Iterative Algorithms with Different Mapping in Each Iteration. Algorithms, 19(6), 470. https://doi.org/10.3390/a19060470

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop