IPGM: Inertial Proximal Gradient Method for Convolutional Dictionary Learning

: Inspired by the recent success of the proximal gradient method (PGM) and recent efforts to develop an inertial algorithm, we propose an inertial PGM (IPGM) for convolutional dictionary learning (CDL) by jointly optimizing both an (cid:96) 2 -norm data ﬁdelity term and a sparsity term that enforces an (cid:96) 1 penalty. Contrary to other CDL methods, in the proposed approach, the dictionary and needles are updated with an inertial force by the PGM. We obtain a novel derivative formula for the needles and dictionary with respect to the data ﬁdelity term. At the same time, a gradient descent step is designed to add an inertial term. The proximal operation uses the thresholding operation for needles and projects the dictionary to a unit-norm sphere. We prove the convergence property of the proposed IPGM algorithm in a backtracking case. Simulation results show that the proposed IPGM achieves better performance than the PGM and slice-based methods that possess the same structure and are optimized using the alternating-direction method of multipliers (ADMM).


Introduction
Sparse representation is a popular a priori mathematical modeling approach in various signal and image processing applications. Inspired by deep learning-based convolutional operations, convolutional sparse representation is a hot topic at present. In the convolutional sparse representation model, the convolutional dictionary learning (CDL) process plays an important role, but the associated algorithm and convergence proof are difficult problems [1]. CDL involves training dictionaries and estimating codes from multiple signals. In the past decade, the convolutional sparse representation model has achieved impressive results in various applications. The resulting model can achieve excellent performance in image denoising and repair [2], image decomposition [3], image reconstruction [4], medical imaging [5,6], trajectory reconstruction [7], image segmentation [8], superresolution [9], audio processing [10], and other applications. CDL research is a typical interdisciplinary research subject combining mathematics and artificial intelligence. The research results in this field have important theoretical and practical value for a variety of signal and image processing applications. In such a case, a typical assumption is that a signal y ∈ R N , y = DΓ is a linear combination of columns, also known as atoms. The coefficient Γ ∈ R M is a sparse vector. A matrix D ∈ R N×M is called a dictionary. Given y and D, this task of finding its sparsest representation is equivalent to solving the following problem: min Γ Γ 0 s.t. y-DΓ 2 where ε represents the degree of model mismatch or the additive noise intensity. The solutions to such problems can be approximated by greedy algorithms (such as orthogonal matching pursuit (OMP) [11]) or convex formulas (such as basis pursuit (BP) [12]). The task of the developed learning model is to identify the dictionary D that can best represent a set of training signals, called dictionary learning. Several processing methods have been proposed, including K-singular value decomposition (K-SVD) [13], the method of optimal directions (MOD) [14], online dictionary learning [15], and trainlet learning [16]. Unfortunately, many real-world signals, such as images and audio, are high-dimensional signals, making sparse coding computationally challenging. To avoid the curse of dimensionality, high-dimensional signals are decomposed into low-dimensional overlapping signal patches, and sparse coding is performed independently on each signal block [17]. Because of its simplicity and high performance, this method has been widely utilized in a successful manner. However, this method has some limitations that its patch-based process ignores the correlations between patches. Another option that contrasts with this local paradigm is a global model [18] called the convolutional sparse representation model. The input signal is represented as the superposition of the convolutional results of some local atoms and sparse feature mappings. The problem is solved by applying a specific structure to the global dictionary involved. In particular, the dictionary in this model is constrained to a banded circulant matrix formed by atomic cascades, which is called convolutional sparse representation.
The convolutional sparse representation model is as follows: where a signal y ∈ R N , the banded global convolutional dictionary D ∈ R N×NM is composed of a local convolutional dictionary filter, {d m } M m=1 ∈ R n , a cyclic shift composed of the atom D L ∈ R n×M , and the coding coefficient This work proposes a theoretical analysis of a new global convolutional sparse representation model, which guarantees the local sparsity metric. The convolutional dictionary directly determines translation invariance to compensate for the problem that the correlations between adjacent signal patches are ignored. By using this convolutional dictionary structure, convolutional sparse coding (CSC) and CDL have arisen. For the convenience of calculation, the 1 norm of the coefficient is used instead of the 0 norm, and a spherical constraint is used for the dictionary instead of the sphere constraint. Several convex and relaxed CDL algorithms have been proposed [19][20][21][22]. They mainly utilize the alternatingdirection method of multipliers (ADMM) solver based on the Fourier domain. However, the ADMM loses its connection to the patch-based processing paradigm that is widely used in many signal and image processing applications [23].
Papyan proposed a new convex relaxation algorithm [23] via slice-based dictionary learning (SBDL), where the sum of the slices forms the signal patches. The developed algorithm is an ADMM solver based on the signal domain. It adopts a local point of view and trains a convolutional dictionary only according to the local calculation of the signal domain. It runs on local slices and faithfully solves the problem of global convolutional sparse coding. This local-global method and its resulting decomposition follow a recent work [18] that compared it with the Fourier-based method. The SBDL algorithm is easy to implement, is intuitive, achieves the most advanced performance, and converges faster than other approaches. Moreover, it provides a better model that can naturally allow a different number of nonzero values to be contained in each spatial location according to the local signal complexity. The ADMM parameters strongly depend on the given problem. Although its corresponding pursuit algorithm can be proven to be convergent, when the convolutional dictionary is iteratively updated in the ADMM algorithm, it is difficult to prove the convergence of the tracking algorithm [24].
The convex relaxation and ADMM-based optimization algorithm produce a nonsparse coding solution, and it is challenging to prove the algorithmic convergence [25]. In addition, in nonconvex optimization, the greedy algorithm for CDL problems has a high computational cost, poor performance, and convergence proof difficulty [26]. Chun and Fessler [27] recently proposed an algorithm that could achieve full convergence. However, the method involves approximate convex constraints, and its overall performance is slightly better than that of the ADMM. The CDL problem is essentially a nonconvex and nonsmooth optimization problem, as shown in the following formula, making it difficult to propose an optimization algorithm and prove its convergence.
where λ i , i = 1, 2 is the equilibrium coefficient and Ω i , i = 1, 2 is the nonconvex constraint of the coefficient and the convolutional dictionary. For example, in a case with typical nonconvex constraints, Ω 1 (x) = x 0 is a zero norm, and Peng [28] realized the joint and direct optimization of the CDL problem under a nonconvex and nonsmooth optimization scheme and proposed a forward-backward splitting algorithm based on the Fourier domain. The developed approach is better than the ADMM algorithm. To be more precise, the forward step involves estimating the smooth part of the objective function through a partial gradient. In contrast, the backward step counts the degree of nonsmoothness of the objective function through its proximal operator. Peng proved the convergence of the solution sequence of the proposed algorithm by using the semialgebraic property of reference [29] and the Kurdyka-Lojasiewicz (KL) property. Peng [28] used the gradient descent algorithm in the dictionary and coding process. Although this dramatically reduces the computational complexity of the overall algorithm, in theory, the gradient descent algorithm can only guarantee that it will reach the lowest local point, not the lowest global point. Many minimum points are contained in many complex functions. In many cases, we can only obtain the optimal local solution by using the gradient descent method and not the optimal global solution. In addition, when the sample size of the given dataset is large, the convergence speed of the gradient descent algorithm is slow.
Polyak [30] proposed the heavy-ball method that an inertia term is added to the standard gradient descent method. This method has a faster convergence rate than the standard gradient method based on an unchanging number of required calculations. Peter Ochs [31] applied this method to a convex optimization scheme. He proposed an iPiano algorithm combining an inertia term and a forward-backward splitting frame to address minimization problems consisting of differentiable (possibly nonconvex) and convex (possibly nondifferentiable) functions. These problems were strictly analyzed, and the global convergence of the function values and parameters was determined. Simultaneously, the efficiency of convergence was greatly improved. We will apply this inertia term algorithm to the CDL problem.
In this paper, an inertial forward-backward splitting algorithm is proposed for the CDL problem. The convergence of the algorithm is given and proven. The optimal convergence rate is derived. Finally, a large number of experiments show that the proposed algorithm has high efficiency and effectiveness.
The rest of this article is organized as follows. In Section 2, we introduce some related knowledge. In Section 3, the inertia forward and backward splitting algorithm is proposed to restate the CDL problem. In Section 4, the complexity of the proposed algorithm is analyzed. In Section 5, the convergence of the proposed algorithm is analyzed and proven. The convergence rate of the proposed algorithm is derived in Section 6. In Section 7, the performance of the proposed algorithm is evaluated through experiments and compared with other existing methods. Section 8 summarizes the full text.

Related knowledge 2.1. Convex CDL in the Time Domain via Local Processing
The CSC model [19] assumes that a global signal y ∈ R N can be decomposed as The matrix D ∈ R N×N M is a banded convolutional dictionary. This matrix consists of all shifted versions of a local dictionary D L ∈ R n×m whose columns are atoms. L represents the initial of the word "local." The global sparse vector Γ ∈ R N M can be decomposed into N non-overlapping, m-dimensional local sparse vectors {α i } N i=1 , where α i ∈ R M are called needles. The operator P T i places D L α i in the ith position of the signal. The above formula can be used to obtain sparse coefficients using the basic pursuit problem, as shown below: Papyan et al. [23] proposed slice-based local processing and defined s i = D L α i as the ith slice. Unlike other existing works in signal and image processing, which train a dictionary in the Fourier domain, they defined the learning problem based on the constructed slices, and CDL was carried out in the signal domain via local processing. Through local processing in the original signal domain, the global problem was solved completely. The global signal y l can be rewritten as y = ∑ N i=1 P T i s i . The ADMM algorithm is used to solve the problem of minimizing the following augmented Lagrangian problem: are the dual variables that satisfy the given constraint. ρ is the Lagrangian coefficient. References [23,24] call this method the SBDL algorithm.
In the SBDL algorithm, the CDL problem is transformed into a traditional dictionary learning problem solved by the K-SVD algorithm [13] or other dictionary learning algorithms. The sparse coding process is based on the ADMM algorithm for updating, and each coding update is regarded as a least absolute shrinkage and selection operator (LASSO) problem, which is solved by the least-angle regression and shrinkage (LARS) algorithm. The ADMM is used to solve the coding problem, which increases the numbers of auxiliary variables, calculations and redundant iterations. In addition, different solving methods are used to update dictionary and codes, which makes it difficult to prove the convergence of the algorithm.

Forward-Backward Splitting
Splitting algorithms for convex optimization problems usually originate from the proximal point algorithm [32]. The proximal point algorithm is very general, and the results regarding its convergence affect many other algorithms. However, in practice, an iteration of the computational algorithm can be as tricky as the original problem. The strategy to solve this problem involves splitting approaches such as the Douglas-Rachford method, several primal-dual algorithms, and forward-backward splitting.
Forward-backward splitting schemes have been used to solve a variety of problems in recent years. For example, forward-backward splitting algorithms are used to solve normal problems [33], to find generalized Nash equilibrium [34], to solve linear constraint problems [35], or to analyze related function problems in Banach space [36,37]. In particular, it is appealing to generalize forward-backward splitting schemes to nonconvex problems. This is due to their simplicity and simpler formulations in some exceptional cases, such as the gradient projection method, where the backward step is a projection onto a set. The backward step of the forward-backward algorithm studied in [28] is the solution of a proximal term of a nonconvex function.
The goal of a forward-backward splitting framework is to solve the following forms of an optimization problem: argmin However, when a large amount of data is processed, even if misestimation is not considered during processing, the processing result of the forward-backward splitting algorithm becomes inaccurate. In this paper, the original algorithm is improved to increase the accuracy and reduce the induced error.

Inertia Item
Polyak studied a multistep scheme for the accelerated gradient method in [30] and proposed the heavy-ball method for the first time. Unlike the usual gradient method, this approach adds an inertia term, which is calculated by the difference between the values obtained during the previous two iterations. Compared with the standard gradient method, this method can accelerate the convergence speed of the algorithm while keeping the cost of each iteration unchanged. In addition, this method can obtain the optimal convergence rate without additional knowledge.
The inertia term of the heavy-ball method was first applied to the minimization of differentiable convex functions, and then it was extended to subdifferential convex functions. Later, the heavy-ball method was used for forward-backward splitting [37,38]. Recently, it has been frequently applied to the minimization of nonconvex functions. In [39,40], the inertia term was introduced into the nonconvex optimization problem to improve the convergence speed. Refs. [31,41] used an inertia term to develop a nonconvex optimization CDL scheme. The authors of [31] aimed to minimize problems composed of differentiable (possibly nonconvex) and convex (possibly nondifferentiable) functions. The iPiano algorithm was proposed by combining forward-backward splitting with an inertia term. The global convergence of the function values and parameters was determined. In [41], an inertial version of the proximal alternating linearization minimization (PALM) algorithm was proposed, and its global convergence to the critical point of the input objective function was proven.

Proposed IPGM Algorithm
The CDL problem to be solved via local processing is given as follows: D L is the local convolutional dictionary, which has n rows and m columns. α l,i , which has m rows, is the sparse coding of each component i of each sample l. P T i , which has N rows and n columns, is the operator that puts D L α l,i in the ith position and pads the rest of the entries with zeros. y l is the observed signal. · has different meanings; when the interior is a vector, · represents the 2 -norm, and when the interior is a matrix, · represents the Frobenius norm. λ 1 and λ 2 are hyperparameters. Ω 1 is the sparse constraint imposed on the column vectors with the 0 norm or 1 norm, which is defined as follows: 1 . Ω 2 is the indicator function of unit-norm sphere.
To solve the above convex or nonconvex CDL optimization problem (8) via local processing, we use the inertial forward-backward splitting framework. An inertial forwardbackward splitting algorithm via local processing called the inertial proximal gradient method (IPGM) algorithm is proposed.
To form the objective function of the proposed CDL optimization problem via local processing with the developed framework, based on the proposed Equation (8), supposing , x t can be generated by iteratively implementing the following equation: where prox refers to the proximal mapping operation. f and g are, respectively defined as follows: To generate the sequence x t = D t L , α t l,i using iterative Equation (9) based on inertial forward-backward splitting, we first need to derive the gradient of f and the proximal mapping of g.
Since f is a function of the composite variable x or D L , α l,i , we define the gradient of f as follows: ∇ D L f and ∇ α l,i f can be computed as follows: To show the result of the descent process, we use an intermediate variable where η t is a step size or descent parameter and ξ t is an inertial parameter. Therefore, we compute the proximal mapping of g at as follows: The function prox refers to the proximal mapping operation, which is defined with the following form: Furthermore, in the inertial forward-backward splitting framework, to accelerate the convergence rate, we propose using η t to fit the local curvature of f in each iteration and capturing the local curvature f by estimating the local Lipschitz constant of the CDL problem via local processing. This constant is defined as follows: However, the direct derivation of η t from L t does not satisfy the convergence requirement. It is suggested to insert a backtracking scheme in the inertial forward-backward splitting framework to restore its convergence.
We introduce a sequence {τ t }, where τ t > 1 holds for all t, to represent the adaptive parameter such that the sequence x t satisfies the following equation: We assume that each step size η t is maintained by τ t using the inequality 0 < η t < (1/(τ t L t )) and that (20) can be reformulated as follows: where stands for the Hadamard product. We present the IPGM algorithm that solves the proposed CDL problem via local processing using inertial forward-backward splitting in Algorithm 1.

Computational Complexity Analysis
The computational complexity of our algorithms is discussed as follows. We assume the following. N is the signal dimension. I is the number of signals. n is the patch size. m is the number of filters. k is the maximal number of nonzeros in a needle α l,i , which is very sparse k m. C is the number of backtracking loops, which is usually very small. The convolution is performed by the local processing operation. The operation P T i is only an operator of a column to an image.
The dominant computational complexity of the signal reconstruction performed by the local processing operation is approximately O(I Nnk). The computation of the signal residual requires O(I N) operations. Neglecting some minor factors, the computational complexity of the gradient of needles and dictionary performed by the local processing operation is effectively O(I Nnm) and O(I Nnk), respectively. In addition, the inertia term includes an addition and a multiplication for each iteration which is negligible.

Proof of the Convergence of Algorithm 1
Before describing the convergence theorem, let us analyze the relevant lemmas and hypotheses.
We fix two positive constants a > 0 and b > 0 and consider a proper lower semicontinuous function F : R 2N → R ∪ {∞} . Then, the conditions for z t t∈N are as follows: (H1) For each k ∈ N, it holds that H2) For each k ∈ N, there exists a w k+1 ∈ ∂F w k+1 such that There exists a subsequence z k j j∈N such that z k j →z and F z k j → F z as j → ∞ When F is a KL function and H1, H2, and H3 are satisfied, F satisfies the convergence result.
Proof of Lemma 2. By the algorithmic requirements, The upper bounds for ξ t and η t are obtained by rearranging γ t ≥ c 1 to The last statement follows by incorporating the descent property of δ t . Let δ −1 ≥ c 1 be chosen initially. Then, the decent property of (δ t ) ∞ t=0 requires one of the equivalent statements below to be true. An upper bound on η t is obtained by Consider the condition for a nonnegative gap between the upper and lower bounds of η t : Defining b := (2δ + τ t L t )/(2c 1 + τ t L t ) ≥ 1, it is easily verified that there exists ξ t ∈ 0, 1 2 satisfying the equivalent condition As a consequence, the existence of a feasible η t follows, and the decent property for δ t holds.
n=0 is monotonically decreasing and thus convergent. In particular, it holds that Proof of Proposition 1. (a) For a more convenient notation, we abbreviate h = f + g in the IPGM algorithm for the nonconvex nonsmooth CDL problem via local processing; this does not mean that the value of the function drops, so we construct a function H δ (x, y) = h(x) + δ x − x 2 , δ ∈ R. Note that for x = y, H δ (x, y) = h(x).
We show below that H δ (x, y) satisfies the convergence requirement. In the inertial forward-backward splitting framework, the sequence x t is generated by iteratively implementing Equation (9).
Notably, in the IPGM algorithm, the function f satisfies the adaptive descent formula in (20).
The function g is a nonconvex, nonsmooth, and proper closed function. Combining the iterative mapping of the sequence x t in the inertial forward-backward splitting framework and the definition of the proximal operator, we can obtain Thus, this simplifies to Now, using (20) and (29) by summing both inequalities, it follows that where the second line follows from 2 A, B ≤ A 2 2 + B 2 2 for vectors A, B ∈ R N . Then, a simple rearrangement of the terms yields: which establishes (27) as δ t is monotonically decreasing. The sequence H δ t x t , x t−1 ∞ n=0 monotonically decreases if and only if γ t ≥ 0, which is confirmed by the algorithmic requirements. By assumption, h is bounded from below by some constant h > −∞; hence, Letting T tend to ∞, it can be seen from (32) that lim t→T γ t ∆ 2 t = 0 since γ T ≥ c 1 > 0 implies the above statement.
Proof of Proposition 2. (a) This follows from the squeeze theorem as for all t ≥ 0, the following holds: and due to Propositions 1(a) and (b), (b) By Proposition 1(a) and the fact that H δ 0 x 0 , x −1 = h x 0 , it is clear that the whole sequence (x t ) ∞ t=0 is contained in the level set x ∈ R N : h ≤ h(x) ≤ h x 0 , which is bounded due to the coercivity of h and because h = inf x∈R N h(x) > −∞. Using the Bolzano-Weierstrass theorem, we deduce the existence of a convergent subsequence (x t k ) ∞ k=0 . (c) To show that each limit point x * = lim j→∞ x t j is a critical point of h(x), we recall that the subdifferential is closed. We define Then, the sequence x t j , ς j ∈ Graph(∂h) := (x, ς) ∈ R N × R N ς ∈ ∂h(x) . Furthermore, it holds that x * = lim j→∞ x t j , and due to Proposition 1(b), ∇ f is Lipschitzcontinuous, and It holds that lim j→∞ ς j = 0. It remains to be shown that lim j→∞ h x t j = h(x * ). By the closure property of the subdifferential ∂h, (x * , 0) ∈ Graph(∂h), which means that x * is a critical point of h.
According to the iterative mapping (Equation (9)) of the sequence x t in the inertial forward-backward splitting framework, we can obtain: which implies that Proposition 1(b) and the boundedness of . Invoking the lower semicontinuity of g yields lim j→∞ g x t j = g(x * ). Moreover, f is differentiable and continuous, thus, lim j→∞ f x t j = f (x * ). We imply that lim j→∞ h x t j = h(x * ). Now, using Lemma 1, we can verify the convergence of the sequence x t t∈N generated by Algorithm 1.
Theorem 1 (Convergence of the IPGM algorithm to a critical point). Let x t t∈N be generated by Algorithm 1, and let δ t = δ for all t ∈ N. Then, the sequence x t+1 , x t t∈N satisfies H1, H2, and H3 for the function H δ : Moreover, if the sequence possesses the KL property at a cluster point, then the sequence has a finite length and is a critical point H δ ; hence, x * is a critical point of h.
Proof of Proposition Theorem 1. First, we prove that the function has the KL property.h is a semialgebraic function. If we consider that 2 , then x − y 2 is a polynomial function. Therefore, it is semialgebraic, so δ x − y 2 is a semialgebraic function. H δ (x, y) is semialgebraic and has the KL property.
Next, we verify that Assumptions H1, H2, and H3 are satisfied.
Condition H1 is proven in Proposition 1(a) with a = c 2 ≤ γ t . To prove Condition H2, consider w t+1 := (w t+1 x , w t+1 The Lipschitz continuity of ∇ f and the use of (9) to specify an element from ∂g x t+1 imply that In Proposition 2(c), it is proven that there exists a subsequence x t j +1 j∈N of x t t∈N such that lim j→∞ h x t j +1 = h(x * ). The following corollary uses the fact that semialgebraic functions have the KL property. Proposition 1(b) shows that x t+1 − x t → 0 as t → ∞ ; hence, lim j→∞ x t j+1 = x * . As the term δ x − y 2 is continuous in x and y, we deduce that Therefore, Condition H3 is proven. Now, Theorem 1 concludes the proof.

Remark 1. IPGM algorithm is convergent under nonconvex optimization.
It is easy to prove the convergence under convex optimization constraints, which is equivalent to a special case of the iPiano algorithm for the CDL problem.

Convergence Rate
We prove a global O(1/k) convergence rate for x t+1 − x t 2 . We first define the error u N to be the smallest squared 2 norm value of successive iterations: Theorem 2. Algorithm 1 guarantees that for all N ≥ 0,

Proof of Proposition Theorem 2.
In view of Proposition 1(a) and the definition of γ N in (21), summing both sides of (26) for n = 0, . . . , N and using the fact that δ N > 0 from (21), we obtain As γ n > c 1 , a simple rearrangement concludes the proof.

Experimental Results and Analysis
In this section, we compare the performance of the proposed IPGM with that of various existing methods with respect to solving CDL problems. SBDL denotes the method proposed in [23] based on convex optimization with an 1 norm-based sparsity-inducing function that uses slices via local processing.

2.
Local block coordinate descent (LoBCoD) denotes the method proposed in [24] based on convex optimization with an 1 norm-based sparsity-inducing function that utilizes needle-based representation via local processing. 3.
The PGM denotes the method proposed in [1] based on convex optimization with an 1 norm-based sparsity-inducing function that uses the Fast Fourier transform (FFT) operation.

4.
The IPGM denotes the method proposed in Algorithm 1 in this paper uses needlebased representation via local processing.

Parameter Settings
The parameter settings used for the comparison methods are described as follows: 1.
In SBDL [23], the model parameter is λ, and its initial value is 1. The maximum nonzero value of the LARS algorithm is 5. The filter size of the dictionary is 11*11*100. In addition, special techniques are used in the first few iterations of the dictionary learning process.

2.
In LoBCoD [24], the model parameter λ is initialized to 1. The maximum nonzero value of the LARS algorithm is 5. The filter format of the dictionary is 11*11*100. In addition, the dictionary learning process first carries out 40 iterations and then uses the proximal gradient descent algorithm.

3.
In the PGM [1] model, the Lipschitz constant L is set to 1000 for fruit, and λ = 1.

4.
In the IPGM, the model parameter L t is set at 1000 for fruit, and λ = 1.

Implementation Details
The computations of the SBDL, LoBCoD, and PGM algorithms, as well as the IPGM algorithm, are performed using a PC with an Intel i5 CPU and 12 GB of memory.

Motivation and Evaluation Metrics
The efficiency and stability of objective function are the most important criteria for evaluating a numerical optimization algorithm. The sequence generated by an efficient dictionary learning algorithm converges quickly to the corresponding clustering point with a lower function value. The relevant evaluation indicators are listed below.

1.
Final Value of f + g and the Algorithmic Convergence Properties: In the f + g minimization-based analysis method, we use the function value of the generated sequence to evaluate its optimization efficiency and convergence.

2.
Computing Time: We compare the computing times of the SBDL, LoBCoD, PGM, and IPGM algorithms and compare the computing efficiency.

Training Data and Experimental Settings
A set of generic grayscale natural images is used as the training data in this experiment. The set includes 10 fruit images (100 × 100 pixels). It is decomposed into high-and low-frequency components. The means are subtracted. These images are normalized to a range of 0 to 1. The dictionary includes 100 elements, the sizes of which are 11 × 11. One thousand iterations are carried out for each method. The dataset comes from SBDL [23] and LOBCOD [24], and the experimental process refers to the experimental process in references SBDL and LOBCOD. Similar to references [23,24], there is a similar trend in the experimental results under different atomic numbers or different sizes. We give the final objection function value representing the convergence properties and computing time representing convergence efficiency using only the conditions of 100 atoms with a size of 11 × 11 from the fruit grayscale natural images. Each of the average results represents the four trials using different initial dictionaries, from which the initial coefficients were derived.

Results
The experimental results are described as follows. The set of training data described above is used to derive the results. Figure 1 shows the functional values of all compared methods in each iteration of the dictionary learning procedure. In addition, Table 1 shows the experimental data yielded by all compared methods over 1000 iterations. Figures 2 and 3 correspond to the trained dictionaries and reconstructed images of all compared methods after 1000 iterations, respectively.
Electronics 2021, 10, x FOR PEER REVIEW 16 of 21 11 × 11 from the fruit grayscale natural images. Each of the average results represents the four trials using different initial dictionaries, from which the initial coefficients were derived.

Results
The experimental results are described as follows. The set of training data described above is used to derive the results. Figure 1 shows the functional values of all compared methods in each iteration of the dictionary learning procedure. In addition, Table 1 shows the experimental data yielded by all compared methods over 1000 iterations. Figures 2  and 3 correspond to the trained dictionaries and reconstructed images of all compared methods after 1000 iterations, respectively.

Discussion
A discussion of the experimental results is as follows. Table 1 shows the experimental results produced by the four algorithms for the fruit dataset, which shows that the IPGM method yields better results in terms of performances. In terms of the objective function values of the four algorithms, the IPGM algorithm is lowest with 9.449×10 3 . In terms of sparsity, except for the poor sparsity of the PGM, there is little difference among the other algorithms. The time consumption of our algorithm is 122.722 s, which is better than that of LoBCoD and the PGM but worse than that of SBDL. The SBDL algorithm calls for the C++ function in the MATLAB program, so it requires less time than the IPGM method. The peak signal-to-noise ratio (PSNR) of the IPGM algorithm is best with 29.438 dB, which is the highest value among the four algorithms. After analyzing Figure 3 above, we see that compared with other algorithms, the IPGM algorithm provides the reconstructed image with the clearest texture details. Generally, the IPGM algorithm has the best performance and the best effect when compared with other algorithms.

Discussion
A discussion of the experimental results is as follows. Table 1 shows the experimental results produced by the four algorithms for the fruit dataset, which shows that the IPGM method yields better results in terms of performances. In terms of the objective function values of the four algorithms, the IPGM algorithm is lowest with 9.449×10 3 . In terms of sparsity, except for the poor sparsity of the PGM, there is little difference among the other algorithms. The time consumption of our algorithm is 122.722 s, which is better than that of LoBCoD and the PGM but worse than that of SBDL. The SBDL algorithm calls for the C++ function in the MATLAB program, so it requires less time than the IPGM method. The peak signal-to-noise ratio (PSNR) of the IPGM algorithm is best with 29.438 dB, which is the highest value among the four algorithms. After analyzing Figure 3 above, we see that compared with other algorithms, the IPGM algorithm provides the reconstructed image with the clearest texture details. Generally, the IPGM algorithm has the best performance and the best effect when compared with other algorithms.  In the proposed IPGM algorithm, the inertia parameters are assigned directly, and they are equivalent to fixed values. Nevertheless, the magnitudes of the inertia parameters have specific impacts on the performance of the algorithm. Therefore, this section uses

Discussion
A discussion of the experimental results is as follows. Table 1 shows the experimental results produced by the four algorithms for the fruit dataset, which shows that the IPGM method yields better results in terms of performances. In terms of the objective function values of the four algorithms, the IPGM algorithm is lowest with 9.449 × 10 3 . In terms of sparsity, except for the poor sparsity of the PGM, there is little difference among the other algorithms. The time consumption of our algorithm is 122.722 s, which is better than that of LoBCoD and the PGM but worse than that of SBDL. The SBDL algorithm calls for the C++ function in the MATLAB program, so it requires less time than the IPGM method. The peak signal-to-noise ratio (PSNR) of the IPGM algorithm is best with 29.438 dB, which is the highest value among the four algorithms. After analyzing Figure 3 above, we see that compared with other algorithms, the IPGM algorithm provides the reconstructed image with the clearest texture details. Generally, the IPGM algorithm has the best performance and the best effect when compared with other algorithms.

Motivation and Evaluation Metrics
In the proposed IPGM algorithm, the inertia parameters are assigned directly, and they are equivalent to fixed values. Nevertheless, the magnitudes of the inertia parameters have specific impacts on the performance of the algorithm. Therefore, this section uses ablation experiments to analyze the performance of IPGM under different inertia parameters and the performance of IPGM under different optimization conditions.

Training Data and Experimental Settings
The experimental setting of the ablation experiment is consistent with that of the experiment in the previous section. The fruit dataset is also used as the ablation experiment dataset. The dataset comes from SBDL [23] and LOBCOD [24], and the experimental process refers to the experimental process in references SBDL and LOBCOD. We experiment on IPGM algorithms with different inertia parameters under convex optimization constraints and nonconvex optimization constraints.

Results
A discussion of the experimental results is as follows. Table 2 shows the objective function and PSNR values obtained with different inertia parameters under the constraint of nonconvex optimization. Table 3 shows the objective function and PSNR values obtained with different inertia parameters under the constraint of convex optimization. The effect is the best when the inertia parameter is 0.4. The objective function value is 1.167 × 10 4 , and the PSNR value is 32.734. Under convex optimization, with the increase in the inertia parameter, the value of the objective function decreases, and the value of PSNR increases. The effect is the best when the inertia parameter is 0.9. At this time, the value of the objective function is 9.449 × 10 3 , and the value of PSNR is 29.438. The results show that the larger the inertia parameter setting is, the better the performance of IPGM is under both convex and nonconvex constraints.

2.
According to the ranges of inertia parameters allowed under different constraints, the results in the above table are obtained. By comparing Tables 2 and 3, it can be found that with the increase in inertia parameters, the PSNR of the IPGM algorithm under convex optimization constraints is higher, the objective function value is low, and the performance improves. Under the constraint of nonconvex optimization, although the PSNR of the IPGM algorithm is high, the value of the objective function is also high, and the performance is not ideal. This result shows that the simple utilized inertia term cannot achieve the perfect effect under the nonconvex constraint. In the future, we will continue to study the IPGM algorithm with dynamic inertia parameters under nonconvex optimization.

Conclusions
For the CDL problem, an IPGM algorithm based on a forward-backward splitting algorithm with an inertial term is proposed. The complexity of the algorithm is analyzed, and the convergence of the algorithm is proven. Finally, the IPGM algorithm is compared with other algorithms in terms of performance and effect through experiments. The results show that the IPGM algorithm produces a lower objective function value, lower sparse value, and higher reconstruction PSNR. In summary, the IPGM algorithm has many good theoretical properties, and it is efficient and straightforward.
In addition, according to the allowable range of inertia parameters under different constraints, the ablation experiment of the IPGM algorithm is carried out. According to the experimental results, it can be found that the IPGM algorithm achieves better performance under convex optimization. However, under the nonconvex optimization constraint, the performance of the IPGM algorithm does not reach the ideal value, indicating that the perfect effect cannot be achieved by simply using the inertia term under the nonconvex constraint. In the future, we will further study the IPGM algorithm with dynamic inertia parameters under nonconvex optimization.