Progressive Learning Hill Climbing Algorithm with Energy-Map-Based Initialization for Image Reconstruction

Image reconstruction is an interesting yet challenging optimization problem that has several potential applications. The task is to reconstruct an image using a fixed number of transparent polygons. Traditional gradient-based algorithms cannot be applied to the problem since the optimization objective has no explicit expression and cannot be represented by computational graphs. Metaheuristic search algorithms are powerful optimization techniques for solving complex optimization problems, especially in the context of incomplete information or limited computational capability. In this paper, we developed a novel metaheuristic search algorithm named progressive learning hill climbing (ProHC) for image reconstruction. Instead of placing all the polygons on a blank canvas at once, ProHC starts from one polygon and gradually adds new polygons to the canvas until reaching the number limit. Furthermore, an energy-map-based initialization operator was designed to facilitate the generation of new solutions. To assess the performance of the proposed algorithm, we constructed a benchmark problem set containing four different types of images. The experimental results demonstrated that ProHC was able to produce visually pleasing reconstructions of the benchmark images. Moreover, the time consumed by ProHC was much shorter than that of the existing approach.


Introduction
Image reconstruction refers to the task of reconstructing an image with the restrictions of using specific geometric shapes, e.g., polygons or eclipses. It can be viewed as an optimization problem that minimizes the difference between the reconstructed image and the original image. Finding algorithms that can efficiently solve this problem may give rise to several important applications. One potential application is image compression. Instead of recording the pixel values, one can represent images with a relatively small number of polygons. Given the end points of the polygons and their corresponding colors and transparencies, one can repaint the image. Another potential application would be generating computational art works. Elaborate pictures can be created by using simple geometric shapes to approximate real-world figures.
The image reconstruction problem is very challenging since it comprises many decision variables, which are highly correlated. Moreover, there is no explicit expression for the objective function, making it infeasible to use gradient-based algorithms. To evaluate the fitness of a candidate solution, one needs to draw the polygons on a blank canvas and then calculate the element-wise differences between the reconstructed image and the source image. This evaluation process is very time-consuming. It is impractical for general computing devices to perform a large number of fitness evaluations. From this viewpoint, image reconstruction is a sort of expensive optimization problem [1][2][3][4][5].
Metaheuristic search algorithms are powerful optimization techniques that make no assumption about the nature of the problem and do not require any gradient information. Therefore, they can be applied to a variety of complex optimization problems, especially in the cases of imperfect information or a limited computational capability. According to the number of candidate solutions maintained in the search process, metaheuristic algorithms can be divided into two categories, i.e., single-solution and population-based algorithms. Single-solution-based algorithms iteratively update a single candidate solution so as to push the solution toward a local or global optimum. Some prominent single-solution algorithms include hill climbing [6][7][8], tabu search [9,10], and simulated annealing [11,12]. In contrast to single-solution algorithms, population-based algorithms maintain a population of candidate solutions in the running process. New solutions are generated by extracting and combing the information of several parental solutions. This type of algorithm can be further divided into two subgroups, i.e., evolutionary algorithms (EAs) and swarm intelligence (SI) algorithms. The design of these algorithms was inspired by the theory of evolution (e.g., genetic algorithms (GAs) [13][14][15][16], genetic programming (GP) [17][18][19][20][21], differential evolution (DE) [22][23][24][25][26], and evolutionary strategies (ESs) [27][28][29][30]) or the collective behavior of social animals (e.g., particle swarm optimization (PSO) [31][32][33][34][35] and ant colony optimization (ACO) [36][37][38][39][40][41]).
Since no gradient information is available, metaheuristic search algorithms have become one of the major tools for image reconstruction. In addition, due to the fact that evaluating the fitness of candidate solutions requires a large amount of computation, the running time may be unaffordable if population-based search algorithms are adopted. Therefore, single-solution-based algorithms have been widely used in the literature.
Hill climbing [8] is a popular single-solution-based algorithm. It starts with a random candidate solution and tries to improve the solution by making incremental changes. If a change improves the solution quality, then the change is kept and another incremental change is made to the solution. This process is repeated until no better solutions can be found or the algorithm reaches the predefined limit of fitness evaluations. Each incremental change is defined by random modifications to one or several elements of the candidate solution. This strategy can produce decent results for image reconstruction. However, it does not fully resolve the challenges that arise in the optimization process. Note that each candidate solution is composed of multiple geometric shapes, and all the geometric shapes together form the reconstructed image. The decision variables that determine the position, color, and transparency of the geometric shapes highly correlate with each other. A small change in one geometric shape may influence the appearance of other shapes that overlap with it. On the other hand, the total number of decision variables is several times the number of geometric shapes used to approximate the source image. As the number of shapes increases, the number of decision variables increases rapidly. This causes the rapid growth of the search space, which significantly lowers the search efficiency of hill climbing.
To overcome the challenges posed by the image reconstruction problem, researchers have drawn inspiration from approaches in mathematical optimization and deep learning. In mathematical optimization, it is an effective strategy to divide a complex optimization problem into a sequence of simple problems. Optimizing a sequence of problems one after another provides an approximate solution to the original complex problem [41]. This strategy also applies to many tasks in deep learning. Karras et al. [42] developed a new training methodology for generative adversarial networks (GANs) to produce highresolution images. The key idea is to grow both generator and discriminator networks progressively. Starting from low-resolution images, the methodology adds new layers that introduced higher-resolution details as the training progresses. The detailed architecture of this methodology can be found in [42]. Its incremental learning nature allows GANs to first discover the large-scale structure of image distribution and then shift attention to finer-scale structures, instead of learning all the structures simultaneously. The experimental results showed that the new methodology was able to speed up and stabilize the training process. Tatarchenko et al. [43] developed a deep convolution decoder architecture called the octree generating network (OGN) that is able to generate volumetric 3D outputs. OGN represents its output as an octree. It gradually refines the initial low-resolution structure to higher resolutions as the network goes deeper. Only a sparse set of spatial locations is predicted at each level. More detailed information about the structure of the OGN can be found in [43]. The experimental results proved that OGN had the ability to produce high-resolution outputs in a computation-and memory-efficient manner.
From the above review, it can be noticed that progressively increasing the complexity of learning tasks is an effective strategy for various applications. In addition, this strategy is able to increase the learning speed. Motivated by this finding, we developed a progressive learning strategy for the hill climbing algorithm so that it could progressively find a way of combining basic geometric shapes to approximate different types of images. The basic idea was to reconstruct images from a single geometric shape. After optimizing the variables of the first shape, we stacked a new shape on top of the existing ones. The process was repeated until we reached the number limit. This process amounted to transforming the original complex high-dimensional problem into a sequence of simple lower-dimensional problems. In the problem sequence, the optimization outcome of the former problem served as the starting point of the latter problem. In this way, the challenges created by a high-dimensional search space and variable correlation could be addressed. In addition, an energy-map-based mutation operator was incorporated into our algorithm to facilitate the generation of promising new solutions, which further enhanced the search efficiency.
To evaluate the performance of the proposed algorithm, we constructed a benchmark problem set that contained four different types of images. Experiments were conducted on the benchmark problem set to check the efficacy of the proposed progressive learning strategy, as well as the energy-map-based initialization operator. The experimental results demonstrated that the new strategies were able to enhance the optimization performance and generate images of a higher quality. In addition, the running time was significantly reduced.
The remainder of this paper is organized as follows. Section 2 provides a formal description of the image reconstruction problem. Then, the existing hill climbing algorithm is introduced in detail. Section 3 introduces related studies in the literature. The proposed progressive learning strategy and energy-map-based initialization operator are described in Section 4, where we present the underlying principle of the new strategies. In Section 5, we describe experiments conducted on a set of benchmark images to evaluate the effectiveness of the proposed strategies, with a detailed analysis of the numerical results. Finally, conclusions are drawn in Section 6. Some promising future research directions are pointed out as well.

Background
In this section, a formal description of image reconstruction is first presented. Then, we review the classical single-solution algorithm to lay the groundwork for the proposed progressive learning strategy presented in Section 4. We also discuss some recent studies closely related to the topic of image reconstruction.

Image Reconstruction Problem
It is interesting to note that the image reconstruction problem was first posed in a blog published by Roger Johansson [44]. Johansson presented the idea of reconstructing one of the most famous paintings in the world, the Mona Lisa, using a number of transparent polygons. One would need to adjust the shape and position of the polygons, as well as their colors and transparency, so as to make the reconstructed image as close to the original image as possible. This task can be formulated as an optimization problem, in which the decision variables correspond to the parameters of the polygons. Suppose that the i-th polygon P i has n i vertices; then, P i can be represented by a list of parameters, i.e., [x 1 , y 1 , x 2 , y 2 , . . . , x ni , y ni , r, g, b, a]. The tuple (x j , y j ) denotes the coordinate of the j-th vertex. The last four elements encode the color and transparency of P i . Suppose there are, overall, m polygons; a candidate solution is represented by a sequence of polygons, i.e., [P 1 , P 2 , . . . , P m ]. If all the polygons have the same number of vertices, i.e., n, then the total number of decision variables would be (2n + 4) × m. Therefore, the number of decision variables grows linearly as the number of polygons increases. Figure 1 illustrates the solution encoding. In the figure, each polygon has three vertices. The parameters of the polygons are concatenated to form a long vector.
[x1, y1, x2, y2, …, xni, yni, r, g, b, a]. The tuple (xj, yj) denotes the coordinate of the j-th vertex. The last four elements encode the color and transparency of Pi. Suppose there are, overall, m polygons; a candidate solution is represented by a sequence of polygons, i.e., [P1, P2, …, Pm]. If all the polygons have the same number of vertices, i.e., n, then the total number of decision variables would be (2n + 4) × m. Therefore, the number of decision variables grows linearly as the number of polygons increases. Figure 1 illustrates the solution encoding. In the figure, each polygon has three vertices. The parameters of the polygons are concatenated to form a long vector. Given a candidate solution containing m polygons, the reconstructed image Y can be expressed as:  (1) where f is the reconstruction function. This is generally implemented using off-the-shelf graphics interfaces. One thing worth noting is that the reconstruction function f has no explicit expression. It is also very difficult to build a computational graph for this function.
The sum of the element-wise absolute difference between the source image X and the reconstructed image Y is used as the objective function, as illustrated in Figure 2. Suppose that the size of source image X is W × H × C; the objective can be formulated as follows: ( , ) , where W, H, and C denote the width, height, and number of channels of the image. A channel is the grayscale image of a colored image, which is made up of only one of the primary colors that form the colored image. Primary colors are basic colors that can be mixed together to produce other colors. For RGB images, the primary colors are red, green, and blue. Each pixel of an RGB image is made up of three channels, with each channel representing a primary color. The goal is to minimize the objective function. A zero-function value indicates that the polygons completely restore the source image X.
The image reconstruction problem has several potential applications. The most direct application is image compression. One can store an image as the parameters of the polygons instead of the pixel values. Less memory is required, since the number of parameters is much smaller than the number of pixel values. The second application is generating artistic images. Any complex figure can be represented by simple geometric shapes. The level of abstraction can be adjusted by controlling the number of polygons used. Given a candidate solution containing m polygons, the reconstructed image Y can be expressed as: where f is the reconstruction function. This is generally implemented using off-the-shelf graphics interfaces. One thing worth noting is that the reconstruction function f has no explicit expression. It is also very difficult to build a computational graph for this function. The sum of the element-wise absolute difference between the source image X and the reconstructed image Y is used as the objective function, as illustrated in Figure 2. Suppose that the size of source image X is W × H × C; the objective can be formulated as follows: where W, H, and C denote the width, height, and number of channels of the image. A channel is the grayscale image of a colored image, which is made up of only one of the primary colors that form the colored image. Primary colors are basic colors that can be mixed together to produce other colors. For RGB images, the primary colors are red, green, and blue. Each pixel of an RGB image is made up of three channels, with each channel representing a primary color. The goal is to minimize the objective function. A zero-function value indicates that the polygons completely restore the source image X.
The image reconstruction problem has several potential applications. The most direct application is image compression. One can store an image as the parameters of the polygons instead of the pixel values. Less memory is required, since the number of parameters is much smaller than the number of pixel values. The second application is generating artistic images. Any complex figure can be represented by simple geometric shapes. The level of abstraction can be adjusted by controlling the number of polygons used.

Hill Climbing with Mutation Operators
The objective function calculates the difference between the source image X and the reconstructed image Y. However, to reconstruct images from the parameters of polygons one needs to use the reconstruction function f, which has no explicit expression. Consequently, the objective function cannot be expressed as a function of the parameters of polygons. This property limits the application of gradient-based methods [45]. In comparison, hill climbing [8] is an iterative optimization technique that does not require any gradient information. It can be used for solving any black-box optimization problem in which the objective function or the system being optimized is treated as a black box. The simplicity of hill climbing makes it popular among researchers. Hill climbing has been adopted to handle the image reconstruction problem. The pseudo code is presented in S t+1 ←S t , I ← rand_int(m) 6: if r 1 < 1 then: 8: Mutate_color(S t+1 .P i ) 9: else if r 1 < 2 then: 10: Mutate_vertex(S t+1 .P i ) 11: else: // Mutate the polygon sequence. 12: Mutate_sequence(S t+1 , i) 13: end if 14: Calculate the objective function value L t+1 of the new solution S t+1 . 15: if L t+1 < L t then: 16

Hill Climbing with Mutation Operators
The objective function calculates the difference between the source image X and the reconstructed image Y. However, to reconstruct images from the parameters of polygons, one needs to use the reconstruction function f, which has no explicit expression. Consequently, the objective function cannot be expressed as a function of the parameters of polygons. This property limits the application of gradient-based methods [45]. In comparison, hill climbing [8] is an iterative optimization technique that does not require any gradient information. It can be used for solving any black-box optimization problem in which the objective function or the system being optimized is treated as a black box. The simplicity of hill climbing makes it popular among researchers. Hill climbing has been adopted to handle the image reconstruction problem. The pseudo code is presented in Algorithm 1 [44].

Algorithm 2 Mutate_color
Input: A polygon P i . Output: Mutated P i . 1: r 1 = rand(0, 1) 2: r 2 = rand(0, 1) 3: if r 1 < 1/4 then: 4: if r 2 < 0.5 then: 5: P i .a ← S t+1 .P i .a + rand(0.1) 6: else: 7: P i .a ← rand(0, 1) 8: end if 9: else if r 1 < 2/4 then: 10: if r 2 < 0.5 then: 11: P i .r ← S t+1 .P i .r + rand() 12: else: 13: end if 15: else if r 1 < 3/4 then: 16: if r 2 < 0.5 then: 17: if r 2 < 0.5 then: 23: Hill climbing starts with a random candidate solution S 0 , in which all the parameters of the polygons are randomly sampled from their feasible regions. Then, it enters a loop that tries to iteratively improve the candidate solution. Suppose that the current candidate solution is S t ; a mutation operator that makes random changes to S t is designed to produce new solutions. Specifically, a polygon from the candidate solution is randomly chosen, and one of its parameters is changed. Suppose that the i-th polygon P i is chosen to be modified. A random real number r 1 within [0.0, 3.0] is first generated. If r 1 is less than one, then the color or transparency of the polygon is changed (Algorithm 2). The parameters a, r, g, and b have an equal chance of being modified. If r 1 is in the range [1.0, 2.0], the position of a vertex is modified (Algorithm 3). The value of the x-coordinate or the y-coordinate is changed. In the above two cases, the selected parameter value is either increased by a small value or set to a random value within its feasible range. The decision is made by sampling another random value r 2 within [0.0, 1.0]. If r 1 is larger than two, another polygon P j is randomly selected, and the stacking sequence of P i and P j is exchanged (Algorithm 4). A new solution S t+1 is generated in this manner. Subsequently, the objective function value L t+1 of the new solution is calculated. If its objective function value L t+1 is less than that of S t , then S t+1 is replaced with S t . Otherwise, the new solution S t+1 enters the next iteration. The above process is repeated until the maximum number of iterations is reached. Figure 3 illustrates the mutation operator. For a randomly selected polygon P i , the mutation operator makes changes to one of three components, namely, the position of the vertices, the color/transparency, and the position in the stacking sequence.

13:
Pi.yj = rand(0, H) 14: end if 15: end if Because a mutation operator is used in the algorithm to generate new solutions, its developer called the algorithm GP instead of hill climbing. However, since only one candidate solution is maintained in the search process, and there are no interactions between the generated solutions, it would be more appropriate to name it hill climbing.
Randomly choose one point Randomly choose one color parameter 2 ≤ r 1 < 3 Randomly choose another polygon P j and exchange P i and P j Figure 3. Illustration of the mutation operator used in the hill climbing algorithm.

Related Studies
Before Johansson posed the image reconstruction problem, researchers had examined similar problems. Castro et al. [46] proposed a clone selection algorithm (CLONALG) Because a mutation operator is used in the algorithm to generate new solutions, its developer called the algorithm GP instead of hill climbing. However, since only one candidate solution is maintained in the search process, and there are no interactions between the generated solutions, it would be more appropriate to name it hill climbing.

Related Studies
Before Johansson posed the image reconstruction problem, researchers had examined similar problems. Castro et al. [46] proposed a clone selection algorithm (CLONALG) that minimized the affinity maturation of the immune response. The principle was that only cells that recognized the antigens were selected to proliferate. The selected cells underwent a maturation process that improved their affinity to the antigens. The authors applied the algorithm to perform machine learning and pattern-recognition tasks. One of the tasks they tested was to restore grey number images from random pixel values. As the search progressed, the candidate solutions in the memory cell gradually approached the original images. For a demonstration of the optimization process, interested readers can refer to [46]. This problem can be regarded as a special case of the image reconstruction problem, where the polygons are replaced by pixels. Since the goal is to approximate small-scale grey images, the number of parameters to be optimized is much smaller than that of the image reconstruction problem.
Tian and Ha [47] revisited the use of EAs for computational creativity and developed an evolution strategy (ES) for generating art images. The ES was used to fit both concrete images and abstract concepts. The task of fitting concrete images is the same as image reconstruction, in which the objective function is defined as L 2 , i.e., the loss between the generated image and the source image. Fitting abstract concepts is less straightforward than fitting concrete images. An abstract concept was given in the form of a text prompt. The match degree of the generated image and the given text prompt was measured using a transferable visual model called CLIP. CLIP jointly trained a text encoder and an image encoder to predict the correct pairing of texts and images. The text prompt and the generated image were fed into the text encoder and the image encoder, respectively. The objective function was defined as the cosine distance between the output of the text encoder and the output of the image encoder. The ES was used to adjust the parameters of the polygons with the aim of minimizing the cosine distance. As the evolution progressed, the generated image gradually matched the abstract concept indicated by the given text prompt. Interested readers can refer to [47] for some interesting examples of abstract concept fitting.

Progressive Learning Hill Climbing Algorithm with an Energy-Map-Based Initialization Operator
In this section, we first illustrate the idea behind the progressive learning strategy and explain how the strategy can be used to circumvent the challenges posed by the image reconstruction problem. We developed a new algorithm named ProHC by combining the progressive learning strategy with the hill climbing algorithm. Furthermore, to improve the search efficiency, an energy-map-based initialization operator was designed to better adjust the parameters of the polygons.

Progressive Learning Strategy
The image reconstruction problem is very challenging since it involves a large number of decision variables. In addition, the variables are highly correlated. Slight changes in the parameters of one polygon affect the appearance of other polygons stacked in the same position. Although hill climbing is able to generate quite appealing results for this problem, there is still large room for improvement. The search efficiency can be further enhanced if the challenges can be overcome. Motivated by the successful application of the progressive learning concept in mathematical optimization and deep learning areas [41][42][43], we developed a progressive learning strategy for solving the image reconstruction problem.
The basic idea of the progressive learning strategy is very simple. Instead of simultaneously optimizing the parameters of all the polygons, we adjusted the color, transparency, and vertices of the polygons sequentially. A newly generated polygon was stacked on top of the previous polygons after the parameters of the polygons were sufficiently optimized. This process amounted to transforming the original complex problem into a sequence of simpler problems. Starting from a single polygon, we gradually increased the number of decision variables by adding new polygons on top of the existing ones. The last problem in the problem sequence was the same as the original problem. This guaranteed that solutions to any problem in the problem sequence were partial solutions to the original problem. The progressive learning strategy is illustrated in Figure 4 from both genotype and phenotype viewpoints. problem in the problem sequence was the same as the original problem. This guaranteed that solutions to any problem in the problem sequence were partial solutions to the original problem. The progressive learning strategy is illustrated in Figure 4 from both genotype and phenotype viewpoints. (2) , y1 (2) ) (x2 (2) , y2 (2) ) (x3 (2) , y3 (2) ) …… (b) The pseudo-code of the hill climbing algorithm with the progressive learning strategy is presented in Algorithm 5. In contrast to traditional hill climbing, ProHC stacks the polygons layer by layer until the number of polygons reaches the predefined limit. The polygons in different layers have different probabilities of mutation. Polygons in newer layers have higher probabilities. Specifically, the selection probabilities are assigned as follows. In the initial phase, there is only one polygon. All the attention is focused on the optimization of the first polygon. When the objective function value has not been improved for a number of successive trials, one should check whether the inclusion of the previously added polygon contributed to a reduction in the objective function value. If the condition is false, then the previously added polygon should be reinitialized. Otherwise, a new polygon is added to the canvas. In subsequent iterations, half of the mutation probability is assigned to the new polygon. The remaining probability is assigned to the preceding polygons according to a geometric sequence. Every time a new polygon is added, ProHC reassigns the probabilities in the same manner. Finally, after the number of polygons reaches the predefined number limit, all the polygons are assigned equal probabilities of mutation. In this phase, the problem becomes the same as the original problem. Through the progressive learning process, a high-quality initial solution is obtained. The remaining computational resources (objective function evaluations) are used to fine-tune the parameters of the polygons.  The pseudo-code of the hill climbing algorithm with the progressive learning strategy is presented in Algorithm 5. In contrast to traditional hill climbing, ProHC stacks the polygons layer by layer until the number of polygons reaches the predefined limit. The polygons in different layers have different probabilities of mutation. Polygons in newer layers have higher probabilities. Specifically, the selection probabilities are assigned as follows. In the initial phase, there is only one polygon. All the attention is focused on the optimization of the first polygon. When the objective function value has not been improved for a number of successive trials, one should check whether the inclusion of the previously added polygon contributed to a reduction in the objective function value. If the condition is false, then the previously added polygon should be reinitialized. Otherwise, a new polygon is added to the canvas. In subsequent iterations, half of the mutation probability is assigned to the new polygon. The remaining probability is assigned to the preceding polygons according to a geometric sequence. Every time a new polygon is added, ProHC reassigns the probabilities in the same manner. Finally, after the number of polygons reaches the predefined number limit, all the polygons are assigned equal probabilities of mutation. In this phase, the problem becomes the same as the original problem. Through the progressive learning process, a high-quality initial solution is obtained. The remaining computational resources (objective function evaluations) are used to fine-tune the parameters of the polygons. S t+1 ←S t 8: i ← rand_int(k, [pr k, , . . . , pr 1 ]) 9: r 1 = rand(0, 3) 10: if r 1 < 1 then: 11: Mutate_color(S t+1 .P i ) 12: else if r 1 < 2 then: 13: Mutate_vertex  Figure 5 illustrates the probability assignment process using an example with five stages. When solving the i-th problem, the parameters of the previous i − 1 polygons are inherited from the solution of the (i − 1)th problem. These parameters have been optimized for a relatively large amount of time. In comparison, the parameters of the i-th polygon are new to the algorithm. Therefore, a larger amount of effort is spent on the new parameters by assigning a higher mutation probability to the i-th polygon. For the parameters of the previous i − 1 polygons, it can be inferred that the smaller the polygon number, the longer the time spent on its parameters. In order to make sure that all the parameters have similar chances of mutation, the probability assigned to each polygon is determined based on a geometric sequence. The probabilities in each stage are normalized so that they sum to one. In this way, the total mutation probability of each polygon across multiple stages is roughly the same. The progressive learning strategy has two advantages. First, it can circumvent th challenges posed by the image reconstruction problem. In the problem sequence, solu tions to the former problem can serve as partial solutions to later problems. Therefore relatively good starting points can be obtained for the later problems by first optimizin the former problems. The former problems involve only a small number of polygons an have less control parameters. They are much easier to solve. Solving the problems on after another provides a high-quality initial solution for the original complex problem an makes the optimization process much easier. The second advantage is that the progressiv learning strategy can reduce the evaluation time of the objective function. To calculate th objective function value of a candidate solution, one needs to draw the polygons on blank canvas and then compute the element-wise differences between the generated im age and the source image. Since the problems in the problem sequence encode less poly gons than the original problem, it is less time-consuming to compute the reconstructio function f, and therefore the total running time of ProHC can be significantly reduced.
One thing worth noting is that ProHC does not incorporate an exclusion operato The purpose of the exclusion operation is to remove redundant polygons. ProHC-EM ha a similar mechanism that plays the role of the exclusion operator. Suppose that the fina solution to the (i − 1)th problem is Si−1, and its objective function value is fi−1. One move forward to the i-th problem by stacking the i-th polygon on top of the existing ones. Afte a period of optimization, one finds a solution Si to the i-th problem. If the objective func tion value of Si is worse than fi−1, the i-th polygon is reinitialized, and the optimizatio process is repeated. In this way, one can ensure that each newly added polygon is no redundant and contributes to the improvement of the objective function value.

Initialization Assisted by an Energy Map
In the progressive learning strategy, each time a new polygon is appended to th canvas, the parameters of the polygon are randomly sampled from their feasible regions However, there exist more efficient ways to initialize the parameters. Since the goal o image reconstruction is to generate an image that matches the source image, it is reason The progressive learning strategy has two advantages. First, it can circumvent the challenges posed by the image reconstruction problem. In the problem sequence, solutions to the former problem can serve as partial solutions to later problems. Therefore, relatively good starting points can be obtained for the later problems by first optimizing the former problems. The former problems involve only a small number of polygons and have less control parameters. They are much easier to solve. Solving the problems one after another provides a high-quality initial solution for the original complex problem and makes the optimization process much easier. The second advantage is that the progressive learning strategy can reduce the evaluation time of the objective function. To calculate the objective function value of a candidate solution, one needs to draw the polygons on a blank canvas and then compute the element-wise differences between the generated image and the source image. Since the problems in the problem sequence encode less polygons than the original problem, it is less time-consuming to compute the reconstruction function f, and therefore the total running time of ProHC can be significantly reduced.
One thing worth noting is that ProHC does not incorporate an exclusion operator. The purpose of the exclusion operation is to remove redundant polygons. ProHC-EM has a similar mechanism that plays the role of the exclusion operator. Suppose that the final solution to the (i − 1)th problem is S i−1 , and its objective function value is f i−1 . One moves forward to the i-th problem by stacking the i-th polygon on top of the existing ones. After a period of optimization, one finds a solution S i to the i-th problem. If the objective function value of S i is worse than f i−1 , the i-th polygon is reinitialized, and the optimization process is repeated. In this way, one can ensure that each newly added polygon is not redundant and contributes to the improvement of the objective function value.

Initialization Assisted by an Energy Map
In the progressive learning strategy, each time a new polygon is appended to the canvas, the parameters of the polygon are randomly sampled from their feasible regions. However, there exist more efficient ways to initialize the parameters. Since the goal of image reconstruction is to generate an image that matches the source image, it is reasonable to place new polygons on regions where the differences between the generated image and the source image are significant. When determining the vertices of the newly generated polygons, higher probabilities can be directed to positions with large biases.
To this end, we needed to construct a matrix that recorded the sum of element-wise absolute differences across the channel dimension between the generated image and the source image. The matrix is referred to as an energy map. High energies correspond to high differences. The energy map provides useful information about which part of the generated image is dissimilar to the source image. One can use this information to guide the initialization of the newly added polygons. Motivated by this finding, we developed an energy-map-based initialization operator. Every time a new polygon is added to the canvas, the operator is adopted to initialize the vertices of the polygon.

Algorithm 6 Energy-map-based initialization
Input: A polygon P i , energy map E. Output: Initialized P i . 1: j ← rand_int(n) 2: P i .r ← rand(), P i .g ← rand(), P i .b← rand(), P i .a ← rand() 3: Compute the probability matrix Pr 4: Compute the supplemental matrix MX 5: for k = 1, . . . , n: 6: r 1 = rand(0, 1) 7: Find the first element (I, j) in matrix MX that has a larger value than r 1 8: The pseudo-code of the energy-map-assisted initialization operator is presented in Algorithm 6. Instead of randomly initializing the positions of the vertices, probabilities are assigned to pixels with respect to their corresponding energies. Specifically, the probability of selecting position (i, j) as a vertex of the polygon is calculated as follows: where E i,j denotes the energy associated with pixel (i, j), which is defined as follows: To sample the vertices of the new polygon, a supplemental matrix MX is first computed. The elements of MX are the cumulated probabilities of matrix Pr. Specifically, the element mx i,j in position (i, j) is computed as follows: When sampling a new vertex, a random real value r whin [0, 1] is generated. Then, one retrieves the first element mx i,j in matrix MX whose value is larger than r. The coordinate (i, j) is selected as the position of the new vertex. All vertices of the new polygon are determined in the same manner. In this way, there is a higher probability that the new polygon is placed on the most critical regions. With the energy-map-based operator, ProHC-EM (ProHC with an energy map) can avoid wasting effort on low-energy regions and further increase the search efficiency. The proposed approach relates to bionics in two ways:

•
HC is a useful tool in bionics for optimizing the performance of artificial systems inspired by biological systems. The proposed progressive learning strategy can be embedded into HC to further improve its effectiveness. By mimicking the problemsolving procedures of human beings, ProHC can generate effective solutions to complex design problems, allowing researchers to create artificial systems that are more similar to biological systems in their structure, function, and behavior. • ProHC-EM incorporates a mutation operator that mimics the process of evolution that occurs in biological systems. An incremental change is made to the candidate solution by mutating position-related or color-related parameters of a selected polygon. Moreover, an energy-map-based initialization operator was designed to help the algorithm target the most critical regions of the canvas and place new polygons on these regions. The effect of the energy map is similar to the heat-sensing pits of snakes in biology. The heat-sensing pits allow snakes to target prey by detecting the infrared radiation of warm-blooded animals.

Experimental Study
In this section, we present experiments carried out to study the performance of the hill climbing algorithm developed based on the progressive learning strategy. We first constructed a set of benchmark test cases that contained different types of images. Then, the performance of ProHC-EM was evaluated in terms of the final objective function value and running time. The effect of the progressive learning strategy, as well as the energy-map-based initialization operator, are investigated in this section.

Test Cases
A set of test cases with different characteristics were collected to examine the algorithm's ability to reconstruct images. The benchmark set contained eight 200 × 200 images. They were divided into four categories. The first category contained two famous portrait paintings, i.e., the 'Mona Lisa' and the 'Girl with a Pearl Earring'. The second category consisted of the logos of two popular web browsers (Chrome and Firefox). The images in the third category were photos of animals (i.e., a cat and a dog). The last category consisted of two famous artworks from impressionist artists, namely, 'The Starry Night' and 'The Scream'. Figure 6 depicts the test cases. It can be observed that the second category was the easiest among the four categories, since the images involved a small number of solid colors, and the subjects were composed of simple geometric shapes. The last category was more difficult to reconstruct due to the features of impressionist paintings. In impressionist paintings, the essence of subjects is captured by short, thick strokes of paint. The artists produce greys and dark tones by mixing complementary colors.

Performance Metric
To evaluate the quality of the reconstructed image, a performance metric vealed the similarity between the reconstructed image and the source image neede specified. In the experiment, the complete percentage (CP) [44] was used as the mance metric. Specifically, this metric is defined as follows: where Lbest is the loss of the current best solution found, and MaxL is the difference b the source image and the blank canvas. This is computed as follows: where X is the source image, and B is the blank canvas whose pixel values equal ze larger the complete percentage, the better the performance of the reconstructio rithm. Note that it was feasible to directly use the sum of element-wise absolute ences (i.e., Lbest) to evaluate the algorithm performance. However, the loss functio different ranges for different test cases. Therefore, it was difficult to interpret the lo tion value. In comparison, the complete percentage indicated how well the recons image matched the source image. A 100% CP value would indicate that the recons image was exactly the same as the source image, while 0% would indicate that the structed image was a blank canvas.

Overall Performance
To show the overall effect of the proposed progressive learning strategy and ergy-map-based initialization operator, we compared ProHC-EM with the mu based hill climbing algorithm (denoted as HC) [44]. The termination criterion of t algorithms was defined by the maximum number of fitness evaluations (MaxFEs). ically, MaxFEs was set to 5 × 10 5 for all eight test cases. For each test case, 50 polygon used to approximate the source image, and each polygon had three vertices, name 50 and n = 3. In the progressive learning strategy, the threshold value used to det whether to add a new polygon was empirically set to 1000. To obtain statistically results, both algorithms ran 25 times for each test case. The hyperparameters of P

Performance Metric
To evaluate the quality of the reconstructed image, a performance metric that revealed the similarity between the reconstructed image and the source image needed to be specified. In the experiment, the complete percentage (CP) [44] was used as the performance metric. Specifically, this metric is defined as follows: where L best is the loss of the current best solution found, and MaxL is the difference between the source image and the blank canvas. This is computed as follows: where X is the source image, and B is the blank canvas whose pixel values equal zero. The larger the complete percentage, the better the performance of the reconstruction algorithm. Note that it was feasible to directly use the sum of element-wise absolute differences (i.e., L best ) to evaluate the algorithm performance. However, the loss functions had different ranges for different test cases. Therefore, it was difficult to interpret the loss function value. In comparison, the complete percentage indicated how well the reconstructed image matched the source image. A 100% CP value would indicate that the reconstructed image was exactly the same as the source image, while 0% would indicate that the reconstructed image was a blank canvas.

Overall Performance
To show the overall effect of the proposed progressive learning strategy and the energymap-based initialization operator, we compared ProHC-EM with the mutation-based hill climbing algorithm (denoted as HC) [44]. The termination criterion of the two algorithms was defined by the maximum number of fitness evaluations (MaxFEs). Specifically, MaxFEs was set to 5 × 10 5 for all eight test cases. For each test case, 50 polygons were used to approximate the source image, and each polygon had three vertices, namely, m = 50 and n = 3. In the progressive learning strategy, the threshold value used to determine whether to add a new polygon was empirically set to 1000. To obtain statistically reliable results, both algorithms ran 25 times for each test case. The hyperparameters of ProHC-EM were set based on the following considerations:

•
The hyperparameter n determined the number of end points of each polygon. Considering that triangles are the simplest type of polygon and can be used as building blocks for more complex polygons, n was set to 3.

•
The hyperparameter m controlled the number of polygons used to reconstruct an image. Generally speaking, the larger the m value, the higher the precision of the reconstructed image. However, setting m to a large number would significantly increase the optimization time. The goal of the experimental study was to examine the effectiveness and robustness of the new strategies. Therefore, we made a compromise by setting m to 50.

•
The hyperparameter MaxFEs was used to determine the termination criterion of the optimization algorithms. When m was set to 50 and n was set to 3, there were 500 decision variables in total. The hyperparameter MaxFEs was set to 1000 times the number of decision variables.
The experimental results of the algorithms for all eight test cases are listed in Table 1, with better results marked in bold. Table 1 provides the average CP values over 25 independent runs as well as the standard deviations. Moreover, we conducted a Wilcoxon sign rank test to examine the differences between the numerical results of the two algorithms. The p-values of the statistical tests are also included in Table 1. From the table, it can be observed that ProHC-EM consistently outperformed HC in all the test cases. For each test case, ProHC-EM surpassed HC by approximately one percent. It is worth noting that increasing the CP value in the late optimization stage was much harder than in the early optimization stage. This was because the high-energy region was gradually divided into many small parts as the optimization progressed. To reduce the energy in these small parts, we needed to fine-tune the parameters of the deployed polygons. This required a large number of fitness evaluations. From this point of view, the improvements brought by the progressive learning strategy and energy-map-based initialization were significant. Another observation was that ProHC-EM had lower standard deviations, indicating that ProHC-EM had more stable performance than HC. According to the p-values listed in Table 1, the CP values obtained by ProHC-EM were statistically different from those of HC at a significance level of α = 0.05. It is worth pointing out that the average CP values obtained by ProHC-EM for the second category were the highest among the four categories, while the numerical results for the fourth category were the lowest. This matched our intuition that the second category of images would be the simplest to reconstruct and the fourth category contained the most difficult test cases. Figure 7 shows the final results of ProHC-EM and HC for the eight test cases. As can be seen in Figure 7, although the outlines of the reconstructed images were similar to the source images, they were still sketchy. This was because only a small number of polygons (triangles) were used for reconstruction. It was very difficult to capture the details of the source images. To examine the capability of ProHC-EM to fit the images at a fine-grain level, we increased the number of polygons to 1000, and MaxFEs was set to 2 × 10 7 . When m was set to 1000, there were 10,000 decision variables in total. The hyperparameter MaxFEs was set to 1000 times the number of decision variables, enlarged by an additional factor of two. This was because the problem complexity increased dramatically as the number of variables increased. The CP indexes achieved by ProHC-EM are reported in Table 2, and the final outputs are displayed in Figure 8. From the figure, it can be seen that more details could be encoded by the parameters of the polygons as the number limit increased. MaxFEs was set to 1000 times the number of decision variables, enlarged by an add factor of two. This was because the problem complexity increased dramatically number of variables increased. The CP indexes achieved by ProHC-EM are repo Table 2, and the final outputs are displayed in Figure 8. From the figure, it can be se more details could be encoded by the parameters of the polygons as the numbe increased.

Running Time Comparison
In this subsection, we compare the running time of ProHC-EM with that of HC. The algorithms were implemented in C++ and compiled using the Microsoft compiler. To calculate the objective function value of a candidate solution, a decoding process that transformed the parameters of the polygons into a reconstructed image was required. The decoding process was implemented using OpenCV functions. Both algorithms were executed on a workstation running Windows 10. The workstation was powered by two Intel Xeon Gold 5218R CPUs with 64 GB memory.
The experimental results of the algorithms are depicted in Figure 9. The results were averaged over 25 independent runs. From the figure, it can be seen that the average running time of ProHC-EM was much lower than that of HC. ProHC-EM reduced the running time by approximately 60%. This was attributed to the working principle of the progressive learning strategy. Starting from the blank canvas, ProHC-EM stacked the polygons sequentially as the optimization progresses. Therefore, in the early optimization stage, only a small number of polygons needed to be drawn when evaluating the objective function value of candidate solutions. This feature reduced the number of OpenCV function uses. Note that implementing the OpenCV function was the most time-consuming step in the objective function evaluation, so a significant amount of time was saved.

Effect of Mutation Range
In this subsection, we study the effect of the mutation range. The goal of the mutation operator was to find moving directions that could reduce the objective function value by making small changes to the candidate solution. A large mutation range would bring large changes to the candidate solution and cause oscillations, which would be detrimental to the convergence of the algorithm. In Algorithm 3, a random value within 10% of the parameter domain was added to the positional parameters of the polygons. To investigate the influence of the mutation range, nine different settings of the mutation range (i.e., 0.1 to 0.9) were examined on the test cases. The experimental results of ProHC-EM with different mutation ranges are listed in Table 3. It can be observed from the table that the final CP indexes dropped as the mutation range increased, indicating that a large mutation range was not suitable for fine-tuning the parameters of the polygons.

Effect of Mutation Range
In this subsection, we study the effect of the mutation range. The goal of the mutation operator was to find moving directions that could reduce the objective function value by making small changes to the candidate solution. A large mutation range would bring large changes to the candidate solution and cause oscillations, which would be detrimental to the convergence of the algorithm. In Algorithm 3, a random value within 10% of the parameter domain was added to the positional parameters of the polygons. To investigate the influence of the mutation range, nine different settings of the mutation range (i.e., 0.1 to 0.9) were examined on the test cases. The experimental results of ProHC-EM with different mutation ranges are listed in Table 3. It can be observed from the table that the final CP indexes dropped as the mutation range increased, indicating that a large mutation range was not suitable for fine-tuning the parameters of the polygons.

Effect of the Energy-Map-Based Initialization
In this subsection, we proceed to study the effect of the energy-map-based initialization operator. To investigate the pure effect of the operator, we removed the operator from ProHC-EM, and the resulting algorithm is denoted as ProHC. ProHC was compared with ProHC-EM to show the influence of the initialization operator on the final solution quality. The experimental results are tabulated in Table 4, with better results highlighted in bold. A Wilcoxon signed rank test was conducted to check whether there were significant differences between the results of ProHC and ProHC-EM. Table 4 also lists the p-values output by the statistical tests.
From the table, it can be observed that ProHC was consistently beaten by ProHC-EM in all eight test cases. The p-values for all the test cases were smaller than 0.05, indicating that there were significant differences between the results of the two algorithms. With the assistance of the energy-map-based initialization operator, the newly added polygons could be placed on high-energy regions with a high probability. ProHC-EM was capable of reducing the energy by focusing its attention on the most critical region. In this way, the search efficiency was significantly increased. According to the standard deviations listed in the table, the initialization operator could also make the performance of the algorithm more stable.
One thing worth noting is that the energy map had the same size as the source image. The elements in the energy map could be interpreted as pixel values. Therefore, we could display the energy maps as if they were grey images. To demonstrate the changes in the energy maps, Figure 10 depicts the energy maps at different stages of the optimization process when solving the first test case. The brighter the pixel, the higher the energy. It can be observed that at the initial optimization stage, the reconstructed image approximated the source image very poorly. As the optimization progressed, the energy map became darker and darker, indicating that the differences between the reconstructed image and the source image were shrinking. At the end, only small parts of the image had not been recovered by the polygons.

Conclusions
In this paper, we proposed a progressive learning hill climbing algorithm with an energy-map-based initialization operator to solve the image reconstruction problem. The image reconstruction problem is an interesting yet challenging problem that involves the

Conclusions
In this paper, we proposed a progressive learning hill climbing algorithm with an energy-map-based initialization operator to solve the image reconstruction problem. The image reconstruction problem is an interesting yet challenging problem that involves the reconstruction of images using simple geometric shapes. The problem comprises a large number of decision variables that specify the position, color, and transparency of the geometric shapes. The decision variables are highly correlated with each other. To tackle the challenges imposed by this problem, we took inspiration from methods in mathematical optimization and deep learning and developed a progressive learning strategy. The strategy transforms the original complex problem into a sequence of simpler problems. The former problems in the sequence contain fewer decision variables and are easier to solve. Solving the sequence of problems one after another provides a good initial solution to the original complex problem. Furthermore, to increase the search efficiency, an energy-map-based initialization operator was devised to provide good initial positions for the newly added geometric shapes in the progressive learning process.
Comprehensive experiments were conducted on a set of benchmark test cases to examine the effect of the progressive learning strategy and the energy-map-based initialization operator. The experimental results revealed that these processes could enhance the final solution quality and reduce the running time.
For future research, it would be interesting to design more efficient algorithms to reconstruct high-resolution images. A promising approach is to divide high-resolution images into small parts and approximate these small parts separately. The benefit of this approach is that the reconstruction processes for all the small parts can be run in parallel. Another research direction worth investigating is how to measure the differences between the reconstructed image and the source image. In our experiment, the sum of the elementwise absolute differences between the constructed image and the original image was used as the objective function. If our goal is to generate visually similar images, some differences that are imperceptible to humans can be omitted. In this scenario, delta-E [48] can probably be adopted to represent the difference between two colors. We believe that in the future, the emergence of an efficient image reconstruction algorithm will probably give rise to a new sort of compression algorithm, i.e., search-based compression algorithms.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.