An External Parameter Independent Novel Cost Function for Evolving Bijective Substitution-Boxes

: The property of nonlinearity has high importance for the design of strong substitution boxes. Therefore, the development of new techniques to produce substitution boxes with high values of nonlinearity is essential. Many research papers have shown that optimization algorithms are an e ﬃ cient technique to obtain good solutions. However, there is no reference in the public literature showing that a heuristic method obtains optimal nonlinearity unless seeded with optimal initial solutions. Moreover, the majority of papers with the best nonlinearity reported for pseudo-random seeding of the algorithm(s) often achieve their results with the help of some cost function(s) over the Walsh–Hadamard spectrum of the substitution. In the sense, we proposed to present, in this paper, a novel external parameter independent cost function for evolving bijective s-boxes of high nonlinearity, which is highly correlated to this property. Several heuristic approaches including GaT (genetic and tree), LSA (local search algorithm), and the Hill Climbing algorithm have been investigated to assess the performance of evolved s-boxes. A performance comparison has been done to show the advantages of our new cost function, with respect to cost functions for s-boxes like Clark’s and Picek’s cost functions.


Introduction
Today, the information shared online by users is a highly valuable resource. The integrity of such data rests in the use of cryptographic algorithms, which provide a set of algorithmic tools to maintain the consistency and security of the data. Therefore, the design of cryptographic algorithms with high level of security is a wide area of investigation from researchers tasked to information security and reliability. There exist two large groups of cryptographic algorithms, public-key algorithms and private-key algorithms, but the first group is not in the scope of this paper. We center our attention in private-key algorithms, particularly in block ciphers. Elementary, a block cipher partition the flow of information in k bit pieces, where each piece of data is matched with a different one. The Data Encryption Standard (DES) [1] and the Advanced Encryption Standard (AES) [2] are, perhaps, the most representative examples of symmetric block ciphers, although there are other good examples in the literature [3][4][5], even some that do not rely on nonlinear components-this can reduce the security against linear and differential attacks-because they want to ensure the security of statistical properties in the encryption scheme [6].
Substitution-boxes (s-boxes) are the prominent nonlinear components of modern-day block ciphers, adding confusion to encryption process. Hence, any confusion layer highly depends on the s-boxes to maintain the integrity of shared secrets. Although the primary use of s-boxes is for symmetric block ciphers [2][3][4][5], one cannot restrict the use of s-boxes only to these encryption systems. s-boxes are also applied to image encryption schematics with excellent results [7][8][9]. Therefore, the effectiveness of any symmetric encryption system using s-boxes will rest in the selection of such components.
To warrant the integrity of one s-box, it must satisfy a set of properties which assess the resistance of the s-box towards several cryptanalytic techniques [10,11]. The search for cryptographic sound s-boxes is a well-studied subject in the literature. The construction of s-boxes mainly follows approaches namely: algebraic constructions [2,12,13], pseudo-random generation, chaos-based generation [7,14,15], and heuristic methods [16][17][18]. Algebraic constructions achieve unsurpassed outcomes with regards of many security properties of s-boxes [19]. Random generation makes it possible to yield a considerable number of s-boxes in short periods. However, any random s-box lack of good values of its properties, therefore it is not suitable for practical applications. Like in random generation, chaos-based s-boxes are quickly integrated into the encryption process, particularly in images encryption, but they mainly ensure a more randomness and statistical sound. The last direction focuses on the use of heuristic methodologies to find s-boxes with strong cryptographic features. Next, we present a representative set of such research findings.

Related Work
The public literature contains an extensive survey of evolutionary computation papers related to design of s-boxes having good cryptographic properties. We present a brief resume of some important results in this area of research. In 2005, Clark et al. proposed a cost function for evolving of s-boxes coupled with simulated annealing to obtain s-boxes with nonlinearity values up to 102 [16]. Later in 2010, Tesař perform an extensive parameter tuning of Clark's cost function which, in combination with a special genetic algorithm named genetic and tree (GaT), makes it possible to obtain 8-bit s-boxes with good nonlinearity [17]. In 2013, Kazymyrov et al. presented a modified gradient descent method to obtain s-boxes with nonlinearity 104 and high algebraic resistance [20]. Ivanov et al. experiment with modified immune algorithm using three different cost functions, included the tuned version of Clark's cost function proposed by Tesař and a function over the differential spectrum of s-boxes to achieve nonlinearity 104 and differential uniformity 6 [21]. They applied genetic algorithms working in reverse mode to evolve high nonlinear bijective s-boxes of sizes from 8 × 8 up to 16 × 16 [22]. In 2014, Picek et al. investigated side-channel analysis resilience of s-boxes considering the confusion coefficient property [23]. Later, in 2016, Picek et al. presented a cost function for evolution of high nonlinear s-boxes [24]. More research was presented by Picek et al. in 2017, making use of cellular automata and genetic programming for design s-boxes with good cryptographic features [25,26]. Isa et al. presented a hybridization of heuristic methods for generating 8-bit permutations with good nonlinearity and differential uniformity results [27]. Menyachikhin (2017) presented the spectral-linear and spectral differential techniques for constructing s-boxes consisting of near-optimal security features of s-boxes [28], wherein, they used the information from linear approximation table and differential spectrum of s-boxes to improve the properties of resulting s-boxes. Lerman et al. are assisted by genetic algorithms in the generation of side-channel attacks robust s-boxes of small dimensions [29]. Martínez-Díaz works with local search algorithms to evolve s-boxes with improved side-channel resistivity having good values of nonlinearity [30]. This line of research was continued by Freyre-Echevarría in 2020, presenting a hybrid heuristic algorithm capable of produce s-boxes with high theoretical resistance to SCA attacks, as well as acceptable nonlinearity and low differential cryptanalysis performance [31]. Bolufé and Tamayo use hybrid heuristic methods and machine learning for the evolution of s-boxes, taking in count the properties of nonlinearity and transparency order [32]. Ahmad et al. propose the use of particle swarm optimization and chaotic Renyi's map to obtain 8 × 8 s-boxes with high nonlinearity scores [7].
In most of the aforementioned works, the property of nonlinearity is taken under consideration in the optimization process [7][8][9]14,16,17,[20][21][22]24,27,28,30,31,[33][34][35][36][37][38][39][40]. However, heuristic techniques cannot achieve nonlinearity values close to algebraic constructions unless they are seeded with s-boxes having optimal properties [22]. The best value of nonlinearity reported from papers that use random initial s-boxes to seed their algorithms is bounded above by 104 for 8 × 8 s-boxes [16,17,21,24]; and, in most of cases (for not being absolute), these values cannot be achieved without the assistance of cost functions. Cost functions help to describe the behavior of coefficients in the Walsh-Hadamard spectrum of s-boxes. Hence, a well descriptive cost function will undoubtedly help to improve the final nonlinearity of s-boxes. With respect to nonlinearity, the most accurate cost functions are Clark's [16] and Picek's [24] cost functions, each one with its own characteristics described in Section 3.
The rest of the paper is managed as follows. Section 2 contains several definitions with regard to s-boxes and their properties. In Section 3, we briefly describe Clark's and Picek's cost functions and present the contribution of this paper. Section 4 is dedicated to explaining the heuristic methods we have employed and the parametric configuration of the same for each different s-box dimension. Finally, in Section 5, we present and discuss our experimental results and establish some comparison with respect to the results obtained from Clark's and Picek's cost functions.

Preliminaries
An s-box S : F n 2 → F m 2 is often described as multi-input and multi-output Boolean functions consisting of m Boolean functions in n variables known as the coordinates of s-box S. However, all coordinates functions and their all linear combinations are responsible for deciding the cryptographic strength of the s-box [41].
for any λ ∈ F m 2 . The component corresponding to λ = 0 is called zero (or trivial) component (Definition 2.1 from [39]). Definition 2. One s-box S : F n 2 → F m 2 is balanced if every value x ∈ F m 2 appears exactly equal to 2 n−m times. When n = m, the s-box S is known as bijective, i.e., that each input value is uniquely mapped to one output value.
Balanced n × n s-boxes are permutations in F n 2 [13,19]. In particular, the fact that an s-box is invertible (i.e., a permutation) can be characterized by its coordinates [33]. For the sake of simplicity, we restrict ourselves to the study of bijective substitution boxes only. Definition 3. Let S : F n 2 → F m 2 be an s-box. The Walsh-Hadamard transform of S is computed as [19]: The linearity (resp. nonlinearity) is the highest (resp. lowest) of any nontrivial component function of one s-box [13,19]. The two can be described in terms of the W S transform as: This gives the following relation between linearity and nonlinearity of s-boxes.

Definition 4. (Parseval's relation)
Given any Boolean function f : F n 2 → F 2 , its Walsh-Hadamard transform satisfies that A direct result from Parseval's relation is that linearity of Boolean functions (resp. s-boxes) is lower bounded by 2 n 2 . Notice that equality can only be achieved when n is even, and such functions are known as the bent functions [42]. Bent functions have the maximum achievable nonlinearity, but they are not balanced. For the case of bijective substitution boxes the maximum achievable nonlinearity cannot be greater than the Sidelnikov-Chabaud-Vaudenay (SCV) bound [43]: which immediately resolves in The case of equality denotes the functions which are called the Almost Bent (AB) function. Notice that AB-functions only exists when n is an odd number [19]. When n is even, the maximum value of nonlinearity is achieved through power permutations over the finite field F 2 n . and equals to [8]: Definition 5. The autocorrelation function of one s-box S : F n 2 → F m 2 is defined as [19]: There exists two significant cryptographic parameters: (1) global avalanche characteristics (GAC) [44], which is related to the autocorrelation function, used to measure the level of diffusion ensured by a function; and (2) The absolute value of the autocorrelation function is called an absolute indicator of anticipated s-box. In practice, the absolute indicator is determined as: Definition 6. Let S : F n 2 → F m 2 be an s-box. For any x ∈ F n 2 , y ∈ F m 2 one can define: The multi-set ∆ S = δ(x, y), x ∈ F n * 2 , y ∈ F m 2 represents the I/O differential distribution spectrum of S and its maximum value is called differential uniformity of S, denoted δ s .
For any balanced s-box S the differential uniformity of s-box satisfies as δ s ≥ 2 [13]. The functions having δ s = 2 are called almost perfect nonlinear (APN) functions. As for nonlinearity property, the APN condition only exists for odd number of variables, and when n = 6 [45]. In the case of n even, the best-known differential uniformity value is 4 [12,19,33].

Motivation and Contribution
The outcomes of any optimization algorithm are heavily dependent on the fitness functions used to guide the optimization process. Here, we discuss about some cost functions with proven effectiveness towards nonlinearity of s-boxes obtained by heuristic methods.
In 2005, Clark et al. introduce a cost function that consider all values of the Walsh-Hadamard spectrum for Boolean functions, which can be scaled for the multi-output case (s-boxes) [16]: where X and R are real-valued parameters like R = 3 and X ∈ {−4, −3, −2, −1, 0, 1, 2, 3, 4} [16].  [24]. The histogram is represented by a vector having in the i-th component the count of coefficients associated to absolute value 4i from the Walsh-Hadamard spectrum of the s-box. Let l indicates the las component of the vector with nonzero value. The absolute value of the Walsh-Hadamard spectrum associated with l is the linearity of the s-box S, therefore the nonlinearity of S is related to this value. Then, the function PCF is given by the formula [24]: where H(S) k is the k-th component of the zero-indexed vector H(S), which represent the aforementioned histogram of frequencies, and H(S) k = 0, ∀k < 0. Multiplying by the term 2 −i , i.e., dividing 2 i as shown in (14), the authors pretend to give some ranking to the influence of one coefficient over the final value of the function. For example, the maximum absolute value of the Walsh-Hadamard spectrum is divided by 2 0 = 1, implying that it is the most influential on the final result of PCF. Altogether, Picek et al. set the value of parameter N to 10, given by the fact that if one s-box does not contain ten levels of coefficients (N < 10), then all coefficients of the spectrum are considered when PCF is calculated.
One can characterize both functions in terms of the count of coefficients in the Walsh-Hadamard spectrum taken into account to compute their value and the dependence of any external parameter with no relation with such coefficients. In the case of Clark's cost function, the whole spectrum is analyzed to get the final value of the function. However, the values of parameters X and R represent a major disadvantage, since they have to be tuned in order to achieve the best results [18]. In contrast, Picek's cost function selects a sample of the coefficients in the spectrum depending on the value of parameter N; if sufficiently large, all the values in the spectrum are taken into account. Moreover, after the study of results from [18], we notice that selection of parameter N has influence in the outcome nonlinearity of evolved s-boxes, as well the number of solution evaluations to obtain desired values of this property. We refer the readers to [16,17,24] for more information about Clark's and Picek's cost functions.

Our Contribution
The main contribution of our paper is the construction of a novel cost function for evolving the performance of high nonlinearity s-boxes without depending on in any external parameter unlike the Clark's and Picek's cost functions which are external parameter dependent [16,24]. Moreover, further contributions includes in the form of statistical analysis of correlation between Clark's, Picek's, and our function with respect to nonlinearity of s-boxes and the convergence rate of various algorithms for different s-box dimensions using our new cost function.

Experimental Setup
Optimization algorithms can be divided in two groups: exact methods and heuristic methods. While exact methods guaranty the optimal solution in a finite time, for more complex problems, like that in our paper, the notion of time exponentially growth with regard to the dimensions of the problem. Hence, the heuristic methods are suitable to challenge these problems. It is well known that heuristic methods do not guaranty to find the optimal solution, but they often achieve a good solution in a reasonable time. Most of heuristic methods are problem dependent. In counterpart, meta-heuristic algorithms establish a high-level algorithmic framework to bring the more accurate solution to the problem.
In Section 1.1, we commented on successful applications of meta-heuristic techniques to improve different parameters related to the security of s-boxes. The effectiveness of these algorithms to obtain desired nonlinearity values differ on the fitness function, the selection, variation and mutation operators and the characteristics of the selected algorithm. Nonetheless, these optimization algorithms ensure substitution boxes applicable to real life encryption systems with the advantage of randomized structure, unlike algebraic constructions, different pool of high nonlinear solutions each time the algorithm is executed and low consumption of time. Moreover, these algorithms can be seeded with optimal s-boxes instead of pseudo random substitutions, and they will manage to produce optimal or almost optimal s-boxes in terms of nonlinearity.
In our experiments, we applied three optimization methods, namely, genetic and tree algorithm (GaT) [17], local search algorithm (LSA) [24], and a hill simple climbing algorithm (HC) explained later in this section. We choose these optimization methods to obtain a high non-linear value, because their exploitation capabilities over the solution search space. The experiments conducted on this paper present the results achieved with proposed cost function for evolving bijective s-boxes of sizes ranging from 5 × 5 to 8 × 8. We also include pseudo-code of each algorithm in the appendix of this paper (see Appendices B-D).

Common Parameters
We execute different amounts of experiments according to dimensions of the analyzed s-box space. The stopping condition for each algorithm is reached when a fixed number of solution evaluations are done, which differ in relation to the size of the space as shown in Table 1. In all algorithms used, we define the fitness function as the tuple (N S , C S ). Here, N S represents the nonlinearity of the solution and C S represents the value of our new cost function. We decide not to include small 4 × 4 s-boxes since the best value of nonlinearity can be achieved without the assistance of optimization algorithms. However, for larger sizes, s-boxes having maximal nonlinearity are quite difficult to found by any evolutionary algorithm when it starts from random n-bit permutations. Thus, our goal is to increase nonlinearity of evolved s-boxes as much as possible.

Local Search Algorithm
The local search algorithm from [24] receives as input a random s-box from the space. In each iteration, the algorithm generates new solutions with given mutation operators. The mutation operator randomly decides k different positions in solution and then permutes the element at selected positions. To get the best results, Picek et al. set mutation operators with k ∈ {2, 3, 4, 5, 6, 7}.
In the LSA algorithm, each mutation operator is defined with two parameters k and l, where k is a number of positions whose elements are to be permuted, and l defines how many times that mutation operator is to be applied on the current solution (see Table 6 from [24]). The best solution is selected and set as the current solution from generated solutions.

Genetic and Tree
The genetic and tree algorithm was presented by Tesař in [17]. The algorithm is a combination of a special case of the genetic algorithm and total tree search. We note two important aspects of the method: the criterion to begin the total tree search portion of it and the stopping condition. The algorithm swap between the genetic and the tree part when an s-box with nonlinearity close to desired value is found (see Table 2), then, it executes the tree portion of the algorithm until a s-box with desired non-linearity is obtained or the algorithm depletes all solution evaluations presented in Table 1 for the corresponding s-box space. For more information about the configuration of GaT algorithm, we refer the readers to [17,24].

Hill Climbing Algorithm
We propose the use of a simple hill climbing mechanism to produce s-boxes having good nonlinearity in a small amount of solution evaluations. The algorithm receives a random permutation as input. Then, while the number of solution evaluations is not depleted, the algorithm creates a new s-box by swapping a pair of outputs on the target s-box. If the new s-box is better than the current solution of the algorithm, according to the fitness condition of the problem, it replaces the solution, becoming the best solution found by the algorithm.

Results and Discussion
The current section is entirely dedicated to the analysis of results obtained in our experiments. First, we proposed a novel cost function for evolution of high nonlinearity s-boxes. Then, we present the results achieved by the optimization algorithms described in Section 4 using this new function. Note that we also show the values of differential uniformity and absolute indicator of evolved s-boxes.

Definition of a New Cost Function
The nonlinearity of one s-box is dependent of the highest absolute value of the Walsh-Hadamard spectrum. Most of evolutionary research papers that only use the value of nonlinearity as fitness function to guide the evolutionary process are not able to reach nonlinearity values greater than 100 in the case of 8-bit permutations. This may happen, since nonlinearity only contains information about the highest score of the Walsh-Hadamard spectrum, without extracting any data of the remaining values in the spectrum. Reviewing of the definition of nonlinearity itself, it is straightforward to notice that reducing the highest absolute values of the spectrum leads to an increase of final nonlinearity of s-boxes. However, if the extreme values of the spectrum are reduced, some other values in the same must be increased, in accordance with Parseval's relation. Hence, any function exploiting the Walsh-Hadamard spectrum is tasked to make the spectrum as flat as possible.
Let C be the set of all absolute coefficients lower or equals to the SCV bound [41]. Then, we have the following: if n is even Proof. We reduce the demonstration to the case when n is odd since one must review the same conditions when n is even. First, let us demonstrate the direct implication, i.e., if the nonlinearity of S is maximal, then all the absolute values of the coefficients in its Walsh-Hadamard spectrum are contained in C.
For n odd, if the nonlinearity of S is maximal, then N S = 2 n−1 − 2 n−1 2 satisfying the equality in (4). Moreover, the result in (5) implies that equality in (4) results in L S = 2 n+1 2 . Since L S is the greatest absolute value of the coefficients in the Walsh-Hadamard spectrum of S, then L S ≥ |X| for any arbitrary coefficient X in spectrum. By definition, one has that max(C) = 2 Conversely, if all coefficient in the Walsh-Hadamard spectrum of S are contained in C, then we have 2 . However, the inequality in (5) sustain that L S ≥ 2 n+1 2 , hence, we have that 2 n+1 2 ≤ L S ≤ 2 n+1 2 , which resolves in L S = 2 n+1 2 , the minimal value of linearity achievable for bijective s-boxes with odd number of variables, i.e., the nonlinearity of S is maximal. Therefore, the proof is now complete. Proposition 2. Let x be an integer and K a finite set of positive integers such that|x| ∈ K Then the following equality holds P = i∈K (|x| − i) = 0 Proof. Let M be the size of K. One can decompose the formula of P in the multiplication of M subtractions as follows where the term i t denote the t-th element of K. If |x| ∈ K, exist some i t such that |x| = i t , i.e., |x| − i t = 0. Hence, substituting the previous subtraction in the decomposition we have P = (|x| − i 1 ) · (|x| − i 2 ) · . . . · 0 · . . . · (|x| − i M ), which results in P = 0. Therefore, the proof is complete. On the basis of propositions 1 and 2 we define our new cost function. The result from Proposition 2 warranties that any coefficient in the Walsh-Hadamard spectrum whose absolute value is contained in C does not interfere with the final result of our cost function. Hence, the calculus of C S is deduced from the coefficients with absolute values greater than the SCV bound. Moreover, the minimum value of C S is achieved when the largest absolute value in the Walsh-Hadamard spectrum equals to the greatest coefficient in C, where C S is equal to zero, implying that S has maximum nonlinearity, according to Proposition 1.

Relation to Nonlinearity
We perform a statistical analysis on the relation between nonlinearity property and our new cost function as well as Clark's and Picek's cost functions. For such an analysis, we compute the trajectory of the values of one cost function with respect to nonlinearity as follows: 1.
Set up a fixed number M of cost function upgrades.

2.
Execute hill climbing algorithm upgrading the value of the cost function regardless nonlinearity.

3.
Save the value of the cost function and the corresponding nonlinearity each time the cost function is upgraded. 4. Repeat Step 3 until there is no available upgrade on the cost function (i.e., the function was updated M times).
The two vectors obtained through this method contain information about both, the cost function and the nonlinearity each time the cost function is improved. Hence, we can calculate the correlation between the cost function and nonlinearity by means of these vectors.
To obtain a more accurate result, we repeat the procedure described above 100 times for each cost function, with M = 50 for s-boxes of dimension eight. Then, we obtain the average trajectories of the cost function and nonlinearity and proceed to calculate the Pearson's correlation coefficient between the two trajectories, i.e., the correlation between the values of the cost function and the value of nonlinearity property. For better understanding of the experiment, Figure 1 present the average trajectory of each cost function w.r.t nonlinearity.
In contradiction with the results presented in [24], the data presented in Table 3 and the curve representing Picek's cost function in Figure 1 (blue) indicates that higher values of the function are better to improve nonlinearity. However, one knows from [24] that for a fixed value of nonlinearity, the minimization of Picek's cost function will lead to maximization of the nonlinearity. Thus, Picek's cost function will not achieve good results if the value of nonlinearity of the s-box is depreciated.
The curve representing the trajectory of Clark's cost function (Figure 1-red) and the corresponding correlation in Table 3 may be helpful to give some explanation of the late convergence of Clark's cost function in [17,24]. Although there is some improvement in the final nonlinearity, one can easily see in the plot that there is no stability in the trajectory. In accordance, the correlation coefficient describes very well this phenomenon. The correlation coefficient indicates an inverse relation between Clark's cost function and nonlinearity, but not enough to ensure a fast convergence of nonlinearity to higher values as the value of Clark's cost function decreases, which agrees with the results obtained by Tesař [17] and Picek et al. [24].  Finally, one can observe in Figure 1 (green plot) and Table 3 the results of the analysis for our new cost function. As shown, our proposed cost function is extremely well correlated to nonlinearity. Moreover, Table 4 shows the correlation coefficient between the values of nonlinearity and our new cost function for each s-box dimension we analyze in this paper. Data in Table 4 suggest a strong inverse relation (almost linear) between the values of N S and C S . Hence, minimizing the results of C S will certainly improve the nonlinearity of the s-box. Next, we present the results collected with the proposed cost function.

Results with Our New Cost Function
This section is dedicated to providing and discussing the performance results obtained with our new cost function seeding the optimization algorithms with pseudo-random s-boxes. Together with the values of nonlinearity, differential uniformity, and an absolute indicator of evolved s-boxes, we provide the average number of solution evaluations for each algorithm, to obtain the best value of nonlinearity reported in each s-box space.
In all the experiments conducted on this section, the tuple (N S , C S ) determine the fitness conditions of one s-box A over other s-box B, as follows:

1.
N A > N B N A = N B and C A < C B if rule 1 is not satisfied Table 5 shows the results achieved by our new cost function for s-boxes of various sizes. The best nonlinearity values among the results equals the presented by Tesař [17] and Picek et al. [24] for the respective s-box dimension. In addition, we present, in Table 6, the average number of solution evaluations each algorithm needs to obtain the best nonlinearity values reported in Table 5.  Table 6. Convergence rate of the algorithms to best nonlinearity reported in Table 3  The values of differential uniformity and absolute indicator of evolved s-boxes are not optimal for all dimensions. Notice that all optimization algorithms maintain similar behavior towards differential uniformity and absolute indicator (GaT slightly improves AC max (S) for 8 × 8 s-boxes), due the fact that these properties were not considered in the optimization process, and any improvement on the same is result of existing relation to nonlinearity.
Since no reasoning in the convergence rate of algorithms for s-boxes of sizes lower than 8 × 8 was presented in [17,24], we cannot establish a fair comparison with the results for such s-box spaces. However, for 8 × 8 s-boxes, we can make a direct comparison with the results of LSA and GaT.
For both the local search algorithm and the genetic and tree algorithm, the obtained results are better than those presented in [24] for the best configuration of Picek's cost function. The notorious difference for both algorithms using our cost function with regard to Picek's cost function is the average solution evaluations to obtain nonlinearity 104. The local search algorithm reduces more than 22,500 solution evaluations from the best result with PCF in the result presented in Table 6. In addition, the GaT algorithm shows greater improvement, reducing more than 50,000 solution evaluations to obtain the aforementioned nonlinearity. However, the best performance is achieved by hill climbing algorithm. Notice that hill climbing only needs approximately 1000 solution evaluations to obtain s-boxes having nonlinearity value 102, equal to the best nonlinearity reached by Clark's cost function when R = 3, X = 4 and the most repeated result for Picek's cost function in its initial version, both, after nine million solution evaluations [24]. To obtain nonlinearity 104, the hill climbing algorithm reduces to half the best performance of the algorithms in [24]. Moreover, the worst performance of HC is better than the best average solution evaluations for an algorithm using Picek's cost function in [24] GaT by approximately 30,000 solution evaluations. Substitution boxes having nonlinearity value equal to 104 were obtained in less than 35,000 solution evaluations through the hill climbing algorithm. Figure 2 shows the convergence rate for each algorithm referred in Table 6. We perform larger experiments to search for s-boxes having nonlinearity greater than 104, but no interesting results were found starting from random s-boxes. Hence, other seeding methods or more complex optimization algorithms should be tested with our new cost function, in order to obtain s-boxes with higher nonlinearity values than the achieved in this paper.

Conclusions
In this paper, we presented an effective novel cost function for evolving highly nonlinear s-boxes based on the existing bounds for the values in the Walsh-Hadamard spectrum. Altogether, we removed the cost function, the intervention of external parameters unrelated to the coefficients in the spectrum that may affect the performance of the older cost functions, and increase the correlation with the non-linearity property. We also showed that our new function is capable of producing the same results as other important cost functions in a lower number of solution evaluations; therefore, it is more effective for evolution of s-boxes. However, it is still an open problem as to how to achieve s-boxes with a nonlinearity higher than presented in Table 5 for an optimization algorithm seeded with random s-boxes in a reasonable amount of solution evaluations.
Future investigations will be directed to the study of the autocorrelation and differential spectrum of s-boxes for the definition of new cost functions that help to improve other properties, such an absolute indicator and differential uniformity. We also plan to include our cost function in a trade-off multi-objective optimization related to the resistance against other attacks different from classical linear and differential attacks.

Conflicts of Interest:
The authors declare no conflict of interest.