Optimization of a Stirling Engine by Variable-Step Simplified Conjugate-Gradient Method and Neural Network Training Algorithm

: The present study develops a novel optimization method for designing a Stirling engine by combining a variable-step simplified conjugate gradient method (VSCGM) and a neural network training algorithm. As compared with existing gradient-based methods, like the conjugate gradient method (CGM) and simplified conjugate gradient method (SCGM), the VSCGM method is a further modified version presented in this study which allows the convergence speed to be greatly accelerated while the form of the objective function can still be defined flexibly. Through the automatic adjustment of the variable step size, the optimal design is reached more efficiently and accurately. Therefore, the VSCGM appears to be a potential and alternative tool in a variety of engineering applications. In this study, optimization of a low-temperature-differential gamma-type Stirling engine was attempted as a test case. The optimizer was trained by the neural network algorithm based on the training data provided from three-dimensional computational fluid dynamic (CFD) computation. The optimal design of the influential parameters of the Stirling engine is yielded efficiently. Results show that the indicated work and thermal efficiency are increased with the present approach by 102.93% and 5.24%, respectively. Robustness of the VSCGM is tested by giving different sets of initial guesses.


Introduction
The traditional conjugate gradient method is mainly based on the steepest-descent method or Newton method [1]. The steepest-descent method can reach the immediate neighboring area of the optimal point; however, the searching ability is reduced as the distance between the iterative and the optimal points is small [2]. The Newton method is also referred to as a gradient-based optimization method [3]. Compared to the steepest-descent method, the convergence speed of the Newton method is even faster. Unfortunately, if the position of the original point is too far away from the optimal point, the iteration of the Newton method may fail to achieve convergence. The traditional conjugate gradient method (CGM) combines the above two methods and thus takes advantage of them. However, the feasibility of this method is still limited because the form of the objective function should be cast into a sum of squared difference [4], and hence, it is not suitable for multi-goal optimization.
Cheng and Chang [5] proposed a simplified conjugate gradient method (SCGM). The SCGM method is referred to as a local optimization scheme that is suitable for the search for optimal parameters within a bounded range. With this method, the step sizes corresponding to the designed parameters are fixed during iteration, and the sensitivity of the objective function to the perturbations As a test case for testing the present approach, optimization of the low-temperature-differential gamma-type Stirling engine was attempted. The dataset for training the neural network was prepared based on the CFD computational results using a similar module of Cheng, Le, and Huang [21]. Given input geometrical parameters, the trained neuron network then served as the direct solution provider and outputs numerical information of the indicated power output and the thermal efficiency of the engine.
On the other hand, an objective function is defined in terms of the indicated power output and the thermal efficiency of the engine and is calculated by the direct solution provider. Meanwhile, with the help of the neural network algorithm, the VSCGM method was employed to iteratively adjust multiple parameters until the objective function is minimized and the optimum group of the parameters is obtained.

CGM Method
With the traditional conjugate gradient method (CGM), the objective function (J) to minimize is typically defined as the sum of squares of the differences where v i and v i are the iterative and the compared quantities, respectively, and I is the number of input data. The gradient of the objective function for the designed parameters, {Xj | j = 1, 2,.., k} where k is the number of designed parameters, is then expressed in terms of the sensitivity coefficients The conjugate gradient coefficient (γ j ) is expressed as the ratio of gradients of objective functions of two consecutive iteration steps: where n is the index of the iteration step. Next, the search direction is calculated as a linear combination of the conjugated gradient coefficient and gradients of the objective function.
The designed parameter is then updated as follows: In the CGM method, for different designed parameters {Xj | j = 1, 2,.., k} the corresponding step sizes {βj | j = 1, 2,.., k} are different. They are obtained by solving a set of simultaneous linear algebraic equations. Meanwhile, the sensitivity coefficient ∂vi/∂Xj with each of the designed parameters is calculated from the solution of a partial differential equation (PDE). Thus, if there are k designed parameters, one may have k PDEs to solve for the values of the k sensitivity coefficients. These sensitivity coefficients are introduced into (2) to determine the gradient of objective function ∂J/∂Xj.

SCGM Method
In the SCGM method proposed by Cheng and Chang [5], the step sizes corresponding to the designed parameters are fixed during iteration. That is The task of the sensitivity analysis is to evaluate the sensitivity of the objective function J for the designed parameter, which is represented with the gradient of objective function ∂J/∂Xj and is evaluated directly using direct numerical differentiation as The perturbations in the designed parameters ΔX j can be specified readily by trial and error. In this study, these perturbations are ranged between 0.0001 and 0.001 depending on the complexity of the problem. In the optimization process, the values are fixed generally. In this manner, it is not necessary to solve any equation for the sensitivity coefficients nor the step sizes, and hence, the tedious computation process is greatly simplified. Furthermore, the objective function is not limited to the form of the sum of the squared difference; however, one major disadvantage of this method is that the SCGM method may slow down the convergence because the step size is fixed. Under these circumstances, the present study developed a novel optimization method that is named variable-step simplified conjugate gradient method (VSCGM), which is described in the successive section.

VSCGM Method
The VSCGM method is a further modified version of the SCGM method, which can reduce the time to reach convergence while the form of the objective function can still be defined flexibly. Finding an optimum design can be accomplished by following almost the same problem-solving process of the SCGM method.
In the VSCGM method, the gradient of objective function ∂J/∂Xj is calculated by using the same direct numerical differentiation method as described with (7); however, since the distance between the iterative and the optimal points is gradually decreased in the iterations, the values of the step sizes and the perturbations of the designed parameters for the sensitivity analysis are varied. For this purpose, the perturbations in the designed parameters ΔX j (n) is determined at each iteration as where ΔG j (1) and β j (1) are the initial values of the perturbations and the step sizes, respectively, in the first iteration. It is noted that as the iterative point approaches the optimal point, the step sizes β j (1) should be reduced to its minimum such that the iteration may not jump over the optimal point. Thus, the step sizes are expressed with ,min where ( ) ,min ( ) ,min , , , , m i n and The variable step size is automatically adjusted with (9) to (11) in terms of the gain function (G j (n) ) and the ratio of search directions (R j (n) ). The gain function is calculated by using a linear interpolation between two extreme values, Gj,min and Gj,Max. The two extreme values need to be appropriately specified by the user. In this study, the minimum and maximum gain function values are assigned to be 1.0 and 3.0, respectively. Through the gain function, the influence of the ratio of search directions is introduced to the adjustment of the step sizes. Note that the step sizes can be enlarged or reduced depending on the magnitudes of the gain function and the ratio of search directions. In this way, when the iterative point is still far away from the optimal point, the step sizes can be increased to facilitate the search. On contrary, the step sizes will be decreased when the iterative point gets into the immediate neighboring area of the optimal point; however, to avoid a steep rise or steep drop in the step sizes, it is necessary to prescribe the upper and the lower bounds of the ratio of search directions (Rj,Max, Rj,min) properly.
Through the automatic adjustment of the variable step size, the optimal design is reached more efficiently and accurately. Figure 1 shows the comparison in the solution process between VSCGM and SCGM with computation flow charts, and the difference between the two methods can be seen.

CFD Module Generating Dataset of Training
Here, the three-dimensional CFD computation is done by using a numerical module similar to that of Cheng, Le, and Huang [21]. The physical model with geometrical parameters of the engine is shown in Figure 2. The space of the engine includes three chambers, namely an expansion chamber, compression chamber, and regenerator. The expansion chamber at the bottom contacts with the hightemperature energy sources, whereas the compression chamber at the top is cooled by a heat sink. The regenerator between the two chambers helps recycle the exhaust heat from the hot working fluid. The two moving parts, the piston and displacer, are connected to a flywheel. Figure 2. Geometrical parameters of a low-temperature-differential gamma-type Stirling engine.
The mathematical model in the CFD analysis is described briefly as follows: Mass equation: where is the fluid density, and u, v and w are the velocity components in the x-, y-, and z-directions. Momentum equations: Energy conservation equation: where c p is the specific heat of working fluid, Prt is turbulence Prandtl number, and μt is the eddy viscosity.
The realizable k-ε model is selected for the numerical model because it simulates accurately a wide range of boundary layer flow with a pressure gradient. The realizable k-ε model includes the following two well-known transport equations: Turbulent kinetic energy equation (k-equation): where G k and G b denote the generation of turbulence kinetic energy due to the mean velocity gradients and buoyancy, respectively. Viscous dissipation of turbulent kinetic energy equation (ε-equation): Detailed information regarding the coefficients with the above mathematical model, boundary conditions, fluid properties of the CFD module is available in [21]. The framework of the CFD module is briefly described in Table 1. Major geometric and operating parameters of a baseline engine considered in the computation are provided in Table 2. The baseline engine is used as the test case to demonstrate the performance of the present approach.  16.532 The present Stirling engine is referred to as a low-temperature-differential engine that can be applied in waste heat recovery. The engine is driven between an ambient temperature of 300 K and a heating temperature of 423 K. Both the two temperatures are constant during optimization. In a particular application, the heating temperature may reach 700 K. The heating conditions with all the test cases are already given in detail in Table 3 and more information is available in [21]. In the computation, the power output and the thermal efficiency of the engine are treated as performance indices. Stirling engines are characterized by the pressure-volume diagrams for the expansion and the compression chambers. By integrating the function of gas pressure with its volume in a cycle for each of the two chambers, it is possible to write an expression of the indicated power output produced by the engine, in Watt, as where the symbol cycle  represents the cyclic integration. Thermal efficiency (ε) is calculated by where W is indicated power output and Q is the cyclic average rate of heat transfer input. The effects of rotation speed (ω), charged pressure (Pch), phase angle (θph), displacer stroke (sd), piston diameter (Dp), the equilibrium position of the piston (xp), heating temperature (TH), and porosity (ϕ) on the power output and the thermal efficiency are evaluated. The numerical data for 55 cases are listed in Table 3. The dataset is used for training the neural network.

Neural Network
In this study, the Levenberg-Marquardt method [18] is used to update weight and bias values. The method can approach second-order training speed without having to compute the Hessian matrix [22,23]. As shown in Figure 3, in general, the neural network is defined by several neurons connected to each other. These neurons are stacked in three fully connected layers, namely, input, hidden, and output layers. That is, each neuron from one layer is connected to each neuron from the other layers. It means that the output from one neuron is used as an input to other neurons. Deep learning neural network models learn to map inputs to outputs given a training dataset from Table  3. The training of the neural network is based on the concept of supervised learning [24]. The basic structure of a linear neuron includes weight (W), bias (b), and transfer function [f(x)] [25]. When the input value (p) passes through a network that contains weights and bias values, the weights and bias values within the network are adjusted based on the difference between targets (t) and outputs (a). The formula of a linear neuron could be written as The network structure is illustrated in Figure 4. The first layer is the input layer, which has eight neurons; the second one is the hidden layer in which there are twenty-five neurons; the third one is the output layer in which there are two neurons. Each of the hidden and the output layers contains a weighting matrix, a bias matrix, and a conversion function. The eight parameters in the input layer are the prescribed values of rotation speed, charged pressure, phase angle, piston diameter, the equilibrium position of the piston, displacer stroke, heating temperature, and porosity. On the other hand, the two variables in the output layer are indicated power output (Wid) and thermal efficiency (ε). The input values (p), weights (W), biases (b), and output values (a) are expressed in matrix form as The conversion functions used in the network are To increase the amount of data used for training, a triangulation-based interpolation method was applied to generate more data based on the 55-set original data. The final training data were expanded to 3168 sets of samples in total. The data point distributions are plotted in Figure 5. Figure  5a,b displays the data of indicated power output and thermal efficiency, respectively, versus charged pressure and rotation speed. After training, the mean squared error (MSE) and regression value (R) are examined. The numerical data of the mean squared error and regression value of training are provided in Table 4. The magnitude of MSE is an average squared difference between outputs and targets. In this table, and the values of MSE in the processes of training, validation, and testing are 157.68, 29.39, and 54.95, respectively. Besides, the R-value is the correlation between outputs and targets which is used to characterize relationships among variables. R = 1.0 means a very close relationship, and R = 0 a random relationship. In this study, the values of R in the processes are all close to 1. It means that the training is acceptable with such low errors. The error histogram and regression distribution are provided in Figures 6 and 7, respectively. The bar chart plotted in Figure 6 uses three different color bars to represent the errors in the three training processes individually. It is found again that the three bars are all close to the central line which is marked in orange. It is also observed in Figure 7 that the regression data points closely agree with the fitting lines. This means that the relationship between outputs and targets are indeed rather close.   It is noticed that the CFD simulation software package itself can be used as the direct solution provider; however, the present study employs the neural network algorithm, instead of the CFD simulation software package, to serve as the direct solution provider. One major reason is that the CFD simulation software package consumes too much computation time and resources. For a typical simulation using the CFD software package on a personal computer with an Intel Core i7-7700 processor, it needs approximately 60 h to finish 10 cycles for one iteration. In a typical optimization process, the computation may exceed 2000 iterations, and hence, the time consumed in the optimization process may be longer than 120,000 h. The present study adopts a dataset involving only 55-set original data. The data are expanded to 3168-set training data by the triangulation-based interpolation method. Hence, the computation time can be reduced significantly.

Results and Discussion
The trained neuron network serves as the direct solution provider which provides numerical information of the thermal efficiency and the indicated power output of the engine for the VSCGM method. Then, the objective function is determined, and the designed parameters are updated by the VSCGM method. With the updated designed parameters, the neural network provides the indicated power output and the thermal efficiency for the VSCGM method again. The iteration continues until the objective function reaches a minimum value. The VSCGM method can be applied for a multi-goal and multi-parameter optimization. The objective function of optimization is calculated in terms of the indicated power output and the thermal efficiency as On the right-hand side of the above equation, the denominator of the fraction includes two terms. The first term represents the magnitude of indicated power output, and the second term represents the magnitude of thermal efficiency. M and N are two positive weighting coefficients that are specified by the users depending on individual applications. In the present optimization, the values of M and N are assigned to be 1.0 and 8.8, respectively.
It is important to mention that among the eight parameters, some parameters have a monotonic relationship with the performance of the engine. For example, as the value of charged pressure or heating temperature increases, so does the value of the indicated power output or thermal efficiency. Those parameters are called monotonically related parameters. The parameters can then be categorized into two groups: monotonically and non-monotonically related variables. Here in this study, the optimization task is focused on the group of non-monotonically related parameters. Thus, five non-monotonically related parameters are selected to be the designed parameters, including rotation speed, phase angle, piston diameter, displacer stroke, and porosity. Note that the designed parameters are changed around the values given in Table 2 for the baseline case. Table 5 conveys the initial values and the bounded ranges of the five designed parameters. The results of the indicated power output, thermal efficiency, and minimum objective function before and after the optimization are also provided in this table. A comparison between the initial and optimal designs shows that the indicated power output can be increased from 161.616 to 327.980 W, and the thermal efficiency from 16.532% to 17.399%. In the end, the objective function reaches a minimum value of 0.002078. The indicated power output and the thermal efficiency are elevated with this approach by 102.93% and 5.24%, respectively. The optimization can improve the performance of the engine remarkably. Meanwhile, the values of the designed parameters of the optimal design are determined within the bounded ranges efficiently. These values make up an important part of what influences designers when they make their design decisions. The increase in the performance via optimization is mainly dependent on the collected dataset (Table 3), the fixed parameters (Table 2), and the lower and the upper bounds of the designed parameters ( Table 5). The dataset shown in Table 3 illustrates a wider range for the indicated power while a narrower range for the thermal efficiency. Therefore, it is expected that increase of the indicated power output will be greater than that of thermal efficiency.
It is necessary to compare relative performance in the convergence speed between the VSCGM and SCGM methods. Both VSCGM and SCGM methods start from the baseline case and use the same trained neural network. Figure 8 shows the variations in the objective function, indicated power output, and thermal efficiency with the five-parameter optimization process. There is a rapid initial change in all three quantities with VSCGM than with SCGM. The quantities then change less rapidly, and more linearly. Eventually, the objective function approaches a minimum value. It is seen that the number of iterations needed is approximately 2700 with the SCGM method, whereas with the VSCGM it is only 1700. The rapid initial change is caused by the automatic adjustment of the step size. In general, the larger step size is produced by Equation (9) in the very initial stage such that the process can be quickly facilitated to the area adjacent to the optimal point. Then, the step size is reduced to a smaller value such that the iteration can approach the optimal point smoothly. As a result, the VSCGM method can accelerate convergence significantly.
Meanwhile, note that the approach is general; therefore, it shall include but not be limited to the optimization of the low-temperature-differential gamma-type Stirling engine. The dataset can also be prepared by experiments, not just limited to numerical computation.
It is also necessary to know whether the optimization approach can lead to a unique optimal point for different initial guesses. To test the uniqueness of the optimal design, in addition to the baseline case (Initial guess 1), three additional initial guess sets (Initial guess 2 to 4) are taken into account: Figure 9 depicts the optimization processes starting from these four initial guesses in the coordinates (Dp, θph, ϕ). It is found that even though the four initial points are separate in the coordinates, the four optimization processes still approach the same optimal point. This implies that the optimization method is robust, and the obtained optimal design is independent of the initial point for this case. It is noted that the present approach is not limited to the present group of designed parameters. When necessary, more designed parameters may be readily applied. In this study, the VSCGM method is originally presented in this paper which is a novel optimization method that can be coupled with a neural network training algorithm for optimization. It has been proven that the approach is efficient and robust. It is not limited to Stirling engine optimization. It can be readily applied to optimize other energy devices.

Conclusions
The present study develops a variable-step simplified conjugate gradient method (VSCGM) and incorporates this method with a neural network training algorithm to optimize a low-temperaturedifferential gamma-type Stirling engine. The VSCGM method is a further modified version of the existing SCGM method which introduces the concept of variable step size into the optimization process. A comparison between the VSCGM and the SCGM methods shows that the number of iterations needed is approximately 2700 with the SCGM method, whereas only 1700 with the VSCGM. The VSCGM method can accelerate convergence significantly. On the other hand, the neural network training algorithm is based on the Levenberg-Marquardt method with supervised learning. The three-dimensional CFD simulation results are used as the dataset for training the neural network.
A comparison between the initial and optimal designs shows that using this approach, the indicated power output can be elevated from 161.616 to 327.980 W, and the thermal efficiency from 16.532% to 17.399%.
Meanwhile, four different initial points are adapted to test the robustness of the approach. It is found that the approach is robust, and the obtained optimal combination of the designed parameters is independent of the initial guess for this case.