An Improved Equilibrium Optimizer Algorithm and Its Application in LSTM Neural Network

: An improved equilibrium optimizer (EO) algorithm is proposed in this paper to address premature and slow convergence. Firstly, a highly stochastic chaotic mechanism is adopted to initialize the population for range expansion. Secondly, the capability to conduct global search to jump out of local optima is enhanced by assigning adaptive weights and setting adaptive convergence factors. In addition 25 classical benchmark functions are used to validate the algorithm. As revealed by the analysis of the accuracy, speed, and stability of convergence, the IEO algorithm proposed in this paper signiﬁcantly outperforms other meta-heuristic algorithms. In practice, the distribution is asymmetric because most logging data are unlabeled. Traditional classiﬁcation models have difﬁculty in accurately predicting the location of oil layer. In this paper, the oil layers related to oil exploration are predicted using long short-term memory (LSTM) networks. Due to the large amount of data used, however, it is difﬁcult to adjust the parameters. For this reason, an improved equilibrium optimizer algorithm (IEO) is applied to optimize the parameters of LSTM for improved performance, while the effective IEO-LSTM is applied for oil layer prediction. As indicated by the results, the proposed model outperforms the current popular optimization algorithms including particle swarm algorithm PSO and genetic algorithm GA in terms of accuracy, absolute error, root mean square error and mean absolute error.


Introduction
Most of the optimization problems arising from various engineering settings are highly complex and challenging. Since traditional approaches tend to use specific solutions to solve specific problems, they are highly characteristic but not universally applicable. Characterized by linear programming and convex optimization [1], traditional methods are restricted by various drawbacks. For example, the problem-solving process is extremely complicated, and it is prone to obtaining local optimal solutions, thus it is difficult to solve realistic problems using traditional methods and therefore, currently it is a common practice to use meta-heuristic optimization algorithms. The meta-heuristic optimization algorithm is becoming popular because it is easy to implement, bypass local optimization, and can be used in a wide range of problems covering different disciplines utilized in various problems covering different disciplines. Meta-heuristic algorithms solve optimization problems by mimicking biological or physical phenomena. It can be grouped into three main categories: natural evolution (EA)-based algorithms [2], physics-based algorithms [3], and swarm intelligence algorithms (SI) [4]. They all have excellent performances in combinatorial optimization [5], feature selection [6], image processing [7], data mining [8], and many other fields. In recent years, new meta-heuristic algorithms based on mimic the natural behavior have been proposed one after another, including particle swarm optimization (PSO) [9,10], gray wolf optimization (GWO) [11], seagull optimization algorithm (SOA) [12], whale optimization algorithm (WOA) [13,14], cuckoo search algorithm (CSA) [15,16], marine predator algorithm (MPA) [17,18], coyote optimization algorithm (COA) [19], carnivorous plant algorithm (CPA) [20], transient search algorithm (TSA) [21], genetic algorithms (GA) [22] and more.
Inspired by control volume mass balance models used to estimate both dynamic and equilibrium states, the equilibrium optimization algorithm (EO) is proposed by Faramarzia and Heidarinejad in 2019 [23]. The mass balance equation is used to describe the concentration of non-reactive components in the control volume depending on the exact resource library model, before reaching the equilibrium state as the optimal result. However, the EO algorithm is disadvantaged by its slow convergence and the tendency to fall into the local optimum. Thus, an improved algorithm is proposed in this paper. Firstly, allowing for the instability caused by the traditional randomly initialized population, chaotic mapping is performed to initialize the population. At this time, the distribution of search agents will be made more comprehensive. Secondly, adaptive weights and adaptive convergence factors are introduced to improve global search ability in the early stage, for avoiding the prospect of falling into the local optimum. In addition, the local search ability in the later stage is also improved. At present, there have been a lot of comparative experiments conducted to validate the improved algorithm. Herein, there are three popular group optimization algorithms chosen for comparison. Besides, there are fifteen commonly-used standard test functions selected, including five high-dimensional unimodal functions, three multimodal functions, seven fixed-dimensional multimodal functions, and 10 composite functions. For each function, 30 experiments were performed, and then the experimental results were recorded. According to the experimental results obtained, IEO has a higher convergence speed and higher search accuracy for most functions.
Like other neural network models, in LSTM applications it is necessary to manually set some parameters for the LSTM neural network model, such as the learning rate and the number of batches. These parameters tend to have a significant impact on the topology of the network model. Since prediction performance of models trained with different parameters varies significantly. To confirm the appropriate model parameters, this paper proposes the EO algorithm with adaptive weights based on chaos theory to optimize the internal parameters of LSTM, and then use the optimized neural network for oil layer prediction. Experimentally, it is proved that the improved EO algorithm can globally optimize parameters of LSTM, reduce the randomness and improve the prediction effect of LSTM for the oil layer.

Description of Equilibrium Optimizer Algorithm
In the equilibrium optimization algorithm, the solution is similar to particles, the concentration is similar to the position of particles in PSO. The mass balance equation is used to describe the concentration of non-reactive components in the control volume. The mass balance equation is reflected in the physical process of controlling the mass entering, leaving and generating in the volume. Usually described by a first-order differential equation, its expressed as follows: where V represents control volume; V dC dt denotes the rate of quality change; C indicates the concentration in the control volume; Q is referred to as the volumetric flow rate into or out of the control volume; C eq denotes the concentration inside the control volume in the absence of mass generation (which is the equilibrium state); G represents the rate of mass production inside the control volume.
Through the differential equation as described using Equation (1) is solved, it can be obtained that: where C 0 represents the initial concentration of the control volume in time t0; F indicates the index term coefficient; λ refers to the flow rate. For an optimization problem, the concentration C on the left side of the equation represents the newly generated current solution; C 0 denotes the initial concentration of the control volume at the time t 0 , while representing the solution as obtained in the previous iteration; C eq represents the best solution currently found by the algorithm; t is defined as an iterative (iter) function, which decreases as the number of times of iteration increases: Iter Max_iter (4) Iter and Max_iter are constant, representing the current and maximum number of times of iteration, respectively; a 2 is a constant, which is used to manage the optimization ability of the function.
The mathematical model is as follows: (1) Initialization: In the initial state of the optimization process, the equilibrium concentration is unknown. Therefore, the algorithm performs random initialization within the upper and lower limits of each optimization variable.
where → C min and → C max represent the lower limit and upper limit vector of the optimized variable, respectively; → r i denotes the random number vector of individual i, the dimension of which is consistent with that of the optimized space. The value of each element is a random number ranging from 0 to 1.
(2) Update of the balance pool: The equilibrium state in Equation (2) is selected from the five current optimal candidate solutions to improve the global search capability of the algorithm and avoid the prospect of falling into low-quality local optimal solutions. The equilibrium state pool formed by these candidate solutions is expressed as follows: where → C eq(1) , → C eq(2) , → C eq(3) , → C eq(4) represent the best solution found through the iteration so far; → C eq(axe) refers to the average of these four solutions. The probability is identical for these five candidate solutions selected, that is, 0.2.
(3) Index term F: To better balance the local search and global search of the algorithm, Equation (3) is improved as follows: where, a 1 represents the weight constant coefficient of the global search; sign indicates the symbolic function; → r , → λ both represent a random number vector, the dimension of which is consistent with the optimized space dimension. The value of each element is a random number ranging from 0 to 1. (4) Generation rate G: To enhance the local optimization capability of the algorithm, the generation rate is designed as follows: where: when GP = 0.5, the algorithm achieves balance between global optimization and local optimization. (5) Solution update: To address the update optimization problem, the solution of the algorithm is updated as follows according to Equation (2): → F is obtained using Equation (9), and V represents the unit. The right side of Equation (10) is comprised of three terms. The first one is an equilibrium concentration, while the second and third ones represent the changes in concentration.
Though the equalization optimization algorithm (EO) performs well in optimization, it is prone to find local optimal solutions, as a result of which the capability of global search is inadequate, as is that of convergence [24]. Therefore, an inertial weight optimization algorithm based on chaos theory is proposed in this paper to improve the capability of global search and local convergence.

Improved Strategies for EO (1) Population initialization based on chaos theory
The diversity of the initial population can have a considerable impact on the speed and accuracy of convergence for the swarm intelligence algorithm. However, the basic equilibrium optimization algorithm can't ensure the diversity of the population by randomly initializing the population. Featuring regularity, randomness, and ergodicity, chaotic mapping has been widely practiced for the optimization of intelligent algorithms [25]. In this paper, the information in the solution space is fully extracted and captured through chaotic mapping to enhance the diversity of the initial concentration. One of the mapping mechanisms widely used in the research of chaos theory is chaotic mapping, the mathematical iterative equation of which is expressed as follows: where λ t represents a uniformly distributed random number in the interval [0,1], λ 0 / ∈ {0, 0.25, 0.5, 0.75, 1}; T indicates the preset maximum number of chaotic iterations, T ≥ D; µ denotes the chaos control parameter. When µ = 4, the system is in a completely chaotic state.
Equation (8) is first used to generate a set of chaotic variables. Then, the chaotic sequence is adopted to map the concentration of all dimensional liquids into the chaotic interval [ f min , f max ] according to Equation (9). Furthermore, the fitness function is used to calculate the fitness value corresponding to each solubility: where X j i represents the coordinate of the j-th dimension of the i-th search agent, and λ j refers to the coordinate of the j-th dimension randomly sorted inside λ.
(2) Adaptive inertia weights Based on the nonlinear decreasing strategy The inertia weight plays an essential role in optimizing the objective function. More specifically, when the inertia weight is larger, the algorithm performs better in global search, and it can search a larger area. Conversely, when the inertia weight is smaller, the algorithm has a better capability of local search, and it can search the area around the optimal solution. However, the value of inertia weight is constant at 1 in most cases of EO initialization. This is not a better value in the early and late stages of the algorithm. Therefore, appropriate weighting is required to improve the capability of optimization for the algorithm. In the process of EO optimization, the convergence speed of the algorithm will be reduced if the linear inertia weight adjustment strategy chosen is inappropriate. Therefore, an approach to adaptively changing the weight based on the number of iterations is proposed in this paper in line with the nonlinear decrement strategy. It is expressed as follows: where m and n represent optional parameters; Max_t indicates the maximum number of iterations; t denotes the current number of iterations. By combining directions Equations (7) and (8), the concentration update formula is expressed as follows: The fluctuation range of the convergence factor is shown in Figure 1. Plenty of experiments have been conducted to demonstrate that the effect is better when both m and n are 2, respectively. The large inertia weight and rapid decline in the early stage are conducive to improving the capability of global search. While the small inertia weight and slow decline in the later stage are favorable to enhancing the capability of local search and improving convergence speed. In the later stage of the algorithm, the updated concentration increasingly approaches the optimal concentration to reach an equilibrium state. Thus, the small adaptive weight can be used to change the concentration at this time while improving the capability of local optimization.

IEO Algorithm Process
The main process of the IEO algorithm is detailed as follows: (1) The parameters are initialized, including a 1 , a 2 , and GP, Then the maximum number of iterations and the fitness function of the candidate concentration are set; (2) The fitness value of each object is calculated according to Equation (5), including: , and → C eq(ave) ; (3) Calculation is performed for → C eq,pool according to Equation (6), and the concentration of each search object is updated using Equation (15); (4) The output result is saved when the maximum number of iterations is reached.
The pseudo-code of the IEO algorithm is shown in Algorithm 1.
Algorithm 1 Pseudo-code of the IEO algorithm.

Iter Max_iter
For I = 1:number of particles(n) Randomly choose one candidate from the equilibrium pool(vector) Generate random vectors of λ, r End while

Numerical Optimization Experiments
In order to validate the IEO algorithm, it is compared with 6 well-known metaheuristic algorithms, including equilibrium optimizer (EO), whale optimization algorithm (WOA), marine predator algorithm (MPA), grey wolf optimization algorithm (GWO), particle swarm optimization (PSO), and genetic algorithm (GA). Furthermore, a total of 25 standard test functions are used to test its performance. In the following part, the benchmark function, experimental settings, and simulation results will be detailed.

Experimental Settings and Algorithm Parameters
The experimental environment is Microsoft Corporation, Windows 10 64 bit, MATLAB R2012a, Intel Core CPU (I5-3210M 2.50GHz) equipped with 8 G RAM. Since the results of each algorithm are randomized during operation, each comparison algorithm will be operated independently in all test functions 30 times to ensure the objectivity of comparison. The population size is 30, and the maximum number of iterations is set to 1000. Then, the average and standard deviations are calculated for the data collected.

Benchmark Test Functions
Currently, the parameter settings of these 25 standard test functions have been widely adopted to verify meta-heuristics. In general, it is difficult for an algorithm to fit all test functions. Therefore, the 25 test functions selected are different. In this way, the experimental results can better reflect the capability of optimization for the algorithm.
The five high-dimensional unimodal test functions (F 1 (x)~F 5 (x)) have as few as one global optimal value but no local optimal value. Therefore, this function can be applied to test the algorithm for its capability of local search and convergence speed. In contrast to the high-dimensional unimodal function, the three high-dimensional multimodal test functions (F 6 (x)~F 8 (x)) have multiple local optimal values, thus making it more difficult for the algorithm to solve high-dimensional multimodal test functions than to solve unimodal functions. Thus, this function can be used to verify the algorithm for its capability of global search. The 7 fixed-dimensional multimodal functions (F 9 (x)~F 15 (x)) have multiple local extrema but the level of their dimensionality is lower as compared to high-dimensional multimodal functions. Therefore, the number of local extrema is relatively small. Similar to the high-dimensional multi-modal test function, the fixed-dimensional multi-modal test function can also be used to assess the algorithm for its performance in global search. To test its comprehensive capability, we added 10 composition functions of cec2017 [26]. In Table 1, Dim indicates the dimension of the standard test function, f min represents the theoretical optimal value of the standard test function, and the range means the range of search space, N means the number of basic functions.

Experimental Results and Analysis
In this section, comparative numerical optimization experiments are conducted on EO, WOA, MPA, GWO, PSO, GA, and IEO, respectively. The convergence curve is presented in Figure A1, and the stability curve of each group optimization algorithm is presented in Figure A3. The test results of the unimodal function, high-dimensional multimodal function, fixed-dimensional multimodal function and composition functions are listed in Table A1, where Ave and Std represent the average solution in 30 independent experiments and the standard deviation of the results in 30 runs, respectively.
As shown in Figure A1, the convergence speed of IEO on F 1 (x)~F 5 (x) is significantly higher when compared with other algorithms and it achieves higher accuracy of convergence. Besides, the convergence of each function is relatively slow, and the IEO convergence value is shown to be the best, suggesting that the IEO is less prone to the local optimum. In general, when the high-dimensional unimodal function is tested, the IEO algorithm demonstrates its superior and consistent performance, implying its effectiveness in solving the high dimensionality of space. On F 6 (x)~F 8 (x), the convergence speed of IEO is the highest, and the accuracy of convergence is improved, suggesting that IEO has a better ability to jump out of the local optimum than other algorithms. Compared with other algorithms, it can be seen from the average (Ave) and standard deviation (Std) that IEO performs best and is most stable. Overall, the performance of the IEO algorithm in solving the high-dimensional multi-peak test function shows higher stability than other algorithms, which evidences its strong capability of global search. On F 9 (x)~F 15 (x), in the first five functions, IEO is not only reflected in the fastest convergence speed, but also the higher convergence accuracy. While on F 15 (x), despite the slow convergence of IEO, it jumped out of the local optimum in the later stage and found the optimum value which is closer to the theory. In general, the capability of IEO to jump out of the local optimal value is better than that of other algorithms for the fixed-dimensional multi-modal function. Thus, the algorithm is verified as effective and stable. On F 16 (x)~F 25 (x), Overall, the convergence speed of IEO is faster than that of EO, and compared with other algorithms, the convergence speed of IEO has certain advantages. Among them, F 16 (x), F 18 (x), F 21 (x) and F 23 (x) converge the fastest and obtain the optimal value. As shown in Table A1, on F 1 (x)~F 5 (x), the IEO algorithm outperforms other algorithms. In these five test functions, IEO produces better results than other algorithms regardless of the average value or the standard deviation, and the accuracy of convergence is significantly improved relative to other algorithms. Moreover, in the tests in four of the functions, IEO achieved the ideal optimal value of 0 every time. The variance of the IEO algorithm is considerably smaller as compared to other algorithms, which indicates the stability of the IEO algorithm. On the high-dimensional multimodal test functions F 6 (x)~F 8 (x), IEO outperforms other algorithms to a significant extent. Compared with other algorithms, IEO performs more consistently with the smallest average and standard deviation. Notably, it achieves the theoretical optimal value of 0 on F 6 (x) and F 8 (x). On fixed-dimensional multimodal functions (F 9 (x)~F 15 (x)), the data comparison of all algorithms to optimize the fixed-dimensional multimodal function is performed. According to the comparison of the mean and standard deviation in the table, IEO performs better in mean and standard deviation on the test functions of F 9 (x), F 12 (x), and F 13 (x), while MPA performs best in average and standard deviation on F 10 (x), followed by IEO. On F 11 (x), the average and standard deviation of IEO are not as satisfactory as the four functions of MPA, GWO, PSO and GA. On F 14 (x), EO has achieved the best average and standard deviation, followed by IEO. On F 15 (x), IEO and MPA have achieved the best average and standard deviation together. As a whole, IEO shows stronger capabilities of global search in fixed-dimensional multi-modal functions.
all achieved the minimum value. In the remaining functions, IEO performed better than EO, indicating that the IEO improvement strategy was successful. It shows that IEO has a significant improvement over EO in solving composition functions, and has a great advantage over other algorithms, and ranks second in most of the functions. Figure A3 shows the test result block diagram of the selected function. In the experiment, box plots are used to indicate the distribution of the results for each function after 30 independent tests to demonstrate the stability of IEO and other algorithms in a more intuitive way. The numerical distribution of the selected function is selected from the test functions. IEO exhibits outstanding stability in all functions, which allows it to outperform other algorithms. More specifically, its bottom edge, top edge, and median value of the same function are superior or inferior to other algorithms.
In general, the IEO optimization algorithm outperforms other algorithms in optimization speed, accuracy and stability.

LSTM Model Optimization by Improved IEO Algorithm
Time series data are everywhere, and prediction is an eternal topic for human beings. Gooijer and Hyndman highlight researched published in the International Institute of Forecasters (IIF) and key publications in other journals [27]. From the early traditional autoregressive integrated moving average (ARIMA) linear model to various nonlinear models (e.g., SETAR model). Due to increased computational power and electronic storage capacity, nonlinear forecasting models have grown in a spectacular way and received a great deal of attention during the last decades. The neural networks (NN) model is a prominent example of such nonlinear and nonparametric models that do not make assumptions about the parametric form of the functional relationship between the variables. Several studies have been done relating to the application of NN, such as Caraka and Chen used vector autoregressive (VAR) to improve NN for space-time pollution date forecasting modeling [28], then they used VAR-NN-PSO model to predicting PM 2.5 in air [29]. Suhartono and Prastyo applied Generalized Space-Time Autoregressive (GSTAVR) to FFNN for forecasting oil production [30]. In recent years, types of recurrent neural networks (RNN) have been frequently employed for forecasting tasks and have shown promising reslts. This is mainly due to RNNs having the ability to persist information about previous time steps and being able to use that information when processing the current time step. Due to vanishing gradients, RNN is unable to persist long-term dependencies. A long short term memory (LSTM) network is a type of RNN that was introduced to persist long term dependencies [31]. By comparing LSTM with a random forest, a standard deep net, and a simple logistic regression in the large-scale financial market prediction task. It is proved that LSTM performed better than others and was a method naturally suitable for this field [32]. By comparing LSTM with the Prophet model proposed by Facebook, the result showed LSTM performs better on minimum air temperature [33]. ChihHung and Yu-Feng proposed a new forecasting framework with the LSTM model to forecasting bitcoin daily prices. The results revealed its possible applicability in various cryptocurrencies prediction [34]. Helmini and Jihan adopted a special variant LSTM with peephole connections for the sales forecasting tasks, proved that the initial LSTM and improved LSTM both outperform two machine learning models (extreme gradient boosting (XGB) and random forest regressor (RFR)) [35]. To enhance the performance of the LSTM model, Gers and Schmidhuber proposed a novel adaptive "forget gate" that enables an LSTM cell to learn to reset itself at appropriate times [36], Karim and Majumdar added the attention model to detect regions of the input sequence that contribute to the class label through the context vector [37]. Saeed and Li proposed a hybrid model called BLSTM, constructing quality intervals based on the high-quality principle, presented excellent prediction performance [38]. Venskus and Treigys described two methods-the LSTM prediction region learning and the wild bootstrapping for estimation of prediction regionsand the expected results have been achieved when it is used for abnormal marine traffic detection [39]. In order to strengthen the early diagnosis of cardiovascular disease, Zheng and Chen presented an arrhythmia classification method that combines 2-D CNN and LSTM and uses ECG images as the input data for the model [40]. Currently, the most mainstream method for time series forecasting would be LSTM and its variants.

Basic Model of LSTM
The LSTM model introduces a gating unit into its network topology for gaining control on the degree of impact of current information on the previous information, thus endowing the model with a longer "memory function". It is made suitable for solving long-term nonlinear sequence forecasting problems. The LSTM network structure consists of input gates, forget gates, output gates, and unit states. The basic structure of the network is illustrated in Figure 2. LSTM is a purpose-built recurrent neural network. Not only does the design of the "gate" structure avoid the gradient disappearance and gradient explosion caused by traditional recurrent neural networks, it is also effective in learning long-term dependency. Therefore, the LSTM model with memory function shows a clear advantage in dealing with the problems arising from the prediction and classification of time series.
Let the input sequence be (x 1 , x 2 , · · · , x T ), and the hidden layer state be (h 1 , h 2 , · · · , h T ). Then, at time t: where, x t represents the input vector of the LSTM cell; h indicates the cell output vector; f t , i t , and o t refer to the forget gate, input gate, output gate, respectively; C t denotes the status of the cell unit; t stands for the time; σ and tanh represent the activation functions; W and b denote the weight and deviation matrix, respectively. U stands for the input weight matrix.
The key to LSTM lies in the cell state C t . It maintains the memory of the unit state at time t, and adjusts it through the forget gate f t and input gate i t . The purpose of the forget gate is to make the cell remember or forget its previous state C t−1 . The input gate is purposed to allow incoming signals to update the unit state or prevent it. The output gate aims to control the output of the unit state C t and then transmit it to the next cell. The internal structure of the LSTM unit is comprised of multiple perceptrons. In general, backpropagation algorithms are the most commonly used training method.
The LSTM neural network framework applied in the design of this paper involves one input layer, three hidden layers (64 neurons in the first layer, 16 neurons in the second layer, and 50 neurons in the third fully connected layer), and one output layer (two neuron output). Using the Adam optimizer, the learning rate is 0.01, the loop is executed 30 times, and the batch size is 256.

LSTM Based on Improved EO
As mentioned above, the parameters of the LSTM-based model are set manually, which however affects the outcome of the prediction made using the model. Thus, an improved EO-LSTM model is proposed in this paper. The main purpose of doing so is to optimize the relevant parameters of the LSTM through the excellent parameter optimization capabilities of the above-mentioned IEO algorithm for the improved outcome of prediction for the LSTM. In order to ensure the objectivity of the experiment, the LSTM model structure is fixed (the number of nodes in each layer does not modified), The modeling process of the IEO-LSTM model is detailed as follows.
(1) The relevant parameters are initialized. The parameters of the IEO algorithm are initialized, including: population size, fitness function, are free parameter assignment. The parameters of the LSTM algorithm are initialized by setting the time window, the initial learning rate, the number of initial iterations, and the size of the packet. In this paper, the error is treated as a fitness function, which is expressed as follows: where D represents the training set, m denotes the number of samples in the training set, f (x i ) indicates the predicted sample label, y i means the original sample label, and The LSTM parameters are set as required to form the corresponding particles. The particle structure is (al pha, num_epochs, batch_size). Among them, al pha represents the LSTM learning rate, num_epochs indicates the number of iterations, batch_size suggests the size of the bag. The particles mentioned above are the objects of IEO optimization. (3) The concentration is updated according to Equation (15). According to the newlyobtained concentration, calculate the fitness value and then update the individual optimal concentration and the global optimal concentration of the particles. (4) If the number of iterations reaches the maximum number of times of iteration, that is, 30, the LSTM model trained on the optimal particles will output the prediction value; otherwise, return to step (3) for continued iteration.

Design of Oil Layer Prediction System
In respect of oil logging, reservoir prediction is regarded as crucial for improving the accuracy of oil layer prediction. Herein, we apply IEO-LSTM to oil logging and verify the effectiveness of this algorithm by using oil data provided by three oil fields. Figure 3 shows the block diagram of the oil layer prediction system based on IEO-LSTM. The oil layer prediction mainly includes the following processes: (1) Data acquisition and preprocessing In practice, the logging data is classified into two categories after being collected. On the one hand, labeled data as obtained through core sampling and laboratory analysis is generally used as a training set. On the other hand, the unlabeled logging data is treated as the test set. The data pre-processing mainly includes denoising process, normalization process, etc.
Besides, due to the different dimensions of each attribute and value range, these data need to be normalized in the first place, so that the sample data ranges between [0,1]. The normalized logging data are then trained and tested. The formula for sample normalization is expressed as follows: where, x ∈ [x min , x max ]; x min represents the minimum value of the attributes of the data sample, and x max refers to the maximum value of the attribute of the data sample.
(2) Selection of sample set and attribute reduction In the selection of the sample set, the selection of the sample set for training is supposed not only to be complete and comprehensive but also to be closely related to the evaluation of the oil. Moreover, the degree to which the prediction of oil can be made varies due to the different attributes of oil logging. In general, there are dozens of logging condition attributes in logging data but not all of them play a decisive role, which makes it necessary to perform attribute reduction. In this paper, the discretization algorithm based on the inflection point is applied and then the reduction method based on the attribute dependence is applied to reduce the logging attributes [41].

(3) IEO-LSTM modeling
First of all, the LSTM model is established, the training set information is inputted after attribute reduction, and The IEO algorithm is used to find the optimal combination of parameters for the LSTM model to obtain a high-precision IEO-LSTM prediction model.

(4) Prediction
The trained IEO-LSTM model is used to perform prediction and output the results. Among them, before the test set is inputted into the IEO-LSTM model, the reduced test set must be obtained according to the sample attribute reduction result for normalization according to the normalization principle of sample training.
The pseudo-code of the oil layer prediction system based on IEO-LSTM is shown in Algorithm 3.

Practical Application
To verify the effectiveness and stability of the proposed IEO-LSTM model, we selected logging data from three actual oil wells for the experiment. And the prediction results are compared and evaluated with those of the LSTM, PSO-LSTM [42], GA-LSTM [43] and EO-LSTM models.
The logging data for the three selected actual wells were recorded as o 1 , o 2 and o 3 , Table 2 shows the reduction results obtained after the attribute reduction process for the three wells. It can be seen from the table that there are various redundant condition attributes in the original data. After attribute reduction is performed through a rough set, those significant attributes can be selected, which is conducive to simplifying the complexity of the LSTM model. The data set division of three wells is shown in Table 3. It can be seen from Table 3 that the oil layer and dry layer distribution range in the training set and test set.    Table 4 lists the value range of each attribute after o 1 attribute reduction. Among them, GR, DT, SP, LLD, LLS, DEN, and K represent natural gamma, acoustic time difference, natural potential, deep lateral resistivity, shallow lateral resistivity, compensation density, and potassium, respectively. Besides, Table 5 shows the value range of each attribute after o 2 attribute reduction. Among them, AC, RT, and RXO denote jet lag, intact formation resistivity, and flush zone resistivity, respectively. Table 6 lists the value range of each attribute after o 3 attribute reduction, where NG and RI represent neutron gamma and intrusion zone resistivity, respectively.   The logging curve is obtained after the normalization of the reduced attributes, as shown in Figure 4. The horizontal axis and the vertical axis indicate the depth and the standardized value, respectively.
In order to conduct comparative experiments better, the LSTM model, PSO-LSTM model, GA-LSTM model, and EO-LSTM model are established for comparison against the IEO-LSTM model. Then, these optimized classification models are applied to oil predict using the test set.
In addition to the prediction accuracy of the test set, the following performance indicators are defined to assess the performance of the prediction model: where, f (x i ) and y i represent the predicted output value and the expected output value, respectively. RMSE is applied to measure the deviation between the observed value and the true value. MAE is referred to as the average of absolute errors. If the RMSE and MAE are reduced, the performance of the algorithm model is improved. Therefore, RMSE is treated as a criterion applied to evaluate the accuracy of each algorithm model. MAE is often treated as the actual forecast and forecast error because it can better reflect the real state of the forecast value error.
For a comprehensive experimental comparison, the number of times of training is set to 30, based on which the average value of the performance indicators is calculated for each model. Table 7 shows the prediction ability of each model on the test set of three wells (o 1 , o 2 , o 3 ). On the whole, the prediction accuracy of IEO-LSTM is most satisfactory.
Compared with other models, it performs better in each index, and the accuracy of it is about 5% higher than the original LSTM model on average.

Conclusions
In this paper, an improved equalization optimization algorithm is proposed, which applies chaos theory to initialize the population and relies on adaptive weights to enhance the capability of global and local search. Firstly, a total of 30 independent experiments are conducted on 25 benchmarks using six popular swarm intelligence algorithms. According to the experimental results, IEO achieves a significant improvement of convergence speed and convergence accuracy as compared to traditional EO. Compared with PSO, GA, MPA, WOA, and GWO, IEO performs better in convergence speed, accuracy, and stability.
The poor robustness of LSTM is attributed to the randomized generation of such parameters as the learning rate, the number of iterations, and the packet size. To improve the performance of LSTM, IEO and LSTM are combined, with the aforementioned three parameters globally optimized to improve the performance of LSTM in prediction.
Finally, it is applied to well-logging oil prediction for the validation of IEO-LSTM. The data set includes the logging data collected from three different wells. The prediction accuracy of the proposed model is shown to be significantly higher as compared to LSTM. Compared with PSO-LSTM, GA-LSTM, and EO-LSTM, the performance of IEO-LSTM is better. When the number of times of iteration is 200, IEO-LSTM is superior to other classification models in terms of classification accuracy, root mean square error, and mean absolute error. In comparison with the actual oil layer distribution, the prediction of IEO-LSTM shows an average classification error of about 5% relative to the original oil layer distribution.
Moreover, this method can also be applied to solve the generic optimization problems encountered in other fields, which suggests a promising prospect of practical application for it. However, LSTM is a special form of time recursive neural network, which appears to solve a fatal defect of RNN, and it still has its own defects: It only solves the gradient problem of RNN to some extent, it is still difficult for data with too large magnitude; It is time-consuming and computationally intensive during optimization. These two problems will be addressed and practically applied in my subsequent research.