Artificial Neural Networks Based Optimization Techniques: A Review

In the last few years, intensive research has been done to enhance artificial intelligence (AI) using optimization techniques. In this paper, we present an extensive review of artificial neural networks (ANNs) based optimization algorithm techniques with some of the famous optimization techniques, e.g., genetic algorithm (GA), particle swarm optimization (PSO), artificial bee colony (ABC), and backtracking search algorithm (BSA) and some modern developed techniques, e.g., the lightning search algorithm (LSA) and whale optimization algorithm (WOA), and many more. The entire set of such techniques is classified as algorithms based on a population where the initial population is randomly created. Input parameters are initialized within the specified range, and they can provide optimal solutions. This paper emphasizes enhancing the neural network via optimization algorithms by manipulating its tuned parameters or training parameters to obtain the best structure network pattern to dissolve the problems in the best way. This paper includes some results for improving the ANN performance by PSO, GA, ABC, and BSA optimization techniques, respectively, to search for optimal parameters, e.g., the number of neurons in the hidden layers and learning rate. The obtained neural net is used for solving energy management problems in the virtual


Introduction
Artificial intelligence (AI) helps computers or inanimate objects based on computers to think or act as humans do. AI research focuses on how the human brain thinks, learns, decides, and works to solve problems. AI is a vast field that aims to create intelligent machines [1]. Machine learning (ML) is a branch of AI that recognizes and learns different data set patterns [2]. As a definition, ML is an AI application that allows systems to learn automatically and improve by the experience and is devoid of being programmed implicitly [3,4]. The common algorithms used in the ML are neural networks, support vector rameters, such as the number of neurons in the hidden layers and learning rate. The obtained neural net is used for solving energy management problems in the virtual power plant system.
Supporting AI, ML, DL with optimization techniques has gained importance in the last few years. There is a lot of ongoing research using optimization to enhance or boost performance by finding the optimal parameters values to help architecture design. In [24], a fuzzy logic controller design improvement for PV Inverters utilizes differential search optimization to find the optimal membership function patterns, which improve the fuzzy controller to a higher level of accuracy. In [25] the ML, this approach optimizes the Support Vector Machine model parameters and simultaneously locates the best features subset. Allowing the optimization technique to do the job is the smartest way to improve almost any AI or ML performance [26]. It is essential to process pre-setting to guarantee optimal results for almost any application. In the DL and particularly deep neural networks (DNNs) and ANN, the more hidden layers, number of neurons, and complex activation functions, the better the outcomes but will cost more time and more complexity of the network [27,28]. So, to use the optimum numbers of parameters by trial and error is a time-consuming and impossible way to follow. From another point of view, the ANN with human estimation parameters setup could bring outcomes, but how to confirm this is the best outcome of the ANN? For these reasons, the optimization algorithm can solve these issues, and this review delivers a detailed analysis of various examples of ANNbased optimization techniques. For instance, In [29][30][31], optimization techniques optimize ANN parameters to solve different electricity and communications fields by finding optimal parameters for the optimum ANN structure.
The rest of this paper is organized as follows: Section 2 presents the materials and methods used. Section 3 addresses the challenges and motivations for ANN-based optimization, while Section 4 presents a review of optimization algorithms, Section 5 addresses neural network structure types. Section 6 is the complete overview of neural networks enhanced by optimization algorithms, Section 7 is an application on artificial neural network-based optimization algorithms, Section 8 covers artificial neural network trainingbased optimized parameters and finally, Section 9 presents the conclusions and future work.

Materials and Methods
A literature survey for material contents was done to present, identify, analysis, classify and review the distinguished ANN-based optimization techniques for various applications and controller enhancement. In this comprehensive review, the survey has gone through voluminous publisher databases. For example, IEEE Xplore library, Web of Science, Elsevier Scopus, and MDPI open access for putting into practice the search queries to ensure all selected articles meet the essential quality measures, novelty, originality, high impact, and high h-index. Following the guidelines [32][33][34], to present an in-depth review and understanding of utilized various keywords to find significant journals within the scope of the research, including multiple types of neural networks, such as ANN, RNN, CNN, GNN, and many more and different kinds of optimization techniques such as PSO, EA, CSA, and many more. In this section, the enhancement of the architecture of the neural network is further evaluated. Different neural network structures are presented, selecting and validating the superiority of using various optimization techniques to search for optimal ANN parameters and comparing the performance [35] to designate the best parameters results in the comparative performance output of the ANN controllers.
Most of the included articles were about the focus of this research review, that is how to boost the performance of neural networks using optimization algorithms by modifying the neural network structure. The screening stage comprises three stages. Firstly, matching articles were excluded, bringing in about 433 best articles which were examined in the next step, where the significant papers were reviewed by looking at their title, keywords and abstract. This step resulted in 306 documents for additional investigation. The third stage is the eligibility step, in which the full texts of papers were studied, in which 219 were counted as eligible for review of references. In this review, only the meaningful and suitable literature has been considered by evaluating the article's relevant content and the critical topic of attention of the review. Accordingly, the related papers were designated based on the number of citations and research interest. This review methodology process comprises several stages, and the Prisma guidelines according to [36,37] were followed. Figure 1 shows how the methodology for utilizing optimization to find optimal parameters of neural networks. A schematic diagram of the review section process, evaluation, and quality control of the database using the Prisma guidelines is shown in Figure 2.

Challenges and Motivations for ANN-Based Optimization
Neural networks can study large volumes of data with complex features and extract different patterns in a relevantly short amount of time. Therefore, they are useful for many industrial applications, such as predicting certain behaviors, detecting anomalies or errors in data, detecting certain images, sounds or pictures. They could use self-learning to produce the best output, with unlimited provided inputs [36]. The neural networks modeling approach is very flexible and quick to solve problems, and it does not rely on physicsfounded algorithms to build models. They are easy to modify and deal with based on operator experiences and merge with the ANN structure model. Neural networks are good for solving complex non-linear relationships as their inputs are saved in their particular networks as an alternative to a database line. For that reason, the loss of data will not disturb the NN operation process. The following points list the main motivations associated with the use of ANNs:


Ability to accept unlimited inputs and outputs; this unique advantage makes ANNs more important and popular than other AI methods, making them suitable for small or huge dataset analysis.  Skills to learn and model non-linear and complex relationships, the ANNs can handle various real-life applications in different fields that are complicated and non-linear; this is a very significant advantage.  Skills of training without complete information and the data may produce output, and the performance depends on the importance of the missing data.  Distinct from the other deep learning prediction techniques, ANNs do not need any enforcement restrictions on the input variables, such as how the data need to be distributed.  Skills to create ML: ANNs can learn events and sort wise decisions via commenting to reach similar events better.  Multi-processing capability, the ANNs can assure numerical efficiency with their power of performing several duties simultaneously.  Ability to tolerate faults, whereby ANNs can produce output results even if some cells are corrupted, and this advantage allows ANNs to tolerate faults.  Ability to generalize; as soon as the ANNs learn from the initial input relations, they can conjecture unknown relationships in anonymous data, thus making the model generalized and allowing it to predict unknown data.  In using distributed memory during the ANN learning, an essential process is adjusting samples and indoctrinating the network according to the desired output by viewing these samples to the network. This process allows the network to achieve and select the instance straight proportionally and by failing to show the event to the network in its full features, and the network may yield false outputs.
However, artificial neural networks, although considered one of the best general algorithms solving problems, they are very much a stochastic problem, where model weights are used and every iteration is reorganized with the backpropagation of the error algorithm signal. ANNs' performance is good, yet several disadvantages and challenges face the ANN to assure the proper network structure, duration, best tuning parameters, trial and error, and more explanations that must rely on an expert user. The next points express the main challenges for ANNs as follows: • Mysterious network behavior, After the ANN produces an analytical result, it is unexplained why or how selecting these outputs and rejecting the others may make it untrusted in the network. • Appropriate network architecture design. ANNs have no exact law to determine the best structure design or a proper network structure must be achieved by experience and trial and error. • Obscure duration time for the network; the optimum results may be produced during the training phase as expected because the network minimizes to a certain level the error on the sample to allow the training completion.
• Depending on the hardware, ANNs need powerful dual processors and ANN structures. This drawback is called the realization that the whole approach is equipmentdependent. • Gradual corruption slows down the process over time and it suffers relative degradation, and the network problems do not immediately degrade directly. • Difficulty recognizing network problems if they exist since ANNs are based on numerical data that explain the difficulties in numerical values before being introduced to the ANNs. This could depend on the researcher's ability to display the mechanism and influence the network's performance.
Artificial neural network applications that have increased dramatically in the world in the middle of the last century are developing very fast. At present, besides the computer capabilities, the advantages of ANNs have been examined, and the problems users have encountered. However, it is very important not to neglect the ANN network's disadvantages, which are a developing science branch, and should excluded one after another, and the advantages of the ANNs are growing progressively. That is, the avenue of using ANNs will be an increasingly important indispensable part of our lives. The enhancement of ANNs by using optimization methods could eliminate some of their disadvantages in picking the best network structure using the proper optimization techniques. The challenge is finding a system coding that enables appropriate tuning of neural structures in professional networks, including the best number of neurons, hidden layers, weights, bias, and self-shaping architecture and multi-stage objective functions.
It is very important to select and adjust the best suitable neural network parameters for any given application, as there are many possibilities. However, not every neural network can could act perfectly in all applications. Some types are more practical in particular applications; for example, CNN is good for images and videos, while RNN is good for text and classification problems, so the networks need to be studied and adjusted, and the problems need to be compared and contrasted. Somehow, to enhance the neural networks with optimization it is important to select the neural network parameter optimizer to obtain the best outputs.
Like other AI algorithms, neural networks can deal with non-linear and complicated problems with a high volume of data. The superiority of neural networks over other different AI algorithms lies in that they are very effective for many inputs and outputs. While it is true that fuzzy or adaptive neuro-fuzzy inference systems (ANFIS) techniques have drawbacks, they can accept many inputs, although they are limited in the number of outputs they can support. Neural networks do not have this limitation which makes them work better for classification and regression studies.

Review of Optimization Algorithms
An optimization algorithm is an essential tool for selecting the best solution from a set of all possible solutions for analyzing, classifying, or improving existing systems or data. Above all optimization problems get at least one objective function or more objective functions. However, the target is to determine the optimal key that fulfills the complementarity conditions. Optimization problems are found in numerous scientific areas, such as medicine, engineering, business, and many more. Optimization algorithms are classified into various types: deterministic optimization and global optimization [23,37], continuous optimization [21,38], multi-objective optimization [39][40][41], etc. Overall, each optimization method is designed to serve specific targets. For instance, the local algorithms solve some optimization problems, such as discrete variables or integers, whereas it is easy for global algorithms. The global optimization algorithms can be categorized as either evolutionary algorithms or deterministic algorithms.
There are hundreds of common optimization techniques for many relevant scientific code archives. The challenge is knowing the best techniques to suit a particular optimiza-tion problem because some techniques use derivatives while others do not. The conventional methods normally use first order derivatives, and others use the second derivative for their objective function. The search type is either direct or stochastic search for targeted objective functions that result in its function's maximum or minimum output.
Normally, the most popular kinds of optimization problem facing neural networks involve continuous function optimization. Their input pretext for their function estimates numeric values for either the input function or the output function. However, the more available information for the target function, the more accurate the achieved optimization will be. In contrast, the differentiable function can determine any sample in the input search. Optimization algorithms are, in general, categorized into two groups: deterministic and heuristic algorithms. Deterministic techniques exploit their analytical capabilities, while in contrast, heuristic techniques are more flexible and efficient than deterministic techniques through fast-to-obtained solutions, decreasing the number of global solutions. Global optimization algorithms are used to find the global minimum or maximum in complex problems. This is harder than local optimization with bound constraints and does not require derivatives. Both the local and global optimizations are a matching set in solving linear, non-linear, quadratic, and least squares constrained or unconstrained, dense or sparse, forward or reverse communication, continuous, mixed-integer, integer problems [42]. The optimization techniques are classified according to the underlying principle of a biological and physical-based algorithm. The first category is a biology-based algorithm such as genetic algorithm (GA), harmony search algorithm (HSA), particle swarm optimization (PSO), bacteria foraging optimization (BFO), cuckoo search algorithm (CSA), bee colony algorithm (BCA), ant colony optimization (ACO), firefly algorithm (FA) [43], backtracking search algorithm (BSA), lightning search algorithm (LSA), etc. The second category is physics-based algorithms such as simulated annealing (SA), gravitational search algorithm (GSA), chaotic optimization algorithm (COA), etc. [44,45]. In this review, some of the most popular optimization algorithms are explained.
The particle swarm optimization algorithm is one of the most popular evolutionary optimization algorithms [46]. The PSO algorithm principle depends on the velocity and position of particles [47]. The authors described that the PSO algorithm is utilized to automatically design an ANN method to improve the synaptic mass, architecture, and transfer functions for each neuron [48][49][50][51][52]. Nevertheless, PSO has some drawbacks: it is vulnerable to becoming stuck in local minima and selecting control parameters incorrectly, resulting in a bad solution. In [48], an ANN-based PSO method was used to predict the thermal properties of molecular structure.
Another popular algorithm is the gravitational search algorithm, a physics-based optimization algorithm inspired by Newton's motion and gravity laws [49]. The GSA optimization method has been used in some applications to find the best solution for a short-term training feedforward approach to ANN problems and to improve the performance [53][54][55][56][57]. In [50], the authors addressed an ANN-based GSA optimization approach to enhance kidney image quality classification for a bio-medical application. The study in [51] presented a GSA optimization-based ANN to solve geotechnical engineering issues for improving geogrid-reinforced soil structures.
One optimization algorithm is the neural network algorithm (NNA), which is inspired by the functioning of biological nervous systems and artificial neural networks [52]. NNA has recently been used in machine learning, as an intelligent controller, in biodiversity assessment, intelligent feature recognition, and for uncertain data streams to provide a way of learning features, predicting highly nonlinear functions, discovering useful hidden representations of the input because it does not require mathematical models and it achieves good prediction for ANN [58][59][60][61]. However, NNA controllers require massive data and long-time training and learning. In [53], an artificial bee colony (ABC) and an NNA intelligent feature recognition for STEP-NC-compliant manufacturing can adjust geometric and topological information. The study in [54] addressed estimating biodiversity assessment based on AI and NNA.
Another powerful optimizer is the BSA which generates a trial population and then takes partial advantage of its experiences from previous generations. Crossover is developed in the trial population. The initial trial populations are taken from mutations. The described benefits of BSA are in their search exploration process, which has the advantage of using the mutation and crossover strategies. Though it has some limitations, such as time-consuming computation because of the dual population algorithm, one parameter is only used to control the amplitude of the search direction matrix in the mutation phase, and crossover is complex [55][56][57]. In [29][30], BSA is applied for a fuzzy logic speed controller optimization approach for induction motor drives. The deterministic global optimization in numerical optimization helps to search for global solutions for optimization problems [42].
The lightning search algorithm was first proposed by Shareef and his colleagues [58]. Afterwards Ali upgraded it with quantum mechanics theories to generate a quantum-inspired LSA (QLSA) [59]. The LSA optimization approach has been utilized in numerous applications [30,60,61]. The study in [30] described an LSA-based ANN method home energy management scheduling controller for residential demand response strategies. The study in [62] proposed a neural networks-based LSA to find the optimized feedforward learning process to solve datasets. In [63], the author addressed finding the optimal Kp and Ki value of the LSA-based PI voltage controller and implementing it into the dSPACE controller. Table 1 lists the advantages and disadvantages of the most popular nature-inspired optimization techniques. However, not all optimization algorithms and their variants provide superior solutions to specific problems. Also, even though some of the optimization techniques are efficient, they still need further improvement to enhance their performance. Besides, how to speed up the convergence of an algorithm is still a very challenging question, so new Nature-inspired optimization techniques must be continuously developed to advance the field of computational intelligence or heuristic optimization [60,61,[64][65][66][67][68][69][70][71][72][73]. Table 1. Advantages and disadvantages of the most popular nature-inspired optimization techniques.
-The capability of solving complex problems in a different application domain.
-Easily get trapped in local minima.
-Improper selection of control parameters leads to a poor solution.

GA [63]
-It does not require derivative information -Suitable for a large number of variables, -No guarantee of finding the global minimum, -Long time for convergence, -Hard to fine-tune all the parameters, like mutation rate, crossover parameters, etc., this is often done by just trial and error.
-It obtains good results when dealing with lower-dimensional optimization problems.
-Abrupt switching to the exploitation stage by quickly varying wavelength and pulse emission rate. -Difficult to solve high-dimensional optimization problems.

ABC [64]
-Strong robustness -Fast convergence and flexibility -Premature convergence in the later search period.
-Accuracy problems that in some cases cannot meet the optimal solution.

LSA [58]
-Suitable for the search exploration process.
-Has the advantage of using the mutation and crossover strategies.
-Time-consuming in computation because of the use of the dual population algorithm. -One parameter only controls the amplitude of the search direction matrix in the mutation phase. -Crossover is complex.
-Easily to adjust to the problem -Usually provide reasonably good performance -Premature convergence to a local global minimum.
BSA [73] -Suitable for the search exploration process.
-Has good mutation and crossover strategies.
-Time-consuming in computation because of the dual population algorithm.
GSA [67] -Faster solution convergence -Easily gest trapped in local minima, and weakness in its strategy to diversify the algorithm's population FA [68] -Easy to implement -Capable of automatic subdivision and dealing with multimodality.
-Gets trapped in several local minimal.

-
Performs local searches -Does not memorize the history of the better situation, and may end up missing situations

Neural Networks in Deep Learning
Deep learning (DL) is a subset of ML which is based on learning data representations, unlike task-specific algorithms. It is inspired by the function and structure of the brain, known as artificial neural networks. The approach utilizes a hierarchy of concepts in a field that assists a computer in building knowledge from experience. This technique does not require computer knowledge to be provided through human input as it is automatically gathered. The hierarchy of concepts facilitates breaking complex concepts into simpler ones with several layers [8]. DL techniques use several layers of abstraction to learn when there is more than one processing layer. This approach has found its use especially in visual object or speech recognition as well as genomics and medicines. DL implements a backpropagation approach to detect patterns in complex datasets. It does this by considering how the internal parameters should be altered to move from one representation layer to the next. Deep convolutional and recurrent nets have facilitated breakthroughs in image and audio processing as well as text and speech detection, respectively [16,69].
Neural networks have different implementations with slight variations, including RNN, ANN, and CNN [15,16]. Due to their feature engineering and decision boundaries, such novel NN approaches are preferred over machine learning by the people active in the study of self-driving vehicles, unmanned drones, or complex deep learning problems [17]. The decision boundary is a technique used to classify any data point as belonging to one of two classes, positive or negative. For this reason, if the data is not separable for any reason, neural networks will not be a good choice in deep learning. On the other hand, feature engineering is composed of two steps: feature selection and extraction. These two components make up the model building. The multi-layer ANNs are also neurons placed similarly to the human brain. Each neuron is connected to other neurons with certain coefficients. During training, information is distributed to these connection points to learn the network structure and functioning [18].

Neural Networks Structure Types
In deep learning, many neural network types use different principles to determine rules for various applications and formulate the foundation for most pre-trained models. The most well-known neural networks are ANN [70], CNN [71], and RNN [72]. On the other hand, many neural networks are developed with unique structures to serve different software. For example, radial basis function neural networks, modular neural networks, multilayer perceptron neural networks, and sequence-to-sequence models neural networks use their unique strengths to serve and fit some applications well compared to other networks. DNN [73] and another deep learning NN are so-called graph neural networks (GNNs), designed for graphic data classification problems [74,75]. LSTM recurrent neural network models are excellent for text classification problems. An ANN is based on the use of simultaneous optimization techniques was used to model theophylline tablet formulations in [76]. A generative neural network has been used in adjoint electromagnetic simulations [23].

Artificial Neural Networks
An ANN is a cluster of multiple perceptrons or neurons at each layer; when the input data is sorted in the forward direction, this is called the feed-forward neural network [15,77]. The basic structure of an ANN consists of three layers: the input layer, hidden layers, and output layer. The input layer receives the input data; the hidden layers compute the input data, and the output layer provides outcomes. Each layer's duty in the neural networks attempts to learn specific decimal weights to be set at the end of the learning process. The ANN approach is good for solving image data, text data, and tabular data problems. The advantage of ANN is its skill of dealing with nonlinear functions and learning weights that help map any input to the output for any data. The activation functions provide nonlinear properties to their ANN, which can benefit the net to learn any complex relation associated with input data and output data, known as a universal approximation. Many researchers adopt ANNs to solve complex relations, for example, the coexistence of cellular and WiFi networks in an unlicensed spectrum [78].
Another example is feed-forward neural network probabilistic neural network (PNN) in [79] and knowledge-based neural network described in [80,81]. In [82] this approach was used for modeling a solar field in direct steam generation parabolic trough. ANN is used as an optimizer in many research projects to solve bundling problems; for example, in [83] it was used to optimize a flight trajectory for rockets. An ANN optimized the design and optimization of microwave circuits in [84]. Model-aided wireless AI embedding expert knowledge in DNN to solve wireless system optimization to find the best architecture of an ANN [22]. ANN is also used to optimize and control thin film growth processes [85]. A sampling method for the ANN model's optimal design [86]. A feedforward neural network optimization is applied to synthesize fault-tolerance [87]. ANNs, together with the Xinanjiang model to employed to explore nonlinear transformations [88]. Some optimized artificial neural network models for predicting chlorophyll dynamics were done to decrease the cost of aquatic environmental in-situ monitoring and increase bloom forecasting accuracy [89]. A problem of crude oil distillation systems was solved using ANN by optimizing heat-integration [90]. ANN solved the optimization problem and extraction of anthocyanins in black rice using orthogonal arrays [91]. ANN solves the optimization problems in traffic lights timing traffic light controller [92]. Also, ANN is used as an optimizer and applied to waves energy converters (WEC) to predict overtopping rates as part of a sustainable optimization of coastal or harbor defense structures and their conversion for constructing a predictive model [1]. The architecture of the artificial neural network is shown in Figure 3. Each neuron output includes an activation function of a sum of all inputs weights, while the neuron input is a sum of all weights included in the bias, as shown in Figure 4. The bias is a constant used to adjust the output and the weighted sum of the inputs to the neuron, while the activation functions are a powerhouse for neural networks [93][94][95][96][97][98][99]. The neural network weights updates in the back-propagation process are done to get the gradients as a neural network using many hidden layers. The gradient may vanish and explode during the backward propagation [100,101].

Recurrent Neural Network
Recurrent neural network architecture is from a neural network family, though, is displays differences compared to ANN, in which the looping constraint on the hidden layer turns back to RNN [15]. The feedback constraint is back-propagated to ensure that the subsequent data is looped into the input data from the last step in each neuron's first step, as shown in Figure 5. RNN is normally used to solve problems associated with text data, time-series data, and audio data. Because the parameters go through different time steps, these steps are called parameter sharing, ending with fewer parameters to be trained [102]. This action could save computational time because the gradient computes only at the last step and vanishes in every neuron in the RNN. The error is back-propagated from the previous time step to the first step. The error at each time step is calculated, allowing us to update the weights. The Elman neural network (ENN) has similar concept properties to RNNa, and it has standard back-propagation known as the Elman backpropagation algorithm (EBP). RNN is used in many applications for real-world problem-solving, as in [38].

Convolution Neural Network
A convolutional neural network is a neural network family compared to a multilayer perceptron (MLP) [93]. The CNN has hidden layers called convolutional layers. Also, CNN has other non-convolutional layers [15,103]. The basic concept of CNN structure is the convolutional layers that pull through the input weight and transform the neurons' input on the activation function.
To the next convolutional layer, is the convolutional operation, as shown in Figure 6. Each convolutional layer specifies the number of filters used to detect the patterns of shapes in specific object shapes, for example (circle, squire, corner, eyes, feathers, etc.). These filters help extract the right and relevant features from the input data. CNN is the most widely used type of neural network for analyzing images. However, image analysis is but one use of CNN and it can be used for other data analysis problems such as classification problems. Most generally, CNN is a critical neural network specializing in picking out patterns and making sense of them. This pattern detection is what makes CNN so useful for image analysis. These CNN models are used across different applications and domains, especially in image and video processing projects. CNN is related to solving images as multiple-image-based depth estimation or estimates depths in the edges; basically, they are used to classify the edges in backgrounds or reflections [5]. CNN was used to detect wildfire smoke images [104] and forest fire smoke recognition [105].

Overview of Neural Networks Enhanced by Optimization Algorithms
The optimization technique aims to improve the applications by finding the minimal error, minimal cost, maximum performance and efficiency. It can be categorized into two principal ideologies: physical and biological-based [44]. for example, chaotic optimization algorithm (COA), simulated annealing (SA), gravitational search algorithm, etc. or a biology-based algorithm such as genetic algorithm (GA), practical swarm optimization, bacterial foraging optimization, harmony search algorithm, cuckoo search algorithm, ant colony optimization, dolphin swarm algorithm (DSA) bee colony algorithm, firefly algo-rithm, LSA, backtracking search algorithm, etc. [46]. In this review, some common optimization algorithms that enhance the performance of neural networks are discussed in detail in the following subsections.

Artificial Neural Networks Based Particle Swarm Optimization
The PSO method was first discovered by Eberhart and Kennedy inspired by the movement of organisms such as bird flocking and fish schooling in 1995 [106]. PSO uses a velocity vector to update each particle's current position in the swarm [107]. The PSObased neural network is used extensively compared to the other algorithms and applied by many researchers in different applications. For example, it is used to solve mathematical problems of predicting the uniaxial compressive strength of rock samples from other states in Malaysia [108]. Also, this combination of ANN-based PSO is used for detecting trip purposes from smartphone-based travel surveys of GPS data [109]. ANN-based PSO is used smartly to improve the prediction performance model for Wi-Fi indoor localization strategies by reducing the maximum location error with astonishing results [110]. In [111], PSO optimization swas used to design a dynamic modular neural network based on adaptive PSO to solve the problem related to a subnetwork output. Optimization enhances many algorithms and applications to solve complex linear and nonlinear problems; for example, an efficient PSO-based ANN was utilized for the nonlinear mathematical model of Troesch's problem. PSO was used to obtain a unique numerical solution by weights optimization for the final network [112]. Network weight optimization is very popular for optimizing the initial weights or entire network weights. For example, in [72], optimization is used to find the best weights for self-adaptive parameters and strategybased PSO (SPS-PSO) algorithm to optimize feedforward NN (FNN) design. Again, weights optimization using ANN-based PSO solves a non-linear channel equalization problem as in [113], and in [49], PSO's weight optimization automatically designs an ANN methodology. Using PSO for search for hyperparameters is also widely discussed and tested in many kinds of studies and the outcome improves many applications. For example, CNN-based PSO optimized the hyperparameter linearly to decrease CNN weights in the final network [114]. Also, PSO optimization boosts neural networks by searching for the optimal hyperparameters for network architecture design in [115]. PSO-based deep NN was used to optimize the number of hidden layer nodes for digital modulation recognition applications. Another study [116] discusses optimizing the number of hidden layer nodes used for global solar irradiance prediction in extremely short-time intervals with hybrid backpropagation neural networks based on PSO optimization. Table 2 presents some examples of PSO research for neural network architectures focusing on weights and neurons in hidden layer optimization problems. A combination of PSO optimization and neural networks is the most common combination between optimization algorithms and AI and is used in many application software and controllers. There is much ongoing research on this combinationa; for example, a PSO-based ANN was used to enhance forecasting software reliability [121], while in [122], one was used for data-based fault-tolerant control. PSO assists different types of neural networks in different ways. For example, a PSO-based BP neural network used to solve big-data mining approach problems associated with financial risk management with the Internet of Things (IoT) constructs a nonlinear parallel optimization model [3]. There are some applications done on a giant scale, for example, The Kambara reactor desulfurization used a combining ANN-based optimization techniques and a simulated annealing algorithm with PSO (SAPSO) for determining optimal parameter structures such as a number of hidden layers, neurons, and activation functions training to solve desulfurization model performance problems [117]. In [118], an issue of ship motion attitude prediction was solved by using the adaptive dynamic PSO (ADPSO) algorithm and bidirectional long short-term memory (LSTM). That is done by searching for the hyperparameters of bidirectional (BiLSTM) neural networks. In [119] interval type-2 fuzzy neural networks (IT2FNNs)-based PSO and a big bang big crunch (BBBC) functional for parameter optimization were used for Takagi-Sugeno-Kang type problem. Sadik and his co-workers have successfully used a hybrid PSO-ANN algorithm for indoor and outdoor track cycling wireless sensor localization. and the algorithm was used for improving the distance estimation accuracy of mobile nodes [29].
The PSO optimization with AI saves lives in many biomedical applications that help many smart applications in hospitals, clinics, and therapists by assisting smart diagnoses or smart robots. Some applications in this area can be highlighted; for example, in [123], a hybrid ANN-PSO is used for predicting airblast-overpressure by estimating quarry blasting and influential parameters in four granite quarry sites in Malaysia. Also, in [124] ANN-PSO is used to manage groundwater resources to solve the groundwater management problems of groundwater in France's Dore river basin [124], whereas in western Australia, short-term traffic flow predictors for forecasting traffic flow conditions on a section of freeway using Intelligent Swarm PSO-based ANNs were used [125]. In [126], a functional-link-based neural fuzzy network (FLNFN)-based hybrid cooperative PSO and cultural algorithm were proposed for solving problems related to orthogonal polynomials and linearly independent functions in a functional expansion of the functional link neural networks. in [127], PSO was enhanced with a periodic mutation strategy (PMS) and neural networks with mutation application strategy and diversity variety for solving problems of an airfoil in transonic flow. A photovoltaic thermal nanofluid-based collector system used ANN and PSO to solve a complex non-linear relationship between input and output parameters [128]. Some researchers have used a neural network to improve the PSO search performance oppositely [129][130][131]. Improved PSOs revolve around feed-forward ANNs, as in [31], to present a unique evolutionary ANN algorithm called IPSONe. In [132], a neural network with a fuzzy algorithm and PSO is used for a brain-computer interface classifier for wheelchair commands, whereas PSO is used to optimize with a cross-mutated-based ANN (FPSOCM-ANN). A PSO combined with ANN for data classification with an opposition-based PSO neural network (OPSONN) algorithm was used for the NN training to solve data classification problems [133]. Taguchi PSO solves high-dimensional global numerical optimization problems for ANN design concerning tensile strength for steel bars [131]. A nonlinear neural network predictive control strategy based on tent-map chaotic PSO (TCPSO) was used for achieving a nonlinear optimization for advanced convergence and high accuracy [129]. ANN is the most common neural network and the PSO is the most common optimization method; for that reason, they have been used and compared in some cases with other AI or optimization techniques. For example, training ANNs over a hybrid PSO and cuckoo search (PSO-SC) algorithms that have been done by adopting feedforward neural networks (FNNs) to solve algorithm performance problems [130]. Table 3 presents studies involving PSO for neural network design and application enhancement.

Artificial Neural Networks-Based Genetic Algorithms
Holland firstly introduced the genetic algorithm concept in 1975. It is a stochastic global adaptive search optimization technique based on the mechanisms of natural selection [134]. The GA algorithm solves optimization problems by applying a series of crossover, mutation, and fitness evaluations to multiple chromosomes. This algorithm is initialized to a population containing several chromosomes, in which each one represents the optimal solution of the problem that is evaluated by an objective function [87]. Many researchers use the GA for different applications. Some such research was about renewable energies applications, such as the maximum power point tracking for PV and wind systems, to improve distribution systems' reliability and power quality [135]. Ongoing studies focus more on GA for enhancing the ANN than other neural networks in comparison. For example, the GA is used for outline capturing using rational functions and ANN to solve energy management applications such as scheduling and economic dispatches [136]. It is also used for solving reliability problems of structural laminated composite materials [137]. In [138], it solves bankruptcy prediction problems, while that combination is used to solve circular tubes with functionally graded thickness problems with multiple objective crashworthiness optimizations [139].
ANN-based GA is applied in many ways; some are related to optimizing the ANN structures design. For example, in [140], an ANN used GA to optimize parameters to determine the number of hidden neurons, bias values, and the connection weights between nodes to solve time series forecasting problems. Also, it is used for weights optimization of ANN on a pre-specified neural network applied on a mobile ad-hoc network [141]. GAbased ANN is used for solving many issues, such as producing spectra for prediction, parameter fitting, inverse design, and performance to design network architectures and select optimal hyperparameters [142]. In the same way, it is used to compute a heat transfer study in [143] to determine suitable parameters for maximum weight reduction. GA is also used to select optimal network parameters for a deep-NN model architecture to model prospective university students' admission [144].
Many types of research use these two smart concepts by merging them for many applications and classification problems. In [145], an ANN hybridized with GA was used to optimize lipase production from Penicillium roqueforti ATCC 10110 in solid-state fermentation in. A multi-layer ANN united with GA was employed to solve problems of pectinase-assisted extraction of cashew apple juice [146]. Some studies discuss parametric study problems of the transcritical power cycle and regenerator by selecting objective functions for parametric optimization [147]. In [148], a nanofluid flow in flat tubes using computational fluid dynamics problems was solved using multi-objective ANN optimization and non-dominated sorting GA (NSGA). Also, decouples capacitor placement on a power delivery network, while another example used for analog circuit design space exploration for automated sizing of integrated circuits [149].
In some cases, the neural network works with more than one optimization for either comparison or combination reasons. For example, the GA and PSO work together on ANN to find the best values of the rational functions' parameters for optimizing surface roughness [12]. in [150,151] the GA is used with Adadelta DNN (GA-ADNN) to predict catenary comprehensive pantograph and catenary monitor status models. Table 4 presents different studied involving GA for neural network design and application enhancement. In [152], a hybrid PSO with GA for ANN training for short-term load forecasting and GA optimization was used to solve power grid investment risk problems by optimizing the weight and threshold of the BP neural network, while in A GA was also applied on three neural networks (MLP), radial basis functions neural network (RBFNN), and a GA-derived generalized regression neural network (GRNN) for discovering the optimal weights to solve the problem of predicting groundwater salinity [153]. Also, GA Search for optimal weights Predicting groundwater salinity

Artificial Neural Networks-Based Artificial Bee Colony
Many optimizations are used for optimizing neural networks to find the values of linkage weights either alone or associated with biases and neurons in hidden layers. Many researchers have considered the ABC for boosting neural network performance either by optimizing the hyperparameters or somehow merging to enhance the neural network or applications. An example of improving the ANN is an efficient model based on the ABC optimization algorithm with neural networks [154]. The ABC algorithm uses an alternative learning scheme to optimize neuron connection weights for the design of ANN structures used for electric load forecasting to obtain an optimized set of neuron connection weights [155]. In [156], intrusion detection for cloud computing using ANN neural networks and an ABC and fuzzy logic for identified normal and abnormal network traffic packets by optimizing the values of linkage weights and biases [156]. Deep neural networks are good for classification problems, and some studies use the ABC algorithm with DNN. For example, in the ABC algorithm search for hybridization parameters of DNN structure, this study included autoencoder layers cascaded to a softmax classification layer [157].
Also, a modular neural network presents a modular NN model based on the ABC algorithm for electric load forecasting with synaptic weights optimization [158]. On the other hand, some research is merging the neural networks with ABC to solve specific problems. For example, a study using a swarm-inspired algorithm with ANN to protect against dual attacks using the concept of ANN as a deep learning algorithm and the swarm-based ABC optimization technique [8]. Table 5 lists studies involving ABC for neural network design and application enhancement.

Artificial Neural Networks Based Evolutionary Algorithm
Most of the research discussing neural networks-based evolutionary algorithms improves the neural networks' design to either reduce the training time or solve problems encountered by ANNs [159]. This combination is used in many applications to solve different problems, for example, an adaptive co-optimization of ANNs using EA for global radiation forecasting using hybrid ANN models. It predicted monthly radiation by typical weather and geographic data-adaptive the EAs utilized to improve prediction performance was developed to train the neural networks [160]. At the same time, another study used multiverse optimization for new natural EA together with ANN to develop advanced detection approaches for intrusion detection systems [161]. The combined effort between ANN and EA is reported in many research studies, yet only some significant research has been considered in this review. For example, in a correlation analysis of the training process, self-organizing combined with genetic EA, is applied to boost built structures of neural network's performance and efficiency [162], while another research study evaluated a model-based optimization process for high voltage alternating current systems [163]. Though some studies use EA optimization for ANN weights optimization, this unique combination is used in mobile communications to solve weights optimization problems in ANN optimal modeling by applying a framework for predicting received signal strength [164]. Also, in the chemistry field, the EA introduces chemical reaction optimization (CRO), used as a global optimization technique to replace BP in training neural networks [165], for better performance and saving more time for the training process. An optimization technique, EA based on pieces of training, approximates the solution of fractional differential equations [166]. Table 6 presents research involving EA for neural network structure design and application enhancement.

Artificial Neural Networks-Based Backtracking Search Algorithm
BSA optimization technique is an evolutionary computation technique for producing a trial population that includes two new crossovers and mutation operators proposed by [62]. BSA dominates searching for the best value of the populations and searches in the space boundary to get the exploitation capabilities and very robust exploration [62]. BSA dominates the search's value for the best populations and the boundary of the space to provide very sturdy exploration and exploitation capabilities. Thus, considerable research has proven it as one of the most powerful optimization techniques [62]. Numerous researchers widely use BSA in modern applications, such as solving the state of charge of lithium-ion batteries by improving a backpropagation neural network (BPNN) by optimizing hidden layer neurons' optimal value learning rate [167]. The BSA improved neural network with random weights by combining BSA and a neural network with random weights (NNRWs) to optimize the hidden layer parameters of the single-layer feed-forward network (SLFN), and NNRWs is used to derive the output layer weights [168]. In [169] a modified BSA (MBSA) has been improve by learning and niching together with ANN training and in [170] an ANN prediction method based on adaptive BSA was used for optimizing the connection weights matrix of the echo state network reservoir [170]. These studies involving BSA for neural network design and applications enhancement for the best neural network structure boost the performance level and reduce the time-consuming network setup. Table 7 presents different research projects involving BSA for design and application enhancement.

Artificial Neural Networks Based Other Optimization Search Algorithms
Neural network-based optimization algorithms are a hot topic in the research field. Many algorithms have been studied in the past ten years; their combination has become very attractive because of the incredible outcomes from that merging or enhancement. As a result, many studies have been conducted in different applications in life; in this section, some significant research has been investigated to highlight the importance of enhancing neural networks with optimizations. A short wind speed forecasting-based prediction problem has been solved by ANN hybrid with crisscross optimization in [172]. ANN also solves the reliability-based design problem of double-loop reliability-based optimization approaches [173]. Deterministic global optimization and ANN to solve the convex and concave envelopes of the nonlinear activation function in [4]. A graph neural network called RouteNet solves complex relationships between topology, routing, and input traffic to produce accurate estimates solutions in [81].
This section focuses on mixing different types of neural networks with other optimization algorithm techniques. For example, FNN training employs a symbiotic organisms search (SOS) algorithm to solve the UCI machine learning repository problems [84]. An ANN model using the teaching-learning-based optimization algorithm (TLBO) solves energy consumption estimates in Turkey [174]. Also, the ANNs with ant colony optimization (ACO) assess residential buildings' performance by training the NN based on ACO instead of the BP algorithm [175]. In [176], a social spider optimization was used to improve the training phase of ANN with multilayer perceptrons for the context of Parkinson's disease recognition. A dynamic optimization problem (DOPs) used with neural network (NN)-based information transfer method (NNIT) used for solving issues associated with environmental changes in [177]. Also, an automated optimization-oriented strategy for designing high power amplifiers using DNNs with a deep learning regression network and electromagnetic-based Thompson sampling efficient multi-objective optimization (TSEMO) [178]. Another continuous optimization based on deep RNNs uses metaheuristic algorithms to solve the difficulties of optimization problems for noise to signal ratio [40]. A neural network in numerous learning problems and backpropagation (BP) methods as correntropy-based conjugate gradient BP (CCG-BP) in [179]. A DNN based on a secure precoding scheme is a deep AN scheme for solving artificial noise scheme problems in multiple-input single-output (MISO) wiretap channels [180]. A deep CNN (DCNN) structure modeling for reconstruction enhancement and decreasing online prediction in ANN is used for anthropomorphic manipulators in [181]. In [182], three DNNs called deep multilayer perceptron (DMLP), long-short memory (LSTM) neural network, and CNN, were used to build prediction-based portfolio optimization models in the Chinese stock market. This combination has come to an optimal prediction without optimization in comparison to the other studies. A hybrid method for electricity price forecasting by ANN and artificial cooperative search algorithm (ACS) for the combination of mutual information and neural network (NN) in [183]. Table 8 presents research involving various optimization techniques based on neural network design and application enhancement. Forecasts of short-term electricity prices in a deregulated market The following examples enhance neural networks by optimizing their weights connections, for example, a prediction of time series to adjust the weights in the ANNs model with parameter-free simplified swarm optimization (SSO) [184]. ANN-based biogeography-based optimization (BBO) also solved electrical energy forecasting problems for longterm forecasting of India's sector-wise electrical energy demand [185]. Again, an enhanced ANN with a shuffled complex evolutionary global optimization algorithm with principal component analysis-University of California Irvine (SP-UCI) for the weight training for feedforward ANN [186]. Another example of weights linkages optimization is done in a metaheuristic, bird mating optimizer (BMO), which was used to train feedforward ANNs in [21]. Also, a quantum-based algorithm was used to design an ANN with few connections and high classification performance by simultaneously optimizing the network structure and the connection weights [187]. Neural network training with a weighting mechanism-based optimization algorithm was used to resolve some algorithms' undesirable convergence behavior and improve Adam and AMSGrad [188]. A unified automated model generation algorithm uses optimization to automatically determine the type and topology of the mapping structure in a knowledge-based neural network model to force some weights of the mapping neural networks to zeros while leaving other weights nonzeros optimized in [88]. An Elman neural network was used to train the connection weights between the layers based on a whale optimization algorithm (WOA) to solve the problem of falling into local best solutions [189]. Another optimization of connection weights in neural networks using the WOA for training ANN and verified by comparisons with BP algorithm other evolutionary techniques was described in [190]. An evolutionary nonlinear adaptive filter approach via cat swarm functional link ANN (CS-FLANN) was employed for solving unwanted noise problems by picking the optimum weights of NN filters in [191]. Cat swarm optimization (CSO) was also used to train the ANN for structure design by simultaneously optimizing the connection weights [192]. A calibration method was done to improve the robot positional accuracy of industrial manipulators using a teaching-learning-based optimization (TLBO) method to optimize the weights and bias in ANN in [193]. ANNs based sparse optimization simultaneously estimates the weights and model structure of an ANN in [194]. Table 9 lists optimizationbased neural network weights optimization enhancements. To resolve the undesirable convergence behavior by weighting mechanism-based first-order gradient descent optimization

For effective neural network training
Knowledge-based NN [88] i1Optimizer To force some weights of the NNs to zeros while leaving other weights as non-zeros.
For unified automated parametric modeling algorithm Elman Neural Network [189] WOA To train the connection weights between the layers Network soft-sensor model of conversion velocity in a polymerization process ANN [190] WOA Optimizing connection weights of ANN controlling parameters weights and biases ANN structure design FLANN [191] CSO For the selection of an optimum weight of the neural network filter Gaussian noise removal from tomography Images ANN [192] CSO & OBD For optimization of the connection weights ANN structure design ANN [193] TLBO To optimize weights and bias of the NN Robot manipulator The following studies overview some examples of optimization enhancement of various optimization-based neural network parameters (hidden layers, learning rate, neurons, and wights). For example, a hybrid lightning search algorithm (LSA)-based ANN can predict the optimal ON/OFF status for home appliances for home energy management by tuning the learning rate value and the number of nodes in the hidden layers in [30]. Also, in [195], FNNs are based on artificial fish swarm optimization (AFSA) to replace the BP process in ANN. A DNN based on multi-objective was used for solving the connecting structure DNNs, particularly the layerwise structure learning method, in [80]. An optimization method for CNNs based on the difference between the present and the immediate past gradient diffGrad optimization technique to solve the problem with basic stochastic gradient descent (SGD) in [103]. A global optimal known as Bayesian optimization (BayesOpt) is a machine learning-based global optimization technique to solve a simple objective function problem in CNN [196]. A systematic quantitative and qualitative analysis and guidelines using CNN-based Ben's spiker algorithm [197]. In [198], the microcanonical optimization algorithm (MOA) is used to select the best hyperparameter architecture for CNN for a variant of simulated annealing. DNNs with stochastic optimization acceleration update the network parameters to solve PID controller problems in [199]. A hybrid neuro-fuzzy network-based differential biogeography-based optimization (DBBO) for online population classification in an earthquakes optimizer searches for the best parameter for the main network and the subnetwork [200]. An adaptive memetic algorithm with a rank-based mutation (AMARM) is used to design ANN architectures by a simultaneously fine-tune number of hidden neurons as well as connection weights in [201].
ANN-based path loss prediction for wireless communication network multilayer perceptron (MLP) neural network generates low dimensional environmental features and eliminates redundant information among similar environmental types [202]. Table 10 overviews various optimization-based neural network parameters (hidden layers, learning rate, neurons) optimization enhancement.

Optimization Search Algorithm-Based Artificial Neural Networks
In this subsection, neural networks work as an optimizer for optimization techniques to optimize algorithm parameters. For example, as in a study, the fitness function value in a pressurized water reactor core was optimized by pattern optimization using a grey wolf algorithm (GWO)-based ANN to solve the best configuration for fuel assemblies [203], while in [204] parameter prediction, using an ANN as a tool in finding the parameter optimization of resistance spot welding optimization (RSW) to solve the sensitivity of exact measurement for aluminum alloy was decribed. A topology optimization accelerated-based deep learning study discussed learning a cross-sectional image of an interior permanent magnet motor represented in RGB and trained a CNN to infer the torque properties to decrease computational cost for the optimization topology (TO) [11]. Table 11 presents studies involving neural networks for improving optimization techniques design and application enhancement. Cross-sectional image of an interior permanent magnet motor

Application on Artificial Neural Networks Based Optimization Algorithms
Previous research on optimal scheduling controllers was developed for energy management, reliable power generation, cost minimization, and carbon emission calculation [64]. A binary BSA algorithm and binary PSO are utilized to search for optimal binary schedules [120,205]. These algorithm techniques have powerful optimization skills, search exploration process, fast convergence for the solution, and other conventional optimization techniques and overcome local minima traps. Besides, developing an enhanced ANNbased BBSA and ANN-based BPSO schedule controller ensures the best performance across different load conditions. [206,207]. An ANN is on track as a prediction technique to find the best weight values for neural nets designed for efficient system operation. In this paper, these nets operate in optimum ON/OFF status by training on input and output data patterns obtained from scheduling controllers [206,208]. This section presents ANNbased optimization algorithms implementation for ANN-PSO, ANN-GA, ANN-ABC, and ANN-BSA, respectively, to search for the optimal values of the number of nodes in hidden nodes layer1 and layer2 as well as the best value of the learning rate. The algorithms apply limitations of, e.g., (max and min) number of nodes in each hidden layer and the learning rate. The output data is relayed on a binary schedule (25x24) obtained from the scheduling controller. The input data includes six inputs, including solar irradiances, wind speed, energy price, battery status, gird status, and diesel fuel status refer to [206,208]. In all the ANN-based algorithms, the iteration of ANN is set to 100 iterations, and the population size is 20 populations. Table 10 presents a brief list describing the data and the limitations of the aforementioned algorithm techniques. The mean absolute error (MAE) is an objective function that enhances the ANN performance by decreasing the error function expressed in the general flow chart of optimization of ANN optimal parameters, as shown in Figure 7. All the inputs and outputs of the ANN-based optimization algorithm training for the virtual power plants system in [208] can be expressed by the following Equations (1) and (2): ANN deep neural networks using feed-forward structures have been adopted in this study. The use of trainlm as a network training function that updates weight and bias values according to Levenberg-Marquardt optimization is considered the fastest backpropagation algorithm in the Matlab toolbox. However, it does require more memory than other algorithms. Hidden layers are chosen to be two layers using the sigmoid activation function and the optimization is adapted to search for the number of the nodes in both hidden layers; the optimization algorithms are also set to search for the optimal value for learning rate. This optimization process is done using random trail values in ANN training based on the aforementioned inputs and outputs data. The optimal trail is the minimalist mean absolute error (MAE). Those trials from each optimization algorithm have been done separately, and each optimization takes days to come up with the best parameters. All these algorithms have addressed the limitations of the search for a set of trails, as presented in Table 12. The algorithms use random ANN trial parameters as the initial step pre iteration process includes ANN training for 10000 epochs to evaluate the minimum objective function. Figure 8 shows that the numbers of inputs and outputs layers are known based on the data [209]. The duration time of each ANN training is unexpected could take a long or short time depending on the trial training points of the ANN training. The training can show good or bad performance from the very early stages of the training, but this is not sure because it sometimes behaves differently and improves or remains in the middle or end of the training. Neural Network outputs data Neural Network inputs data Figure 8. ANN Architecture Based on PSO, GA, ABC, and BSA algorithms, using input and output data obtained from [208].

Max value of nodes in hidden layer2
In this study, the BSA objective of enhancing the ANN structure toward optimal parameters was the best among the other techniques, which minimize the MAE to reaches a value of 0.0062 [210]. The GA objective was 0.0080, which is not very far from the BSA objective. Simultaneously, the MAE of the PSO and the ABC was greater at 0.0144 and 0.0172, respectively, compared to the other two techniques, as shown in Figure 9. The BSA's main principles have been done through its crossover consists of two parts. The first part generates the binary matrix, and the second part compares population X(i,j) and the trial population. Crossover is used to obtain an updated map(i,j). Also, this part works on the control mechanism of boundaries for a trial population. As presented, enhancing the neural network could help the system enormously and the enhanced ANN is proven to be overwhelmingly impressive, or at least competitive, by training and testing is as important as the optimal design of the ANN structure. This study also introduces a novel way of solving optimization tasks by the neural network.

Artificial Neural Network Training-Based Optimized Parameters
When applying the optimized parameters in ANN training using the input and output data for each optimization technique separately, this training process will result in a net for each optimization algorithm. The obtained net is the masterpiece, and it is the intelligent controller that can replace the ordinary controller to predict unexpected non-linear input to result in a wise decision. The enhanced ANN saves training time and ANN parameters chosen wisely by the optimization algorithms. The results are better than human decisions no matter what type of optimization is used [208][209][210][211]. The pseudocode of ANN training based on the optimal parameters, and the outcome is a Net for ANN-PSO, ANN-GA, ANN-ABC, and ANN-BSA [212][213][214]. Since the net output is 0 or 1 hourly pattern, we can call this net an intelligent binary controller [120,206,208,211].
Pseudocode of ANN training based on optimized parameters obtained from optimization algorithms. The optimal enhanced net of ANN-BSA in a Matlab Simulink block is shown in Figure 10, involving six inputs and twenty-five binary outputs on an hourly basis to manage distributed generators throughout the virtual power plant system. The net block is generated after the training completer by using Equation (3). Table 13 presents the ANN training-based PSO, GA, ABC, and BSA using the optimized parameters. The generated ANN Net module is an AI controller; it is considered a masterpiece and smart controller. This Net could be implemented in cheap microchips and used as a smart device to control hug systems to serve in a very effective smart way cheaply.
sin ( 1,1) gen Net   The following figures represent the training performance and regressions for ANN deep neural network after using the optimization algorithms' optimal parameters. This study shows a fair compression based on each optimization technique to find the best parameters to serve the system in the best way. These hybrid techniques could save huge trial and error time during training and find the required best parameters, using smaller nets to save valuable time during the training and testing. Any of the optimization algorithms used could give better results than manual parameters tuning. Yet, some techniques could find the best fitness faster and more efficiently than others, as ANN-BSA in Figure 11, which shows the best training performance of 6.3695 e −7 at 2317 epochs and regression (R) reach to best of 1 regression training which the best results it may obtain by training. However, other optimization techniques trained for 10,000 epochs on their optimal parameter have good results somewhat near in results. However, in Figure 12, ANN-GA shows the best training performance of 5.4579 × 10 −6 , and regression (R) reach 0.99999 it is very close to unity. Figure 13 and Figure 14 show the best training performance of 3.9938 × 10 −6 and 2.5178 × 10 −5 and regression of 0.99999 and 0.99995 for ANN-PSO and ANN-ABC [210].
Fair comparison results of Bus1 of 14-bus IEEE test system for virtual power plants utilize the optimized ANN net based on half-hour binary patterns for managing each distributed generation (DG) unit in the system. The binary (ANN-BPSO), binary (ANN-BABC), binary ANN-BGA, and binary (ANN-BBSA) is a controller with binary output 0 or 1 to switch each DG ON or Off based on the inputs. Figure 15 shows that the entire algorithm saved a huge amount of power. Yet, all the saved power was considered with sharing new distributed resources to inject power to the loads instead of supplying power from the utility grid [212]. However, most of the optimized Nets have done an excellent job. However, some Nets are better than the others based on their objectives as can be seen that the total power for the 24       Much research is conducted to address the ANN enhancement to present extraordinary results compared to using the same ANN. The difference between the first and the second results is because the involvement of optimization techniques to find the best parameters. This method has been evaluated compared to other trends research discussing similar issues in evaluating the applied approach compared to other methods, Table 14 presents a comparison of the proposed technique with other enhancing neural networks by finding the optimal parameters of no. of nodes in hidden layers and learning rate. Table  15 shows an overview of neural networks-based optimization techniques for the optimal number of nodes in hidden layers and learning rate. The table states that ANN-based optimization techniques have gained a momentum trend in the last five years. This enhancement becomes essential in most AI applications used for ANFIS and fuzzy to optimize the best membership function shapes. Also, the optimization techniques are used to enhance the PI controllers to select the best parameters. Also, in many ML to improve the classification or regression are utilized.

Conclusions and Future Work
This review includes extensive research on ANNs' importance, advantages and types of utilization in a series of applications and also neural network enhancement based on optimizations for network architecture design, training and testing. The literature shows that optimization for AI generally and neural networks specifically has been a hot topic during the past ten years and has increased year by year up to 2021. The review has undertaken neural network enhancements by optimizing the parameters, such as weight optimizations, initial weight optimizations, bias and learning rate optimizations, number of hidden layers, number of nodes in hidden layers, and activation functions. On the other hand, the enhancement could be trained by modifying the neural network's regular algorithms, for example, replacing the feed-forward or the back-propagation for tuning network weights based on the error rate per epochs. This review covers a test case study of ANN-based optimization algorithm techniques to provide a quick example of ANN improvement. As presented, the hybrid or mix techniques of ANN-PSO, ANN-GA, ANN-ABC, and ANN-BSA are compared in a fair comparison for their objectives, regressions, training performance, training time, and application of microgrid energy management. The influence of each technique can economize time for selecting parameters and for the training. The enhanced ANN nets are tested on distributed energy resources in the form of an energy management system. The quick results show that virtual power plants save a reasonable amount of supplied power. From this review, the research emphasizes enhancing the neural network by optimization algorithms used to search for ANN best parameters and training parameters to achieve the best structure network have been satisfied based on the comparison tables as well as the testing results for improving the ANN performance by PSO, GA, ABC, and BSA, optimization techniques. Also, this review has proved that neural network is a very hot topic recently and could improve both neural networks or maybe other AI techniques or ML to search for optimal parameters to solve the problems in a short time and efficient way. This review and the case study also include several important and targeted recommendations for the further development of the ANN-based optimization method, such as:

•
Generally, ANN intelligent methods are associated with powerful optimization tools, such as PSO, ABC, BSA, and GA techniques, in various engineering applications, such as electromagnetism, signal processing, and pattern recognition and classification, robotics. Nevertheless, they have a problem with constancy and cost. Thus, future research should be conducted on the appropriate optimization method selection, finding the system's optimal value, such as cost-effect components with high accuracy. • The conventional NN technologies create issues; for example, the human brain is highly complex, non-linear, and sensitive [214]. Therefore, additional investigation is needed on human brain monitoring optimization to obtain high accuracy. The low timing loss under the high-risk, complex situation to achieve high reliability, modularity, efficiency, and performance; further investigation of the system's proper optimization selection is needed. • Despite the benefits of optimization algorithms in reducing technical loss, low error, and cost, their use in ANN has been very limited. Only computational intelligence optimization algorithms have made significant progress toward optimizing the controller design and the price. As a result, advanced optimization algorithms will be better choices for ANN design. • Enhancement of ANN parameters with optimization could result from new algorithms that save more time adjusting the ANN toward optimal architectures by avoiding trial and error or random selection. Like this, the optimal solution is considered as a smaller network, a straightforward calculation method, and less time could be achieved.
The neural networks of intelligence dependence evolutionary algorithms improve neural networks' design by reducing training time or solving problems using the ANN method. For quick tracking, less steady-state errors, and high performance, ANN techniques can be used to monitor robotic sensing and control monitor and achieve bidirectional power management. However, real-time data integrity, reduced operations time, expensive processing equipment, and the need for good parameter selection and manual tuning are all disadvantages. As a result, more research is required to select proper optimization methods for enhancing neural network structure design is important.
DL methods are fast evolving for higher performance. There are adequate review articles about the progressing algorithms in particular application domains. Future work could be carried out considering other DL methods such as denoising autoencoder, deep belief networks, and long short-term memory. Further study and review can enhance or hybridize ML with optimization techniques, random forest, Markov chain Monte Carlo, or support victor machines. Future work can also consider many optimizations to improve AI and ML to boost their performance [215][216][217]. Future studies can consider DL from another perspective, for example, continuous or online optimization.

Conflicts of Interest:
The authors declare no conflict of interest.