Architecture Optimization of a Non-Linear Autoregressive Neural Networks for Mackey-Glass Time Series Prediction Using Discrete Mycorrhiza Optimization Algorithm

Recurrent Neural Networks (RNN) are basically used for applications with time series and sequential data and are currently being used in embedded devices. However, one of their drawbacks is that RNNs have a high computational cost and require the use of a significant amount of memory space. Therefore, computer equipment with a large processing capacity and memory is required. In this article, we experiment with Nonlinear Autoregressive Neural Networks (NARNN), which are a type of RNN, and we use the Discrete Mycorrhizal Optimization Algorithm (DMOA) in the optimization of the NARNN architecture. We used the Mackey-Glass chaotic time series (MG) to test the proposed approach, and very good results were obtained. In addition, some comparisons were made with other methods that used the MG and other types of Neural Networks such as Backpropagation and ANFIS, also obtaining good results. The proposed algorithm can be applied to robots, microsystems, sensors, devices, MEMS, microfluidics, piezoelectricity, motors, biosensors, 3D printing, etc.


Introduction
Optimization is not limited to applied mathematics, engineering, medicine, economics, computer science, operations research or any other science, but has become a fundamental tool in all fields, where constantly developing new algorithms and theoretical methods have allowed it to evolve in all directions, with a particular focus on artificial intelligence, such as deep learning, machine learning, computer vision, fuzzy logic systems, and quantum computing [1,2].
Optimization has grown steadily over the past 50 years. Modern society not only lives in a highly competitive environment, but is also forced to plan for growth in a sustainable manner and be concerned about resource conservation. Therefore, it is essential to optimally plan, design, operate and manage resources and assets. The first approach is to optimize each operation separately. However, the current trend is toward an integrated approach: synthesis and design, design and control, production planning, scheduling and control [3].
Theoretically, optimization has evolved to provide general solutions to linear, nonlinear, unbounded and constrained network optimization problems. These optimization problems are called mathematical programming problems and are divided into two different categories: linear and nonlinear programming problems. Biologically derived genetic algorithms and simulated annealing are two equally powerful methods that have emerged in recent years. The development of computer technology has provided users with a variety of optimization codes with varying degrees of rigor and complexity. It is also possible to extend the capabilities of an existing method by integrating the features of two or more optimization methods to achieve more efficient optimization methodologies [4]; current optimization methods that can solve specific problems are still being developed, as we do layers.
Humans do not start their thinking from scratch every second. As we read, we un derstand each word based on our understanding of the previous words. We never star thinking from scratch every time we do; our thoughts have permanence. A traditiona ANN cannot do this, and it seems like a major shortcoming. For example, imagine tha you want to classify what kind of event is happening at each point in a movie. It is no clear how a traditional ANN could use its reasoning about earlier events in the movie t inform later events, and RNN address this problem. They are networks with loops in them which allows information to persist.
An RNN is a type of artificial neural network that uses sequential or time series da ta. These deep learning algorithms are commonly used for ordinal or temporal problems such as language translation, natural language processing (NLP) [33,34], speech recogni tion, and image captioning [35]. They are distinguished by their "memory" because the take information from previous inputs to influence the current input and output. Whil traditional deep neural networks assume that inputs and outputs are independent of each other, the output of recurrent neural networks depends on previous elements within th sequence.
NARNNs are a type of RNN with memory and feedback capabilities. The output o each point is based on the result of the dynamic synthesis of the system before the curren time. It has great advantages for modeling and simulating dynamic changes in time serie [36]. Typical NARNNs mainly consist of an input layer, a hidden layer, an output laye and an input delay function, the basic structure of which is shown in Figure 1. In Figure 1, y(t) is the output of the NARNN, 1.. 19 represents the delay order, w i the joint weight and b is the threshold of NARNNs. The model of NARNN networks can be expressed as in Equation (2), where d is the delay order and f are a nonlinear function where the future values depend only on the previous values d of the output signal.
From the equation, it can be seen that the value of y(t) is determined by the values o y(t − 1), ..., y(t − d), which indicates that based on the continuity of data development, th model uses past values to estimate the current value [37,38].
The prediction method of the NARNN model adopts the recursive prediction method. The main purpose of this prediction method is to reproduce the predicted valu one step ahead.
The future values of the time series y(t) are predicted only from the past values o this series. This type of prediction is called Nonlinear Autoregression (NAR) and can b written as Equation (2): (2 In Figure 1, y(t) is the output of the NARNN, 1..19 represents the delay order, w is the joint weight and b is the threshold of NARNNs. The model of NARNN networks can be expressed as in Equation (2), where d is the delay order and f are a nonlinear function, where the future values depend only on the previous values d of the output signal.
From the equation, it can be seen that the value of y(t) is determined by the values of y(t − 1), . . . , y(t − d), which indicates that based on the continuity of data development, the model uses past values to estimate the current value [37,38].
The prediction method of the NARNN model adopts the recursive prediction method. The main purpose of this prediction method is to reproduce the predicted value one step ahead.
The future values of the time series y(t) are predicted only from the past values of this series. This type of prediction is called Nonlinear Autoregression (NAR) and can be written as Equation (2): This model can be used to predict financial instruments, but it does not use additional sequences [39].
Predicting a sequence of values in a time series is also known as multi-pass forecasting. Closed-loop networks can perform multi-step forecasting. When external feedback is missing, closed-loop networks can still make predictions using internal feedback. In NARNN prediction, the future values of a time series are predicted only from the past values of that series.
The current literature provides a history of very extensive research on the use of NARNNs in the following areas: • The use of NARNN in medical devices such as continuous glucose monitors and drug delivery pumps that are often combined with closed-loop systems to treat chronic diseases, for error detection and correction due to their predictive capabilities [42]. • The use of NARNNs as Chinese e-commerce sales forecasting to develop purchasing and inventory strategies for EC companies [43], to support management decisions [44], the effects of air pollution on respiratory morbidity and mortality [45], the relationship between time series in the economy [46], to model and forecast the prevalence of COVID-19 in Egypt. [47], etc.

Discrete Mycorrhiza Optimization Algorithm
Most of the world's plant species are associated with mycorrhizal fungi in nature; this association involves the interaction of fungal hyphae on plant roots. Hyphae extend from the roots into the soil, where they absorb nutrients and transport them through the mycelium to the colonized roots [48]. Some hyphae connect host plants in what is known as a Mycorrhizal Network (MN). The MN is subway and is difficult to understand. As a result, plant and ecosystem ecologists have largely overlooked the role of MNs in plant community and ecosystem dynamics [49]. Micromachines 2023, 14, x 4 o This model can be used to predict financial instruments, but it does not use additio sequences [39].
Looking at Figure 2, NARNN represents the entire neural network. Figure 3 "U rolled" represents the individual layers, or time steps, of the NARNN network. Each la corresponds to a single piece of data [40,41].
Predicting a sequence of values in a time series is also known as multi-pass fo casting. Closed-loop networks can perform multi-step forecasting. When external fe back is missing, closed-loop networks can still make predictions using internal feedba In NARNN prediction, the future values of a time series are predicted only from the p values of that series.
The current literature provides a history of very extensive research on the use NARNNs in the following areas: • The use of NARNN in medical devices such as continuous glucose monitors and d delivery pumps that are often combined with closed-loop systems to treat chro diseases, for error detection and correction due to their predictive capabilities [42] • The use of NARNNs as Chinese e-commerce sales forecasting to develop purchas and inventory strategies for EC companies [43], to support management decisi [44], the effects of air pollution on respiratory morbidity and mortality [45], the r tionship between time series in the economy [46], to model and forecast the pre lence of COVID-19 in Egypt. [47], etc.

Discrete Mycorrhiza Optimization Algorithm
Most of the world's plant species are associated with mycorrhizal fungi in natu this association involves the interaction of fungal hyphae on plant roots. Hyphae exte from the roots into the soil, where they absorb nutrients and transport them through mycelium to the colonized roots [48]. Some hyphae connect host plants in what is kno as a Mycorrhizal Network (MN). The MN is subway and is difficult to understand. A result, plant and ecosystem ecologists have largely overlooked the role of MNs in pl community and ecosystem dynamics [49]. Predicting a sequence of values in a time series is also known as multi-pass forecasting. Closed-loop networks can perform multi-step forecasting. When external feedback is missing, closed-loop networks can still make predictions using internal feedback. In NARNN prediction, the future values of a time series are predicted only from the past values of that series.
The current literature provides a history of very extensive research on the use of NARNNs in the following areas:

•
The use of NARNN in medical devices such as continuous glucose monitors and drug delivery pumps that are often combined with closed-loop systems to treat chronic diseases, for error detection and correction due to their predictive capabilities [42].

•
The use of NARNNs as Chinese e-commerce sales forecasting to develop purchasing and inventory strategies for EC companies [43], to support management decisions [44], the effects of air pollution on respiratory morbidity and mortality [45], the relationship between time series in the economy [46], to model and forecast the prevalence of COVID-19 in Egypt. [47], etc.

Discrete Mycorrhiza Optimization Algorithm
Most of the world's plant species are associated with mycorrhizal fungi in nature; this association involves the interaction of fungal hyphae on plant roots. Hyphae extend from the roots into the soil, where they absorb nutrients and transport them through the mycelium to the colonized roots [48]. Some hyphae connect host plants in what is known as a Mycorrhizal Network (MN). The MN is subway and is difficult to understand. As a result, plant and ecosystem ecologists have largely overlooked the role of MNs in plant community and ecosystem dynamics [49].
It is clear that most MN are present and provide nutrition to many plant species. This has important implications for plant competition for soil nutrients, seedling formation, plant succession and plant community and ecosystem dynamics [50].
Plant mycorrhizal associations have large-scale consequences throughout the ecosystem [51,52]. The origins of plant-fungal symbiosis are ancient and have been proposed as a mechanism to facilitate soil colonization by plants 400 Mya [53,54]. Mycorrhizal symbiosis is a many-to-many relationship: plants tend to form symbioses with a diverse set of fungal species and, similarly, fungal species tend to be able to colonize plants of different species [55].
In Figure 4 we can see that through the MN resources such as carbon (CO 2 ) from plants to fungi and water, phosphorus, nitrogen and other nutrients from fungi to plants are exchanged, in addition to an exchange of information through chemical signals when the habitat feels threatened by fire, floods, pests, or predators. It should be noted that this exchange of resources can be between plants of the same species or of different species. Figure 5 shows the symbiosis between plants and the fungal network and how the carbon in the form of sugars flows from the plants to the MN and how the MN fixes the nutrients in the roots of the plants.
It is clear that most MN are present and provide nutrition to many plant species. This has important implications for plant competition for soil nutrients, seedling formation, plant succession and plant community and ecosystem dynamics [50].
Plant mycorrhizal associations have large-scale consequences throughout the ecosystem [51,52]. The origins of plant-fungal symbiosis are ancient and have been proposed as a mechanism to facilitate soil colonization by plants 400 Mya [53,54]. Mycorrhizal symbiosis is a many-to-many relationship: plants tend to form symbioses with a diverse set of fungal species and, similarly, fungal species tend to be able to colonize plants of different species [55].
In Figure 4 we can see that through the MN resources such as carbon (CO2) from plants to fungi and water, phosphorus, nitrogen and other nutrients from fungi to plants are exchanged, in addition to an exchange of information through chemical signals when the habitat feels threatened by fire, floods, pests, or predators. It should be noted that this exchange of resources can be between plants of the same species or of different species. Figure 5 shows the symbiosis between plants and the fungal network and how the carbon in the form of sugars flows from the plants to the MN and how the MN fixes the nutrients in the roots of the plants.  The Nobel optimization algorithm DMOA is inspired by the nature of the Mycorrhiza Network (MN) and plant roots with this intimate interaction between these two organisms (plant roots and the network of MN fungi), a symbiosis is generated and it has been discovered that in this relationship [56-60]: 1.
There is a communication between plants, which may or may not be of the same species, through a network of fungi (MN).

2.
There is an exchange of resources between plants through the fungal network (MN).

3.
There is a defensive behavior against predators that can be insects or animals, for the survival of the whole habitat (plants and fungi).

4.
The colonization of a forest through a fungal network (MN) thrives much more than a forest where there is no exchange of information and resources.
Micromachines 2023, 14, x The Nobel optimization algorithm DMOA is inspired by the nature of t Network (MN) and plant roots with this intimate interaction between the isms (plant roots and the network of MN fungi), a symbiosis is generated a discovered that in this relationship [56-60]: 1. There is a communication between plants, which may or may not b species, through a network of fungi (MN). 2. There is an exchange of resources between plants through the fungal n 3. There is a defensive behavior against predators that can be insects or a survival of the whole habitat (plants and fungi). 4. The colonization of a forest through a fungal network (MN) thrives m a forest where there is no exchange of information and resources.
The launch and publication of the DMOA algorithm has just been carr [61]. Figure 6 describes the flowchart of the DMOA algorithm: we initialize such as dimensions, epochs, number of iterations, etc., and we also initializ ulations of plants and mycorrhizae; with these populations we find the plants and mycorrhizae, while with these results we use the biological oper operator is represented by the Lotka-Volterra System of Discrete Equation operative Model [62], whose result has inference on the other two models The launch and publication of the DMOA algorithm has just been carried out in 2022 [61]. Figure 6 describes the flowchart of the DMOA algorithm: we initialize the parameters such as dimensions, epochs, number of iterations, etc., and we also initialize the two populations of plants and mycorrhizae; with these populations we find the best fitness of plants and mycorrhizae, while with these results we use the biological operators. The first operator is represented by the Lotka-Volterra System of Discrete Equations (LVSDE) Cooperative Model [62], whose result has inference on the other two models represented by LVSDE, Defense and Competitive [63,64], and in this frequency we evaluate the fitness to determine if it is better than the previous one and we update the same as the populations, if not we continue with the next iteration and continue the calculation with the biological operators. If the stop condition is fulfilled we obtain the last solution before evaluation and the algorithm ends.

Proposed Method
The proposed method is to use the Discrete Mycorrhiza Optimization Algo (DMOA) to optimize the architecture of the Nonlinear Autoregressive Neural Ne (NARNN), and as input data we use the Mackey-Glass chaotic time series. In Fig  and 8 we can find the DMOA-NARNN flowchart and DMOA-NARNN pseudoco spectively. The DMOA algorithm is explained in Figure 6 in the previous section, flowchart we include the optimization of the NARNN, evaluating its results by me the RMSE, until we manage to find the minimum error of that architecture throu iterations and the populations of the DMOA algorithm (Algorithm 1).

Proposed Method
The proposed method is to use the Discrete Mycorrhiza Optimization Algorithm (DMOA) to optimize the architecture of the Nonlinear Autoregressive Neural Network (NARNN), and as input data we use the Mackey-Glass chaotic time series. In Figure 7 and Algorithm 1 we can find the DMOA-NARNN flowchart and DMOA-NARNN pseudocode, respectively. The DMOA algorithm is explained in Figure 6 in the previous section, in this flowchart we include the optimization of the NARNN, evaluating its results by means of the RMSE, until we manage to find the minimum error of that architecture through the iterations and the populations of the DMOA algorithm (Algorithm 1).
Initialize a population of n plants and mycorrhiza with random solutions 4.
Find the best solution fit in the initial population 5.
for i = 1:n (for n plants and Mycorrhiza population) 7.
Evaluate new solutions.
Error minor?

35.
Find the current best NARNN-Architecture solution.

end while
Difference equations often describe the evolution of a particular phenomenon over time. For example, if a given population has discrete generations, the size of (n + 1) 1st generation x(n + 1) is a function of the nth generation x(n). This relationship is expressed by Equation (3): We can look at this issue from another perspective. You can generate a sequence from the point x 0 , Equation (4): f(x 0 ) is called the first iterate of x 0 under f.   Discrete models driven by difference equations are more suitable than continuous models when reproductive generations last only one breeding season (no overlapping generations) [65,66].
An example would be a population that reproduces seasonally, that is, once a year. If we wanted to determine how the population size changes over many years, we could collect data to estimate the population size at the same time each year (say, shortly after the breeding season ends). We know that between the times at which we estimate population size, some individuals will die and that during the breeding season many new individuals will be born, but we ignore changes in population size from day to day, or week to week, and look only at how population size changes from year to year. Thus, when we build a mathematical model of this population, it is reasonable that the model only predicts the population size for each year shortly after the breeding season. In this case, the underlying variable, time, is represented in the mathematical model as increasing in discrete one-year increments.
Discrete Equations (5) and (6) Cooperative Model (Resource-Exchange), for both species, where parameters a, b, d, e, g, and h are positive constants, x i and y i represent the initial conditions of the population for both species and are positive real numbers [72].
The biological operators are represented by LVSDE, the mathematical description of the Discrete Equations (7) and (8) Defense Model (Predator-Prey), where the parameters a, b, d and g are positive constants, x i and y i represent the initial population conditions for both species and are positive real numbers [73,74].
Discrete Equations (9) and (10) Competitive Model (Colonization), for two species, where the parameters a, b, d, e, g, and h are positive constants, x i and y i are the populations for each of the species respectively and are positive real numbers. Each of the parameters of the above equations is described in Table 1, [74]. The metric for measuring error is RMSE (Root Mean Square Error) or root mean square deviation, which is one of the most commonly used measures for evaluating the quality of predictions. It shows how far predictions fall from measured true values using Euclidean distance Equation (11), where n is the number of data points, y i is the ith measurement and y i is the expected prediction [80,81].

Mackey-Glass
Chaotic and random time series are both disordered and unpredictable. In extreme cases, the data are so mixed up that those consecutive values seem unrelated to each other. Such disorder would normally eliminate the ability to predict future values from past data.
The Mackey-Glass chaotic time series Equation (12) is a nonlinear differential equation of time delay, and this equation is widely used in the modeling of natural phenomena to make comparisons between different forecasting techniques and regression models [82][83][84], where a = 0.1, b = 0.2, and τ = 17 are real numbers, t is the time, and with this setting the series produces chaotic behavior, and we can compare the forecasting performance of DMOA-NARNN with other models in the literature. .

Results
This section shows the results of the experiments performed in the research involving the Non-Optimized and Optimized results of the method. Table 2   In Figures 8-13, the y axes represent the input values (Validation-Training) and output values of the samples (Prediction-Error), the x axis represents the number of samples in time, Name is the name of the experiment, Samples is the total number of samples in the experiment, Training is the number of samples for training, Error is the minimum error obtained in the experiment, and HL represents the number of neurons in the hidden layers. Figure 8 shows the behavior of the data for 1000 samples of the NARNN403, obtaining an RMSE of 0.2307, with the reference data at the top of the figure. Figures 9 and 10 show the data behavior for 1000 samples of the NARNN404 and NARNN405, obtaining an RMSE of 0.167 and 0.2488, respectively, with the reference data at the top of each figure. Table 3                   show the data behavior for 700, 700 and 1000 samples of the NARNN053, NARNN302 and NARNN303, obtaining an RMSE of 0.0044, 0.0023 and 0.0033, respectively, with the reference data at the top of each figure.
As for the complexity of the DMOA algorithm, it is a linear order algorithm that uses the discrete equations of Lotka-Volterra Equations (5)- (10), and in the search to find the global minimum it performs iterations and in each cycle it compares the best previous local minimum with the lowest current minimum and updates the value in the case that this is the case. As for the times, Table 3 shows the times Tt which represents the total time (seconds) of the experiment and T (seconds) the time in which the DMOA algorithm found the lowest local minimum; in terms of its efficiency the algorithm took 1235 s, about 21 min, to find the lowest minimum 0.0023, which seems to us a short time compared to the times used by the method [22] of up to 3 h and a half, the method [21], its experiments took up to 81 h to find the lowest minimum and as for the method [23] it does not provide the times of its experiments. Table 4 shows 30 experiments with eight non-optimized NARNNN architectures. Each column represents the total number of samples and the number of training samples used for each architecture (700 × 300), and at the end of the table we can find the results of the total sum, mean and standard deviation for each column.  Table 5 shows 30 experiments with eight optimized NARNNN architectures; each column represents the total number of samples and the number of training samples used by each architecture (700 × 300), and at the end of the table we can find the results of the total sum, mean and standard deviation for each column.

Hypothesis Test
Equation (13) represents Hypothesis Testing, Null Hypothesis Equation (14) and Alternative Hypothesis Equation (15), with which comparisons were made between the non-optimized and optimized experiments of the method proposed here.
where x 1 is the Mean of sample 1, x 2 Mean of sample 2, σ 1 Standard Deviation of sample 1, σ 2 Standard Deviation of sample 2, n 1 Number of sample data 1, n 2 Number of sample data 2, µ 1 − µ 2 = D 0 and µ 1 − µ 2 = D 0 . Significance Level α = 0.05, Confidence Level = 95%, Confidence Level = 1 − α; 1−0.05 = 0.95 o 95%, Since the p-value is less than 0.01, the null hypothesis is rejected. Tables 6 and 7 show the results of the hypothesis testing done on the non-optimized and optimized methods shown above; of the eight different architectures, the test results show that in only six were the optimized NARNNs better, and the non-optimized NARNNs were better in two.  In Table 6, N and Name represent the number and name of the experiment, respectively. Error is the minimum error found, HL are the Hidden Layers of neural network (1,2,3), and N is the number of neurons in each HL. In Table 7, the samples are represented by Total number of samples, T is the training samples, V is the validation samples, P represents the prediction, and p-value represents the results of the hypothesis test. Table 8 shows the comparison with other methods that performed experimentation with the chaotic Mackey-Glass time series, and it can be seen from the table that the lowest error belongs to the optimized NARNN-302.  Table 8, case number 1, the method is the Optimization of the Fuzzy Integrators in Ensembles of ANFIS Model for Time Series Prediction [21], where the authors use the Mackey-Glass chaotic time series, with genetic optimization of Type-1 Fuzzy Logic System (T1FLS) and Interval Type-2 Fuzzy Logic System (IT2FLS) integrators in Ensemble of ANFIS models and evaluate the results through Root Mean Square Error (RMSE). ANFIS is a hybrid model of a neural network implementation of a TSK (Takagi-Sugeno-Kang) fuzzy inference system. ANFIS applies a hybrid algorithm which integrates BP (Backpropagation) and LSE (least square estimation) algorithms, and thus it has a fast learning speed.

Comparisone with Other Methods
Case number 2 refers to the method using Particle Swarm Optimization of ensemble neural networks with fuzzy aggregation for time series prediction of the Mexican Stock Exchange [22]. In this case, the authors propose an ensemble neural network model with type-2 fuzzy logic for the integration of responses; in addition, the particle swarm optimization method determines the number of modules of the ensemble neural network, the number of layers and number of neurons per layer, and thus the best architecture of the ensemble neural network is obtained. Once this architecture is obtained, the results of the modules with type-1 and type-2 fuzzy logic systems are added, the inputs to the fuzzy system are the responses according to the number of modules of the network, and this is the number of inputs of the fuzzy system.
Case number 3 refers to the Application of Interval Type-2 Fuzzy Neural Networks (IT2FNN) in non-linear identification and time series prediction (MG) [23]. The authors propose IT2FNN models that combine the uncertainty management advantage of type-2 fuzzy sets with the learning capabilities of neural networks. One of the main ideas of this approach is that the proposed IT2FNN architectures can obtain similar or better outputs than type-2 interval fuzzy systems using the Karnik and Mendel (KM) algorithm, but with lower computational cost, which is one of the main disadvantages of KM mentioned in many papers in the literature. Cases 4 and 5 have already been explained earlier in this article.
By making a brief description of the techniques of the different methods above, we can observe the complexity of their designs using optimization algorithms such as PSO and GAs as optimizers, robust Ensemble Neural Networks, T1FLS and IT2FLS, in comparison with our method that uses the optimization algorithm DMOA and NARNNN, which are neural networks with short memory, and according to the results are made precisely for the prediction of time series. In a future work we plan to perform experiments with the RNN LSTM networks, which have short-and long-term memories.

Discussion of Results
The use of metaheuristics in the optimization of methods is a constant in all research work in artificial intelligence, and in this work the DMOA algorithm was used to optimize the architecture of the NARNN neural network using the MG chaotic series as input data. We also performed experiments without optimizing the NARNN network, while with the optimization we performed experiments with 39 different architectures, and without optimization we performed experiments with 10 different architectures. When we performed the optimization we found an extremely fast algorithm that found the right architecture with very satisfactory results. Of the 39 different optimized architectures, the one that gave us the best results was number 31 (narAll303) Table 3, a NARNN network with three hidden layers of 6, 7, and 5 neurons, respectively. With this architecture we performed 3000 experiments with a total time of 5353 s and in the second 1235 we obtained the best result of 0.0023 (error). Of the 10 experiments without optimization, with architecture number 4 (narAll404) Table 2, a NARNN network with two hidden layers of 9 and 2 neurons, respectively, we also performed 3000 experiments with this architecture and obtained the best result of 0.1670 (error). We performed eight hypothesis tests under equal conditions with these results and found that in five tests the NARNN architectures optimized with the DMOA algorithm were better and in three tests the non-optimized architectures were better, as shown in Table 6. We also performed error comparisons with three other different methods of which the DMOA-NARNN was better, as shown in Table 8.

Conclusions
A total of 49 different architectures were designed, of which 10 non-optimized and 39 were optimized by the DMOA algorithm, 30,000 experiments were performed with the non-optimized architectures, and approximately 110,000 experiments were performed with the optimized architectures. A total of 700, 1000 and 1500 samples were generated with the MG chaotic time series, of which between 300 and 1000 were used for training, between 300 and 900 were used for validation in different combinations, and between 300 and 900 points were generated as prediction points, as can be seen in Tables 2 and 3. The design of the NARNN architectures were two and three hidden layers, with neurons in the range of 2-9, and the graphs of the most representative results of the non-optimized and optimized NARNNs are presented in Figures 8-13.
The optimization of the NARNN network with the DMOA algorithm obtained good results, better than without optimizing the network, and better than the other methods with which it was compared, although not all of the optimized architectures were better in the hypothesis test (only five of them were), the results of the error were much better, as can be seen in Table 7. In the comparison with other methods, the results were also better, as demonstrated in Table 8. We were also able to verify that the DMOA optimization algorithm is fast and efficient, which was really the reason for this research. We wish to continue investigating the efficiency of the algorithm in the optimization of architectures with other types of neural networks, also in Fuzzy Logic Systems Type-1 and Type-2, and also to do the same with the optimization algorithm CMOA (Continuous Mycorrhiza Optimization Algorithm). In addition, the proposed algorithm can be applied to robots, microsystems, sensors, devices, MEMS, microfluidics, piezoelectricity, motors, biosensors, 3D printing, etc.
We also intend to conduct further research and experimentation with the DMOA method and other time series. We will also consider the DMOA and the LSTM (Long Short-Term Memory) Neural Regression Network for Mackey-Glass time series, weather and financial forecasting, and we are interested in hybridizing the method with Interval Type-2 Fuzzy Logic System (IT2FLS), and Generalized Type-2 Fuzzy Logic System (GT2FLS).