Using Artiﬁcial Neural Network with Prey Predator Algorithm for Prediction of the COVID-19: The Case of Brazil and Mexico

: The spread of the COVID-19 epidemic worldwide has led to investigations in various aspects, including the estimation of expected cases. As it helps in identifying the need to deal with cases caused by the pandemic. In this study, we have used artiﬁcial neural networks (ANNs) to predict the number of cases of COVID-19 in Brazil and Mexico in the upcoming days. Prey predator algorithm (PPA), as a type of metaheuristic algorithm, is used to train the models. The proposed ANN models’ performance has been analyzed by the root mean squared error (RMSE) function and correlation coefﬁcient (R). It is demonstrated that the ANN models have the highest performance in predicting the number of infections (active cases), recoveries, and deaths in Brazil and Mexico. The simulation results of the ANN models show very well predicted values. Percentages of the ANN’s prediction errors with metaheuristic algorithms are signiﬁcantly lower than traditional monolithic neural networks. The study shows the expected numbers of infections, recoveries, and deaths that Brazil and Mexico will reach daily at the beginning of 2021. tion with a prey predator algorithm (PPA). The two proposed model are (PPA-BMLPNN and PPA-MMLPNN). These models are employed as an artiﬁcial inelegance forecasting technique for COVID-19 in Brazil and Mexico. PPA was used to determine the optimal parameter values of the ANN models. Note that the parameters for MLPNNs are the input weights and output weights. The proposed models have high performance in predicting the number of infected (active), total deaths, and recovered cases in Brazil and Mexico. The results could also be of interest to countries similar in population and case numbers to Brazil and Mexico. The results show that


Introduction
COVID-19 pandemic was initially reported in China and quickly affected the entire world, leading to a public health emergency of international concern by the World Health Organization [1][2][3]. In the past two decades, COVID-19 is the third outbreak of corona virus-induced respiratory disease after the severe acute respiratory syndrome (SARS) and the Middle East respiratory syndrome (MERS) [4,5]. Since its first appearance in Wuhan, China, COVID-19 caused thousands of deaths and overwhelmed the health care system worldwide. In fighting against the COVID-19 pandemic, timely and effective screening and identification of the infected individuals is of utmost importance to spread the virus to other individuals if not identified and isolated [6].
COVID-19 is usually transmitted from human to human due to unrestricted movement [7]. In Brazil [8], the first two cases were reported on 25 and 29 February 2020. Until 4 March 2020, the total number of cases was only 3. However, from 5 March 2020, onwards, the number of COVID-19 infected patients increased dramatically, and by 21 March, it has spread to all parts of Brazil. In Mexico [9], the first cases of COVID-19 infection were reported on 28 February 2020.
Several researchers from different fields, including mathematics, chemistry, statistics, computer science, and health care, have been analyzing COVID-19 for its behavior and finding a solution for its cure. However, all efforts have gone in vain so far. Further, COVID-19 affected people have been showing a continuous change in the symptoms. Patients show cough, fever, and fatigue at the early stage, followed by pneumonia [8] and severe acute respiratory syndrome [9] in the later stages.
In order to address this problem by the artificial intelligence community, data is the core requirement for constructing a framework [10]. Due to the limited availability of data about COVID-19 infected patients, it is hard to design sufficient identification and analysis models. Very few details are available about the COVID-19 tested positive patients because people hesitate to go to the doctor to avoid any possible quarantine. Besides, some infected people do not show any symptoms [11] of COVID-19; that is why it is impossible to identify the actual number of positive cases in huge populations. The more time it takes to diagnose an infected person, the more he/she becomes risky for others. Therefore, the availability of large amounts of data is crucial to the analysis and behavior prediction of COVID-19 disease. Several articles exploring the effects of COVID-19 have been published from a computational, mathematical, and statistical perspective. Among many mathematical models to study the dynamics of the COVID-19 virus, Susceptible-Infectious-Recovered (SIR) is a widely used model that estimates an epidemic's growth by exploiting a timedependent system differential equation. Ebola and AIDS have been explored and analyzed using different variants of SIR models [12,13]. In this connection, a generalized SEIR epidemiological model [14] is adapted to estimate the growth of SARS-CoV-2 in Italy.
A stochastic approach is ensured using Particle Swarm Optimization (PSO) as a solver to fit the model parameters. Using the same SEIR model, Berger et al. [15] have explored the effects of testing and quarantine policies in the United States. The authors showed that increasing the number of conducted tests and selective quarantine can reduce the coronavirus's negative impact on the economy and lessen hospitals' burden. Roda et al. [16] have recently demonstrated that the SIR model outperforms the SEIR model using confirmed positive cases as input data. Model selection is performed using the Akaike Information Criterion. In another effort, Weissman et al. [17] proposed COVID-19 Hospital Impact Model for Epidemics (CHIME) SIR model to predict when the hospital capacity would be exhausted, and ventilators would be outnumbered focusing on three hospitals in the Philadelphia region. CHIME estimated the time when the current resources would fail to deal with the surge caused by COVID-19 patients.
The real number of COVID-19 data represents a series of observations arranged in time. Methods used for time-series prediction are native to the statistics field, such as Machine learning-based methods, Meta-predictors, and Structure-based methods [18][19][20]. Artificial neural networks (ANNs) are widely used for time series predictions [21]. One of the main advantages of ANN techniques is that it can be fueled with raw data that can automatically find the required representation [22]. Based on several factors like performance, accuracy, latency, speed, convergence, and size, ANN provides reliable results. Brazil and Mexico are among the countries with the largest number of infections globally, estimated in millions. Additionally, the number of infections in Brazil and Mexico is believed to be greater than the declared ones due to spread of the disease in poor communities. This work is based on ANNs for the prediction of a time series problem about the investigation of the COVID-19 in Brazil and Mexico [23,24]. Moreover, we used prey predator algorithm (PPA) to improve the ANN model's performance by specifying the ANN parameters' optimum value [25,26]. Note that there are many analyses of time series based on time-averaged observables and the time series dynamics [27]. The rest of the article is organized as follows. In Section 2, we present the established structures of our ANN models followed by the description of PPA algorithm. In Section 3, we present the results and analyze them. In Section 4, we draw conclusions.

Structure
ANN played a critical role in addressing many modern world problems [28][29][30]. An ANN consists of an input layer, one or more hidden layers, and an output layer. The processing elements or neurons in ANN produce output based on their respective predefined activation functions.
Multilayer Perceptron Neural Network (MLPNN) is a type of feed-forward ANN employed in our current work. We constructed an MLPNN of one hidden layer with ten neurons where sigmoid is used as an activation function in hidden neurons (see Equation (1)). Note that neurons in one layer are connected to other neurons in another layer through connection weights. This study has the input weights that link the input layer and hidden layer and the output weights that link the hidden layers with the output layer. Moreover, we have used a hyperbolic tangent transfer function in the output neurons, expressed in Equation (2). Note that Equation (2) gives output in the range from −1 to +1.
where x k is the input value at input neuron i, w ki is the weight connecting node k from the input layer with node i from the hidden layer, and y i represents the value of the hidden neuron i.ŷ where w * ij is the output weight between the hidden neuron i and the output neuron j.ŷ j is the output value of output neuron j.
There are two stages for proposing an ANN model. In the first stage, we need to define the MLPNN architecture, where it depends on the data to be represented. According to the data sets used in this work (Brazil data set and Mexico data set), the MLPNN architectures of the two models we have used are the same (ANN brazil model and Mexico ANN model). Therefore, we have used one input neuron, ten hidden neurons, and three output neurons for both models. As a result, we proposed two ANN models, one for Brazil and the second one for Mexico. In each model, we have three outputs, namely the number of infections, recoveries, and deaths. Therefore, we have three neurons in the output layer for the prediction task. Figure 1 shows the structure of the MLPNN that we have used in this work.
In the second stage is the training process of ANNs, where it compares the actual outputs and the desired outputs. Therefore, the aim of the training is to obtain the least possible differences between the tangible outputs of the data set (training data) and the corresponding values of the ANN model. So, to build an ANN model, ANN is passed through the training phase. The difference between the two is learned through PPA, and weights are adjusted accordingly to minimize the difference (RMSE). Note that the values of the parameters of an MLPNN depend on the following: (1) The optimization algorithm is employed to determine the optimal parameters. In this study, we have used PPA to determine the optimal parameters (Input weights and output weights). (2) The data set (training data) that is using through the training procedure. This study has used two different data sets (Brazil data set and the Mexico data set). According to the two different data sets, the optimal parameters (Input weights and output weights) for the Brazil MLPNN (BMLPNN) model are not the same as the optimal parameters of Mexico MLPNN (MMPLNN). In the second stage is the training process of ANNs, where it compares the actu outputs and the desired outputs. Therefore, the aim of the training is to obtain the lea possible differences between the tangible outputs of the data set (training data) and th corresponding values of the ANN model. So, to build an ANN model, ANN is passe through the training phase. The difference between the two is learned through PPA, an weights are adjusted accordingly to minimize the difference (RMSE). Note that the valu of the parameters of an MLPNN depend on the following: (1) The optimization algorithm is employed to determine the optimal parameters. In th study, we have used PPA to determine the optimal parameters (Input weights an output weights). (2) The data set (training data) that is using through the training procedure. This stud has used two different data sets (Brazil data set and the Mexico data set). Accordin to the two different data sets, the optimal parameters (Input weights and outp weights) for the Brazil MLPNN (BMLPNN) model are not the same as the optim parameters of Mexico MLPNN (MMPLNN).
The training procedure of a PPA-MLPNN is highlighted in Figure 2. The perfo mance assessment and performance enhancements are achieved using root mean square error (RMSE) and correlation coefficient (R). Equations (3) and (4) can be used to compu RMSE and R. Note that, The training procedure of a PPA-MLPNN is highlighted in Figure 2. The performance assessment and performance enhancements are achieved using root mean squared error (RMSE) and correlation coefficient (R). Equations (3) and (4) can be used to compute RMSE and R. Note that,

Prey Predator Algorithm (PPA)
PPA is considered one of the most effective optimization techniques for finding an optimal solution to an optimization problem [25,31]. It solves the issues of continuous optimization, combinatorial optimization, and constraint optimization. Moreover, it can deal with highly nonlinear, and multi-modal optimization problems naturally and efficiently [26,32,33]. MLPNN is an optimization problem, where we used PPA to determine the best MLPNN models by determining the optimal values of the models weights. As mentioned before, the values of the weights are adjusted accordingly to minimize the RMSE.

Prey Predator Algorithm (PPA)
PPA is considered one of the most effective optimization techniques for finding an optimal solution to an optimization problem [25,31]. It solves the issues of continuous optimization, combinatorial optimization, and constraint optimization. Moreover, it can deal with highly nonlinear, and multi-modal optimization problems naturally and efficiently [26,32,33]. MLPNN is an optimization problem, where we used PPA to determine the best MLPNN models by determining the optimal values of the models weights. As mentioned before, the values of the weights are adjusted accordingly to minimize the RMSE.
The interaction of animals inspires the development of PPA in a prey predator relationship [26]. PPA's basic idea is how a predator hunts its prey where a prey tries to escape and hide. The solutions in the search space of PPA are termed as prey and predator.
In the current article, PPA is employed to optimize the survival value of a predator identifying the smallest performance value using the RMSE function. Among the target preys, the best prey is the one that successfully escapes and finds a hideout [25]. Such prey is called the best prey in the search space and is represented by the highest survival value. In each iteration, the predator hunts for the weaker prey, which tries to follow other prey populations to escape the predator attack and find a safe hiding place. The predator's movement towards prey can be determined by the direction in which a predator or prey moves and by the distance to be covered in that particular direction. Equation (5) shows the direction of a solution in the search space: where different values of parameter v determine the jump size for the solution x i .
A direction is considered optimum if moving along increases the survival value of a solution. An issue with updating a solution is the step length while exploring minimum step length λ min and maximum step length λ max where λ min < λ max . Equations (6)-(9), as discussed in [25,26], can describe predator and prey movement. The movement of prey is given in Equation (6) after the follow-up probability is met.
Alternatively, the movement of prey is determined by Equation (7).
The best prey and predator's movements are, respectively, given by Equations (8) and (9).

COVID-19 Data Set
In this work, we have proposed two ANN models to predict the number of COVID-19 infected (active cases), recoveries, and total deaths in Brazil and Mexico. Using these models aids in making decisions and building overall strategies to reduce human losses in these countries. In addition to the possibility of circulating the forms to countries similar to Brazil and Mexico in terms of the number of cases. We offer a quantitative overview of COVID-19 status in the said countries where the covered durations with starting and ending dates are given in Table 1.

Results and Discussion
We used ANN models with one input neuron, ten hidden neurons, and three output neurons in the current work. Note that the input neuron accepts the "requested data" whereas the three output neurons predict the infected (active cases), recovered, and total deaths. PPA is used to determine the two models' optimal parameters, one for Brazil and the other for Mexico. Moreover, the ANN is trained via PPA for 20 trials with 1000 iterations where each trial covers 50 populations. Moreover, the number of predators in PPA is 8 with local search directions of 1, the number of best preys 4, and then the best values are reported. The RMSE values we obtained were less than 0.05 in all our optimal models.
To propose a prediction model, dataset called training data, and another dataset called testing data are taken to compare the real results with the expected results. Figure 3 highlights the training data (in Red) and ANN-based predicted data (in Blue) of COVID-19 infected people in Brazil, where the training data covers the duration from 13 March 2020, to 2 August 2020. In contrast, ANN-based common infections cover 13 March 2020, to 17 December 2020. The results showed that the number of infected cases would increase at a rate of 9 × 10 5 daily by the end of this year. called testing data are taken to compare the real results with the expected results. Figure  3 highlights the training data (in Red) and ANN-based predicted data (in Blue) of COVID-19 infected people in Brazil, where the training data covers the duration from 13 March 2020, to 2 August 2020. In contrast, ANN-based common infections cover 13 March 2020, to 17 December 2020. The results showed that the number of infected cases would increase at a rate of 9 × 10 5 daily by the end of this year.  On the other hand, Figure 5 shows that the number of death cases would increase by the end of the current year, at 6.5 × 10 5 daily rate. Note that the training data (deaths data) covers the duration from 10 April 2020, to 2 August 2020, whereas ANN-based expected deaths cover the period from 10 April 2020, to 15 December 2020.   Figure 5 shows that the number of death cases would increase by the end of the current year, at 6.5 × 10 5 daily rate. Note that the training data (deaths data) covers the duration from 10 April 2020, to 2 August 2020, whereas ANN-based expected deaths cover the period from 10 April 2020, to 15 December 2020.    Moreover, we have used the correlation confident between the testing data that covered the duration from 4 August 2020, to 22 November 2020, and the corresponding ANN values, as shown in Figure 6. Additionally, the nonlinear regression equations of the test ing data are illustrated in Figure 6. The nonlinear regression equations will help to under stand the relation between the observed values and expected values. The results show that the expected number of infections, recoveries, and deaths is much higher than an nounced due to the insufficient number of tests and the spread of disease among indige nous and poor communities. Moreover, we have used the correlation confident between the testing data that covered the duration from 4 August 2020, to 22 November 2020, and the corresponding ANN values, as shown in Figure 6. Additionally, the nonlinear regression equations of the testing data are illustrated in Figure 6. The nonlinear regression equations will help to understand the relation between the observed values and expected values. The results show that the expected number of infections, recoveries, and deaths is much higher than announced due to the insufficient number of tests and the spread of disease among indigenous and poor communities.        On the other hand, the expected recovery and daily deaths decrease with the end of this year, as shown in Figures 8 and 9. Note that Figure 8 shows the training data (in Red) and ANN-based predicted data (in Blue) of recovered cases from COVID-19 in Mexico where the training data covers the duration from 13 May 2020, to 3 August 2020. In contrast, the ANN-based expected number of recovered cases covers data from 13 May 2020, to 17 January 2021. On the other hand, the expected recovery and daily deaths decrease with the end of this year, as shown in Figures 8 and 9. Note that Figure 8 shows the training data (in Red) and ANN-based predicted data (in Blue) of recovered cases from COVID-19 in Mexico where the training data covers the duration from 13 May 2020, to 3 August 2020. In contrast, the ANN-based expected number of recovered cases covers data from 13 May 2020, to 17 January 2021.     Figure 9 shows the training data (in Red) and ANN-based predicted data (in Blue) of deaths from COVID-19 in Mexico, where the training data covers the duration from 7 April 2020, to 3 August 2020. In contrast, ANN-based expected number of deaths covers the period from 7 April 2020, to 12 December 2020.
We have used the correlation confident between the testing data that covered the duration from 4 August 2020, to 22 November 2020, and the corresponding ANN values, as shown in Figure 10.  Figure 9 shows the training data (in Red) and ANN-based predicted data (in Blue) of deaths from COVID-19 in Mexico, where the training data covers the duration from 7 April 2020, to 3 August 2020. In contrast, ANN-based expected number of deaths covers the period from 7 April 2020, to 12 December 2020.
We have used the correlation confident between the testing data that covered the duration from 4 August 2020, to 22 November 2020, and the corresponding ANN values, as shown in Figure 10.   The linear regression equations of the testing data are illustrated in Figure 10. The linear regression equations will help to understand the relation between the observed values and expected values. By looking at the results, we see that the ANN values for the infections, recoveries and deaths are much higher than announced cases due to the insufficient number of tests as well as the spread of disease among poor communities. For example, the correlation coefficient of the recovered number between the expected data and announced data is 0.045.

Conclusions
In this study, we have proposed artificial neural networks (ANNs) prediction models for Brazil and Mexico using a multilayer perceptron neural network (MLPNN) in conjunction with a prey predator algorithm (PPA). The two proposed model are (PPA-BMLPNN and PPA-MMLPNN). These models are employed as an artificial inelegance forecasting technique for COVID-19 in Brazil and Mexico. PPA was used to determine the optimal parameter values of the ANN models. Note that the parameters for MLPNNs are the input weights and output weights. The proposed models have high performance in predicting The linear regression equations of the testing data are illustrated in Figure 10. The linear regression equations will help to understand the relation between the observed values and expected values. By looking at the results, we see that the ANN values for the infections, recoveries and deaths are much higher than announced cases due to the insufficient number of tests as well as the spread of disease among poor communities. For example, the correlation coefficient of the recovered number between the expected data and announced data is 0.045.

Conclusions
In this study, we have proposed artificial neural networks (ANNs) prediction models for Brazil and Mexico using a multilayer perceptron neural network (MLPNN) in conjunc-tion with a prey predator algorithm (PPA). The two proposed model are (PPA-BMLPNN and PPA-MMLPNN). These models are employed as an artificial inelegance forecasting technique for COVID-19 in Brazil and Mexico. PPA was used to determine the optimal parameter values of the ANN models. Note that the parameters for MLPNNs are the input weights and output weights. The proposed models have high performance in predicting the number of infected (active), total deaths, and recovered cases in Brazil and Mexico. The results could also be of interest to countries similar in population and case numbers to Brazil and Mexico. The results show that

•
At the beginning of 2021, the average active cases of COVID-19 in Brazil will go to 9 × 10 5 , with 1.5 × 10 5 recovered cases per day, and more than 6 × 10 5 the total of deaths. • At the beginning of 2021, the average active cases of COVID-19 in Mexico will go to 1.4 × 10 5 , with less than 0.2 × 10 5 recovered per day, and less than 0.9 × 10 5 total deaths.