Optimization Method for Forecasting Confirmed Cases of COVID-19 in China.

In December 2019, a novel coronavirus, called COVID-19, was discovered in Wuhan, China, and has spread to different cities in China as well as to 24 other countries. The number of confirmed cases is increasing daily and reached 34,598 on 8 February 2020. In the current study, we present a new forecasting model to estimate and forecast the number of confirmed cases of COVID-19 in the upcoming ten days based on the previously confirmed cases recorded in China. The proposed model is an improved adaptive neuro-fuzzy inference system (ANFIS) using an enhanced flower pollination algorithm (FPA) by using the salp swarm algorithm (SSA). In general, SSA is employed to improve FPA to avoid its drawbacks (i.e., getting trapped at the local optima). The main idea of the proposed model, called FPASSA-ANFIS, is to improve the performance of ANFIS by determining the parameters of ANFIS using FPASSA. The FPASSA-ANFIS model is evaluated using the World Health Organization (WHO) official data of the outbreak of the COVID-19 to forecast the confirmed cases of the upcoming ten days. More so, the FPASSA-ANFIS model is compared to several existing models, and it showed better performance in terms of Mean Absolute Percentage Error (MAPE), Root Mean Squared Relative Error (RMSRE), Root Mean Squared Relative Error (RMSRE), coefficient of determination ( R 2 ), and computing time. Furthermore, we tested the proposed model using two different datasets of weekly influenza confirmed cases in two countries, namely the USA and China. The outcomes also showed good performances.


Introduction
A large family of viruses, called coronaviruses, are severe pathogens for human beings, which infect respiratory, hepatic, gastrointestinal, and neurologic diseases. They are distributed among humans, birds, livestock, mice, bats, and other wild animals [1][2][3]. The outbreaks of two previous coronaviruses, SARS-CoV and MERS-CoV in 2003 and 2012, respectively, have approved the transmission from animal to animal, and human to human [4]. In December 2019, the World Health Organization (WHO) received notifications from China for many cases of respiratory illness that were linked to some people who had visited a seafood market in Wuhan [5]. Currently, Wuhan city suffers from the spreading of a novel coronavirus, called COVID-19 (previously, it was called 2019-nCoV). In [6], the authors concluded that COVID-19 likely originated in bats, because it is more similar to two the infection of the SARS epidemic. They concluded that the reproduction number for two different communities, Hong Kong and Toronto, were 1.2 and 1.32, respectively. Ong et al. [20] proposed a monitoring and forecasting model for influenza A (H1N1-2009). Furthermore, Nah et al. [21] proposed a probability-based model to predict the spread of the MERS.
The Adaptive Neuro-Fuzzy Inference System (ANFIS) [22] is widely applied in time series prediction and forecasting problems, and it showed good performance in many existing applications. It offers flexibility in determining nonlinearity in the time series data, as well as combining the properties of both artificial neural networks (ANN) and fuzzy logic systems. It has been applied in various forecasting applications, for example, in [23], a stock price forecasting model was proposed using ANFIS and empirical mode decomposition. Chen et al. [24] proposed a TAIEX time series forecasting model based on a hybrid of ANFIS and ordered weighted averaging (OWA). In [25], another time series forecasting method was presented for electricity prices based on ANFIS. Svalina et al. [26] proposed an ANFIS based forecasting model for close price indices for a stock market for five days. Ekici and Aksoy [27] presented an ANFIS based building energy consumption forecasting model. More so, ANFIS is also applied to forecast electricity loads [28]. Kumar et al. [29] proposed an ANFIS based model to forecast return products. Ho and Tsai [30] applied ANFIS to forecast product development performance. However, estimating ANFIS parameters is a challenge that needs to be improved. Therefore, in previous studies, some individual swarm intelligence (SI) methods have been applied to the ANFIS parameters to enhance time series forecasting because these parameters have a significant effect on the performance of ANFIS. The SI methods include the particle swarm optimization (PSO) [31,32], social-spider optimization [33], sine-cosine algorithm (SCA) [34], and multi-verse optimizer (MVO) [35]. For example, in [34] SCA algorithm was applied to improve the ANFIS model to forecast oil consumption in three countries, namely, Canada, Germany, and Japan. In the same context, in [35], The MVO algorithm was used to enhance the ANFIS model to forecast oil consumption in two countries. In addition, in [36] the PSO was used with ANFIS to predict biochar yield. However, individual SI algorithms may stock at local optima. Therefore, one solution is to apply hybrid SI algorithms to avoid this problem. In [37], a hybrid of two SI algorithms, namely GA and SSA, was presented to improve the ANFIS model. The proposed new model called GA-SSA-ANFIS was applied to forecast crude oil prices for long-term time series data. However, the previously mentioned methods suffer from some limitations that can affect the performance of the forecasting output such as slow convergence and the ability to balance between exploration and exploitation phases can influence the quality of the final output. This motivated us to propose an alternative forecasting method dependent on the hybridization concept. This concept avoids the limitations of traditional SI techniques by combining the strengths of different techniques, and this produces new SI techniques that are better than traditional ones.
In the current study, we propose an improved ANFIS model based on a modified flower pollination algorithm (FPA) using the salp swarm algorithm (SSA). The FPA is an optimization algorithm proposed by Yang [38], which was inspired by the flow pollination process of the flowering plants. The FPA was employed in various optimization applications, for example to estimate solar PV parameter [39,40], solving sudoku puzzles [41], feature selection [42], antenna design [43], and other applications [44][45][46][47]. Moreover, SSA is also an optimization algorithm proposed by Mirjalili et al. [48] inspired by the behavior of salp chains. In recent years, the SSA was utilized to solve different optimization problems, such as feature selection [49,50], data classification [51], image segmentation [52], and others [53,54].
The proposed method called FPASSA is a hybrid of FPA and SSA, in which the SSA is applied as a local search method for FPA. The proposed FPASSA starts by receiving the historical COVID-19 dataset. Then a set of solutions is generated where each of them represents the value for the parameters of the ANFIS model. Then the quality of each solution is calculated using the fitness value, and the solution that has the best fitness value is chosen to represent the best solution. Then the probability of each solution is computed. Then the current solution will be updated, either using global or local strategy in FPA. However, in the case of local strategy, the operators of SSA or FPA will be used according to the probability of the fitness value for each solution. The process of updating the solutions is repeated until reaching the stop condition, and the best parameter configurations are used to forecast the number of confirmed cases of COVID-19.
The main contribution points of the current study are as follows: 1.
We propose an efficient forecasting model to forecast the confirmed cases of the COVID-19 in China for the upcoming ten days based on previously confirmed cases.

2.
An improved ANFIS model is proposed using a modified FPA algorithm, using SSA.

3.
We compare the proposed model with the original ANFIS and existing modified ANFIS models, such as PSO, GA, ABC, and FPA.
The rest of this study is organized as follows. The preliminaries of ANFIS, FPA, and SSA are described in Section 2. Section 3 presents the proposed FPASSA, and Section 4 presents the experimental setup and results. We conclude this study in Section 5.

Adaptive Neuro-Fuzzy Inference System (ANFIS)
The principles of the ANFIS are given in this section. The ANFIS model links the fuzzy logic and neural networks [22]. It generates a mapping between the input and output by applying IF-THEN rules (it is also called Takagi-Sugeno inference model). Figure 1 illustrates the ANFIS model where, y and x define the inputs to Layer 1 whereas, O 1i is its output of node i that is computed as follows: where µ denotes the generalized Gaussian membership functions. A i and B i define the membership values of µ. α i and ρ i denote the premise parameters set. The output of Layer 2 (it is also known as the firing strength of a rule) is calculated as follows: Meanwhile, the output of Layer 3 (it is also known as the normalized firing strength) is calculated as follows: The output of Layer 4 (it is also known as an adaptive node) is calculated as follows: where r i , q i , and p i define the consequent parameters of the node i. Layer 5 contains only one node; its output is computed as:

Flower Pollination Algorithm (FPA)
Flower Pollination Algorithm is an optimization method proposed by Yang [38]. It simulates the transfer of flowers' pollen by pollinators in nature. This algorithm utilizes the two types of pollination (i.e., self-pollination and cross-pollination). In self-pollination, the pollination occurs with no pollinators, whereas, in cross-pollination, the pollens are moved between different plants. In more detail, the self-pollination can be represented as a local pollination while the cross-pollination can be called global pollination.
The global pollination or cross-pollination can be mathematically formed as follows: where x t i defines the pollen i at iteration t. L denotes the pollination's strength or the step size. F * is the target position or best solution. In some cases, insects can fly with different distance steps for a long space; therefore, Levy fly distribution is applied to simulate this movement.
where λ = 1.5. Γ(λ) denotes the gamma function. This distribution is available for large steps s > 0. The self-pollination or local pollination can be mathematically formed as follows: where x t i and x k i represent pollens from different flower in the same plant. in the range [0, 1] The process of pollination can be done using cross-pollination or self-pollination. Therefore, the random variable p, in the range [0, 1], is used to determine this process.

Salp Swarm Algorithm (SSA)
SSA is an optimization technique introduced by [48]. It simulates the Salps' behavior in nature. This behavior is called salp chain. The mathematical model of SSA begins by splinting its population into a leader group and followers group. The leader is the front salp, whereas, the followers are the other salps. The search space is determined in n-dimensions with n variables. Equation (10) works to update the salps' positions.
where x 1 j denotes the leader's position in j-th dimension. F j is the target position. ub j and lb j represent the max and min bounds, respectively. c 2 and c 3 denote random numbers in [0, 1]. c 1 is an important parameter; it balances between the exploration and exploitation phases. It is computed as follows: where the current loop number is t and the max loop' number is t max . Then, the followers' position is updated as follows: where x i j defines the i-th position of the follower in j-th dimension. i > 1.

The Proposed Method
This section explains the proposed FPASSA-ANFIS method. It is a time series method for forecasting the confirmed cases of the COVID-19, as given in Figure 2. The FPASSA-ANFIS utilizes the improved FPA to train the ANFIS model by optimizing its parameters. The FPASSA-ANFIS contains five layers as the classic ANFIS model. Layer 1 contains the input variables (the historical COVID-19 confirmed cases). Whereas Layer 5 produces the forecasted values. In the learning phase, the FPASSA is used to select the best weights between Layer 4 and Layer 5.
The FPASSA-ANFIS starts by formatting the input data in a time series form. In our case, the autocorrelation function (ACF) was considered. ACF is one of the methods applied to find patterns in the data; it presents information about the correlation between points separated by various time lags. Therefore, in this paper, the variables with ACF greater than 0.2 are considered i.e., 5-lags.
Besides, the training data contains 75% of the dataset, whereas the testing data contains 25% of them. The number of clusters is defined by the fuzzy c-mean (FCM) method to construct the ANFIS model.
The parameters of the ANFIS model are prepared by the FPASSA algorithm. In the training phase, the calculation error (as in Equation (13)) between the real data and the predicted data is used to evaluate the parameters' quality.
where T is the real data, and P is the predicted data. N s is the sample length. The smaller values of the objective function indicate good ANFIS's parameter.
On the other hand, the updating phase of the followers' positions in the SSA algorithm is applied to improve the global pollination phase in the FPA algorithm. In this improvement, there is a random variable (r) used to switch between both phases. If r > 0.5, then the operators of the SSA is used; otherwise, the operators of the FPA are used. In general, The FPASSA starts by constructing the population (X); afterward, the objective function is calculated for each solution. The solution with the lowest error value is saved to the next iteration. This sequence is repeated until meeting the stop condition, which in this paper, is the maximum number of iterations. Then the best solution is passed to train the parameters of the ANFIS model.
After finishing the training phase, the testing phase is started with the best solution to compute the final output. The performance of the proposed method is evaluated by comparing the real data with the predicted data using the performance measures. Finally, the FPASSA produces a foretasted value for confirmed cases of COVID-19 in China in the next day. The steps of the proposed FPASSA are presented in Algorithm 1.

Algorithm 1 Proposed FPASSA algorithm
Input: Historical COVID-19 dataset, size of population N, total number of iterations t max .
Divide the data into training and testing sets.
Using Fuzzy c-mean method to determine the number of membership functions.
Constructing the ANFIS network.
Set the initial value for N solutions (X). Return the best solution that represents the best configuration for ANFIS.
Apply the testing set to the best ANFIS model.
Forecasting the COVID-19 for the next ten days.

Experiment
This section presents the description of the used dataset, the performance measures, the parameter setting for all methods, the experiment results, and discussions.

Datasets Description
The main dataset of this study is COVID-19 dataset. It was collected from the WHO website (https: //www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/). It contains the daily confirmed cases in China from 21 January 2020 to 18 February 2020, as shown in Table 1. We used 75% from the dataset to train the model while the rest is used to test it.
Moreover, we evaluated the performance of the proposed method using two datasets of weekly influenza confirmed cases. The first one is called DS1; it was collected from the Centers for Disease Control and Prevention (CDC) (https://www.cdc.gov/flu/weekly/). It starts from week number 40 in 2015 and continues until week number 6 in 2020. Whereas, the second one is called DS2. It was collected from the WHO website (https://www.who.int/influenza). It contains the data of weekly influenza confirmed cases in China from week number 1 in 2016 to week number 8 in 2020.

Performance Measures
The quality of the proposed method is evaluated using a set of performance metrics as follows: • Root Mean Square Error (RMSE): where Yp and Y are the predicted and original values, respectively. • Mean Absolute Error (MAE): • Mean Absolute Percentage Error (MAPE): • Root Mean Squared Relative Error (RMSRE): N s represents the sample size of the data. • Coefficient of Determination (R 2 ): where Y represents the average of Y.
The lowest value of RMSE, MAE, MAPE, and RMSRE refers to the best method. The higher value of R 2 indicates better correlation for the method.

Parameter Settings
This paper aims to assess the ability of the FPASSA to forecast the COVID-19 by comparing its performance with other methods, namely the ANFIS and the trained ANFIS models using PSO, GA, ABC, FPA, and FPASSA. The parameters' setting for these models is listed in Table 2.
The common parameters, such as population size, are set to 25 and 100 iterations are applied. Besides, each algorithm is performed for 30 independent runs to fair comparisons. The selected parameters are chosen because they produced good behavior in previous experiments, such as [34,35,55,56]. Table 2. Parameters' setting.

Algorithm
Parameters Setting

Performance of FPASSA to Forecast DS1 and DS2
In this section, the performance of the proposed FPASSA to predict the DS1 and DS2 is discussed. It can be concluded from Table 3 that the performance of FPASSA outperformed the compared methods in all measures, whereas the FPA is ranked second. The results of DS2 indicate that the FPASSA is ranked first in terms of RMSE, MAPE, R 2 , and the CPU time. Whereas, the PSO is ranked second, followed by the FPA, GA, then ABC. These results denote that the proposed method can optimize the parameters of the ANFIS model effectively and produce good results in terms of the performance measures. Comparison results between the proposed FPASSA and other models to forecast COVID-19 are given in Table 4. It can be concluded that the FPASSA outperforms other models. For example, by analyzing the results of RMSE, MAE, MAPE, RMSRE, and CPU time(s) it can be observed that the FPASSA achieves the smallest value among the comparison algorithms, and this indicates the high quality of the FPASSA. Meanwhile, the FPA allocates the second rank, which provides better results than the rest of the methods.
Moreover, the value of R 2 refers to the high correlation between the prediction obtained by the proposed FPASSA method and the original COVID-19, which has nearly 0.97. This can also be noticed from Figure 3, which depicts the training of the algorithms using the historical data of the COVID-19 as well as their forecasting values for ten days.    Table 5 depicts the forecasting value for the confirmed cases of the COVID-19 in China from 19/2/2020 to 28/2/2020. From these results, it can be noticed that the outbreak will reach its highest level on the day 28/2/2020. The average percentage of the increase over the forecasted period is 10%, the highest percentage is 12% on 28/2/2020, and the lowest percentage is 8.7% on 19/2/2020. From the previous results, it can be concluded that the proposed FPASSA-ANFIS has a high ability to forecast the COVID-19 dataset. These results avoid the limitations of traditional ANFIS because of the combination with the modified FPA method. Moreover, the operators of SSA are combined with the local strategy of FPA to enhance their exploitation ability. However, the time computational of the proposed FPASSA method still requires more improvements.

Conclusions
This paper proposed a modified version for the flower pollination algorithm (FPA) using the salp swarm algorithm (SSA). This modified version, called FPASSA, is applied to improve the performance of the ANFIS through determining the optimal value for its parameters. The developed FPASSA-ANFIS model is applied as a forecasting technique for a novel coronavirus, called COVID-19, that was discovered in Wuhan, China at the end of last year and January of the current year. The proposed FPASSA-ANFIS model has a high ability to predict the number of confirmed cases within ten days. Besides, FPASSA-ANFIS outperforms other forecasting models in terms of RMSE, MAE, MAPE, RMSRE, and R 2 . Furthermore, two datasets of weekly influenza confirmed cases in the USA and China were used to evaluate the proposed method, and the evaluation outcomes showed its good performance. According to the promising results obtained by the proposed FPASSA-ANFIS, it can be applied in different forecasting applications.