Design of Type-3 Fuzzy Systems and Ensemble Neural Networks for COVID-19 Time Series Prediction Using a Fireﬂy Algorithm

: In this work, information on COVID-19 conﬁrmed cases is utilized as a dataset to perform time series predictions. We propose the design of ensemble neural networks (ENNs) and type-3 fuzzy inference systems (FISs) for predicting COVID-19 data. The answers for each ENN module are combined using weights provided by the type-3 FIS, in which the ENN is also designed using the ﬁreﬂy algorithm (FA) optimization technique. The proposed method, called ENNT3FL-FA, is applied to the COVID-19 data for conﬁrmed cases from 12 countries. The COVID-19 data have shown to be a complex time series due to unstable behavior in certain periods of time. For example, it is unknown when a new wave will exist and how it will affect each country due to the increase in cases due to many factors. The proposed method seeks mainly to ﬁnd the number of modules of the ENN and the best possible parameters, such as lower scale and lower lag of the type-3 FIS. Each module of the ENN produces an individual prediction. Each prediction error is an input for the type-3 FIS; moreover, outputs provide a weight for each prediction, and then the ﬁnal prediction can be calculated. The type-3 fuzzy weighted average (FWA) integration method is compared with the type-2 FWA to verify its ability to predict future conﬁrmed cases by using two data periods. The achieved results show how the proposed method allows better results for the real prediction of 20 future days for most of the countries used in this study, especially when the number of data points increases. In countries such as Germany, India, Italy, Mexico, Poland, Spain, the United Kingdom, and the United States of America, on average, the proposed ENNT3FL-FA achieves a better performance for the prediction of future days for both data points. The proposed method proves to be more stable with complex time series to predict future information such as the one utilized in this study. Intelligence techniques and their combination in the proposed method are recommended for time series with many data points.


Introduction
The COVID-19 pandemic has had worldwide effects since its origin. The number of confirmed cases worldwide has increased, especially in the different waves that each country has experienced. Studies regarding this disease and its repercussions have not ceased since it emerged, which has led to various publications focused on who the people at risk of developing severe COVID-19 are [1,2], the consequences seen in people who were infected with this disease, and the prediction of future cases in order to formulate measures to mitigate the increase in cases [3]. The possibility of developing severe COVID-19 depending on whether a person has other illnesses or is a pregnant woman has been studied [4,5], and the use of drugs and their effects have also been widely studied [1,6]. There is a significant number of models that have been successfully applied to the prediction and detection of COVID-19 cases. Among these models are those based on the statistics and neural networks applied to time series and human body images [7,8]. In this work, a combination of three techniques is developed: ensemble neural networks, type-3 fuzzy logic, and a firefly algorithm. Each of these intelligence techniques has been successfully applied in different applications, and now they are combined to predict confirmed COVID-19 cases.
The main contribution of this work consists of a hybrid approach combining neural networks in an ensemble as well as interval type-3 fuzzy systems for achieving the aggregation of the outputs of the ensemble. In addition, the firefly algorithm is utilized to automate the design of the ensemble and the interval type-3 fuzzy system. We have to mention that there is no previous work in the literature combining ensembles of neural networks with interval type-3 fuzzy systems and optimizing them with the firefly algorithm, which is an indication of the novelty of the proposed ENNT3FL-FA. Finally, the application of the hybrid approach for COVID-19 prediction is also important because of the real-world implications of the pandemic.
The structure of this paper is as follows. A literature review is presented in Section 2. Brief descriptions of the techniques applied to develop the proposed method are presented in Section 3. The description of the proposed method is shown in Section 4. The experimental results achieved are explained in Section 5. The statistical tests are presented in Section 6. The conclusions are presented in Section 7.

Literature Review
Some of the models used for predicting COVID-19 cases are statistical models. In [8], a global prediction of confirmed and recovered cases was proposed using different models based on statistical methodology. The authors used a model selection criterion, where the results are better than those achieved with the standard Gaussian autoregressive time series model. In [9], a Bose-Einstein (BE)-based statistical model for predictions of 14 days was proposed using COVID-19 confirmed case data from New York and DKI Jakarta (Indonesia). The authors wanted to provide the necessary information to decide on the social restrictions that should be used to contain the pandemic. Statistical models such as the generalized Waring regression model were proposed in [10] to predict confirmed COVID-19 cases in Senegal. The authors concluded that this model could consider other factors that affect the number of COVID-19 cases, achieving better results than other count regression models.
Intelligence techniques have also proven to play an important role in facing the COVID-19 pandemic. In [11], a comparison of three predictive models based on artificial neural networks was proposed to predict 10 days of confirmed COVID-19 cases and deaths, using information from 03/11/2020 to 01/23/2021. The achieved results show the effectiveness of this intelligence technique in predicting COVID-19 cases. A method using recurrent and convolutional neural networks to predict confirmed COVID-19 cases was proposed in [12]. The model was applied to the seven most affected states in India, and the results demonstrated that hybridization with convolutional neural networks has a better performance than other proposed hybridizations. In [13], an improvement to the grey wolf optimizer to accelerate convergence by using gradient information was proposed to predict COVID-19 cases in the United States of America. The authors applied a Gaussian walk and Lévy flight to improve the exploration and exploitation to avoid trapping in local optima. An improvement of a long short-term memory network was proposed in [14], where the authors show the behavior of the proposed method with a chaotic time series of COVID-19. They demonstrated the effectiveness of their proposed method for COVID-19 cases in Vietnam. Ensemble neural networks (ENNs), an improvement of conventional neural networks, are a useful technique applied to predict future information based on a data period learned by the modules of the ENNs [15][16][17][18]. This improvement of conventional neural networks has also been applied to analyze images to detect COVID-19 infections. In [19], an ensemble deep neural network architecture to detect COVID-19 infections from chest computed tomography (CT) images was proposed. The authors proposed different architectures and developed a localhost application to perform a diagnosis automatically. In [20], a hybrid system was proposed, where a deep neural network ensemble using pre-trained models (VGG, Xception, and ResNet) and using a genetic algorithm to combine an ensemble architecture was used to perform the classification of clustered images of lung lobes. In [21], an ensemble deep learning model was also proposed to predict COVID-19 infection using chest X-ray images. The authors applied different deep learning models, which indicated that a final combination of an ensemble model can provide better results than individual models. Most previously mentioned works focused on predicting a few future days only in a specific country or in a global prediction.
In a previous work [22], ensemble neural networks were implemented to predict COVID-19 cases, and the ENN architectures were designed using a firefly algorithm. To obtain a final prediction, a comparison among the average, type-1, and type-2 FWA integration methods was performed. The achieved results showed a better performance using the type-2 FWA. Type-3 fuzzy logic has been applied to complex problems where it is combined with other techniques, such as learning algorithms for optimization techniques [23,24]. In [25], a deep learning model with type-3 fuzzy logic to produce forecasting of renewable energies was proposed. For this reason, in this work, a type-3 FWA is proposed to compare with a type-2 FWA and to predict 20 future days (from 03/30/2022 to 04/18/2022). The proposed method is tested using confirmed cases from 12 countries: Brazil, China, France, Germany, India, Iran, Mexico, Italy, Spain, Poland, the United States of America, and the United Kingdom. Different from the previous literature, the main contribution presented in this work is the use of type-3 fuzzy logic to combine responses produced by ENNs. In addition, the type-3 fuzzy inference systems (FISs), as well as the ENNs, are optimally designed by using a firefly algorithm, which has not been previously presented in the literature.

Intelligence Techniques
The intelligence techniques applied to develop the proposed method are briefly described in this section.

Artificial Neural Networks
Human beings have many abilities that stand out from other living beings. Among these skills, we have the ability to learn and recognize. These tasks are performed by the human brain, which recognizes and learns complex problems. These processes can be defined as follows: Information is collected using the human senses (for example, sight), and this information is sent to previously trained neurons. The neurons react to the information providing activations. The next step consists of recognizing shapes, for example, letters or numbers [26,27]. These processes described are mimicked by a technique used in artificial intelligence called artificial neural networks (ANNs). This technique is based on the biological neural networks which compose the human brain trying to make a decision as a human would. To mimic the human brain, ANNs are mainly composed of layers: input, hidden, and output layers, with interconnected units called neurons. The ANNs can simulate complex problems by adjusting parameters as the weights, which during the learning process, store knowledge [28][29][30]. In Figure 1, the illustration of an ANN is presented where the propagation of the information up to the last layer is performed, because the neurons of each layer (input or hidden) are fully connected to the neurons of the next layer [31,32].
An improvement to an individual ANN is to create an ensemble neural network. Ensemble neural networks (ENNs) are composed of several artificial neural networks that learn the same information, creating individual experts in the same task. Each neural network provides results that can be different from each other [17]. A final result is achieved with the answers provided by the neural networks by combining them using an integration technique. In this work, this kind of ANN is applied to obtain an individual prediction, which is then combined with a unit integration to achieve a final prediction [16,18]. In Figure 2, an example of an ENN with three neural networks (modules) is shown. An improvement to an individual ANN is to create an ensemble neural network. Ensemble neural networks (ENNs) are composed of several artificial neural networks that learn the same information, creating individual experts in the same task. Each neural network provides results that can be different from each other [17]. A final result is achieved with the answers provided by the neural networks by combining them using an integration technique. In this work, this kind of ANN is applied to obtain an individual prediction, which is then combined with a unit integration to achieve a final prediction [16,18]. In Figure 2, an example of an ENN with three neural networks (modules) is shown.

Type-3 Fuzzy Logic
In [33], L.A. Zadeh proposed another useful intelligence technique applied to model complex problems: type-1 fuzzy logic (FL), which has a membership as a crisp number in [0, 1]; thus, an element partially belongs with a membership grade to a set. Type-2 FL is proposed in [34], where an element does not have a crisp number between 0 and 1 as in the type-1 FL. In type-2 FL [35][36][37], the membership function (MF) of an element is defined by a fuzzy set (FS) in [0, 1]. A type-2 FS can be defined as: where X represents the domain of the fuzzy variable. In this case, there is a primary and a secondary membership. The first one is defined by ⊆ 0, 1 , and the second is a type-1 FS defined by µ Ã , . The footprint of uncertainty (FOU) is an uncertainty region. If µ Ã , = 1, ∀ ∈ ⊆ 0, 1 , there is an interval type-2 MF as in Figure 3, where there is a uniform shading of the FOU with its upper µ Ã and lower µ Ã MF [38]. An interval type-2 FS is defined as:  An improvement to an individual ANN is to create an ensemble neural network. Ensemble neural networks (ENNs) are composed of several artificial neural networks that learn the same information, creating individual experts in the same task. Each neural network provides results that can be different from each other [17]. A final result is achieved with the answers provided by the neural networks by combining them using an integration technique. In this work, this kind of ANN is applied to obtain an individual prediction, which is then combined with a unit integration to achieve a final prediction [16,18]. In Figure 2, an example of an ENN with three neural networks (modules) is shown.

Type-3 Fuzzy Logic
In [33], L.A. Zadeh proposed another useful intelligence technique applied to model complex problems: type-1 fuzzy logic (FL), which has a membership as a crisp number in [0, 1]; thus, an element partially belongs with a membership grade to a set. Type-2 FL is proposed in [34], where an element does not have a crisp number between 0 and 1 as in the type-1 FL. In type-2 FL [35][36][37], the membership function (MF) of an element is defined by a fuzzy set (FS) in [0, 1]. A type-2 FS can be defined as: where X represents the domain of the fuzzy variable. In this case, there is a primary and a secondary membership. The first one is defined by ⊆ 0, 1 , and the second is a type-1 FS defined by µ Ã , . The footprint of uncertainty (FOU) is an uncertainty region. If µ Ã , = 1, ∀ ∈ ⊆ 0, 1 , there is an interval type-2 MF as in Figure 3, where there is a uniform shading of the FOU with its upper µ Ã and lower µ Ã MF [38]. An interval type-2 FS is defined as:

Type-3 Fuzzy Logic
In [33], L.A. Zadeh proposed another useful intelligence technique applied to model complex problems: type-1 fuzzy logic (FL), which has a membership as a crisp number in [0, 1]; thus, an element partially belongs with a membership grade to a set. Type-2 FL is proposed in [34], where an element does not have a crisp number between 0 and 1 as in the type-1 FL. In type-2 FL [35][36][37], the membership function (MF) of an element is defined by a fuzzy set (FS) in [0, 1]. A type-2 FS can be defined as: where X represents the domain of the fuzzy variable. In this case, there is a primary and a secondary membership. The first one is defined by J x ⊆ [0, 1], and the second is a type-1 FS defined by µ Ã (x, u). The footprint of uncertainty (FOU) is an uncertainty region. If 1], there is an interval type-2 MF as in Figure 3, where there is a uniform shading of the FOU with its upper µ Ã (x) and lower µ Ã (x) MF [38]. An interval type-2 FS is defined as: Axioms 2022, 11, 410

of 29
Axioms 2022, 11, x FOR PEER REVIEW 5 of 29 Ã , , 1 ∀ ∈ , ∀ ∈ ⊆ 0, 1 A type-3 fuzzy set (T3 FS) [39,40], denoted as , is represented by a trivariate function, called the MF of , in the cartesian product (Equation (3)), where is the universe for the primary variable of , . The MF of µ is denoted by µ , , , and is a type-3 MF of the T3 FS: , , , , µ , , ∈ , ∈ ⊆ 0, 1 , ∈ ⊆ 0, 1 where for secondary variable its universe is , and for tertiary variable . A Gaussian interval type-3 MF , = ScaleGaussScaleGaussIT3MF with Gaussian has parameters , for the upper membership function (UMF), and for the lower membership function (LMF), (lower scale) and ℓ (lower lag), to form the domain of uncertainty , ]. This membership function is represented as: , The vertical cuts characterize the ; these are an IT2 FS with a Gaussian IT2 MF, with parameters , for the UMF, and for LMF, (lower scale) and ℓ (lower lag). This interval type-3 MF is described as: where * ℓ , and is an epsilon. If ℓ 0, * . Then, and are the upper and lower DOU. The range, , and radio, , of the FOU are: The mean, , of the IT3 MF , is defined by Equation (10): A type-3 fuzzy set (T3 FS) [39,40], denoted as A (3) , is represented by a trivariate function, called the MF of A (3) , in the cartesian product (Equation (3)), where X is the universe for the primary variable of and is a type-3 MF of the T3 FS: where for secondary variable u its universe is U, and V for tertiary variable v.
Axioms 2022, 11, x FOR PEER REVIEW 6 of 29 (10) where * 2 ⁄ . Then, the vertical cuts with the IT2 MF, , , are described for Equations (11) and (12): where * ℓ . If ℓ 0, then * . Then, and are the UMF and LMF of the vertical cuts with the IT2 FS of the secondary IT2 MF of the IT3 FS [41]. A visualization of this IT3 MF is shown in Figure 4.

Firefly Algorithm
The firefly algorithm (FA) was proposed by Xin-She Yang in [42]. The algorithm is based on the behavior and flashing produced by fireflies. The development of this algorithm is mainly based on three principles: (1) The fireflies do not have sex, which means they are unisex, which allows attraction to any other firefly. (2) The attractiveness of a firefly determines its brightness. The evaluation of a firefly mate includes that the firefly with less brightness is moved in the direction of the other brighter one. Their movement is random if both have the same brightness. (3) The brightness of each firefly is established by the fitness function. In [43], the variation of attractiveness β using the distance r is proposed, calculated by: The attractiveness at r = 0 is determined by . The movement in the next iteration of a firefly i to the brighter one j is calculated by:

Firefly Algorithm
The firefly algorithm (FA) was proposed by Xin-She Yang in [42]. The algorithm is based on the behavior and flashing produced by fireflies. The development of this algorithm is mainly based on three principles: (1) The fireflies do not have sex, which means they are unisex, which allows attraction to any other firefly. (2) The attractiveness of a firefly determines its brightness. The evaluation of a firefly mate includes that the firefly with less brightness is moved in the direction of the other brighter one. Their movement is random if both have the same brightness. (3) The brightness of each firefly is established by the fitness function. In [43], the variation of attractiveness β using the distance r is proposed, calculated by: The attractiveness at r = 0 is determined by β 0 . The movement in the next iteration of a firefly i to the brighter one j is calculated by: Axioms 2022, 11, 410 7 of 29 where x i represents a firefly i in the iteration t. The attractiveness of a mate in a group of fireflies is represented with β 0 e −γr 2 ij x t j − x t i , and ε t i is a vector. This vector contains random numbers, α t , which determine a randomization parameter. The initial randomness scaling factor is given by: where δ is a value [0, 1]. In this work, a random array is implemented to avoid the local minimum; this allows the movement of fireflies to avoid stagnation.

Proposed Method
The implementation of the proposed method and dataset are described in this section.

Proposed Method Description
The proposed method designs ENNs for a time series prediction, where each individual neural network provides a prediction for the testing set and 20 future days. The set of predictions is combined using a type-3 FIS to establish a weight for each prediction, and thus, obtain a final prediction. An FA is applied to design the architecture and parameters of the ENNs and the FIS. The proposed ENNT3FL-FA is shown in Figure 5. The design of each ENN consists of mainly finding the number of modules (ANNs); this task is developed by the FA, which can search from 1 to "m" modules, and each prediction is combined using a type-3 FIS. The firefly algorithm also optimizes the type-3 fuzzy inference system.
where xi represents a firefly i in the iteration t. The attractiveness of a mate in a group of fireflies is represented with  , and є is a vector. This vector contains random numbers, , which determine a randomization parameter. The initial randomness scaling factor is given by: (15) where δ is a value [0, 1]. In this work, a random array is implemented to avoid the local minimum; this allows the movement of fireflies to avoid stagnation.

Proposed Method
The implementation of the proposed method and dataset are described in this section.

Proposed Method Description
The proposed method designs ENNs for a time series prediction, where each individual neural network provides a prediction for the testing set and 20 future days. The set of predictions is combined using a type-3 FIS to establish a weight for each prediction, and thus, obtain a final prediction. An FA is applied to design the architecture and parameters of the ENNs and the FIS. The proposed ENNT3FL-FA is shown in Figure 5. The design of each ENN consists of mainly finding the number of modules (ANNs); this task is developed by the FA, which can search from 1 to "m" modules, and each prediction is combined using a type-3 FIS. The firefly algorithm also optimizes the type-3 fuzzy inference system.

Description of the ENN
For the ENN establishment, three kinds of ANNs, including function fitting [44], feedforward [45,46], and cascade-forward [47,48] neural networks, can be chosen. Each ENN can be designed using from 1 to "m" neural networks (modules), and this value is established using the FA. In the learning phase, a backpropagation algorithm widely applied to times-series predictions is used: the Levenberg-Marquardt (LM) algorithm, with three feedback delays [22,49]. The calculation of the prediction error of the module k, where k = {1, 2, 3,…,m}, is given by:

Description of the ENN
For the ENN establishment, three kinds of ANNs, including function fitting [44], feedforward [45,46], and cascade-forward [47,48] neural networks, can be chosen. Each ENN can be designed using from 1 to "m" neural networks (modules), and this value is established using the FA. In the learning phase, a backpropagation algorithm widely applied to times-series predictions is used: the Levenberg-Marquardt (LM) algorithm, with three feedback delays [22,49]. The calculation of the prediction error of the module k, where k = {1, 2, 3, . . . , m}, is given by: where the real prediction in the time i is represented by y i . The prediction in the same time provided by the neural network k is represented byŷ ki . The number of data points is represented by N.
The ENN design consists mainly of the number (m), and the type of ANN to form the ENN. For each neural network, the goal error, number of hidden layers, and neurons are also sought by the firefly algorithm. The prediction error of each ANN is used to obtain a weight to finally produce a final prediction. The weights used to achieve a final prediction are obtained using a Sugeno type-3 FIS model. The design of this FIS is established in two parts: The first one is developed depending on the number of neural networks of the ENN, where the value of "m" (modules or ANNs) determines the input and output number of the FIS. The ranges of each fuzzy input variable, sigma, and mean of three Gaussian type-3 MFs ("low", "medium", and "high") depend on the individual prediction error (MSE). The second part of designing the FIS is determined by the FA, which establishes the values of lower scale and lower lag of the Gaussian type-3 membership functions. In Figure 6, an example of the Sugeno type-3 FIS model is presented.
where the real prediction in the time i is represented by y . The prediction in the same time provided by the neural network k is represented by ŷ . The number of data points is represented by N.
The ENN design consists mainly of the number (m), and the type of ANN to form the ENN. For each neural network, the goal error, number of hidden layers, and neurons are also sought by the firefly algorithm.

Description of the Type-3 FWA Integration
The prediction error of each ANN is used to obtain a weight to finally produce a final prediction. The weights used to achieve a final prediction are obtained using a Sugeno type-3 FIS model. The design of this FIS is established in two parts: The first one is developed depending on the number of neural networks of the ENN, where the value of "m" (modules or ANNs) determines the input and output number of the FIS. The ranges of each fuzzy input variable, sigma, and mean of three Gaussian type-3 MFs ("low", "medium", and "high") depend on the individual prediction error (MSE). The second part of designing the FIS is determined by the FA, which establishes the values of lower scale and lower lag of the Gaussian type-3 membership functions. In Figure 6, an example of the Sugeno type-3 FIS model is presented. As was previously mentioned, the ranges of the input variables depend on the error prediction (MSE) achieved by each module (normalized value between 0 and 1). That means that for each ENN, a type-3 FIS is designed. The minimal (R ) and maximal (R ) values given by Equations (17) and (18) allow us to define the range of the input variables.
An example of a type-3 fuzzy input variable is shown in Figure 7. As previously mentioned, a Gaussian type-3 membership function has four values: sigma ( ), mean ( ), lower scale ( ), and lower lag (ℓ). The difference between R and R must be calculated by Equation (19)   As was previously mentioned, the ranges of the input variables depend on the error prediction (MSE) achieved by each module (normalized value between 0 and 1). That means that for each ENN, a type-3 FIS is designed. The minimal (R min ) and maximal (R max ) values given by Equations (17) and (18) allow us to define the range of the input variables.
An example of a type-3 fuzzy input variable is shown in Figure 7. As previously mentioned, a Gaussian type-3 membership function has four values: sigma (σ), mean (m), lower scale (λ), and lower lag ( ). The difference between R min and R max must be calculated by Equation (19) to determine the sigma and mean values. The sigma value and means of the three membership functions are given by Equations (20)- (23). The constant values of the fuzzy output variables are established by the FA.
Another essential element of an FIS is its fuzzy if-then rules, which allow combining all the predictions based on their errors. In this work, the total number of rules is calculated by Equation (24), where each fuzzy variable has three membership functions.

(24)
where m is the number of fuzzy input variables. An example with an ENN with three modules (m = 3) is shown in Table 1. The type-3 FIS has as inputs the MSE values of each module and, as outputs, the weights corresponding to each prediction. These weights are used with the corresponding prediction to calculate a final prediction using the equation: where w1 is the weight for the prediction obtained by ANN #1, w2 is the weight for the prediction obtained by ANN #2, and so on up to wm, which is the weight of the prediction obtained by ANN m, ŷ1 is the prediction of ANN #1, ŷ2 is the prediction of ANN #2, and so on up to ŷm, which is the prediction of ANN (module) m. The values used by the firefly algorithm as the search space to design the ENN and FIS are shown in Table 2. The search space for the ensemble neural network is established based on previous works applied to time series predictions [15,22]. Another essential element of an FIS is its fuzzy if-then rules, which allow combining all the predictions based on their errors. In this work, the total number of rules is calculated by Equation (24), where each fuzzy variable has three membership functions.
where m is the number of fuzzy input variables. An example with an ENN with three modules (m = 3) is shown in Table 1. The type-3 FIS has as inputs the MSE values of each module and, as outputs, the weights corresponding to each prediction. These weights are used with the corresponding prediction to calculate a final prediction using the equation: where w 1 is the weight for the prediction obtained by ANN #1, w 2 is the weight for the prediction obtained by ANN #2, and so on up to w m , which is the weight of the prediction obtained by ANN m,ŷ 1 is the prediction of ANN #1,ŷ 2 is the prediction of ANN #2, and so on up toŷ m , which is the prediction of ANN (module) m. The values used by the firefly algorithm as the search space to design the ENN and FIS are shown in Table 2. The search space for the ensemble neural network is established based on previous works applied to time series predictions [15,22]. For each experiment, the firefly algorithm is established with the parameters based on [22,50]. The settings for the FA are 10 fireflies, an alpha (α) with a value of 0.01, a beta (β) with a value of 1, a delta (δ) with a value of 0.97, and 30 maximum iterations. In Figure 8, the flowchart of the optimization technique is illustrated.
The firefly algorithm has as its objective the minimization of error of the final prediction achieved by the ENN, and the objective function is given by: where the real prediction in the time i is determined by Y i , the final prediction of the ENN in the same time is determined by P i , and the number of data points is determined by N.  The firefly algorithm has as its objective the minimization of error of the final prediction achieved by the ENN, and the objective function is given by: where the real prediction in the time i is determined by Y , the final prediction of the ENN in the same time is determined by P , and the number of data points is determined by N.

Dataset Description
The information of the worldwide confirmed cases dataset is from the Humanitarian Data Exchange [51]. In this work, two data periods are used to test the performance of the proposed method. Each one is divided into three parts: the testing, training, and validation set. The first data period is from 01/22/20 to 03/29/22, and this period consists of 798 days. The second data period consists of 158 days, from 01/22/20 to 06/27/20; the results achieved with this second data period are compared with a previous work [22]. For both data periods, 12 countries are analyzed: Brazil, China, France, Germany, India, Iran, Mexico, Italy, Spain, Poland, the United States of America, and the United Kingdom. In Figures 9 and 10, the information by country is shown for the first and second data period, respectively.

Dataset Description
The information of the worldwide confirmed cases dataset is from the Humanitarian Data Exchange [51]. In this work, two data periods are used to test the performance of the proposed method. Each one is divided into three parts: the testing, training, and validation set. The first data period is from 22 January 2020 to 29 March 2022, and this period consists of 798 days. The second data period consists of 158 days, from 22 January 2020 to 27 June 2020; the results achieved with this second data period are compared with a previous work [22]. For both data periods, 12 countries are analyzed: Brazil, China, France, Germany, India, Iran, Mexico, Italy, Spain, Poland, the United States of America, and the United Kingdom. In Figures 9 and 10, the information by country is shown for the first and second data period, respectively.

Experimental Results
This section presents the results achieved by the proposed method applied to predicting the confirmed cases of 12 countries. The experimental results are achieved using a testing set with 30% of the information (black points on the graphs), leaving 70% to be divided into two sets: training and validation (80/20). Comparisons with a type-2 FWA presented by [22] are performed. As previously mentioned, the information of the confirmed cases of 12 countries is used, with 30 runs performed for each country. An individual prediction of the 20 future days is achieved by each module of the ENN (pink points on the graphs). These predictions are integrated using the weights provided by the FIS, and finally, a final prediction is obtained using Equation (25). This section only shows the results of three countries (Brazil, China, and France) of the first data period, but in Section 4.1 the results of the 12 countries are shown for both data periods.
The ENN architecture, which provides the best future prediction for Brazil, is shown in Table 3. This architecture has three modules using different types of ANNs. The first and second modules use two hidden layers, and the third module only one. The final prediction is obtained using a type-3 FWA.

Experimental Results
This section presents the results achieved by the proposed method applied to predicting the confirmed cases of 12 countries. The experimental results are achieved using a testing set with 30% of the information (black points on the graphs), leaving 70% to be divided into two sets: training and validation (80/20). Comparisons with a type-2 FWA presented by [22] are performed. As previously mentioned, the information of the confirmed cases of 12 countries is used, with 30 runs performed for each country. An individual prediction of the 20 future days is achieved by each module of the ENN (pink points on the graphs). These predictions are integrated using the weights provided by the FIS, and finally, a final prediction is obtained using Equation (25). This section only shows the results of three countries (Brazil, China, and France) of the first data period, but in Section 4.1 the results of the 12 countries are shown for both data periods.
The ENN architecture, which provides the best future prediction for Brazil, is shown in Table 3. This architecture has three modules using different types of ANNs. The first and second modules use two hidden layers, and the third module only one. The final prediction is obtained using a type-3 FWA. In Figure 11, the individual predictions for the 20 future days and the testing set achieved with the architecture shown in Table 3 for Brazil are shown. In Figure 11b,c, module #2 and module #3 are shown, respectively, where the prediction of the 20 future days (pink points) tends to decrease, but the final integration (Figure 11d) achieved improves its behavior because the first module has a good prediction.
A zoom-in of the best final prediction for Brazil is shown in Figure 12, where the prediction is almost similar to the real behavior. After day #17, the prediction tends to decrease. The type-3 fuzzy input variables generated by the FA are shown in Figure 13 for module #1 (Figure 13a), module #2 (Figure 13b), and module #3 (Figure 13c). In Figure 11, the individual predictions for the 20 future days and the testing set achieved with the architecture shown in Table 3 for Brazil are shown. In Figure 11b,c, module #2 and module #3 are shown, respectively, where the prediction of the 20 future days (pink points) tends to decrease, but the final integration (Figure 11d) achieved improves its behavior because the first module has a good prediction. A zoom-in of the best final prediction for Brazil is shown in Figure 12, where the prediction is almost similar to the real behavior. After day #17, the prediction tends to decrease. The type-3 fuzzy input variables generated by the FA are shown in Figure 13 for module #1 (Figure 13a  In Figure 14, the average convergence obtained by both integration techniques of the 30 runs for Brazil is shown. The behavior with the type-3 FWA has better convergence than the type-2 FWA technique (testing set). The results achieved with both techniques for the testing set are shown in Table 4, where a better average for Brazil is achieved using the proposed type-3 FWA (indicated in bold in the table). In Figure 14, the average convergence obtained by both integration techniques of the 30 runs for Brazil is shown.  The behavior with the type-3 FWA has better convergence than the type-2 FWA technique (testing set). The results achieved with both techniques for the testing set are shown in Table 4, where a better average for Brazil is achieved using the proposed type-3 FWA (indicated in bold in the table). In Figure 15, the average predictions of the 20 future days for Brazil are shown. As these results show, the type-2 and type-3 FWA achieved predictions closer to the real cases up to the eighth day (day #806 04/06/2022). However, after this day, the type-3 FWA integration tends to decrease more than the type-2 FWA. This behavior caused, on average, type-2 to achieve a better performance.  In Figure 15, the average predictions of the 20 future days for Brazil are shown. As these results show, the type-2 and type-3 FWA achieved predictions closer to the real cases up to the eighth day (day #806 6 April 2022). However, after this day, the type-3 FWA integration tends to decrease more than the type-2 FWA. This behavior caused, on average, type-2 to achieve a better performance.
In Figure 15, the average predictions of the 20 future days for Brazil are sho these results show, the type-2 and type-3 FWA achieved predictions closer to the re up to the eighth day (day #806 04/06/2022). However, after this day, the type integration tends to decrease more than the type-2 FWA. This behavior cau average, type-2 to achieve a better performance. In Table 5, the results of the future prediction for Brazil are shown. These achieved for Brazil show that type-2 FWA has on average a better performance prediction (indicated in bold in the table). In Table 5, the results of the future prediction for Brazil are shown. These results achieved for Brazil show that type-2 FWA has on average a better performance in a real prediction (indicated in bold in the table). The ENN architecture, which provides the best future prediction for China, is shown in Table 6. This architecture also has three modules, but each module has only one hidden layer. The final prediction is obtained using the type-3 FWA. In Figure 16, the individual predictions for the 20 future days and the testing set achieved with the architecture shown in Table 6 for China are shown. The prediction of the 20 future days (pink points) tends to decrease for module #1 (Figure 16a) and module #2 (Figure 16b), but the prediction of module #3 (Figure 16c) achieved a good final integration (Figure 16d).
A zoom-in of the best final prediction for China is shown in Figure 17, where the prediction is almost similar to the real behavior. After day #16, the prediction tends to increase. The type-3 fuzzy input variables generated by the FA are shown in Figure 18 for module #1 (Figure 18a), module #2 (Figure 18b), and module #3 (Figure 18c).
In Figure 16, the individual predictions for the 20 future days and the testing set achieved with the architecture shown in Table 6 for China are shown. The prediction of the 20 future days (pink points) tends to decrease for module #1 (Figure 16a) and module #2 (Figure 16b), but the prediction of module #3 (Figure 16c) achieved a good final integration (Figure 16d). A zoom-in of the best final prediction for China is shown in Figure 17, where the prediction is almost similar to the real behavior. After day #16, the prediction tends to increase. The type-3 fuzzy input variables generated by the FA are shown in Figure 18 for module #1 (Figure 18a), module #2 (Figure 18b), and module #3 (Figure 18c).  In Figure 19, the average convergence obtained by both integration techniques of the 30 runs for China is shown. The behavior shown indicates that the type-3 FWA also has better convergence than the type-2 FWA (testing set).
In Table 7, the results achieved with both techniques for the testing set are shown, where a better average for China is achieved using the proposed type-3 FWA (indicated in bold in the table). In Figure 19, the average convergence obtained by both integration techniques of the 30 runs for China is shown. The behavior shown indicates that the type-3 FWA also has better convergence than the type-2 FWA (testing set).  In Table 7, the results achieved with both techniques for the testing set are shown, where a better average for China is achieved using the proposed type-3 FWA (indicated in bold in the table). In Figure 20, the average predictions of the 20 future days for China are shown. The results show that the type-3 FWA achieved predictions closer to the real cases up to the sixteenth day (day #814 04/14/2022). After this day, the type-3 FWA integration had a slight increase.   In Figure 20, the average predictions of the 20 future days for China are shown. The results show that the type-3 FWA achieved predictions closer to the real cases up to the sixteenth day (day #814 14 April 2022). After this day, the type-3 FWA integration had a slight increase. In Figure 20, the average predictions of the 20 future days for China are shown. results show that the type-3 FWA achieved predictions closer to the real cases up to sixteenth day (day #814 04/14/2022). After this day, the type-3 FWA integration h slight increase. In Table 8, the results of the future prediction for China are shown. These res achieved for China show that the type-3 FWA has on average a better performance real prediction (indicated in bold in the table). In Table 8, the results of the future prediction for China are shown. These results achieved for China show that the type-3 FWA has on average a better performance in a real prediction (indicated in bold in the table). In Table 9, the ENN architecture, which provides the best future prediction for France, is shown. This architecture has three modules and also uses one hidden layer per module. The final prediction is obtained using the type-3 FWA. In Figure 21, the individual predictions for the 20 future days and the testing set achieved with the architecture shown in Table 9 for France are shown. All the modules achieved a good prediction of the 20 future days (pink points), which allows us to have an excellent final prediction (Figure 21d).
A zoom-in of the best final prediction for France is shown in Figure 22, where the prediction is very similar to the real behavior. The type-3 fuzzy input variables generated by the FA are shown in Figure 23 for module #1 (Figure 23a), module #2 (Figure 23b), and module #3 (Figure 23c).
In Figure 24, the average convergence obtained by both integration techniques of the 30 runs for France is shown, where the behavior with the type-2 FWA has better convergence than the type-3 FWA (testing set).
6.14 × 10 −6 4.13 × 10 −6 Cascade 30 1.34 × 10 −5 Fitnet 24 9.51 × 10 −6 In Figure 21, the individual predictions for the 20 future days and the testing set achieved with the architecture shown in Table 9 for France are shown. All the modules achieved a good prediction of the 20 future days (pink points), which allows us to have an excellent final prediction (Figure 21d). A zoom-in of the best final prediction for France is shown in Figure 22, where the prediction is very similar to the real behavior. The type-3 fuzzy input variables generated by the FA are shown in Figure 23 for module #1 (Figure 23a), module #2 (Figure 23b), and module #3 (Figure 23c). The results achieved with both techniques for the testing set are shown in Table 10, where a better average for France is achieved using the proposed type-2 FWA (indicated in bold in the table). Table 10. First data period (testing prediction, France).

Type-2 FWA
Type-3 FWA Best Avg Worst Best Avg Worst 6.09 × 10 −6 6.25 × 10 −6 6.80 × 10 −6 6.14 × 10 −6 6.36 × 10 −6 6.81 × 10 −6 Figure 22. Best future prediction (France). In Figure 24, the average convergence obtained by both integration techniques of the 30 runs for France is shown, where the behavior with the type-2 FWA has better convergence than the type-3 FWA (testing set).  The results achieved with both techniques for the testing set are shown in Table 10, where a better average for France is achieved using the proposed type-2 FWA (indicated in bold in the table). Table 10. First data period (testing prediction, France).

Type-2 FWA
Type-3 FWA Best Avg Worst Best Avg Worst 6.09 × 10 −6 6.25 × 10 −6 6.80 × 10 −6 6.14 × 10 −6 6.36 × 10 −6 6.81 × 10 −6 The predictions (averages) for the next 20 days for France are shown in Figure 25. As these results show, the type-3 FWA achieved predictions closer to the real cases up to the seventh day (day #805 04/05/2022). After this day, the type-3 FWA integration tends to increase, but not as much as the type-2 FWA. The predictions (averages) for the next 20 days for France are shown in Figure 25. As these results show, the type-3 FWA achieved predictions closer to the real cases up to the seventh day (day #805 5 April 2022). After this day, the type-3 FWA integration tends to increase, but not as much as the type-2 FWA.
In Table 11, the results of the future prediction for France are shown. These results achieved for France show that the type-3 FWA has on average a better performance in a real prediction (indicated in bold in the table).
6.09 × 10 −6 6.25 × 10 −6 6.80 × 10 −6 6.14 × 10 −6 6.36 × 10 −6 6.81 × 10 −6 The predictions (averages) for the next 20 days for France are shown in Figure 25. As these results show, the type-3 FWA achieved predictions closer to the real cases up to the seventh day (day #805 04/05/2022). After this day, the type-3 FWA integration tends to increase, but not as much as the type-2 FWA. In Table 11, the results of the future prediction for France are shown. These results achieved for France show that the type-3 FWA has on average a better performance in a real prediction (indicated in bold in the table).

Summary of Results
The results obtained with the type-3 FWA are presented and compared with the type-2 FWA in this section. The tests were performed using two data periods of confirmed cases of 12 countries. For the testing set for the first data period, the results achieved (MSE) are shown in Table 12. As the results show, the type-2 FWA has on average a better performance for the testing prediction for most countries. In general, for the future days, the type-3 FWA had on average a better performance based on the results shown in Table 13 (except for Brazil).   The results achieved (MSE) for the second data period are shown in Tables 14 and The results achieved with the type-2 FWA were previously presented by [22]. As the sults show in Table 15, the testing set with the type-2 FWA has on average a better perf  The results achieved (MSE) for the second data period are shown in Tables 14 and The results achieved with the type-2 FWA were previously presented by [22]. As the sults show in Table 15, the testing set with the type-2 FWA has on average a better perf  The results achieved (MSE) for the second data period are shown in Tables 14 and 15. The results achieved with the type-2 FWA were previously presented by [22]. As the results show in Table 15, the testing set with the type-2 FWA has on average a better performance for most countries (eight countries).  However, for the future days, most countries (eight countries) had on average a better performance using the type-3 FWA integration.
The results achieved (averages) for the second data period for the testing phase and the next 20 days are graphically shown in Figures 28 and 29. The results achieved (averages) for the second data period for the testing phase and the next 20 days are graphically shown in Figures 28 and 29.

Statistical Comparison
The Shapiro-Wilk test was performed on each country for both data periods (testing and future prediction) to determine if the results are normally distributed. Due to the results achieved, where the normality assumption is not passed, a non-parametric test must  The results achieved (averages) for the second data period for the testing phase and the next 20 days are graphically shown in Figures 28 and 29.

Statistical Comparison
The Shapiro-Wilk test was performed on each country for both data periods (testing and future prediction) to determine if the results are normally distributed. Due to the results achieved, where the normality assumption is not passed, a non-parametric test must

Statistical Comparison
The Shapiro-Wilk test was performed on each country for both data periods (testing and future prediction) to determine if the results are normally distributed. Due to the results achieved, where the normality assumption is not passed, a non-parametric test must be applied. The Mann-Whitney tests were performed to compare both integration techniques, where the median is used [52]. The achieved results are presented in this section. In this work, a 0.05 level of significance is utilized to perform the comparisons. The null hypothesis indicates that there is no difference between both techniques. The null hypothesis is rejected if the p-value is smaller than 0.05. The lower median for each country is underlined in each table.
The Mann-Whitney results for each country of the first data period for the testing days are shown in Table 16. As the results show (indicated in bold in the table), there is only a significant difference in only two countries (France and Iran) using the type-2 FWA. Table 17 shows the results for each country of the first data period for the future days. As the results show (indicated in bold in the table), there is only a significant difference in only four countries. The type-2 FWA is significantly different for Brazil and the United Kingdom, and the type-3 FWA for China and Germany. However, the median achieved by the type-3 FWA for other countries was also lower than that the achieved by the type-2 FWA. Table 18 shows the results for each country of the second data period for the testing days. As the results show (indicated in bold in the table), there is only a significant difference in only three countries (China, Iran, and the United Kingdom) using the type-2 FWA. In Table 19, the results for each country of the second data period for the future days are shown. As the results show (indicated in bold in the table), no significant difference exists for any country. However, the median values achieved by the type-3 FWA for seven countries were lower than those achieved by the type-2 FWA.

Conclusions
This paper utilized the information of confirmed COVID-19 cases to predict the cases of 20 future days. The proposed ENNT3FL-FA combines ENNs, type-3 fuzzy logic, and a firefly algorithm, which was not previously proposed. The firefly algorithm allows us to design optimal ENN architectures and fuzzy inference systems to predict confirmed COVID-19 cases, where each individual prediction obtains a weight using the Sugeno model type-3 fuzzy inference system generated by the firefly algorithm to produce a final prediction. The proposed method was tested using the information of 12 countries in two data periods. The first data period contains information from 798 days, and the second one only 158 days. The last one was used to compare these results with a previous work, where integration techniques such as the average, type-1, and type-2 FWA were compared, and it was found that better results were achieved with the type-2 FWA. For this reason, in this work, the proposed method was compared with the type-2 FWA. The proposed ENNT3FL-FA achieved a better performance on the prediction of future days for both data periods in countries such as Germany, India, Italy, Mexico, Poland, Spain, the United Kingdom, and the United States of America based on the average values. However, when Mann-Whitney tests were performed, the achieved results showed for the first data period (798 days) in the testing prediction that there is only a significant difference for France and Iran predictions, where the best median values are achieved with the type-2 FWA. A similar situation occurred for the second data period (158 days), also in the testing period, where the type-2 FWA has better median values for Iran and the United Kingdom. For the future prediction, in the case of the first data period, there is a significant difference for Brazil and the United Kingdom predictions, where the best median values are achieved with the type-2 FWA, and only Germany obtained a significant difference by the type-3 FWA. In the second data period, the comparison between the type-2 FWA and the type-3 FWA does not provide significant differences. However, it is important to mention that for both data periods in the future prediction, more countries have lower median values with the proposed ENNT3FL-FA. With the results achieved, we conclude that ENNT3FL-FA provides a better performance in predicting future days and simulates the real behavior for the data period with more information, although statistically only a significant difference is achieved in a few countries. The main challenge of the proposed ENNT3FL-FA is to achieve a better performance with the testing prediction; this would allow for achieving a better future prediction. In future works, other type-3 membership functions will be implemented to evaluate their performance with another data period of COVID-19 and other infectious diseases such as influenza and monkeypox. We also will consider implementing other optimization methods to compare results and performance.