Deep Assessment Methodology Using Fractional Calculus on Mathematical Modeling and Prediction of Gross Domestic Product per Capita of Countries

: In this study, a new approach for time series modeling and prediction, “deep assessment methodology,” is proposed and the performance is reported on modeling and prediction for upcoming years of Gross Domestic Product (GDP) per capita. The proposed methodology expresses a function with the ﬁnite summation of its previous values and derivatives combining fractional calculus and the Least Square Method to ﬁnd unknown coe ﬃ cients. The dataset of GDP per capita used in this study includes nine countries (Brazil, China, India, Italy, Japan, the UK, the USA, Spain and Turkey) and the European Union. The modeling performance of the proposed model is compared with the Polynomial model and the Fractional model and prediction performance is compared to a special type of neural network, Long Short-Term Memory (LSTM), that used for time series. Results show that using Deep Assessment Methodology yields promising modeling and prediction results for GDP per capita. The proposed method is outperforming Polynomial model and Fractional model by 1.538% and by 1.899% average error rates, respectively. We also show that Deep Assessment Method (DAM) is superior to plain LSTM on prediction for upcoming GDP per capita values by 1.21% average error. memory and generalization of the problem for di ﬀ erent fractional order. In experiments, ﬁrst, GDP per capita is modeled. The Deep Assessment model has a 4.308% average MAPE and outperforms Polynomial and Fractional Model-1 by 1.538% and 1.899% average error rates for modeling. For prediction, LSTM, a special type of neural network is used to assess the performance of the model. In the selected test region, it is shown that Deep Assessment is superior to LSTM by 1.51% average error. Results illustrate that the proposed method yields promising results and demonstrates the beneﬁts of combining fractional calculus and di ﬀ erential equations. Evaluation of multivariable and multifunctional problems, analyzing time windows, randomness, noise and error changes are left to future work.


Introduction
In the last quarter of the century, the data exchange with not only person to person but also, machine to machine has increased tremendously. Developments in technology and informatics in parallel with the development of data science lead the companies, institutions, universities and especially, the countries to give priority to evaluating produced data and predicting what can be forthcoming. The modeling of all technical, economic, social events and data has been the interest of scientists for many years [1][2][3][4]. Many authors have been investigating the modeling and predicting events, options, choices and data. Especially, there is a huge research interest in finding any relation between telecommunication, economic growth and financial development [5][6][7][8][9][10][11][12]. One of the approaches to model a physical phenomenon or a mathematical study is to model the dependent variable satisfying differential equation with respect to the independent variable. However, the differential equations with an integer-order proposed for mathematical economics or data modeling cannot describe processes with memory and non-locality because the integer-order derivatives have the property of the locality. On the other hand, the fractional-order differential equation is a branch of mathematics that focuses on fractional-order differential and integral operators and can be used to address the limitations of integer order differential models. Using the fractional calculus or converting the integer-order differential equation into the non-integer order differential equations lead to a very essential advantage which is memory property of the fractional-order derivative. This is very crucial for models related to economics which in general, deal with the past and the effect of the past and now on future [12,13]. The memory capability of the fractional differential approach is the foundation of our motivation.
Fractional calculus (FC) as a question to Wilhelm Leibniz (1646-1716) first arose in 1695 from French Mathematician Marquis de L'Hopital (1661-1704) [11]. The main question of interest was what if the order of derivative were a real number instead of an integer. After that, the FC idea has been developed by many mathematicians and researchers throughout the eighteenth and nineteenth centuries. Now, there exist several definitions of the fractional-order derivative, including Grünwald-Letnikov, Riemann-Liouville, Weyl, Riesz and the Caputo representation. The fractional approach is used in many studies because the fractional derivative represents the intermediate states between two known states. For example, zero order-derivative of the function means the function itself while the first-order derivative represents the first derivative of the function. Between these known states, there are infinite intermediate states [11]. The use of semi-derivatives and integrals in the mass and heat transfer become an important instant in the field of fractional calculus due to employing the mathematical definitions into physical phenomena [12,13]. In the last decade, using fractional operators which explain the events, situations or modes between two different stages or the phenomena with memory provide more accurate models in many branches of science and engineering including chemistry, biology, biomedical devices, nanotechnology, diffusion, diffraction and economics [12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31]. In References [25][26][27][28][29][30][31], the modeling and comparison of the countries and trends in the sense of economics and its parameters are implemented. In References [25,26], economic processes with memory are discussed and modeling is obtained by using the fractional calculus. The studies with similar purposes as we aim such as modeling or prediction exist. In these studies, the fractional calculus is employed to model the given dataset and to predict for the forthcoming. In Reference [28], the orthogonal distance fitting method is used. The study is trying to minimize the sum of the orthogonal distance of data points in order to obtain an optimized continuous curve representing the data points. In Reference [32], the one-parameter fractional linear prediction is studied using the memory of two, three or four samples, without increasing the number of predictor coefficients defined in the study. In Reference [33], the generalized formulation of the optimal low-order linear prediction by using the fractional calculus approach is developed with restricted memory. All these studies focus on modeling or prediction for a phenomenon with fractional calculus. Also, in our previous studies, methods based on FC that works for modeling were introduced. In these studies, the children's physical growth, subscriber's numbers of operators, GDP per capita were modeled and compared with other modeling approaches such as Fractional Model-1 and Polynomial Models [34][35][36]. According to the results, proposed fractional models had better results compared to the results obtained from Linear and Polynomial Models [34][35][36]. Our previous works do not take into account the previous values of the dataset for any time instant. Their purpose is to model the dataset with minimum error and faster way compared with classical methods such as Polynomial and Linear Regression.
In this study, we extend our prior works by predicting the next incoming values as well as modeling the data itself. We introduce a new mathematical model, namely "Deep Assessment," based on the fractional differential equation for modeling and prediction by using the properties of fractional calculus. Different to the literature and our early studies mentioned above, this model can be used for prediction as well as modeling. The proposed approach is built on the fractional-order differential equation and corresponding Laplace transform properties are utilized. Here, the modeling is implemented with mathematical tools similar to those developed in the previous study [4] with a different approach in which the finite numbers of previous values and the derivatives are taken into account. Then, the prediction is obtained by assuming a value in a specific time can be expressed as the summation of the previous values weighted by unknown coefficients and the function to be modeled is continuous and differentiable. In this way, the proposed method takes previous values and variation rates between different time samples (derivative) of the dataset into account while modeling the data itself and predicting upcoming values. Combining the previous values with the variations weighted by the unknown coefficients lead to calling the method "deep assessment." In this study, we assessed the proposed method by the modeling, testing and predicting GDP per capita of the following countries and the European Union: Brazil, China, European Union, India, Italy, Japan, the UK, the USA, Spain and Turkey. GDP per capita is a measure of a country's total economic output divided by the number of the population of the country. In general, it is a reasonable and good measurement of a country's living quality and standard [37]. Therefore, the modeling of GDP per capita is crucial and predicting GDP per capita is very essential not only for researchers but also for companies, investors, manufacturers and institutions. To assess the performance of Deep Assessment in modeling, we compare the proposed model with Polynomial Regression and Fractional Model-1 [34]. Besides, in the same way, for the prediction, we compared the model with Long-Short Term Memory (LSTM), a special type of neural networks used in time series problems.
The structure of the study is the following. Section 2 explains the formulation of the problem. After that, Section 3, namely Our Approach, is devoted to explaining how to obtain modeling, simulation, testing and prediction. Then, in Section 4, the results are presented. Lastly, Section 5 highlights the conclusion of the study.

Formulation of the Problem
In this section, the mathematical foundation of the proposed method is given. Before going into the mathematical manipulations, it is better to explain the approach and the main steps for the formulation. The study aims to model and then, to predict GDP per capita data at any time t by using the previous GDP per capita values of the countries. Here, we assume that countries' historical data and the change of these data over time create an eco-genetics for the forthcoming. In other words, mathematically, GDP per capita at a time t is assumed to be the summation of both its previous values and the changes in time with unknown constant coefficients. In the second stage, we express a function for the GDP per capita as a series expansion by using Taylor expansion of a continuous and bounded function. Then, the differential equation obtained from this series expansion is defined. After that, the unknown constant coefficients are found by the least-squares method. The method aims to minimize the error between the proposed GDP per capita function and the dataset.
First, it is a reasonable idea to approximate a function g(x) as the finite summation of the previous values of the same function weighted with unknown coefficients α k and the summation of the derivatives of the previous values of the same function weighted with unknown coefficients β k because, intuitively, the recent value of data, in general, is related to and correlated with its previous values and the change rates. The purpose is to find the upcoming values of any dataset with a minimum error by employing the previously inherited features of the dataset. As a starting point, an arbitrary function is assumed to be approximately the finite summation of the previous values and the change rates weighted with some constant coefficients. To use the heritability of fractional calculus, this presupposition for modeling of the function itself and predicting future values is done [6,28,34]. (1) Here, g is the first derivative of g(x − k) with respect to x. After assuming Equation (1), the function g(x) can be expanded as the summation of polynomials with unknown constant coefficients, a n as given in Equation (2). Here, g(x) is assumed to be a continuous and differentiable function.
After combining α k a n as a kn , β k a n as b kn and approximating Equation (4), Equation (5) is obtained. Here, truncation of ∞ to M is performed. After truncation, the first derivative of g(x) is taken and given in Equation (6).
The expression given in Equation (7) is the definition of Caputo's fractional derivative [11]. Throughout the study, Caputo's description of the fractional derivative is employed.
In Equation (7), Γ(1 − γ) is the Gamma function, the fractional derivative is taken with respect to x in the order of γ and g (n) corresponds to the n th derivative again, with respect to x. In our study, n = 1 is assumed and the fractional-order spans between 0 and 1. Here, two expansions are done to express g(x), approximately. The first one is to express the function as the finite summation of the previous values of the function. Second, expressing the function g(x) as the summation of polynomials known as Taylor Expansion assuming that g(x) is a continuous and differentiable function.
Finally, the mathematical background is enough to go further in the proposed methodology. As a summary, above, we mentioned three important tools. First, a function is expressed as the summation of its previous samples. Second, Taylor expansion for a continuous and differentiable function is defined. After that, the Caputo definition of the fractional derivative is given. Now, it is time to express Deep Assessment Methodology by using fractional calculus for the modeling and prediction. Apart from above, there is an assumption that the fractional derivative f (x) in the order of γ is equal to Equation (8). After this assumption, it is required to find unknown f (x) which satisfies the fractional differential equation below and models the discrete dataset.
where, f (x) stands for the GDP per capita of the countries and x corresponds to the time. Note that, in (6), allowing the order of the derivation in the left-hand side of Equation (6) to be non-integer gives a more general model [28]. This generalization is employed in Deep Assessment Methodology for f (x) which stands for the GDP per capita.
Here, the motivation is to find a kn and b kn given in Equation (8). To find the unknowns, the differential equation needs to be solved. The strategy is as follows-first, it is required to take the Laplace transform which leads to having an algebraic equation instead of a differential equation. In other words, the Laplace transform is taken for Equation (8) to reduce the differential equation to algebraic equation, then, by using inverse Laplace transform properties, the final form of f (x) is obtained as Equation (9) [11].
To obtain the numerical calculation, the infinite summation of polynomials is approximated as a finite summation given in Equation (10).
Here, f (0), a kn and b kn are unknown coefficients that need to be determined. Note that, below, properties of the Laplace transform (L) are given to find Equations (9) and (10) [11].
where, L stands for the Laplace transform and L[ f (x)] = F(s).
For the numerical calculation, the infinite summation is converted into a finite summation, as given in Equation (10).

Modeling with Deep Assessment
In this part, the methodology for the modeling of the problem is given in detail. To predict the upcoming years, the problem has four regions as given in Figure 1. Dataset spans in Region 1, 2 and 3. Note that, there is no data for Region 4 where the prediction is aimed. Region 1 is called "before modeling region" which consists of historical data. Each of the coefficients (x − k) n+γ−1 and derivative coming from previous values of GDP per capita for different values of k and multiplication by different weights as given in Equation (10) will add the contribution to the recent data. For modeling, the historical data is employed directly for the modeling of the data located in Region 2. Region 2 and 3 are named as modeling and testing, respectively. In the modeling region, the GDP per capita is tried to be modeled and the unknown coefficients are found. Note that, the approach uses the previous l values (P i−1 , P i−2 , . . . , P i−l and corresponding f (i − 1), f (i − 2), . . . f (i − l)) for arbitrary P i located in Region 2. The third region consists of the data used to test for upcoming predictions. Finally, Region 4 is called the "prediction region" where the aim is to find the GDP per capita values for the time that the actual values have not known yet and implement prediction. The region division is required because there are parameters given in the previous section (Equation (10)) such as M, l, γ which need to be found before the prediction. In Region 2, the modeling is done to find the optimum values of coefficients a kn , M, l, γ in Equation (10) for modeling. To model the data, Least Squares Method is employed, which is explained later in this section. After that, one of the purposes of the study is achieved. This is the modeling of the data using the fractional approach. Then, the second purpose comes which is to predict the values of GDP per capita for the upcoming unknown years. In order to find optimum M, l, γ values for the prediction, Region 3, namely testing is needed. In the region, there is an iterative solution where the real discrete data is again known. For instance, in Region 3, it is required to find f (m 1 + 1). Then, by using the proposed method employing the fractional calculus and Least Squares Method, f (m 1 + 1) is obtained with a minimum error by optimizing M, l, γ values for f (m 1 + 1) itself. Then, f (m 1 + 1) is included the dataset for the next test which is done for f (m 1 + 2). This continues up to f (m). Then, with optimized M, l, γ, the predicted f (m x ) is found in Region 4.
To model the known data, f (x) representing the data optimally should be obtained. In other words, the unknowns a kn , b kn and f (0) in Equation (10) or Equation (11) should be determined. For this, the Least Squares Method is employed.
In Equation (12), the squares of total error T 2 is given. The main purpose of the modeling region is to minimize T 2 by a gradient-based approach which requires minimization of the square of the total error as the following.
where, r = 1, 2, 3, . . . l and t = 1, 2, 3, . . . M. It is better to give an example of how to obtain The same procedure is followed for This leads to having a system of linear algebraic equations (SLAE) as given in (13).

[A], [B] and [C] is shown in Equations
To find f (x) the continuous curve modeling with a minimum error, the optimum fractional-order γ is inquired between (0, 1). Then, with optimum fractional-order γ, the unknown coefficients are determined. In the study, the GDP per capita of Brazil, China, the European Union, India, Italy, Japan, the UK, the USA, Spain and Turkey were used from 1960 until 2018 [38]. The dataset is shown in Tables A1 and A2. Among them, the year 2018 is in Region 3 as testing to predict for the next years. Here A matrix consists of the matrix set below, where,           To model the known data, ( ) representing the data optimally should be obtained. In other words, the unknowns , and ( ) in Equation (10) or Equation (11) should be determined. For this, the Least Squares Method is employed.
In Equation (12), the squares of total error is given. The main purpose of the modeling region is to minimize by a gradient-based approach which requires minimization of the square of the total error as the following. where, = 1, 2, 3, … and = 1, 2, 3, … .
It is better to give an example of how to obtain

Prediction with Deep Assessment
To find the optimized values of the unknowns for the prediction, the testing region (3rd region) is required. The predictions obtained in the test region (m 1 < i < m) are also given in Table 1. For testing, the data up to m 1 = 58 have been taken into consideration in the operations. The f (m 1 + 1) value was found from the obtained modeling. Then, the value is kept, and the next step was started again for ( f (m 1 + 2)). These operations are done until the last value of the test zone. In our case, m = 59. The last region is called the "Prediction Region." Here, using Region 1, 2 and 3, the prediction for the upcoming years is obtained. After having modeled and tested regions, the unknowns in Equation (11) have already found in an optimal manner. After testing, Region 4 is started. In the region, the first prediction f (m + 1) is found by using the coefficients and unknowns found by the testing region. After that, the first predicted value ( f (m + 1)) is included in Region 3 (testing) for the consecutive prediction f (m + 2). This procedure is reiterated and recycled up to f (m x ).
The prediction results for 2019 are given in Table 2. For example, as of the end of 2019 ( f (m + 2)), Brazil, China, European Union, India, Italy, Japan, the UK, the USA, Spain and Turkey's GDP per capita values are expected as listed. In Figure 2, the algorithm for prediction with DAM is illustrated. The first step of the algorithm is to initialize the parameters (l, M, x 1 , x 2 , . . . x m and P 1 , P 2 , . . . P m ). Then, the counter variable N is introduced, which counts the number of prediction steps. The total number of required predicted steps is denoted as n 0 . As an initial value, the fractional-order γ is assigned 0 and the increment is 0.01 for each loop to find the optimized value. For each value of γ between 0 and 1, matrix A given as Equation (14) is created and then, the unknown coefficients given in Equation (10) are calculated. After that, using the actual data in Region 1 and Region 2, the modeling of data between P l and P m is actualized for Region 2. Then, the error defined in Equation (12) is calculated. The value of the error is analyzed and compared to previously obtained values. If it is smaller than the previous one, the corresponding fractional-order value is memorized. At the end of Loop II, the optimal value of the fractional-order, which coincides with the optimal modeling is found and corresponding coefficients given in Equation (10) is determined. Then, the prediction for the next forthcoming value is made with Equation (10). After that, all the procedures starting from the increment of N is repeated so that the previously predicted value is added to the initial data for the next step prediction. This process is repeated up to the termination of Loop I. Finally, n 0 the number of predictions is obtained. Keep in mind that, for the parameters l and M, there exist two loops starting from 1 to L 0 and 1 to M 0 searching the optimum values of the parameters in order to get the outcomes with a minimum error for the testing region, respectively. Here, L 0 and M 0 are pre-defined some constant values.

Long Short-Term Memory
In our study, we compare the modeling with the polynomial curve fitting method and in the prediction, we compare Deep Assessment with the LSTM method. Conventional neural networks are insufficient for modeling the content of temporal data. Recursive neural networks (RNN) model the sequential structure of data by feeding itself with the output of the previous time step. LSTMs are special types of RNNs that operate over sequences and are used in time series analysis [39]. An LSTM cell has four gates: input, forget, output and gate. With these gates, LSTMs optionally inherit the information from the previous time steps. Forget gate ( ), input gate ( ) and output gate ( ) are sigmoid functions ( ) and they take values between 0 and 1. Gate has hyperbolic tangent ( ) activation and is between −1 and 1. The Gate and forward propagation equations are listed below as Equations (17)- (22). Here and refer to cell state and hidden state of layer at time step , respectively. Each gate takes input from the previous time step ( ) and previous layer ( ) and has its own set of learnable parameters 's and 's.
Here, ⊙ is the Hadamard product. Each LSTM neuron in a network may consist of one or more cells. In every time step, every cell updates its own cell state, . Equation (22) describes how these cells get updated with forget gate and input gate; gate decides how much of previous cell state that cell should remember while gate decides how much it should consider the new input from the previous layer. Then, LSTM neuron updates its internal hidden state by multiplying output and squashed version of . An LSTM neuron gives outputs only in hidden state information to another LSTM neuron. Gate and are used internally in the computation of forward time steps [40]. To forecast time series and compare our proposed approach to neural networks, we employed a stacked

Long Short-Term Memory
In our study, we compare the modeling with the polynomial curve fitting method and in the prediction, we compare Deep Assessment with the LSTM method. Conventional neural networks are insufficient for modeling the content of temporal data. Recursive neural networks (RNN) model the sequential structure of data by feeding itself with the output of the previous time step. LSTMs are special types of RNNs that operate over sequences and are used in time series analysis [39]. An LSTM cell has four gates: input, forget, output and gate. With these gates, LSTMs optionally inherit the information from the previous time steps. Forget gate ( f ), input gate (i) and output gate (o) are sigmoid functions (σ) and they take values between 0 and 1. Gate g has hyperbolic tangent (tanh) activation and is between −1 and 1. The Gate and forward propagation equations are listed below as Equations (17)- (22). Here c l t and h l t refer to cell state and hidden state of layer l at time step t, respectively. Each gate takes input from the previous time step (h l t−1 ) and previous layer (h l−1 t ) and has its own set of learnable parameters W's and b's.
c l t = f c l t−1 + i g Here, is the Hadamard product. Each LSTM neuron in a network may consist of one or more cells. In every time step, every cell updates its own cell state, c l t . Equation (22) describes how these cells get updated with forget gate and input gate; f gate decides how much of previous cell state that cell should remember while i gate decides how much it should consider the new input from the previous layer. Then, LSTM neuron updates its internal hidden state by multiplying output and squashed version of t . An LSTM neuron gives outputs only in hidden state information to another LSTM neuron. Gate o and c t are used internally in the computation of forward time steps [40]. To forecast time series and compare our proposed approach to neural networks, we employed a stacked LSTM model with 2 layers of LSTMs (each having 50 hidden units) and a linear prediction layer. LSTM model is trained with the Adam optimizer [40].

Numerical Results
In this section, we report the modeling and prediction performance of the Deep Assessment methodology. Further, we compare the proposed method to other modeling and prediction approaches such as Polynomial Model, Fractional Model-1 [34,35] and LSTM. In this section, results are reported with the Mean Average Precision Error (MAPE) metric and calculated as follows: where k is the total number of samples, v(i) is the actual value and ∼ v(i) is the predicted value for i th sample.
Before presenting the results, it is important to highlight that for modeling, M 0 and l 0 are taken 20 and 10, respectively whereas for prediction, M 0 and l 0 are taken 8 and 25, respectively. The number of prediction, n 0 is equal to 1.

Modeling Results
In this part, we compare the modeling performance with Polynomial, Fractional Model-1 and Deep Assessment models.
To achieve modeling, l value needs to be investigated. For the modeling of the GDP per capita of each country, the required previous data l of past years used in the algorithm differs after optimization. In order to make a fair evaluation, l value is fixed among all countries to 10. Modeling results for Deep Assessment, Polynomial Model and Fractional Model-1 are shown in Table 1. Optimized M values after processing can be seen in the last column. The Deep Assessment model has a %4.308 average MAPE and outperforms Polynomial and Fractional Model-1 by %1.538 and %1.899 average error rates. All three methods model the US best with %0.81, %1.01 and %1.06 error. Further, in the case of Italy, Fractional Model-1 uses the fractional-order value of 1 and produces %8.81 MAPE, equal to the Polynomial method as expected because for the fractional-order value of 1 is the same with the Polynomial Method. However, DAM yields fractional order of 0.39, decreasing the error to 4.70%, justifying the advantage of employing fractional calculus and previous values of the data itself.
GDP per capita data, Deep Assessment, Polynomial and Fractional Model-1 modeling results are shown in Figure 1 for each country. One can conclude that when data points have high variance all models produce high error rates, as in Turkey and Italy. For Japan and Brazil, DAM (Deep Assessment Method) and Polynomial models produce similar results. Also, it can be seen from the Figure 3, both Deep Assessment and Fractional Model-1 have a low bias when compared to the Polynomial model and overfits to dataset less. This is possible because of the memory property of the proposed approach. Except for Brazil, India and the EU, the proposed method yields superior results compared to other models.

Prediction Results
In this section, we compare the accuracy rate of the prediction of Deep Assessment and Deep Learning models. As in modeling, the GDP per capita dataset is used to assess the performance of the proposed method. Table 2 illustrates optimized , , values and the corresponding performance of DAM and LSTM. Here, column 6 reports the performance of DAM while column 7 represents LSTM. Column 5 shows that the Deep Assessment methodology predicts GDP per capita with an average 0.29% error with predicting all countries with 1.< (less than 1 percent) of error. The best-predicted country is Spain while UK's prediction is the least accurate with 0.91% error. On the other hand, LSTM yields 1.51% error on average. For both DAM and LSTM, UK yields the highest error. Table 2 demonstrates that in the implemented setting, DAM outperforms LSTM by 1.21% average error and produces fair results.

Prediction Results
In this section, we compare the accuracy rate of the prediction of Deep Assessment and Deep Learning models. As in modeling, the GDP per capita dataset is used to assess the performance of the proposed method. Table 2 illustrates optimized γ, l, M values and the corresponding performance of DAM and LSTM. Here, column 6 reports the performance of DAM while column 7 represents LSTM. Column 5 shows that the Deep Assessment methodology predicts GDP per capita with an average 0.29% error with predicting all countries with 1.< (less than 1 percent) of error. The best-predicted country is Spain while UK's prediction is the least accurate with 0.91% error. On the other hand, LSTM yields 1.51% error on average. For both DAM and LSTM, UK yields the highest error. Table 2 demonstrates that in the implemented setting, DAM outperforms LSTM by 1.21% average error and produces fair results. Table 3 reports the prediction of GDP per capita for the year 2019 is illustrated in Table 2 for both DAM and LSTM methods. For countries Brazil, China, India, Turkey, the UK and the US, predictions obtained by the two models are similar. On the other hand, Italy and Spain yield different results.

Conclusions
In this study, a model called "Deep Assessment" is introduced which employs Fractional Calculus to model discrete data as the summation of previous values and derivatives. Different to the literature and our previous work, the proposed approach also predicts the incoming values of the discrete data in addition to modeling. The method is evaluated on modeling and predicting GDP per capita, using a dataset including the period of 1960-2018 for nine countries (Brazil, China, European Union, India, Italy, Japan, UK, the USA, Spain and Turkey) and the European Union. Using the fractional differential equation and the summation of previous values for the modeling of GDP per capita at a specific time instant bring non-locality, memory and generalization of the problem for different fractional order. In experiments, first, GDP per capita is modeled. The Deep Assessment model has a 4.308% average MAPE and outperforms Polynomial and Fractional Model-1 by 1.538% and 1.899% average error rates for modeling. For prediction, LSTM, a special type of neural network is used to assess the performance of the model. In the selected test region, it is shown that Deep Assessment is superior to LSTM by 1.51% average error. Results illustrate that the proposed method yields promising results and demonstrates the benefits of combining fractional calculus and differential equations. Evaluation of multivariable and multifunctional problems, analyzing time windows, randomness, noise and error changes are left to future work.
Author Contributions: The contribution of each author is listed as follows. E.K. has contributed to supervision, conceptualization, investigation, methodology, and administration. V.T. plays an important role in resources, supervision, and validation. K.K. supported conceptualization, writing, and editing. N.Ö.Ö. was the key person about visualization, investigation, administration, validation, and writing. E.E. has contributed to validation, visualization, writing, and editing. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript or in the decision to publish the results.