Degree Approximation-Based Fuzzy Partitioning Algorithm and Applications in Wheat Production Prediction

Recently, prediction modelling has become important in data analysis. In this paper, we propose a novel algorithm to analyze the past dataset of crop yields and predict future yields using regression-based approximation of time series fuzzy data. A framework-based algorithm, which we named DAbFP (data algorithm for degree approximation-based fuzzy partitioning), is proposed to forecast wheat yield production with fuzzy time series data. Specifically, time series data were fuzzified by the simple maximum-based generalized mean function. Different cases for prediction values were evaluated based on two-set interval-based partitioning to get accurate results. The novelty of the method lies in its ability to approximate a fuzzy relation for forecasting that provides lesser complexity and higher accuracy in linear, cubic, and quadratic order than the existing methods. A lesser complexity as compared to dynamic data approximation makes it easier to find the suitable de-fuzzification process and obtain accurate predicted values. The proposed algorithm is compared with the latest existing frameworks in terms of mean square error (MSE) and average forecasting error rate (AFER).


Introduction
Currently, time series having indecision observations are called fuzzy time series, a term originally defined by Song and Chissom [1,2]. The interpretation obtained from time series is then transformed into fuzzy sets. There is a need for data available in numerous forms multiplied over time. Forecasting is suitable for circumstances where vacillation linked to the outcome is tangible. Time series exploration is an essential mechanism for forecasting the unknown on the basis of its past history. The two significant methods that fit this category are time series and regression. Modern approaches to time series forecasting are influenced by the repetition of history itself. Time series include the recorded values of the variable in the past and also include the present value. This method supports the discovery of arrangements and the inference of future events based on the patterns established as the chief focus material of time series analysis. Solutions to various practical problems related to finance, economics, marketing, and business as well as prediction in economic and sales forecasting, information systems forecasting, stock market prediction, the number of outpatient visits, etc., can be determined using time series.
The idea for the exploratory work on this topic came from an extensive study of work previously done in the niche field of extrapolative demonstration using fuzzy logic. In an agrarian country with a primarily tropical climate, a tropical plant like wheat presents itself as a very lucrative and justifiable topic. To put this into perspective, Asia on its own harvests and ingests more than three quarters of the global wheat production. If economists are to be believed, this dominance by Asia in the global wheat market leads to a reduction in poverty in the region. With an improvement in production and quality of yield, wheat becomes more accessible to people from all walks of life at a lower price, which in turn pushes farmers to invest in sophisticated and valued crops. These crops bring additional income and prosperity to the farmers' families and improve consumer food products. The sustainable computing and management of natural resources has therefore become an imperative field of study [3]. The assessment and forecasting of wheat manufacture certainly require much effort [4].
Various experimental results have been published where prediction has been shown on different datasets based on time series forecasting. Forecasting and predictions help in combating decision-making problems. Askar [5] proposed an autoregressive moving average model to predict wheat crop yield. Sachin [6,7] worked specifically on predicted rice yield for inventory management using a fuzzy time series model. Narendra [8] proposed a model for a terse-period agricultural protraction estimate. Eǧrioǧlu et al. [9,10] and Wangren et al. [11], on the other hand, implemented a generalized equivalent length breaks implanted for improvement. The former used a genetic approach.
In contrast to the above discussed methods, the proposed method in this study focused on diverse and finer levels of partitions with respect to fuzzy series data. Using this degree approximation method based fuzzy portioning, a higher prediction accuracy was observed. The method of fuzzy partitioning involved the creation of newly generated fuzzy sets based on the underlying data. The time series wheat data undertaken consisted of dynamic data whose feature value changes as a function of time.
In partitioning, elements that are more similar than others form members of one set, whereas dissimilar elements form different fuzzy sets. Prediction was done under a fuzzy environment that consisted of ambiguity, improbability, and inaccuracy. The fuzzy intervals were divided based on the frequency of number of times series data. Later, historical time series data analysis was performed by computation of higher order logical fuzzy relations based on the universe of discourse. The novelty of this paper is explained below.
The proposed method used the first 9th and 11th interval time series fuzzy partitioning for wheat production prediction. Based on the interval-based fuzzy partition degree, approximation was applied for real-time wheat produce forecasting. De-fuzzified outputs obtained from approximations were estimated for error and compared with four existing methods. The decision to use fuzzy partitioning in comparison to a regression model was due to the fact that relationships become more complex when dealing with time series data. As proposed in our case, the wheat dataset was dynamic as a function of time, and the use of regression would not produce compact sets. Fuzzy partitioning was a better approach that used degrees of memberships rather than a strict rule as in case of regression. Because the relationship in our time series dataset was not sufficient to apply regression, fuzzy partitioning was a better choice. The method of fuzzy partitioning was closer to human observation behavior as compared to a linear regression model. Furthermore, the new method for forecasting wheat production with a fuzzy time series using degree approximation as a fuzzy relation for forecasting provided lesser complexity in the linear order. Such simplicity was extended to cubic and quadratic polynomial approximation which minimized the time needed to generate relational equations based on complex min-max composition operations, as well as the various hits and trials of the defuzzification process that might be required to achieve better accuracy as used in [6][7][8][9]12] as well as by Singh [13]. Two-set partitioning with lower and higher approximation performed over regression analysis finally helped in selecting a best fit line/values that represents the average across all points in graph [14].
The rest of the paper is organized as follows: Section 2 provides the literature overview about the use and progress of time series-based fuzzy partition for prediction problems. Section 3 gives the complete explanation of the proposed framework for the algorithm formulated. Section 3.1 gives a diagram workflow representation of the framework followed by a numerical example explaining the methodology in brief. In Section 3.2, a detailed explanation of the proposed methodology, which we named data algorithm for degree approximation-based fuzzy partitioning (DAbFP), is given with intermediate results. The fuzzy logic relation (FLR) for different intervals is calculated using the wheat yield dataset for different years. Thereafter, average forecasting error rate (AFER) and mean square error (MSE) formulas are also mentioned. Section 4 lists experiments using different degree polynomials and calculating the AFER and MSE for the corresponding polynomials with their respective plots. The proposed algorithm is compared with the existing methods in terms of AFER and MSE with respect to other algorithms. Finally, Section 5 depicts the final conclusions about the method and its implications over the wheat dataset and also emphasizes its future application and scope.

Literature Review
Fuzzy-based time series forecasting is used to examine information which is neither explicit nor precise. Researchers have developed fuzzy time series perceptions and definitions to deal with imprecise and vague information systems where decisions or predictions could be carried out. This was later proposed by Song and Chissom, who also portrayed a special dynamic forecasting process with linguistic values [1,2,15]. Fuzzy forecasting to predict links in social networks has been described by authors in [16]. Later, the authors in [12] formally defined a fuzzy time series model described in Section 2.2.
Qiu, Liu, and Li [17] proposed a particle swarm optimization technique for similar forecasting. Primarily, data dealing with time series from the University of Alabama [18] were used. An average autocorrelation function was framed to give high forecasting accuracy. In order to analyze time series using computed fuzzy logical relations of higher order, Garg et al. [19], Son [20], Hunrag [21,22], Hwang and Chen [23], Lee Wang and Chen [24], Chu and Kim [25], and Sheta [26] developed extensive fuzzy as well as decision based forecasting methods in order to augment forecasting accuracy, each having minor variations. Lee [27] proposed a fuzzy candlestick [28] pattern to store financial expertise. To obtain highly mosaic matrix computations, a multivariate heuristic model was modelled and implemented in [29]. A determination of the interval over varying length was given by Hiemstra [30]. A number of repetitions of fuzzy relationships were used to determine the weights in fuzzy time series data in [31][32][33]. Regular increasing Monotone (RIM) quantifiers were used by Garg et al. [34,35] to design a priority matrix.
Several distinguished and relative works have been done by Klir et al. [36] and Dostal [37] with some native approaches for prediction. The use of optimization techniques in commercial and communal sector was also demonstrated by Dostal [38]. Li et al. [39] introduced fuzzy logic linking to chaos theory. Peters [40,41] extended it to fractal market analysis in capital markets. Trippi [42] represented fuzzy logic to chaos and non-linear dynamics in financial markets. Altroc [43] applied to business and finance using neuro-fuzzy. Hamam et al. [44] evaluated superiority of understanding of haptic centered uses based on fuzzy logic. Alreshoodi [45] researched an experiential learning established on a fuzzy logic method to measure the QOS/QOE correlation for covered video streaming. Doctor et al. [46] entrenched agent-based method for comprehending ambient intellect. Wang et al. [47] generated fuzzy instructions by learning from instances. In [48] a high order approximation for forecasting tourism demands in turkey using fuzzy time series data and artificial neural network is proposed. Another, new approach using fuzzy type-2 logic and fractal theory was given by Castillo and Melin [49]. The experimental study was done to establish the span of breaks with fuzzy time series [50]. A non-linear optimization with polynomial time series is another work presented by authors [51]. The forecasting models based on Event discretization function were placed forward.
In this paper, the dataset used for forecasting wheat production is taken from a source [52]. Son et al. [53] established a fuzzy clustering method for weather forecasting. Also, a neuro-fuzzy system has been designed and evaluated for insurance forecasting [54]. In [54], the authors have used an ensemble learning technique with limited fuzzy weights. Adaptive neuro-fuzzy [55] framework is another work by [56] in field of wheat production forecasts.
In [57], a different dataset for wheat production forecasts using soil properties has been used. Some of the properties of soil like shear strength has been predicted in [58]. Similarly, an adaptive fuzzy rule-based technique with automatic parameter updating has been used to model financial time series in [59]. A systematic approach has been discussed in [60] for detection of structural breaks in time series, namely the fuzzy transform and other method of fuzzy natural logic. It is based on F-transform to calculate slope of time-series. Another problem of the separable verification of fuzzy binary relations has been addressed in [61] providing necessary conditions and a well-organized algorithm for checking the same.

Mathematical Preliminary
This section presents the preliminaries needed to understand any problem of time series forecasting Definition 1 [62]. Given F(t) as the group of all possible values of fuzzy time series at time t, F(t − 1) is group of all possible values at t − 1 having Z as a fuzzy relation between F(t) and F(t − 1) where Z is a union of all fuzzy relations defined as: Then a first order time invariant series model is expressed as Definition 2 [62]. Let be U the universe of discourse, U = {u1, u2, u3 . . . } and U be a finite set A fuzzy set F of U can be expressed as follows: where "+" is operator ∪ and "/" is separator.
Definition 3 [63]. Assume that F(t) is a fuzzy time series, and Z (t, t − 1) is a first order model of time series Definition 4 [64]. Given F(t) as the time series data D, with Ft(I) as fuzzy set, then defuzzified value Fd is defined as the z-value with the highest membership degree.
Definition 5 [64]. Given F(t) as the time series data D, with Ft(I) as fuzzy set, a quasi-arithmetic mean for fuzzified output is: Definition 6 [65]. AR model of a given order r is defined as: where W t−1 . . . . . . . are independent variables and ρ 1 . . . ρ r are model parameters.
Substituting values for ρ parameter aids in prediction.

The Need of This Framework
Recent studies on wheat production forecasts have been conducted in [56][57][58]. Here, the later of an artificial neural network with fuzzy systems have been used for predicting forecast for a 5-degree polynomial in only two periods. In another work, ensemble learning with limited fuzzy weights was used. While the former uses another artificial neural network to forecast production based on energy inputs, another decision making analysis has been done in [66]. Several prediction procedures on case basis has been done by authors in [67][68][69][70]. The above stated method provided prediction using support vector machines based on soil properties. A similar prediction was performed in [7][8][9] where data are not partitioned and fuzzified as per time series.
The proposed method in this paper will take the yield data in reference to time series fuzzified in diverse partitions and give precise prediction. The precision comes from the 9 or 11-level linguistic partition carried out over large time series scale. Our method outperforms the existing 4 methods in terms of RMSE and AFER. Hence, a consolidated framework to perform predictions over multiple and diverse linguistic partitions is needed.

The Workflow Diagram
In this section, an overview of the proposed framework with simulation steps is given in Figure 1. Table 1 gives the linguistic fuzzy set partitioning while Table 2 gives the frequency distribution over 9 interval partitioning.

F1
very meagre produce F2 meagre produce F3 better than poor produce F4 not so quality produce F5 average production F6 superior produce F7 very superior produce F8 Very very superior produce F9 tremendous produce

DAbFP Algorithm
The proposed algorithm is performed on the source dataset taken under following steps: Step 1: Let D denotes the source dataset variable.
Using Definition 2, Universe of Discourse (U) is defined as x, y ∈ R+, given R as real numbers.
Step 2: Partition the dataset D into suitable four frequencies to perform subsequent forecasting steps to each group: Step 3: Using above partitioned data as D new , we define fuzzy sets as F1, F2 . . . F7 linguistically mapped over the universe of discourse U defined as follows: Every partition obtained in the partitioning based on frequency is represented by F(I), where (I) indicates the intervals inside its value exist. The value of the outcome increases on increasing the value of "I". The same taxonomy helps to provide an evocative vision to the researchers. For instance, every interval can be signified by fuzzy partitions if we are operating on 9 partitions, as presented beneath: Hence, growth in the suffix (I) is evidently related through greater harvest in the production of wheat and having the same taxonomy. Subsequently, Fuzzy Logic Relationships (FLR) is recognized for the specified group of values. It can be elucidated over the particular instance. Here, q1, q2 . . . q7 ∈ fixed length intervals.
Step 5: Mean of middle values of fuzzy partitions on the Right-Hand Side of Fuzzy logic relation (FLR) is calculated. This calculation is performed for degree approximation. For instance, in the 2nd order FLR, F4 <-F2, F7. If P and Q are the centers of Interval F2 and F7 respectively then where for fuzzy partition F4, R is the center. Likewise, for 3rd order FLR: If F <-F2, F7, F3 where P, Q, R are the centers for Interval F2, F7 and F3 respectively then Here, S is the mean fuzzy value for a particular forecast year. It is used in Linear Regression Model as a variable, for thorough de-fuzzification. From this, the results can be used to calculate the forecast value: Here, n is total number of values while value(Fi(d)) is the fuzzy value at degree. As per the steps followed in the proposed algorithm, Tables 3-5 give the intermediate results. In this section, the concluding part of the devised algorithm is explained with the results presented in Tables 6-9.
Step 6: After degree approximation based on fuzzy logic relation, defuzzification is performed using regression analysis. On plotting the points, we select a Best Fit line that represents average across all points in graph. Thereafter, the equation of line is estimated which can be linear or polynomial of higher degrees 2, 3, 4, 5 or 6. In the consequent section, we use two important constraints to associate the outcome as stated below: Average Forecasting Error Rate (AFER) = Mean Square Error (MSE).
Here, X i is the actual production cost whereas Y i is the predicted value.
In year 1984, Produce = ? (Assume this request to be forecast, let F be the partition where value is contained).
Hence, the above Logical Relationships can assist forecast for a specific year by means of the values obtained for earlier years and then creating a relationship amongst values. Fuzzy Logical Relationship of Order 3 is: F = F2, F7, F3, here F is the forecast partition for produce in year 1984. Now, an appropriate defuzzification procedure can be functional on these values to forecast value of the harvest in year 1984 (conferred in step 5), agreed that appropriate calculations are done for fuzzy sets which resemble to the fuzzy partitions in the previous years.
By means of formulation stated above, the results are shown in Tables 3-9 is calculated for 9 and 11 partitions in together of order two and order three FLR.

Linear Polynomial
A linear polynomial relation is defined as: Here, variable Y provides the value that is predicted. Output for year and the input variable X fed to equation using form (14, 15, 16, which relates to years 1981, 1982, 1983, 1984 . . . ). By means of Figure 2a,b, one can calculate the yearly predicted results and then estimating the AFER and MSE as given in Table 10.

Linear Polynomial
A linear polynomial relation is defined as: Here, variable Y provides the value that is predicted. Output for year and the input variable X fed to equation using form (14, 15, 16, which relates to years 1981, 1982, 1983, 1984…). By means of Figures 2a and 2b, one can calculate the yearly predicted results and then estimating the AFER and MSE as given in Table 10.

Quadratic Polynomial
A linear polynomial relation is defined as:

= + +
Here, variable Y will give the value that is predicted Output for year and the input variable X fed to equation using form (1, 2, 3, 4…., which relates to years 1981, 1982, 1983, 1984…). By means of Figures 2c and 2d, one can calculate the yearly predicted results and then estimating the AFER and MSE was calculated matching to figure as given in Table 11.

Quadratic Polynomial
A linear polynomial relation is defined as: Here, variable Y will give the value that is predicted Output for year and the input variable X fed to equation using form (1, 2, 3, 4 . . . ., which relates to years 1981, 1982, 1983, 1984 . . . ). By means of Figure 2c,d, one can calculate the yearly predicted results and then estimating the AFER and MSE was calculated matching to figure as given in Table 11.

Cubic Polynomial
The cubic polynomial relation is given as: Here, variable Y will give the value that is predicted output for each year and input variable X fed to equation using form (1, 2, 3, 4 . . . ., which relates to years 1981, 1982, 1983, 1984 . . . ). By means of Figure 2e,f, one can calculate the yearly predicted results and then estimating the AFER and MSE matching to figure as given in Table 12.

Results
From the above analysis, we have computed the mean MSE and AFER values for final predicted values where the degree approximation is computed accordingly. The proposed algorithm is initially compared with baseline method such as Chissom [1,2] on benchmark data for forecasting the enrollments of University of Alabama. The superiority in values in terms of MSE and AFER marks it as a probable candidate for predicting wheat production in future as shown in Table 13. For further performance analysis, the proposed method is hereby compared with existing methods as shown in Tables 14 and 15 for both 9 and 11 intervals. The proposed algorithm outperforms the existing ones in terms of MSE and AFER; thereby proving to be a best fit for wheat produce prediction. The MSE and AFER of the proposed algorithm comes out to be 362,119.88 and 5,107,713.738 for 3rd degree and 2nd degree polynomial in 9th interval as compared to the MSE of 36,559.88 and AFER AS 11.92547975 for 3rd degree polynomial of Yalaz et al. [64]. Similarly, the values of MSE and AFER are compared in Tables 13 and 14 for 9th interval 2nd degree polynomial. Also, the evaluation statistics of our proposed algorithm outperforms in 11th interval.
In Figure 3, the FLR 3rd degree MSE is generally higher than FLR 2nd degree MSE except in case of polynomial degree 3. We can infer that Linear FLR 2nd degree polynomial has the lowest MSE among all the cases for 9th interval. It is convenient to estimate a particular case is the best among all others. As it can be inferred from the graph, total 10 cases for 9th interval has been monitored. We have also worked on 8 cases in 11th interval partitioning as shown in Figure 4. In Figure 3, the FLR 3rd degree MSE is generally higher than FLR 2nd degree MSE except in case of polynomial degree 3. We can infer that Linear FLR 2nd degree polynomial has the lowest MSE among all the cases for 9th interval. It is convenient to estimate a particular case is the best among all others. As it can be inferred from the graph, total 10 cases for 9th interval has been monitored. We have also worked on 8 cases in 11th interval partitioning as shown in Figure 4.

Conclusions
Various researchers in the past have tried to explore this prediction modeling field using fuzzy logic. Further research is needed by researchers around the world. In this paper, we proposed a novel algorithm using fuzzy linear regression to forecast wheat production. The results demonstrated the efficiency of the suggested method. Further studies will focus on accelerating computational time of this method by GPU and examining other wheat problems or exploring advanced methods [70][71][72][73][74][75][76][77][78][79][80][81][82].

Conclusions
Various researchers in the past have tried to explore this prediction modeling field using fuzzy logic. Further research is needed by researchers around the world. In this paper, we proposed a novel algorithm using fuzzy linear regression to forecast wheat production. The results demonstrated the efficiency of the suggested method. Further studies will focus on accelerating computational time of this method by GPU and examining other wheat problems or exploring advanced methods [70][71][72][73][74][75][76][77][78][79][80][81][82].

Conflicts of Interest:
The authors declare no conflict of interest.