A Novel Approach to Generate Type-1 Fuzzy Triangular and Trapezoidal Membership Functions to Improve the Classification Accuracy

: Fuzzy logic is an approach that reflects human thinking and decision making by handling uncertainty and vagueness using fuzzy membership functions. When a human is engaged in the design of a fuzzy system, symmetric properties are naturally preferred. Fuzzy c-means clustering is a clustering algorithm that can cluster datasets to produce membership matrix and cluster centers, which results in generating type-1 fuzzy membership functions. However, fuzzy c-means algorithm has a limitation of producing only a single membership function type, Gaussian MF. Generation of multiple fuzzy membership functions is of immense importance as it provides more efficient and optimal solutions to a problem. Therefore, an approach to generate multiple type-1 fuzzy membership functions through fuzzy c-means is required for the optimal and improved results of classification datasets. Hence, to overcome the limitation of the fuzzy c-means algorithm, an approach for the generation of type-1 fuzzy triangular and trapezoidal membership function through fuzzy c-means is considered in this study. The approach is used to calculate and enhance the accuracy of classification datasets called iris, banknote authentication, blood transfusion, and Haberman’s survival. The proposed approach of generating MFs using FCM produce asymmetric MFs, whose results are compared with the MFs produced from grid partitioning (GP), which are symmetric MFs. The results show that the proposed approach of generating type-1 fuzzy membership function through fuzzy c-means is effective and can be adopted.


Introduction
Being a human, many tasks seem very simple and straightforward, such as carrying backpacks, eating food and travelling from one place to another. The same tasks can be very challenging to the machines or computers. The human ability to handle ambiguous and uncertain data make it very simple to perform such tasks, but this ability is lacking in the computers. To replicate human behavior in the machines we need to design them to have the ability to deal with the ambiguity and uncertainty of knowledge. This is precisely what fuzzy logic accomplishes [1]. Fuzzy logic systems design a model in a way that enable it to handle uncertain information, which makes complicated tasks simple for computers.
Since the dawn of the modern science till the end of 19th century, uncertainty was commonly regarded as undesirable and usually the idea was to ignore it. This approach of handling uncertainty gradually changed in the start of 20th century with the emergence of statistical mechanics. The founder of the fuzzy sets and logic Lofti Zadeh introduced the idea of fuzzy logics in 1965 [2]. Zadeh proposed a set whose boundary was not definite or sharp, which was in contrast with the classical concept of a set, known as the crisp set, whose boundary is meant to be definite and precise.
With the emergence of fuzzy logics, the ways of generating fuzzy MF became very vital as it acts as the key component in converting crisp inputs into fuzzy inputs and solving problems, few methods of generating MFs are presented in [3,4]. Fuzzy MF is a scope that describes how each point between 0 and 1 is mapped to membership degree [5]. Membership function is very important component of fuzzy logic and makes it critical to select the type of MF being used in the system being designed. The most popular and widely used fuzzy MFs are Gaussian, triangular and trapezoidal MFs.
Fuzzy MFs can be generated in one of two ways: the expert knowledge approach, where experts provide the parameters for the generation of MFs, or the data-centric approach where MFs are automatically generated based on data [6]. The generation of MFs through the expert knowledge approach suffers drawbacks such as lack of accuracy [7], time taken [8] and unavailability of experts at all times [9]. The data-centric approach, on the other hand, reduces these drawbacks. Therefore, most research is focusing on generation of MFs based on the data-centric approach because it entails the automatic generation of MFs by learning from historical data. The study done by [10] briefly reviews all the previous approaches used to generate fuzzy MFs. The study done by [11] and their extended work presented in [12] briefly described the MF generation issue through FCM. They approximated MF by a heuristic approach but, unfortunately, their approach cannot always result in the accurate shape of triangular and trapezoidal MFs. Moreover, this limitation gradually effects the accuracy and precision of the results. Hence there remains the need for an approach generating the correct linear type-1 triangular and trapezoidal membership function. Therefore, in this research we propose to generate type-1 fuzzy triangular and trapezoidal membership functions using FCM. The objective of this paper is to describe the generation of type-1 fuzzy triangular and trapezoidal MFs using FCM and to present a comparative analysis to prove the accuracy of the proposed approach.
The majority of prior research has only focused on generating Gaussian MFs with FCM or utilizing Gaussian MFs generated by FCM in their solutions. There is still a need to do research to improve the FCM-based MF generating method so that it can produce triangular and trapezoidal MFs. The approach ensures flexibility in MF type selection, hence enhancing FIS effectiveness by exploiting the benefits that each of the three MF types can offer.
In many, if not most, studies involving fuzzy sets and fuzzy systems, symmetry is and has always been present. When a human is engaged in the design of a fuzzy system, symmetric properties are naturally preferred. Fuzzy MFs and linguistic terms are the most typical examples, which are usually developed symmetrically and evenly distributed across the universe of discourse. However, this is not frequently the case, when features are generated automatically. From fuzzy measurements to fuzzy control, the same may be said of many other aspects of fuzzy theory and applications.
The remainder of paper is arranged in the following manner: the background of the fuzzy set and classification is explained in Section 2, a proposed approach and methodology are presented in Section 3. Section 4 presents the experiments and simulation results, and Section 5 provides future direction and the conclusion of the research.

Type-1 Fuzzy Set and Logic System
A type-1 fuzzy set (T1FS) F can be described as follows; assuming X is the universe with collections of x objects, a fuzzy set F is defined as the following: The notation µF(x) in Equation (1) is a MF, which contains the membership degree of each element in X. The value of membership degree is within the range of 0 and 1.
A T1FLS theory is used to map crisp input into outputs in a type-1 fuzzy logic system (T1FLS). A T1FLS contains four key functions called fuzzifier, fuzzy rule base, fuzzy inference engine and defuzzifier. T1FLS block diagram is shown in Figure 1. During the fuzzification process, the crisp value of inputs are transformed into fuzzy values [13]. The fuzzy input is passed to the fuzzy inference engine, which is known as the brain of T1FLS and make decisions. Fuzzy knowledge base is the collection of fuzzy if-then rules based on which the decision in fuzzy inference engine are made [14]. Finally, in the phase of defuzzification the fuzzy output in converted into a crisp output.

Fuzzy Membership Functions
A fuzzy membership function (MF) is a curve that represents all the values in input space between 0 and 1 are mapped to their membership degree. In other words, it can be said that a MF of a set is a link between different components of the set and their belonging degree [6]. Three popular and widely used fuzzy MFs named (i) triangular, (ii) trapezoidal, and (iii) Gaussian MFs are shown in Figure 2. The triangular MF has three parameters, where a and c represent floor values and b represents peak value. The trapezoidal MF is generated based on four parameters, a, b, c, d, where a < b < c < d. Lastly the Gaussian MF is generated based on the center c and width value w.

Related Work and Study
The generation of MF using clustering analysis is presented in [15]. An improved density-based clustering algorithm was used to generate triangular MF. However, their work was not based on an FCM algorithm for the generation of MFs. A circuit to generate MF was presented in [3]. The MF generator was circuit-based and did not depend on the data. Moreover, it was only limited to the generation of a type-1 Gaussian MF and was not based on FCM. [16] proposed evolutionary fuzzy rules for ordinal binary classification. The rough set approach was used and K-means clustering algorithm was applied to generate fuzzy rules. However, the work did not focus on generation of triangular and trapezoidal MFs through FCM. [17] investigated the stability analysis of polynomial fuzzy model-based control systems where two sets of MFs were generated for both polynomial fuzzy model and controller. However, general formed MFs were introduced for nonlinear FIS, which were not generated through data clustering. Studies such as [11] and the extended work presented in [18] have placed a good foundation for the generation of FCM. Unfortunately, the method was unable to generate the correct shape of MFs every time [10]. This limitation may also affect the accuracy and precision of the results. Hence, there remains a need for a method that can produce correct type-1 fuzzy triangular and trapezoidal MFs shapes regardless of the type of underlying dataset.

Proposed Methodology
The proposed method to generate Type-1 fuzzy triangular and trapezoidal membership functions is presented in Figure 3.

Data
Classification datasets were used to implement and validate the proposed approach for the generation of type-1 fuzzy triangular and trapezoidal membership functions. It includes four datasets named the iris dataset [19], the banknote authentication dataset [20], the blood transfusion service center dataset [21], and the Haberman survival dataset [22].

Iris Dataset
The iris dataset is one of the most popular and widely used datasets for addressing classification problems [19]. of 150 instances having four attributes each and one class attribute for each instance. Attributes include sepal length, sepal width, petal length, petal width in cm, which is used to predict the class of iris plant: setosa, versicolor or virginica.

Banknote Authentication Dataset
The photos of genuine and fabricated banknote-like specimens were used to create a banknote authentication dataset in [20]. An industrial camera, which is typically used for print inspection, was utilized for digitalization. The finished photos have a resolution of 400 × 400 pixels. Grayscale photographs with a resolution of roughly 660 dpi were obtained due to the object lens and the distance to the inspected object. To extract characteristics from the photos, the wavelet transform tool was utilized. The dataset was created by Volker Lohweg and was donated in August 2012. The dataset contains 1372 instances, four attributes and one class value attribute, which are used to check the authenticity of bank notes. The attribute variance of a wavelet transformed photo, skewness of a wavelet transformed photo, curtosis of a wavelet transformed photo, and entropy of photo are continuous in nature whereas the class attribute is an integer and is used to check the authenticity.

Blood Transfusion Dataset
The blood transfusion dataset was gathered from the Taiwanese city of Hsin-Chu's blood transfusion service center [21]. The goal of the data was to show how the marketing model worked. Each of the 748 blood donor instances in the dataset has four attributes: R (recency, which shows the month since last donation), F (frequency, which shows the total number of donations), M (monetary, which shows the total blood donated in c.c.) and T (time, which shows the months since first donation) as well as a fifth-class attribute that is a binary variable that indicates whether a person donated blood in March 2007. Prof. I-Cheng Yeh generated and donated the dataset in October 2008.

Haberman's Survival Dataset
Haberman's survival data collection includes cases from a study on the survival of patients who had undergone breast cancer surgery at University of Chicago's Billings Hospital between 1958 and 1970. Tjen-Sien Lim donated the dataset, which consists of three attribute values and one class value that indicates whether a person survived surgery or not in a binary value. There are 306 occurrences for all three attributes namely, patient's age at the time of surgery, patient's year of surgery, and number of positive axillary nodes discovered as well as one class variable that indicates the patient's survival status.

FCM Algorithm
Fuzzy C-Means (FCM) algorithm is used in order to divide the data into fuzzy clusters and calculate U-matrix as well as cluster centers which are then used to approximate triangular and trapezoidal MFs. FCM algorithm can work according to the steps mentioned below: i.
Fix cluster centers (c) i.e., (2 ≤ c ≤ n) and select value for parameter n. ii. Initialize partition matrix (fuzzy membership matrix) Uij. iii. Calculate cluster centers (fuzzy centers) for each step. iv. Update partition matrix (membership matrix).
Here m is a facinis (fuzziness) parameter.
Here E2 is Threshold The flow of FCM Algorithm is presented in Figure 4.

Approximating Type-1 Fuzzy Triangular MF
Data are passed through FCM which in result produces fuzzy membership matrix (U-Matrix) and cluster center. The collection of both U-matrix and cluster center are then passed to the next step where mathematical calculations are performed and parametric values for the generation of type-1 fuzzy triangular MF are calculated, as shown in step 5 of algorithm 1. With the help of these formulated mathematical expression and parametric values, a proposed type-1 fuzzy triangular MF is constructed. The process is explained in detail in the Algorithm 1.

Use parametric values a < b < c to generate triangular MF
U-Matrix and cluster centers are calculated, and the parametric values for the triangular MF are approximated using a formula set. This approach employs the following formula set: The floor and peak values of triangular MF are represented by a, c, and b respectively. α and γ are the values are obtained from U-matrix and cluster centers while β is constant, its value is calculated through experiment and simulation results.
As the values of a, b and c are calculated, these values are used as a parameter for generating type-1 fuzzy triangular MF. Figure 5 shows the triangular MF representation.

Approximating Type-1 Fuzzzy Trapezoidal MF
Similar to triangular MF approximation, the data are passed through FCM to get the membership matrix (U-Matrix) and cluster center. These membership matrix (U-Matrix) and cluster centers are then converted into trapezoidal fuzzy set after being processed by mathematical formulation. As a result, the parametric values for construction of fuzzy trapezoidal MF are achieved. The process is explained in Algorithm 2.

Algorithm 2: Trapezoidal Membership Function Approximation
1. Choose (e, m, iter, ε, a, b, c, d represented by a, d, b, and c respectively. α and γ are the values obtained from U-matrix and cluster centers whereas β and are constant: their value is calculated through experiment and simulation results.
As the values of a, b, c and d are calculated, these values are used as a parameter for generating trapezoidal MF. Figure 6 shows the type-1 fuzzy trapezoidal MF representation.

Performance Measure
To check the accuracy, each dataset was divided into five folds [23,24], among which four folds were used for training while the fifth fold was used as the testing dataset to evaluate the classification results and the outcome is matched with the original dataset. The percentage accuracy was calculated and used to check the accuracy of test. For competitive analysis the results of proposed type-1 fuzzy triangular and trapezoidal MFs were compared with the FCM-based Gaussian MF and the grid partitioning (GP)-based triangular and Gaussian MFs.

Results and Discussion
To validate the approach, the classification datasets were tested against the proposed algorithms to calculate the accuracy of the prediction of classes to which a dataset belongs. The results for each dataset are presented in this section.

Iris Dataset
The iris dataset was divided into five folds. FIS based on proposed type-1 fuzzy triangular and trapezoidal MFs as well as FCM-based Gaussian and GP-based triangular and Gaussian MFs were developed and tested against the test dataset, and the classification accuracy was calculated for each fold. The results of the test are presented in Table 1 and Figure 7. It is evident from the results that the proposed type-1 FCM-triangular MF outperforms other comparative methods for MF generation such as FCM-based Gaussian and GP-based triangular and Gaussian MFs.

Banknote Authentication Dataset
The banknote authentication dataset was divided into five folds. FIS based on proposed type-1 fuzzy triangular and trapezoidal MFs as well as FCM-based Gaussian and GP-based triangular and Gaussian MFs were developed and tested against the test dataset, and the classification accuracy was calculated for each fold. The results of the test presented in Table 2 and Figure 8 show that the proposed methods represented by the FCM triangular and FCM trapezoidal MFs outperform the FCM-based Gaussian and GPbased triangular and Gaussian MFs.

Blood Transfusion Dataset
The blood transfusion dataset is divided into five folds. FIS based on proposed type-1 fuzzy triangular and trapezoidal MFs as well as FCM-based Gaussian and GP-based  Table 3 and Figure 9 show that the proposed methods represented by the FCM triangular and FCM trapezoidal MFs outperform FCM-based Gaussian and GP-based triangular and Gaussian MFs.

Haberman's Survival Dataset
Haberman's survival dataset was divided into five folds. FIS based on proposed type-1 fuzzy triangular and trapezoidal MFs as well as FCM-based Gaussian and GPbased triangular and Gaussian MFs were developed and tested against the test dataset, and the classification accuracy was calculated for each fold. The results of the test presented in Table 4 and Figure 10 show that the proposed methods represented by the FCM triangular and FCM trapezoidal MF outperform FCM-based Gaussian and GP-based triangular and Gaussian MFs.  It can be seen from Table 5 that the proposed type-1 fuzzy triangular and trapezoidal MFs outperform FCM-based Gaussian as well as GP-based triangular and Gaussian MFs.

Conclusions and Future Work
An approach for generating type-1 fuzzy triangular and trapezoidal MFs using FCM is proposed in this study. The proposed approach is validated against classification testing datasets, which concludes that the proposed approach is effective and outperforms FCMbased Gaussian as well as grid-partitioning-based triangular and Gaussian MF. This proposed approach can be extremely useful in the data science field, particularly for problems related to prediction. Many real-world data science problems, including prediction, classification, and regression can be effectively solved by using proposed approach. The approach will be used to forecast electricity demand and pricing in the future. We plan to use fuzzy type-2 and fuzzy interval-type-2 to generate triangular and trapezoidal MFs in the future.