Fuzzy Heuristics and Decision Tree for Classiﬁcation of Statistical Feature-Based Control Chart Patterns

: Monitoring manufacturing process variation remains challenging, especially within a rapid and automated manufacturing environment. Problematic and unstable processes may produce distinct time series patterns that could be associated with assignable causes for diagnosis purpose. Various machine learning classiﬁcation techniques such as artiﬁcial neural network (ANN), classiﬁcation and regression tree (CART), and fuzzy inference system have been proposed to enhance the capability of traditional Shewhart control chart for process monitoring and diagnosis. ANN classiﬁers are often opaque to the user with limited interpretability on the classiﬁcation procedures. However, fuzzy inference system and CART are more transparent, and the internal steps are more comprehensible to users. There have been limited works comparing these two techniques in the control chart pattern recognition (CCPR) domain. As such, the aim of this paper is to demonstrate the development of fuzzy heuristics and CART technique for CCPR and compare their classiﬁcation performance. The results show the heuristics Mamdani fuzzy classiﬁer performed well in classiﬁcation accuracy (95.76%) but slightly lower compared to CART classiﬁer (98.58%). This study opens opportunities for deeper investigation and provides a useful revisit to promote more studies into explainable artiﬁcial intelligence (XAI).


Introduction
Statistical process control charts are commonly used to detect process variations in manufacturing processes [1]. Shewhart-based X-bar control chart introduced in 1920s remains as one of the most widely implemented statistical process control tool [2]. Time series plots from unstable processes would produce unnatural patterns such as trend up, trend down, sudden shift up and down, cyclic, stratification, and systematic patterns, as shown in Figure 1. Most of the time, normal patterns indicate a statistically in-control process. However, as time goes on, the manufacturing process may experience tool wear, operator fatigue, seasonal effects, failure of machine parts, fluctuation in power supply, and lose fixture, among others. For example, a sudden shift pattern could be attributed to failures in machined parts, and a cyclic pattern could be attributed to seasonal changes like fluctuation in temperature [3,4]. Identification and classification of these patterns complemented with process knowledge could be linked to a set of assignable causes for diagnosis purposes. As such, the ability to classify such pattern classes is invaluable for focusing the diagnosis efforts.
Advances in computing and artificial intelligence (AI) have enabled these patterns to be automatically classified. Some of the popular soft computing methods used for control chart patterns recognition (CCPR) are artificial neural network (ANN), support vector machine (SVM), fuzzy inference system (FIS), decision tree and hybrid of these techniques [5][6][7][8]. ANN offers flexibility, learning capability, and is capable of handling Advances in computing and artificial intelligence (AI) have enabled these patte be automatically classified. Some of the popular soft computing methods used for c chart patterns recognition (CCPR) are artificial neural network (ANN), support machine (SVM), fuzzy inference system (FIS), decision tree and hybrid of these tech [5][6][7][8]. ANN offers flexibility, learning capability, and is capable of handling noisy However, ANN is seen like a black box and provides limited internal comprehens to the user. It is difficult to interpret how a CCPR decision has been reached by the based classifier. Besides, it requires a relatively large amount of training data. Su Vector Machine, which uses the principle of structural risk minimization, is an attr alternative for classification problems with a small sample size. However, SVM re parameter optimization and needs to be linked with optimization tools. Also, it i lenging to select an appropriate kernel function in designing the SVM classifier [9] et al. [10] proposed an integrated fuzzy SVM classifier for CCPR. They used a g algorithm to simultaneously optimize the input features subsets and parameters classifier. Sugumaran and Ramachandran [11] reported an application of a decisio for feature selection and for generation of rule set for a fuzzy classifier for fault dia of roller bearing. Recently Zan et al. [4,12] reported a potential application of convo neural network (CNN) and information fusion for CCPR. However, CNN remain o especially among new researchers who need to understand the classification logic to exploring more complex and advanced techniques. Furthermore, there is growi mand for transparency in automated decision-making systems, and the trend is t explainable and trustworthy AI especially for diagnostic purposes such as CCPR.
One of the comprehensible recognition approaches is fuzzy inference system Zarandi et al. [7] investigated Mamdani FIS system with Western Electric [13] run but their study did not include investigation to classify the control chart patterns used a run rules zone tests as the input membership function. Khajehzadeh and Asa used the Sugeno FIS with subtractive clustering technique to classify control cha One of the comprehensible recognition approaches is fuzzy inference system (FIS). Zarandi et al. [7] investigated Mamdani FIS system with Western Electric [13] run rules, but their study did not include investigation to classify the control chart patterns. They used a run rules zone tests as the input membership function. Khajehzadeh and Asady [8] used the Sugeno FIS with subtractive clustering technique to classify control chart patterns. The second approach with interpretable steps is the classification by using a decision tree. Pham and Wani [14] used a decision tree as recognizer for identification of six basic types of control chart patterns. Gauri and Chakraborty [5,15] also proposed classification by a decision tree algorithm. Bag et al. [16] compared CART and quick unbiased efficient statistical tree (QUEST). They reported that the CART algorithm was more effective compared to the QUEST algorithm.
In development of CCPR, input data can be represented either as raw data or as minimal features set. However, it is not practical to use raw data directly for classification by either fuzzy classifiers or decision trees [14]. The most common approaches adopted by researchers are shape features, statistical features, or hybrids of both. Each type of feature set has its own merits and demerits considering different methods in their extraction and selection. Pham and Wani [14] used nine shape features for the first time to recognize six basic control chart patterns. Gauri and Chakraborty [15] used CART-based decision tree with seven shape features. Hassan et al. [17,18] proposed six statistical features set out of ten candidate features for CCPR using ANN recognizer. Al-Assif [19] used features based on wavelet denoise for ANN recognizer. Cheng et al. [20] proposed features based on correlation analysis and ANN as the recognizer. Khormali and Addeh [21] developed six new features based on type-2 fuzzy c-mean approach and support vector machine (SVM) as recognizer. Masood and Hassan [22] investigated feature-based ANN scheme for recognizing bivariate correlated patterns. Bag et al. [16] reported a study using CART to recognize different types of CCPs with shape features. Their results indicate that the performance of feature-based CCP recognition approach is promising. Hassan et al. [17] argued that significant improvement in classification performance of CCPR when using features compared to raw data is due to dimensionality reduction and compact classifier.
This study investigated two soft computing methods, namely fuzzy heuristics based on Mamdani fuzzy inference system, and Classification and Regression Tree (CART). Based on our review, there has been lack of study comparing these two methods particularly with statistical features as input representation for control chart pattern recognition. The purpose of this paper is to develop CCPR classifiers using the above two methods and compare their performance in classifying X-bar control chart patterns. These classification methods were chosen since they provide transparency and a comprehensible decision-making process, require relatively less training data, and potentially result in high recognition accuracy. Having comprehensible logic is desirable to create a trustworthy decision-making system. It also facilitates acceptance among practitioners especially when predicted patterns can be associated with diagnostic and preventive actions. The rest of the paper is organized as follows: Section 2 discusses the overall methodology, Section 3 covers the development of heuristic Mamdani fuzzy classifier, Section 4 explains the development of decision tree classifier, Section 5 presents the results and discussion, and finally, Section 6 concludes the paper.

Methodology
The methodology of the study involves sample patterns generation, statistical feature extraction, classifier design and development, and finally, performance evaluation.

Sample Patterns Generation
It is not economically possible to collect a sufficiently balanced amount of catastrophic unstable process data from real life situations. Thus, it is an acceptable practice for researchers in this area to use mathematical pattern generator to mimic real-life process deterioration [5,17]. Furthermore, it is not possible to know exactly what type of patterns generated from the real process without thorough diagnosis of the real process. These data sets can be generated by standard patterns equations given in Table 1, along with the variability within the specified parameter ranges. A noise magnitude of 1/3σ was used in the pattern generators. These parameters were randomly varied between the specified ranges. We adopted a Swift [23] approach for sample data generation to provide various types of patterns as plotted on Shewhart X-bar chart. The notation µ represents the process mean for stable process, N represents standardized normal distribution and γ is the related parameters. In this study, each sampled data stream has 20 time series subgroup data with a sample size of five. It was assumed that all the sampled streams demonstrated fully developed patterns in the observation window before being recognized. Each data set has a total of 800 sample patterns, where each pattern type comprises 100 patterns. Table 1. Equations for control chart patterns samples generation [5,17].

Features Extraction
Statistical features extracted from raw data were used as input data representation to achieve dimensional reduction [9,24]. Ten statistical features were extracted from each of the sampled data streams. The candidate features were MEAN, standard deviation (SD), skewness, mean square value (MSV), cumulative sum (CUSUM), autocorrelation, range, MEDIAN, kurtosis and SLOPE. Some of the mathematical expressions of statistical features are summarized below [18].

•
Skewness: The symmetry of shape distribution. The estimate of the skewness in data points from X 1 to X n is; where X i is individual value, µ is mean and s is sample standard deviation and n is the number of points or window size.
• Mean Square Value: where X i are the individual values and n is the number of points or window size.
• CUSUM: It is the cumulative sum of values. The last statistical value of CUSUM is taken as the feature in this study. The general formula for upper and lower CUSUM statistics are: where the starting values of C i + , C i − are set to zero.
• Autocorrelation: Exists when later data is dependent on previous data.
• Kurtosis: Measures the peakness of a distribution The factor 3 is used for standard normal distribution to get k = 0.
• SLOPE: The first order line fitting. The slope m is used as a feature in this study.
where C is the y-intercept and m is the slope. The slope can be calculated using the following equation.
The common ones are intentionally excluded from the above list. Each feature has different numerical values for each pattern, and they were normalized to be in the same range [−10, 10] to give appropriate scaling and visibility before being presented to the classifiers.

Classifier Design and Development
Two types of explainable classification method were used in this study, namely the fuzzy inference system (FIS) and classification and regression trees (CART). Fuzzy classifier is transparent to interpretation and analysis. The development of fuzzy classifier involves representation of data in fuzzy set format, selecting optimal fuzzy sets using similarity analysis, designing FIS and finally testing and performance evaluation. We implemented Mamdani FIS using the MATLAB fuzzy toolbox since this heuristic was more appropriate for our CCPR compared to the Sugeno FIS. The simplified fuzzy triangular membership function was used as input and fuzzy IF-THEN rules for inference engine. Since a crisp value of the output was required, the final value of fuzzy output was defuzzified using the smallest of maximum (SOM) method.
The second classification technique, CART performs patterns classification by recursively partitioning the data space into an appropriate class partition. The partitioning can be visualized graphically as a decision tree. The CART in this study was implemented through rpart (Recursive Partitioning and Regression Trees) library in R, an open-source programming environment [25]. Further discussion on the classifiers' development is provided in Sections 3 and 4, respectively.

Performance Evaluation
Six data sets were used for training and testing, where each data set consists of a total of 800 sample patterns. Overall, a total of 4800 data were used where 60% for training and 40% for testing the classifiers. The performance was evaluated in terms of classification accuracy and presented in terms of confusion matrix. The investigated classification heuristics were validated using published data set from Alcock [26].

Development of Heuristic Mamdani Fuzzy Classifier
The development of fuzzy classifier involves representation of data in fuzzy set format, selection of suitable fuzzy sets and then design of Mamdani fuzzy inference system (FIS). The general steps in the development of fuzzy classifier are shown in Figure 2.
Symmetry 2021, 13, x FOR PEER REVIEW 5 Two types of explainable classification method were used in this study, namel fuzzy inference system (FIS) and classification and regression trees (CART). Fuzzy cl fier is transparent to interpretation and analysis. The development of fuzzy classifie volves representation of data in fuzzy set format, selecting optimal fuzzy sets using s larity analysis, designing FIS and finally testing and performance evaluation. We im mented Mamdani FIS using the MATLAB fuzzy toolbox since this heuristic was mor propriate for our CCPR compared to the Sugeno FIS. The simplified fuzzy triang membership function was used as input and fuzzy IF-THEN rules for inference en Since a crisp value of the output was required, the final value of fuzzy output was def ified using the smallest of maximum (SOM) method.
The second classification technique, CART performs patterns classification by re sively partitioning the data space into an appropriate class partition. The partitioning be visualized graphically as a decision tree. The CART in this study was impleme through rpart (Recursive Partitioning and Regression Trees) library in R, an open-so programming environment [25]. Further discussion on the classifiers' development is vided in Sections 3 and 4, respectively.

Performance Evaluation
Six data sets were used for training and testing, where each data set consists of a of 800 sample patterns. Overall, a total of 4800 data were used where 60% for training 40% for testing the classifiers. The performance was evaluated in terms of classific accuracy and presented in terms of confusion matrix. The investigated classification ristics were validated using published data set from Alcock [26].

Development of Heuristic Mamdani Fuzzy Classifier
The development of fuzzy classifier involves representation of data in fuzzy set mat, selection of suitable fuzzy sets and then design of Mamdani fuzzy inference sy (FIS). The general steps in the development of fuzzy classifier are shown in Figure 2. To determine the fuzzy sets for each feature, we adopted the simplest metho determining the maximum, medium, mean and minimum values for each pattern in input feature space. The ten initial statistical features were extracted from 600 sample each type of patterns. The box plots were used to represent feature space for each pat An example of a box plot for feature MEAN is shown in Figure 3 where pattern type shown on the horizontal axis. The box plots represent median points, 75 and 25 percen and maximum and minimum points in the feature space. The vertical axis shows the malized values of the feature MEAN between [−10, 10]. Overlapping can be seen feature spaces such as for Shift up and Trend up patterns and similarly for Shift down Trend down patterns. The feature spaces for Normal, Cyclic and Systematic pattern distributed around zero. This phenomenon occurs due to the nature of these patt where the expected mean values for the observed points center to zero. To determine the fuzzy sets for each feature, we adopted the simplest method by determining the maximum, medium, mean and minimum values for each pattern in each input feature space. The ten initial statistical features were extracted from 600 samples for each type of patterns. The box plots were used to represent feature space for each pattern. An example of a box plot for feature MEAN is shown in Figure 3  The process of fuzzification has two steps: one is to assign fuzzy labels, and the ond is to assign numerical meaning to each label. Figure 4 shows an example of mem ship function for feature MEAN as fuzzy input. Among the different types of member functions, we used triangular membership functions as a preliminary analysis due t simplicity. Three points: upper bound, lower bound and center points are require completely define a fuzzy set. The three points were selected from the box plots for e of the features and the respective patterns as shown in Figure 3. Each fuzzy set star minimum and ends with maximum of feature space. The peak is the median of the fea space. Each fuzzy set represents a pattern class. As shown in Figure 4, the feature ME has a universe of input with crisp values divided into five fuzzy sets. These crisp inp are converted to fuzzy variables with membership degrees at y-axis. For exampl MEAN value 6.0 on the x-axis belongs to the fuzzy linguistic variable SU with a mem ship degree 0.8.  The process of fuzzification has two steps: one is to assign fuzzy labels, and the second is to assign numerical meaning to each label. Figure 4 shows an example of membership function for feature MEAN as fuzzy input. Among the different types of membership functions, we used triangular membership functions as a preliminary analysis due to its simplicity. Three points: upper bound, lower bound and center points are required to completely define a fuzzy set. The three points were selected from the box plots for each of the features and the respective patterns as shown in Figure 3. Each fuzzy set starts at minimum and ends with maximum of feature space. The peak is the median of the feature space. Each fuzzy set represents a pattern class. As shown in Figure 4, the feature MEAN has a universe of input with crisp values divided into five fuzzy sets. These crisp inputs are converted to fuzzy variables with membership degrees at y-axis. For example, a MEAN value 6.0 on the x-axis belongs to the fuzzy linguistic variable SU with a membership degree 0.8.  The process of fuzzification has two steps: one is to assign fuzzy labels, and the second is to assign numerical meaning to each label. Figure 4 shows an example of membership function for feature MEAN as fuzzy input. Among the different types of membership functions, we used triangular membership functions as a preliminary analysis due to its simplicity. Three points: upper bound, lower bound and center points are required to completely define a fuzzy set. The three points were selected from the box plots for each of the features and the respective patterns as shown in Figure 3. Each fuzzy set starts at minimum and ends with maximum of feature space. The peak is the median of the feature space. Each fuzzy set represents a pattern class. As shown in Figure 4, the feature MEAN has a universe of input with crisp values divided into five fuzzy sets. These crisp inputs are converted to fuzzy variables with membership degrees at y-axis. For example, a MEAN value 6.0 on the x-axis belongs to the fuzzy linguistic variable SU with a membership degree 0.8.  Having overlapping fuzzy sets can cause difficulty in interpretation. As shown in Figure 4, fuzzy sets SU and TU, and SD and TD shared some of the crisp values. Likewise, the crisp values for patterns NOR, CYC, STRAT and SYS are concentrated at similar values on the x-axis. The fuzzy sets were then simplified using simplification rules given by Setnes et al. [27]. This involved deleting and merging of the sets. The fuzzy sets reduction was done based on a similarity measure (S) as in Equation (8).
If value of S = 1 then the two sets are identical, and if S = 0 the two sets are completely different. In this study we used a threshold S = 0.5. If S > 0.5, the two sets were merged. Similar analysis was conducted for all features and the final feature sets after simplification are shown in Figure 5. We empirically selected eight statistical features to be used in designing the Mamdani FIS, namely, features MEAN, standard deviation (SD), mean square value (MSV), CUSUM, autocorrelation, range, kurtosis, and SLOPE. Having overlapping fuzzy sets can cause difficulty in interpretation. As shown in Figure 4, fuzzy sets SU and TU, and SD and TD shared some of the crisp values. Likewise, the crisp values for patterns NOR, CYC, STRAT and SYS are concentrated at similar values on the x-axis. The fuzzy sets were then simplified using simplification rules given by Setnes et al. [27]. This involved deleting and merging of the sets. The fuzzy sets reduction was done based on a similarity measure (S) as in Equation (8).
If value of S = 1 then the two sets are identical, and if S = 0 the two sets are completely different. In this study we used a threshold S = 0.5. If S > 0.5, the two sets were merged. Similar analysis was conducted for all features and the final feature sets after simplification are shown in Figure 5. We empirically selected eight statistical features to be used in designing the Mamdani FIS, namely, features MEAN, standard deviation (SD), mean square value (MSV), CUSUM, autocorrelation, range, kurtosis, and SLOPE. We omitted feature MEDIAN simply because it has high similarity with feature MEAN. Feature skewness was also omitted since its fuzzy set covers the entire universe and all fuzzy sets overlap with each other. After fuzzy set simplification, each fuzzy set linguistic name was relabeled to VLOW, LOW, MED, HIGH, and VHIGH as shown in Figure 5. These relabeling are important for designing IF-THEN rules for fuzzy classifiers. Selected fuzzy sets were finally converted into trapezoidal shape as deemed more appropriate, specifically for feature MEAN. The next step in fuzzy classifier design is to formulate the inference engine, the fuzzy IF-THEN Rules. The antecedent IF part is the summation of fuzzy sets, and the consequent part is the pattern class. Iterations and fine tuning of IF-THEN rules was performed to obtain good inference system for the recognition of eight types of control chart patterns. The smallest of maximum (SOM) defuzzification method was used for the output fuzzy sets. The SOM method selects the smallest output with the maximum membership function (crisp value). The fuzzy IF-THEN rules are summarized in Table 2 formulated as the best after undergoing several simulation iterations. The stratification pattern requires two rules to discriminate it from normal and cyclic patterns. The final graphical representation of fuzzy rules is shown in Figure 6 where inputs are the first eight columns, and the last column is the output. Table 2. Fuzzy heuristic IF-THEN rules. We omitted feature MEDIAN simply because it has high similarity with feature MEAN. Feature skewness was also omitted since its fuzzy set covers the entire universe and all fuzzy sets overlap with each other. After fuzzy set simplification, each fuzzy set linguistic name was relabeled to VLOW, LOW, MED, HIGH, and VHIGH as shown in Figure 5. These relabeling are important for designing IF-THEN rules for fuzzy classifiers. Selected fuzzy sets were finally converted into trapezoidal shape as deemed more appropriate, specifically for feature MEAN. The next step in fuzzy classifier design is to formulate the inference engine, the fuzzy IF-THEN Rules. The antecedent IF part is the summation of fuzzy sets, and the consequent part is the pattern class. Iterations and fine tuning of IF-THEN rules was performed to obtain good inference system for the recognition of eight types of control chart patterns. The smallest of maximum (SOM) defuzzification method was used for the output fuzzy sets. The SOM method selects the smallest output with the maximum membership function (crisp value). The fuzzy IF-THEN rules are summarized in Table 2 formulated as the best after undergoing several simulation iterations. The stratification pattern requires two rules to discriminate it from normal and cyclic patterns. The final graphical representation of fuzzy rules is shown in Figure 6 where inputs are the first eight columns, and the last column is the output.

If (MEAN is HIGH) and (SD is MED) and (SLOPE is VHIGH) then (Pattern is Trend up) 2
If (MEAN is HIGH) and (SLOPE is HIGH) then (Pattern is Shift up) 3 If (MEAN is LOW) and (SD is MED) and (SLOPE is LOW) then (Pattern is Trend down) 4 If (MEAN is LOW) and (SLOPE is HIGH) then (Pattern is Shift down)

If (MEAN is MED) and (SD is HIGH) and (MSV is MED) and (CUSUM is HIGH) and (Autocorrelation is HIGH) and (Range is HIGH) and (Kurtosis is MED) and (SLOPE is VLOW) then (Pattern is Cyclic) 6 If (MEAN is MED) and (SD is HIGH) and (MSV is HIGH) and (CUSUM is MED) and (Autocorrelation is LOW) and (Range is HIGH) and (Kurtosis is LOW) and (SLOPE is VHIGH) then (Pattern is Systematic) 7 If (MEAN is MED) and (SD is LOW) and (Autocorrelation is HIGH) and (Range is LOW) and (Kurtosis is HIGH) and (SLOPE is HIGH) then (Pattern is Stratification) 8 If (MEAN is MED) and (MSV is LOW) and (CUSUM is LOW) and (Autocorrelation is LOW) and (Range is MED) and (Kurtosis is HIGH) and (SLOPE is HIGH) then (Pattern is Normal) 9 If (MEAN is MED) and (SD is LOW) and (Range is LOW) and (Kurtosis is HIGH) and (SLOPE is HIGH) then (Pattern is Stratification) 10
If (Range is LOW) then (Pattern is Stratification)

If (MEAN is MED) and (SD is LOW) and (Autocorrelation is HIGH) and (Range is LOW) and (Kurtosis is HIGH) and (SLOPE is HIGH) then (Pattern is Stratification) 8 If (MEAN is MED) and (MSV is LOW) and (CUSUM is LOW) and (Autocorrelation is LOW) and (Range is MED) and (Kurtosis is HIGH) and (SLOPE is HIGH) then (Pattern is Normal) 9 If (MEAN is MED) and (SD is LOW) and (Range is LOW) and (Kurtosis is HIGH) and (SLOPE is HIGH) then (Pattern is Stratification) 10
If (Range is LOW) then (Pattern is Stratification)

Development of Decision Tree Classifier
The decision tree (DT) in this study was generated by using rpart subroutine in R programming [25,28]. The rpart implemented many of the ideas found in the CART of Breiman et al. [29]. It performs binary recursive partitioning where the parent is split into two child branches. This process is repeated until the terminal leaf is reached. The algorithm automatically selects the 'right-sized' classification tree and input features that have good predictive accuracy. The default splitting criteria in rpart is based on Gini splitting rule since it usually performs the best. The stopping rule was set to prevent the model from over-fitting, where the complexity parameter was set to a default value 0.01. The data set was randomly divided into 60% training data and 40% testing data. This ensures the classifiers are sufficiently trained, and the testing results are not biased to small sample size. The proposed classification tree is shown in Figure 7 comprising seven nodes (oval shape) and eight leaves (square shape). The predicted pattern type is given at each leaf (terminal) as listed in Table 3. The significant features included in the decision-making are MEAN, standard deviation (SD), mean square value (MSV), and SLOPE. The insignificant features were excluded from the DT.

Development of Decision Tree Classifier
The decision tree (DT) in this study was generated by using rpart subroutine in R programming [25,28]. The rpart implemented many of the ideas found in the CART of Breiman et al. [29]. It performs binary recursive partitioning where the parent is split into two child branches. This process is repeated until the terminal leaf is reached. The algorithm automatically selects the 'right-sized' classification tree and input features that have good predictive accuracy. The default splitting criteria in rpart is based on Gini splitting rule since it usually performs the best. The stopping rule was set to prevent the model from over-fitting, where the complexity parameter was set to a default value 0.01. The data set was randomly divided into 60% training data and 40% testing data. This ensures the classifiers are sufficiently trained, and the testing results are not biased to small sample size. The proposed classification tree is shown in Figure 7 comprising seven nodes (oval shape) and eight leaves (square shape). The predicted pattern type is given at each leaf (terminal) as listed in Table 3. The significant features included in the decision-making are MEAN, standard deviation (SD), mean square value (MSV), and SLOPE. The insignificant features were excluded from the DT.

Results and Discussion
The performance of the fuzzy heuristics and the decision tree (DT) classifier was evaluated using six different data sets. Each data sets comprised 40 samples of each type of control chart patterns. Overall, a total of 1920 (6 sets × 40 sample × 8 types) unseen samples were used in testing the classifiers. The performance of the proposed methods in terms of recognition accuracy is summarized in Table 4. The results suggest that the overall recognition accuracy (µ) for the DT classifier (98.58%) is better compared to the fuzzy classifier (95.76%). The DT classifier also gave more consistent results (σ = 0.48) compared to the fuzzy classifier (σ = 1.09) despite using only four statistical features compared to the fuzzy classifier with eight features. The fuzzy classifier seems to have more difficulty in classifying trend up patterns (85.4%) and systematic patterns (86.0%) compared to the DT classifier. However, fuzzy classifier performed better in classifying normal patterns (100%). The confusion matrices for the fuzzy classifier and the DT classifier are shown in Tables 5 and 6, respectively. These tables show the correct recognition rate in diagonal positions and the misclassification rate at off-diagonal positions. Table 5 reveals that normal pattern is the most likely to be confused for true patterns of trend up (2.4%), cyclic (1.6%) and systematic (7.8%). The results also indicate that systematic patterns are sometimes confused with cyclic patterns (6.2%). Both shift patterns (up and down) were perfectly classified by the fuzzy classifier. This confirms that shift patterns are among the easiest to be differentiated. Trend up pattern tends to be confused with shift up patterns (12.2%) and trend down patterns tends to be confused with shift down patterns (3.7%).   Table 6 shows that the DT classifier performed well for shift down, cyclic, systematic and stratification patterns with 100% classification accuracy. The DT classifier has a small tendency for normal patterns (stable process) to be confused with systematic patterns (3.9%). Showing a similar trend to the fuzzy classifier, trend down patterns tends to be confused with shift down patterns (3.5%). Shift up patterns tends to be confused with trend up patterns (4%). Overall, as noted earlier, the DT classifier performed relatively better than the fuzzy classifier.
From the above, we observed that both the heuristic fuzzy classifier and the DT classifier provide useful information in deriving the final decision. In case of heuristic fuzzy classifier, the IF-THEN rules are simple, interpretable, and capable of classifying eight types of patterns. The main drawback of heuristic fuzzy classifier is that the formation of input fuzzy set and simplification of fuzzy sets require manual tuning. The use of three points for each fuzzy set i.e., maximum, minimum and median significantly reduced the data requirements in the development step. The fuzzy classifiers by nature do not require large training data. Meanwhile, the DT method provides a simple graphical view and understandable decision-making processes. The drawback of DT is that it requires more training data sets; otherwise, the recognition results tend to be poor.
The performance of the above classification methods was validated by using a published data set from Alcock [26]. The recognition accuracies of 94.16% and 97.9% were obtained for the heuristic fuzzy classifier and the DT classifier, respectively. The validation results confirmed the consistency of recognition accuracy of the proposed classifiers. The proposed DT classifier performed relatively better compared to Gauri and Chakraborty [15], who reported an overall 95.46% recognition accuracy when implemented with their seven shape features. Our proposed classifier scored 98.58% recognition accuracy with only four statistical features as input data. We were unable to make direct comparison for the proposed fuzzy classifier due to lack of comparable published works. The readers need to be cautious in generalizing the above findings as more comparison with other similar works are recommended whenever possible.

Conclusions
As manufacturing industries are moving toward intelligent systems, we notice widespread adoption of AI-based systems with limited transparency in decision-making processes. In process monitoring and diagnosis, it is important to have transparency in key decisions to avoid misjudgment that could lead to catastrophic failures. This paper demonstrates the development of heuristics fuzzy inference system and DT techniques for control chart pattern recognition. The logical rules for decision-making are outlined as explainable in IF-THEN decision rules. The overall recognition accuracy of the DT classifier is found to be better and more consistent (µ = 98.58%, σ = 0.48) compared to the heuristics Mamdani fuzzy classifier (µ = 95.76%, σ = 1.09). The DT classifier only requires four statistical features, while the heuristics Mamdani FIS requires eight statistical features for classifying the same eight types of control chart patterns. Both methods provide explainable classification steps rather than a black box, and this could be more attractive and convincing for decision makers. The efficiency of the fuzzy classifier could be further improved by implementing automatic formation of input fuzzy sets. Investigation of adaptive neuro-fuzzy inference systems (ANFIS) could also boost its efficiency and learning ability. This study may serve as a starting point for investigation of more advanced DT families such as random forest. It opens opportunities for deeper investigation and provides a useful revisit into explainable artificial intelligence.