Investigation of Fusion Features for Apple Classification in Smart Manufacturing

Smart manufacturing optimizes productivity with the integration of computer control and various high level adaptability technologies including the big data evolution. The evolution of big data offers optimization through data analytics as a predictive solution in future planning decision making. However, this requires accurate and reliable informative data as input for analytics. Therefore, in this paper, the fusion features for apple classification is investigated to classify between defective and non-defective apple for automatic inspection, sorting and further predictive analytics. The fusion features with Decision Tree classifier called Curvelet Wavelet-Gray Level Co-occurrence Matrix (CW-GLCM) is designed based on symmetrical pattern. The CW-GLCM is tested on two apple datasets namely NDDA and NDDAW with a total of 1110 apple images. Each dataset consists of a binary class of apple which are defective and non-defective. The NDDAW consists more low-quality region images. Experimental results show that CW-GLCM successfully classify 98.15% of NDDA dataset and 89.11% of NDDAW dataset. A lower classification accuracy is observed in other five existing image recognition methods especially on NDDAW dataset. Finally, the results show that CW-GLCM is more accurate among all the methods with the difference of more than 10.54% of classification accuracy.


Introduction
Smart manufacturing is the advancement in manufacturing process through the integration of computer control and various high level adaptability technologies to optimize productivity [1]. A huge volume, variety and velocity of data in smart manufacturing or referred to as big data, offers an opportunity not only for managing large amount of information, but also to improved diagnostics and prognostics capabilities [2,3]. The analytics in the manufacturing process can shift from a reactionary to a predictive practice [1,4,5] by improving the existing capabilities such as product defect detection and supporting new capabilities for future planning and prediction [1,4,5]. In delivering high quality predictive solution for future planning, the data quality is the most important big data factor [2]. The effective and accurate method is required to provide reliable information as the input for analytics models to make a better decision [6].
In this paper, we investigate the reliability of fusion features as data input to classify between defective and non-defective apple for automatic inspection and sorting processes. The defective and non-defective information can be further used as the input for analytics model for future prediction. The analytics synthesize, analyze the trends and identify the patterns based on the current production data for future planning, decision making and actions to improve the apple growth and processing Though their method showed promising results to classify defective oranges, the method may not work well with the presence of low-quality image region.
To strengthen the characteristics of the images, Fahrurozi et al. [33] use several edge detection techniques from first and second order edge detection technique to extract the GLCM texture features. The first order was chosen because of its simplicity, while the second-order because of its effectiveness. The limitation of their works is that the researchers investigated the effect of several edge detection technique only on one GLCM features, which is the energy. The selection of GLCM features is further extended in [36] for apple diseases detection and classification using Particle Swarm Optimization (PSO). However the implementation of PSO had increase the considerable convergence time and computational complexity [37]. In other works, Moallem et al. [38] proposed a statistical, textural and geometric features for golden delicious apple grading using the SVM classifier. They used the GLCM method to extract the second order texture features, which are contrast, correlation, energy, homogeneity and entropy; whereas the first order measures textural features are not considered. Although their method was able to achieve convincing result (89.20%-92.50%) for grading golden delicious apple, the success of classification rate decreases when the defective region is close to stem ends area. Conversely, Olaniyi et al. [18] suggested a texture analysis method based on eight features from first order statistic and second order statistic, which is, GLCM. The first order features used in their work are mean, variance and standard deviation, while the second order are contrast, correlation, energy, homogeneity and entropy. Their method was able to achieve more than 96.25% of the classification accuracy and improved the result in [38] by utilizing the first order statistic features. However, as the method were solely dependent to the texture-based method, the method may have difficulties in distinguishing between objects that has quite similar texture [14,33]. Recently W.Li et al. [19] proposed CLAHE + GLCM + ELM method using contrast-limited adaptive histogram equalization (CLAHE) and GLCM with Extreme Learning Machine (ELM) classifier to address the limitations of GLCM. The CLAHE was used in their work to depress the noises and to improve the local contrast while the ELM classifier was used to reduce the time complexity. However, their method unable to perform well in terms of sensitivity, specificity and accuracy [19].
Other representations of features that are extensively used in image recognition are the keypoint-based features. These features describe the image by detecting the keypoint in the image and locate keypoint descriptor patch at the center of the keypoint. In image recognition, the widely used keypoint-based features are Harris corner detection [39], Scale Invariant Feature Transform (SIFT) [40], Speeded up Robust Features (SURF) [41,42] and Features From Accelerated Segment Test (FAST) [43]. Harris detection is robust in matching, good stability and repeatability [44]. However, it is sensitive to scale changes. SIFT detector and descriptor [40] are robust to affine distortion, illumination changes, invariant to scale and rotation changes [40]. Although SIFT has shown high repeatability and accuracy, SIFT descriptor has high computational cost [45]. This issue has been addressed in the SURF detector. The SURF detector is faster than SIFT without degrading the quality and more robust to noise [46,47]. To improve on the computational time of earlier methods, Rosten et al. [43] introduce FAST. FAST is faster than both SURF and SIFT method but this method is not invariant to scale [48]. Although the keypoint-based features can be applied in almost all kinds of image recognition, these features are limited; in which, due to noise or distortion where a different patch of contexts or scene may be represented by the similar descriptor and different context or scene also can be presented by different descriptors [49].
To overcome this limitation, a dictionary-based features is used for image recognition. The dictionary-based approach utilized the keypoint patches or regular grid patches or segmentation-based patches to extract a visual pattern (visual words) from the images. Then, the images are represented by counting the number of occurrences patches of each visual words in the images and used it as a feature to train the classifier. In the dictionary-based feature, the BOW method [15] is among the well-known method. Although BOW method is easy to implement, robust to several parameters such as occlusion, clutter, non-rigid deformation and viewpoint changes, this method disregards the Symmetry 2019, 11,1194 5 of 26 spatial layout information in the visual words [50,51]. Disregarding this information may lead to the missing spatial arrangement features on the image composition [50]. The aforementioned issue was addressed by Lazebnik et al. [16] in the SPM method. In the SPM method, the spatial layout information is included to improve image representation. This is because the spatial information is important to discriminate the object, since a different object may have the same visual appearance but in different spatial arrangement [52]. Despite the advantages of the SPM method, this method generates a large numbers of feature redundancies [53]. In order to eliminate the redundancies and select the representative keypoints, Lin et al. [50] and Li et al. [54] proposed a keypoint selection technique to resolve this limitation. Similarly, Xie et al. [55] also proposed a new spatial partitioning scheme to avoid feature redundancy by modifying the pyramid matching kernel.
In a more recent development, deep-learning based method such as CNN method has received considerable attention in computer vision [56]. The CNN method has been implemented in many fields of image recognition [10,17,[57][58][59]. For instance, dos Santos Ferreira et al. [17] proposed CNN method for weed detection and classification in soybean crops. In CNN method, there are few limitations in the structure of the method. Many studies attempting to improve on this issue. One of the major issues is the fixed-size input image required by the CNN method [8]. To address this issue, He et al. [8] proposed a network structure (SPP-net) method that can generate a fixed-length representation regardless of image size or scale. Another issue in the CNN method is the difficulties to train the neural network when the network depth in the structure increases [10,60]. To improve the training for the deeper network, a residual learning framework has been proposed by He et al. [60] that has reformulate the network layers as learning residual functions. Other main issues in the CNN method are it requires large number of training images to avoid over-fitting and are also computationally expensive [7,8,10].
From the above review, texture-based features were among the method that had been considered in existing apple recognition. The GLCM texture-based features is seen as one of the most suitable candidate for classifying defective and non-defective apple. The GLCM method is chosen as it will detect any different property changes on the surface of the apple skin images. However, the GLCM method is less effective in detecting significant features in low-quality image region. In apple classification, failure to detect these features can lead to misclassification between defective and non-defective apples, which consequently reduces the classification accuracy.
Therefore, in this research, we investigate the Wavelet and Curvelet image enhancement technique on the GLCM method to improves the detection of features on low-quality region for apple images. Though there are many image enhancement techniques, in apple classification, it is challenging to enhance the low-quality region while at the same time reduce the uneven illumination effect on images with less computational time and cost [61]. Some of the image enhancement technique such as Adaptive Histogram Equalization (AHE) and CLAHE are unsuitable to be used for real-time application due to high computational time [62,63]. It is also difficult to enhance the low-quality region using traditional image enhancement technique such as frequency-domain. This is due to the lower frequencies that resolved better in frequency, while the higher frequencies are resolved better in time [64][65][66]. The traditional frequency-domain image enhancement technique does not provide simultaneous spatial and spectral resolution.
On the other hand, the Wavelet transform image enhancement technique is capable to provide both spatial resolution and frequency [61]. The Wavelet transform ensures a good frequency resolution at lower frequencies and good spatial resolution at higher frequencies [61,64]. In the proposed method, the Wavelet transform image enhancement is used to improve the quality of the texture of low-quality region in the GLCM method. The Wavelet transform is one of the suitable image enhancement technique for texture analysis [67]. However, its limitation lies in the curved region areas. To effectively deals with a low-quality region area, we also used Curvelet transform image enhancement technique in the proposed method since it has a better ability in capturing the directional edges of curves, corners and profiles [68,69]. The Curvelet transform also provides richer information in both spatial and spectral domains [70]. In the proposed method namely CW-GLCM, the extracted Curvelet features from Curvelet transform are then fused with the GLCM features based on the Wavelet transform to produce a highly informative fusion feature.
As presented in the prior section, many of the image recognition methods discussed earlier are mostly concerned in more general pattern recognition problems. None of the aforementioned methods focused on the recognition and classification of image compromising low-quality region. Based on the review, five methods which are BOW [15], SPM [16], CNN [17], GLCM texture analysis [18] and CLAHE + GLCM + ELM [19] have been chosen to evaluate the proposed method. They are selected due to their popularity and stability to represent the dictionary-based method, deep-learning based method and texture-based method, respectively.

Proposed Method
The proposed CW-GLCM method for apple classification consist of two main phases, feature extraction and feature classification as shown in Figure 1. The feature extraction phase concentrate on the selection of fusion features based on GLCM method which is the key contribution of this research. The GLCM method is chosen since it is able to well describe the relationship of adjacency among pixels [4]. The Curvelet and Wavelet transform is introduced to enhance the apple images especially on low-quality region by improving their texture information. In the feature extraction phase, the images are subjected to the Curvelet transform to obtain the Curvelet features. The images are also subjected to the Wavelet transform in order to obtain the Wavelet coefficient. From these Wavelet coefficients, five GLCMs features which is entropy, contrast, correlation, homogeneity and energy are extracted. In this phase, there are six different features in total which is Curvelet, entropy, contrast, correlation, homogeneity and energy that are fused together forming a set of fusion features. The fusion features obtained in the feature extraction phase are then transferred to the feature classification phase. In the classification phase, six classifiers are utilized to select the most suitable classifier for the proposed fusion features in classifying defective or non-defective apple images. With the use of the proposed fusion features in feature extraction phase, the classification is expected to be more accurate than solely dependent on GLCMs features especially with the presence of low-quality regions in the apple images. The output from the classifier can be used for the data analytics and visualization to identify the patterns and learn for future decision making and actions. The proposed CW-GLCM process flow that comprises of two phases, will be discussed in the following subsections. the extracted Curvelet features from Curvelet transform are then fused with the GLCM features based on the Wavelet transform to produce a highly informative fusion feature. As presented in the prior section, many of the image recognition methods discussed earlier are mostly concerned in more general pattern recognition problems. None of the aforementioned methods focused on the recognition and classification of image compromising low-quality region. Based on the review, five methods which are BOW [15] , SPM [16], CNN [17], GLCM texture analysis [18] and CLAHE + GLCM + ELM [19] have been chosen to evaluate the proposed method. They are selected due to their popularity and stability to represent the dictionary-based method, deep-learning based method and texture-based method, respectively.

Proposed Method
The proposed CW-GLCM method for apple classification consist of two main phases, feature extraction and feature classification as shown in Figure 1. The feature extraction phase concentrate on the selection of fusion features based on GLCM method which is the key contribution of this research. The GLCM method is chosen since it is able to well describe the relationship of adjacency among pixels [4]. The Curvelet and Wavelet transform is introduced to enhance the apple images especially on low-quality region by improving their texture information. In the feature extraction phase, the images are subjected to the Curvelet transform to obtain the Curvelet features. The images are also subjected to the Wavelet transform in order to obtain the Wavelet coefficient. From these Wavelet coefficients, five GLCMs features which is entropy, contrast, correlation, homogeneity and energy are extracted. In this phase, there are six different features in total which is Curvelet, entropy, contrast, correlation, homogeneity and energy that are fused together forming a set of fusion features. The fusion features obtained in the feature extraction phase are then transferred to the feature classification phase. In the classification phase, six classifiers are utilized to select the most suitable classifier for the proposed fusion features in classifying defective or non-defective apple images. With the use of the proposed fusion features in feature extraction phase, the classification is expected to be more accurate than solely dependent on GLCMs features especially with the presence of low-quality regions in the apple images. The output from the classifier can be used for the data analytics and visualization to identify the patterns and learn for future decision making and actions. The proposed CW-GLCM process flow that comprises of two phases, will be discussed in the following subsections.

Feature Extraction
The phase of the proposed method feature extraction comprises of three major methods, which are Curvelet, Wavelet and GLCM method. The proposed method combines the Curvelet features with five GLCMs features extracted based on the Wavelet coefficient. To retain the image information, the image normalization step is skipped in the proposed method to deal with the low-quality region on the apple skin images. This is to avoid misclassification between defective and non-defective apple.

Curvelet Transform
The main reason that the proposed method fuses the Curvelet features is to detect the low-quality apple images region for curves, corners and profiles. As compared with other transform, the performance of the Curvelet transform are extremely well in capturing the edges and other singularities along the curves [71]. The Curvelet features will provide more information on low-quality regions in the apple images. In the proposed method, the Curvelet transform based on wrapping of specially selected Fourier samples (FDCT-Wrap) is used because it is the fastest and well-adapted Curvelet transform algorithm to represent edges [72][73][74]. The FDCT-Wrap is applied to enhance the image contrast of the low-quality region. The two consecutive regions between the low-quality regions that has a different pixel value with the nearby region are likely to form "edges". This edges are formed based on the variation of pixel values allowing the FDCT-Wrap in the proposed method to detect this edges information. In order to obtain the dominant features, the LL sub-bands filter is applied to set the intensity elements of the FDCT-Wrap coefficient. Then, the inverse transformation is performed on the extracted features from the FDCT-Wrap coefficient to produce the Curvelet transform value. The steps of FDCT-wrap are as follows: Step 1. Input the original image; Step 2. Apply 2D Fast Fourier Transform (2DFFT) on the original image; produce a set of Fourier samplef [n 1 , n 2 ]; Step 3. Resample a set of Fourier samplef [n 1 , n 2 ] at each pair of scale j and angle direction l in frequency domain. The scale is from finest to coarsest scale with the angle direction start from the top-left corner increases clockwise. This will produce the new sampling function as expressed in (1). f [n 1 , n 2 − n 1 tan θ l ], (n 1 , n 2 ) ∈ (n 1 , n 2 ), n 1,0 ≤ n 1 ≤ n 1,0 + L 1, j , n 2,0 ≤ n 2 < n 2,0 + L 2, j , where n 1,0 and n 2,0 are the initial position of window function u j,l [n 1 , n 2 ], L 1, j and L 2, j are parameter of length 2 j and width 2 j/2 components of window function support interval. The window function formula is defined in (2).
where w 1 is a vertical axis but located near the horizontal axis of w 2 ; Symmetry 2019, 11, 1194 8 of 26 Step 4. The new sampling function off [n 1 , n 2 − n 1 , tan θ l ] are multiplied with the window function u j,l [n 1 , n 2 ]: Step 5. Then, the inverse 2DFFT is applied to each of f j,l obtained in the previous step to produce Curvelet transform value.
Finally, we extract feature vector of the Curvelet transform value and then fused them with the GLCMs features calculate based on the Wavelet transform in the following step to accomplish defective and non-defective apple classification.

Wavelet Transform
To improve the texture information extracted from the GLCM method, the proposed method also modifies the existing GLCM method by extracting the GLCMs features based on the Daubechies 4 Wavelet coefficient. Daubechies 4 Wavelet is chosen as it is suitable for texture classification due to their relations to multiresolution. The Wavelet coefficient will enhance the visibility of the low-quality region in the apple images, especially on the apple skin by capturing the directional edges in different resolution levels preserving the low and high frequency information. This leads the proposed method to extract better texture information from the apple images. The Wavelet transform are calculated using the wavelet function as follows: where s ∈ Z is parameter of scale resolution level, τ is translation, k ∈ {h, v, d} is the orientation and is a wavelet family. The orientation h, v and d parameter represent horizontal, vertical and diagonal direction. The wavelet decomposition is achieved when the value of s = 2 j and τ = 2 j .n, j, n ∈ Z. The wavelet and scaling family are constructed using wavelet function ψ(x) and scaling function ϕ(x) as expressed in (5).
The two dimensional Wavelet transform are constructed based on the combination of high-pass and low-pass digital filter banks and down-samplers. As the images is in 2D signal, separable function Discrete Wavelet Transform is used in the configuration of DWT structure. The rows and columns of the images are subjected to the 1D Wavelet Transform separately to produce the 2D-DWT. The output of the decomposed images in 2D orthogonal wavelet representation resulting four orthogonal sub-bands component which are Low-Low (LL), Low-High (LH), High-Low (HL) and High-High (HH) as presented in Figure 2. The results shown in the figure is for one level decomposition. Every stage of DWT requires high-pass and low-pass digital filter with two down sampling [75]. This process is further continued and decomposed to another four sub-band components, forming two-level decomposition. The wavelet decomposition at two resolution levels as illustrates in Figure 3.
orthogonal sub-bands component which are Low-Low (LL), Low-High (LH), High-Low (HL) and High-High (HH) as presented in Figure 2. The results shown in the figure is for one level decomposition. Every stage of DWT requires high-pass and low-pass digital filter with two down sampling [75]. This process is further continued and decomposed to another four sub-band components, forming two-level decomposition. The wavelet decomposition at two resolution levels as illustrates in Figure 3. Image decomposition using analysis filter banks. Note that h is low-pass filter and g is high-pass filer and ↓2 is keeping one sample out of two (down sampling).

Figure 2.
Image decomposition using analysis filter banks. Note that h is low-pass filter and g is high-pass filer and ↓2 is keeping one sample out of two (down sampling).

GLCM
The GLCMs texture features are extracted from GLCM based on the computed Wavelet coefficient from the prior process in section 3.1.2. The GLCMs features are included in the proposed method to estimate the apple images texture properties. Instead of calculating the GLCMs features from GLCM coefficient, the proposed method modifies the original GLCM implementation by using Wavelet coefficient to calculate five of the GLCMs feature. The features are entropy, contrast, correlation, homogeneity and energy. The entropy is a measure of levels disorderliness and randomness in the images. It is the most dominant statistical features and widely used to measure variations between pixel intensities [71]. This is important to symbolize texture that appear in the apple images. The contrast measures the variation values and intensity contrast of the neighboring pixel in the gray level. The correlation features are also selected since it measures the correlated pixels to the neighbors over the whole image and determined the linear dependencies of the gray levels. Homogeneity features are important to measure the uniform region in the images according to its gray level difference and the energy returns the sum value of the squared elements in the GLCM. To extract the texture features from GLCM, the matrix must be symmetric [76]. In order to get a symmetric matrix, the GLCM is transposed and added to the original GLCM. From the symmetrical GLCM, the texture features are extracted. To compute the GLCM, the spatial relationship between two pixels is establish. The first one is the reference pixel which is pixel-of-interest and the other pixel

GLCM
The GLCMs texture features are extracted from GLCM based on the computed Wavelet coefficient from the prior process in Section 3.1.2. The GLCMs features are included in the proposed method to estimate the apple images texture properties. Instead of calculating the GLCMs features from GLCM coefficient, the proposed method modifies the original GLCM implementation by using Wavelet coefficient to calculate five of the GLCMs feature. The features are entropy, contrast, correlation, homogeneity and energy. The entropy is a measure of levels disorderliness and randomness in the images. It is the most dominant statistical features and widely used to measure variations between pixel intensities [71]. This is important to symbolize texture that appear in the apple images. The contrast measures the variation values and intensity contrast of the neighboring pixel in the gray level. The correlation features are also selected since it measures the correlated pixels to the neighbors over the whole image and determined the linear dependencies of the gray levels. Homogeneity features are important to measure the uniform region in the images according to its gray level difference and the energy returns the sum value of the squared elements in the GLCM. To extract the texture features from GLCM, the matrix must be symmetric [76]. In order to get a symmetric matrix, the GLCM is transposed and added to the original GLCM. From the symmetrical GLCM, the texture features are extracted. To compute the GLCM, the spatial relationship between two pixels is establish. The first one is the reference pixel which is pixel-of-interest and the other pixel is a neighbor pixel. This process forming the GLCM that contains different combination of pixel gray values. The number of gray levels (G) is ranging from 0 to G − 1. The GLCM is highly dependent on two parameters which are distance between the pixel pair (D) and their angular relationship (θ). In the proposed method, the GLCM is computed based on the predefine distance of one pixel (D = 1) and θ are quantized in four parameter directions which are 0  The procedure of the extraction of GLCMs features based on the Wavelet Coefficient are as follows: Step 1. Input the original image; Step 2. Compute GLCM from the wavelet coefficient and calculate based on five GLCMs. The formulations for each of the GLCMs features are computed as follows: where is a pixel, is row is column, is line of neighborhood, represents the number of gray levels used, , , , are the mean and standard deviation value obtained from and respectively. The and are the results obtained after summing the rows ( , ); The procedure of the extraction of GLCMs features based on the Wavelet Coefficient are as follows: Step 1. Input the original image; Step 2. Compute GLCM from the wavelet coefficient and calculate based on five GLCMs. The formulations for each of the GLCMs features are computed as follows: where P is a pixel, i is row j is column, n is line of neighborhood, G represents the number of gray levels used, µ x , µ y , σ x , σ y are the mean and standard deviation value obtained from P x and P y respectively.
The P x and P y are the results obtained after summing the rows P(i, j); Step 3. Acquire texture features according to (6), (7), (8), (9) and (10). Finally, the Curvelet features obtained from the prior process in Section 3.1.1 are fused in the texture features obtained in step 3 to produce highly informative fusion features. The parameter settings of Curvelet, Wavelet and GLCM used in the proposed method are summarized in Table 1.

Classification
In the classification phase, the fusion features extracted in the previous phase are classified into the defective and non-defective apples. One of the major challenges in classifying large number of features is to determine the suitable classifier that able to achieve better classification accuracy [77]. Therefore, to ensure the optimal performance for the proposed features, six classifiers which are K-Nearest Neighbors (KNN), Bayesian Network, Softmax, Decision Tree, Naïve Bayes and Support Vector Machine (SVM) are tested. These classifiers are selected based on each of their specific advantage. The KNN classifier is among the most widely used classifier for image classification task. This is because of its simplicity and easy to be implemented [78]. In contrast, the Bayesian Network are the classifier that based on their bias-variance trade-off network structure. The network structure models will allow the Bayesian Network to precisely capture the fine details in the data [79]. Another simple and fast classifier that can achieve accurate result in most cases are the Decision Tree Classifier [80]. Furthermore, it also works well with the noisy data [80]. The Naïve Bayes are also an efficient and effective machine learning classifier [81][82][83], while Softmax classifier is one of the most commonly-used logistic regressions classifier especially for multi-class classification [84,85]. Finally, the SVM classifier are also selected due to it is well established technique in many image recognition tasks and its high accuracy performance [78,[86][87][88][89]. Their performance is evaluated and a test decision of the best performing classifier on the proposed method determine the suitable classifier for the proposed method.

Experiments
To demonstrate the reliability of the proposed CW-GLCM method, a series of comprehensive experiments is conducted in VLSI laboratory Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia. All the experiments are conducted using MATLAB R2017b (MathWorks, Natick, MA, USA) on a computer with the following specification: Windows 10 Pro and an Intel Core i7-4770 CPU (3.4GHz) processor with 8.00 GB RAM. The details of the experiments and evaluation are described in the subsections below.

Datasets
Instead of analyzing the performance only against defective and non-defective, we also evaluated the method on low-quality region as well. Due to the shortage of defective and non-defective images datasets that comprises low-quality region, two newly apple image datasets are created, namely NDDA and NDDAW datasets. The NDDA generally consist of various types of apple images, while the NDDAW consist of more apple images with low-quality region on its skin. Each of the dataset are classified into two categories of non-defective and defective. The non-defective and defective classification are verified by the agriculture practitioners from Malaysian Agricultural Research and Development Institute (MARDI) and Pahang Agriculture Department. The datasets are collected using vision sensors and some of the apple images were collected via the Google search engine with several keywords such as "fresh + apple", "healthy + apple", "apple + disease", "damage + apple", "defect + apple" and "low + grade + apple + fruit". This is due to the difficulties in obtaining various types of defective apple. Most of the apple images obtained from Google search engine are center, clean and occupied most of the image without or very few cluttered environments. This is similar to the data acquisition via vision sensor in the research that captures apple images placed on the conveyor belt as illustrated in Figure 5. To ensure the quality of the images, the light sources are placed near to the vision sensor and are uniformly distributed. This is done to reduce the effect of shadows and glare. The image resolution of all the images captured using vision sensor are 900 × 700 pixel. While the resolution of the images obtained via Google search engine may vary from 205 × 220 pixel to 552 × 512 pixel. Each of the images obtained from Google search engine in both datasets are rescaled to 900 × 700 pixel following the resolution setting of the images captured via vision sensor. Datasets are available online at https://github.com/AsIsmail/Apple-Dataset. The details of each dataset are described in the following subsections.   Table 2. The non-defective apple images were collected from five apple cultivars which are Red Delicious, Gala, Fuji, Honeycrisp and Granny Smith. While the defective category was collected from five groups of defects which is Scab, Rot, Cork Spot, Blotch and Bruise. The defective category includes variations severity of external defects, which is visible to the naked eyes with different location, region and size. A sample images of the dataset is shown in Figure 6.   Table 2. The non-defective apple images were collected from five apple cultivars which are Red Delicious, Gala, Fuji, Honeycrisp and Granny Smith. While the defective category was collected from five groups of defects which is Scab, Rot, Cork Spot, Blotch and Bruise. The defective category includes variations severity of external defects, which is visible to the naked eyes with different location, region and size. A sample images of the dataset is shown in Figure 6.

NDDAW
On the other hand, the NDDAW dataset consists of 560 apple images classified into two categories of non-defective and defective. The NDDAW dataset is composed of 280 non-defective apples and 280 defective apples. The overall setup of this dataset follows the NDDA dataset. However, the NDDAW dataset are more complex compared to NDDA as it consists more apple images with low-quality region on its skin (159 images), which intended to address the limitation of classifying low-quality image region. In the dataset, a total of 248 apple images with stem end and 130 apple images with calyx are also included. The properties and sample of NDDAW dataset are shown in Table 3Error! Reference source not found. and Figure 7.

NDDAW
On the other hand, the NDDAW dataset consists of 560 apple images classified into two categories of non-defective and defective. The NDDAW dataset is composed of 280 non-defective apples and 280 defective apples. The overall setup of this dataset follows the NDDA dataset. However, the NDDAW dataset are more complex compared to NDDA as it consists more apple images with low-quality region on its skin (159 images), which intended to address the limitation of classifying low-quality image region. In the dataset, a total of 248 apple images with stem end and 130 apple images with calyx are also included. The properties and sample of NDDAW dataset are shown in Table 3 and Figure 7.

Evaluation Metrics
In this research, K-fold cross-validation is used to evaluate the performance of the proposed CW-GLCM method. The value of K strongly depend on the overall quantity of the data and the suggested number of fold is between 5 to 10 fold [90,91]. Based on our prior work in [1], K parameter of 10 yields the highest classification accuracy. Thus, the 10-fold cross-validation is used in this experiment. The aforementioned performance evaluation technique is also considered in this research since most of the work related to this domain from the literature use the 10-fold cross-validation to evaluate performance measure in their work [30,90,92,93]. The classification performance is measured in terms of precision, recall and accuracy. For 10-fold cross-validation, the dataset will be randomly partitioned into ten number of folders in which every fold will have virtually the same number of class distribution. One of the folders will be used for validation while the remaining nine folders will be used for training. This process is repeated ten times until each of the folders is used exactly once as a validation set. Finally, the average results from the ten experiments are calculated. The classification is defined as follows: where TP, FP, TN and FN are defined in Table 4. Table 4. Terminology and derivations of the evaluation metrics.

TP (true positive)
Defective apple is correctly classified as defective apple. TN (true negative) Non-defective apple correctly classified as non-defective apple. FP (false positive) Defective apple incorrectly classified as non-defective apple. FN (false negative) Non-defective apple is incorrectly classified as defective apple.
In each of the experiments, the computational time is also recorded.

Performance Measure for Fusion Features
As highlighted in Section 3, the fusion features are the combination of Curvelet features and five GLCMs features based on the Wavelet coefficient. This forming a set of fusion features which consist of six features. They are the Curvelet features, entropy, contrast, correlation, homogeneity and energy. In searching for the best fusion features, the fusion features are compared with the Curvelet features and each of the GLCMs features calculated from GLCM coefficient. Their performances are evaluated and compared with our proposed fusion features on NDDA dataset using SVM classifier as shown in Table 5. Based on the table, the results show that our proposed fusion features outperformed others with 88.89% precision, 85.71% recall and 87.04% accuracy. Although the fusion features require the longest time for training and testing, the results proved that the Curvelet and Wavelet transform can improve the detection of the GLCM texture features. A graphical comparison performance between the proposed fusion features with contrast, correlation, energy, homogeneity, entropy and Curvelet using SVM classifier on NDDA dataset are presented in Figure 8. Based on the Figure 8a, the proposed fusion features is shown to be able to obtain the highest percentage for all measurement of precision, recall and accuracy compared to other features with a minimum value of 85.71% on recall. However, it requires the longest time for the training and testing as shown in Figure 8b. This is due to the reason that the fusion features incorporate three major methods which is the Curvelet, Wavelet and the GLCM method. In addition, the normalization step is skipped in the feature extraction phase to retain the information of a low-quality region on the apple skin images. This will increase the time complexity and reduce the speed. Although the fusion features show high computational time, from these results it can be seen that the fusion features of Curvelet features with five GLCMs features calculated based on the Wavelet coefficient outperformed others in term of the precision, recall and accuracy. A graphical comparison performance between the proposed fusion features with contrast, correlation, energy, homogeneity, entropy and Curvelet using SVM classifier on NDDA dataset are presented in Figure 8. Based on the Figure 8(a), the proposed fusion features is shown to be able to obtain the highest percentage for all measurement of precision, recall and accuracy compared to other features with a minimum value of 85.71% on recall. However, it requires the longest time for the training and testing as shown in Figure 8(b). This is due to the reason that the fusion features incorporate three major methods which is the Curvelet, Wavelet and the GLCM method. In addition, the normalization step is skipped in the feature extraction phase to retain the information of a lowquality region on the apple skin images. This will increase the time complexity and reduce the speed. Although the fusion features show high computational time, from these results it can be seen that the fusion features of Curvelet features with five GLCMs features calculated based on the Wavelet coefficient outperformed others in term of the precision, recall and accuracy. To obtain the most suitable classifier, the fusion features utilize six classifiers and are tested on NDDA dataset. The selections of classifiers are KNN, Bayesian Network, Softmax, Decision Tree, Naïve Bayes and SVM. The results for each classifier are presented in Table 6.  To obtain the most suitable classifier, the fusion features utilize six classifiers and are tested on NDDA dataset. The selections of classifiers are KNN, Bayesian Network, Softmax, Decision Tree, Naïve Bayes and SVM. The results for each classifier are presented in Table 6. From the Table 6, the fusion features with Decision Tree classifier give the best performance for all for the measurement of precision, recall and accuracy including the computational time. The comparative results for different classifier are presented in Figure 9. From the Table 6, the fusion features with Decision Tree classifier give the best performance for all for the measurement of precision, recall and accuracy including the computational time. The comparative results for different classifier are presented in Figure 9. Based on Figure 9, the Decision Tree classifier outperformed others with 96.30% of precision, recall 100% and accuracy 98.15%. In contrast, Naïve Bayes classifier shows the lowest performance for precision (59.26%) and accuracy (70.37%). This is because Naïve Bayes classifier make a very strong assumption that all variables are mutually correlated and contribute towards classification. Due to this assumption, it degrades the classification performance [80]. The lowest recall is observed in Softmax classifier with 71.88%. The performance decrease in term of recall rate in the Softmax classifier is due to overfitting from high-variance structure [84]. In terms of computational time among the classifiers, Naïve Bayes takes the longest time for training (390.18 s), whereas KNN for testing (23.55 s). The Naïve Bayes classifier is based on probabilistic that requires the knowledge of prior probability distribution of the class and also data to be classified. This increased the training time in Naïve Bayes classifier. Conversely, the KNN classifier is computationally intensive as it stores all the training data and compares the extracted features on the test images with each training data for classification [78]. In contrast, the Decision Tree classifier is the fastest classifier during the training (344.17 s) and testing (0.25 s). The results also show that the Decision Tree classifier able to achieve the highest performance for all measurements. This is due to the reason that different ranges of features in our fusion features does not affect the Decision Tree. While in the SVM and other classifiers, each of the data instances is represented in the form of real numbers vectors [1]. This Based on Figure 9, the Decision Tree classifier outperformed others with 96.30% of precision, recall 100% and accuracy 98.15%. In contrast, Naïve Bayes classifier shows the lowest performance for precision (59.26%) and accuracy (70.37%). This is because Naïve Bayes classifier make a very strong assumption that all variables are mutually correlated and contribute towards classification. Due to this assumption, it degrades the classification performance [80]. The lowest recall is observed in Softmax classifier with 71.88%. The performance decrease in term of recall rate in the Softmax classifier is due to overfitting from high-variance structure [84]. In terms of computational time among the classifiers, Naïve Bayes takes the longest time for training (390.18 s), whereas KNN for testing (23.55 s). The Naïve Bayes classifier is based on probabilistic that requires the knowledge of prior probability distribution of the class and also data to be classified. This increased the training time in Naïve Bayes classifier. Conversely, the KNN classifier is computationally intensive as it stores all the training data and compares the extracted features on the test images with each training data for classification [78].
In contrast, the Decision Tree classifier is the fastest classifier during the training (344.17 s) and testing (0.25 s). The results also show that the Decision Tree classifier able to achieve the highest performance for all measurements. This is due to the reason that different ranges of features in our fusion features does not affect the Decision Tree. While in the SVM and other classifiers, each of the data instances is represented in the form of real numbers vectors [1]. This transformation may affect the classification performance. Since the data normalization step is skipped in our feature extraction stage, the Decision Tree classifier is found to be more suitable to classify our fusion features. Therefore, in the proposed method, the Decision Tree classifier is chosen.

Comparison with Existing Methods
To further evaluate the performance of the proposed method, five existing methods for image recognition are compared with the proposed method. They are BOW [15], SPM [16], CNN [17], Texture analysis [18] and CLAHE + GLCM + ELM [19]. The average results for 10-fold cross-validation of each method are presented in Table 7. Table 7. Comparison of confusion matrix for BOW [15], SPM [16], CNN [17], Texture analysis [18], CLAHE + GLCM + ELM [19] and the proposed CW-GLCM method on NDDA dataset: Defective (D) and Non-defective (N).  Following the 10-fold cross-validation experiment on NDDA dataset, a total number of 275 images from the defective class and 275 images from the non-defective class are divided into ten equal parts. Each part consists of 28 or 27 images of the defective and non-defective classes. Nine parts are used for training and one for testing. This process is repeated ten times until each of the folders is used exactly once as a validation set. Then, the average value for all ten experiment are taken. In Table 7, the classification performance is led by the proposed CW-GLCM and SPM method with 98.15% classification accuracy. Both methods correctly classified all 27 images of non-defective apples followed by CNN (26 images), BOW and Texture analysis (23 images), while the lowest goes to CLAHE + GLCM + ELM (16 images). For the defective images, the proposed CW-GLCM and SPM method correctly classified 26 out of 27 images of defective apples while CNN, BOW, CLAHE + GLCM + ELM and Texture analysis correctly classified 25, 24, 22 and 20 respectively.

BOW
Overall, our proposed CW-GLCM and SPM method outperformed others in NDDA dataset with the classification accuracy of 98.15%. This is followed by CNN with 94.44%, BOW 87.04%, Texture analysis 79.63% and CLAHE + GLCM + ELM 70.37% as presented in Table 7. The proposed CW-GLCM and SPM method also outperformed others with 96.30% precision and 100% recall rate. Among the methods, CLAHE + GLCM + ELM take the longest time for training (1323.58 s) and the fastest during testing (0.02 s). The CLAHE + GLCM + ELM method required the longest time for training due to the computationally extensive of CLAHE approach in the method. The CLAHE approach are usually used for image enhancement in off-line application [63]. In contrast to the training time, the CLAHE + GLCM + ELM able to classify the dataset faster compared to other methods because of the extremely fast learning speed of the Extreme Learning Machine (ELM) classifier used in the method [19]. This is followed by the CNN method with 0.08 s, SPM 0.13 s, proposed CW-GLCM 0.25 s, texture analysis 1.38 s and BOW 3.92 s. The BOW requires the longest time for testing the NDDA dataset because of the high computational cost in vector quantization step in BOW method [1]. In contrast to BOW, the CNN method able to classify faster because of the input images to the CNN method were rescaled from the original of 900 × 700 pixels to 227 × 227 pixels. This is due to the CNN requirement of having fixed-size input images [7,8]. If the arbitrary sizes of the images are applied, the CNN method will fit the images input to its fixed size via either cropping or warping the images [7,8,94,95]. Although our proposed CW-GLCM requires a longer time to classify the dataset than the SPM method, the results are still acceptable as it takes only less than 0.23 s longer than the CLAHE + GLCM + ELM, which is the fastest method during testing to successfully classify all the non-defective apple images. Other than that, the proposed CW-GLCM only misclassified one out of 27 defective apples. These results indicate that our proposed method able to effectively classify between defective and non-defective apple including the apple with a low-quality region on its skin. The examples of the low-quality region on the non-defective apple images can be found in bright-skinned apple and apple with yellow-white flecks as shown in Figure 10. In other methods, these types of apple may be misclassified as defective.
Symmetry 2019, 11, x FOR PEER REVIEW 18 of 26 or warping the images [7,8,94,95]. Although our proposed CW-GLCM requires a longer time to classify the dataset than the SPM method, the results are still acceptable as it takes only less than 0.23 s longer than the CLAHE + GLCM + ELM, which is the fastest method during testing to successfully classify all the non-defective apple images. Other than that, the proposed CW-GLCM only misclassified one out of 27 defective apples. These results indicate that our proposed method able to effectively classify between defective and non-defective apple including the apple with a low-quality region on its skin. The examples of the low-quality region on the non-defective apple images can be found in bright-skinned apple and apple with yellow-white flecks as shown in Figure 10. In other methods, these types of apple may be misclassified as defective.

Analysis of Classification Accuracy Against Low-Quality Region
Based on the analysis of the proposed method on the NDDA dataset, the results were further explored. The proposed method is tested with NDDAW dataset in which the dataset was created particularly to include more low-quality apple image region. This dataset consists of 159 apple images with a low-quality region on its skin. The comparison average results for 10-fold crossvalidation with other methods are presented in Table 8.  [15], SPM [16], CNN [17], Texture analysis [18], CLAHE + GLCM + ELM [19] and proposed CW-GLCM method on NDDAW dataset: Defective (D) and Nondefective (N). Following the 10-fold cross-validation for NDDAW dataset, a total number of 280 images from the defective class and 280 images from the non-defective class are divided into ten equal parts. Each part consists of 28 images of the defective and non-defective classes. Nine parts are used for training and one for testing. This process is repeated ten times until each of the folders is used exactly once as a validation set. Then, the average value for all ten experiment are taken. From Table 8, the classification performance is led by the proposed CW-GLCM method with 89.11% classification

Analysis of Classification Accuracy Against Low-Quality Region
Based on the analysis of the proposed method on the NDDA dataset, the results were further explored. The proposed method is tested with NDDAW dataset in which the dataset was created particularly to include more low-quality apple image region. This dataset consists of 159 apple images with a low-quality region on its skin. The comparison average results for 10-fold cross-validation with other methods are presented in Table 8. Table 8. Confusion matrix for BOW [15], SPM [16], CNN [17], Texture analysis [18], CLAHE + GLCM + ELM [19] and proposed CW-GLCM method on NDDAW dataset: Defective (D) and Non-defective (N). Following the 10-fold cross-validation for NDDAW dataset, a total number of 280 images from the defective class and 280 images from the non-defective class are divided into ten equal parts. Each part consists of 28 images of the defective and non-defective classes. Nine parts are used for training and one for testing. This process is repeated ten times until each of the folders is used exactly once as a validation set. Then, the average value for all ten experiment are taken. From Table 8, the classification performance is led by the proposed CW-GLCM method with 89.11% classification accuracy. This is followed by CNN 78.57%, BOW 78.50%, SPM 69.64%, Texture analysis 62.50% and CLAHE + GLCM + ELM 53.36%. The proposed CW-GLCM correctly classified 26 images out of 28 non-defective apples while CNN (22 images), BOW (23 images), SPM (19 images), Texture analysis (16 images) and CLAHE + GLCM + ELM (17 images). For the defective images, the proposed CW-GLCM method correctly classified 24 images of defective apples followed by CNN (22 images), BOW (21 images), SPM and Texture analysis (20 images) while CLAHE + GLCM + ELM (14 images). In this dataset, the proposed CW-GLCM outperformed others whereas the CLAHE + GLCM + ELM method recorded the lowest classification accuracy followed by Texture analysis method.

BOW
Overall, the CLAHE + GLCM + ELM method recorded the lowest classification accuracy performance in both datasets tested. This is due to the drawbacks of the CLAHE approach that sometimes may produce unwanted gray level artifact and creates an equal density in all the histogram bins during the image enhancement process [62]. In contrast, the obvious accuracy performance difference can be observed from the SPM method between NDDA and NDDAW dataset. Although the SPM method achieved high percentage for the measurement of precision, recall and accuracy in NDDA dataset, it presents lower performance in NDDAW dataset. This is because the NDDAW dataset contains more apple images with low-quality region compared to NDDA dataset. The result shows that the SPM method is less sensitive in detecting features in the low-quality region. On the other hand, our proposed CW-GLCM method achieved the highest classification accuracy, precision and recall in both datasets. This indicates that the proposed method is more robust in detecting features on low-quality region. Figure 11a,b depicts the performance of precision, recall and accuracy for NDDA and NDDAW. The training and testing time tested on each dataset are presented in Figure 11c,d. Overall, the CLAHE + GLCM + ELM method recorded the lowest classification accuracy performance in both datasets tested. This is due to the drawbacks of the CLAHE approach that sometimes may produce unwanted gray level artifact and creates an equal density in all the histogram bins during the image enhancement process [62]. In contrast, the obvious accuracy performance difference can be observed from the SPM method between NDDA and NDDAW dataset. Although the SPM method achieved high percentage for the measurement of precision, recall and accuracy in NDDA dataset, it presents lower performance in NDDAW dataset. This is because the NDDAW dataset contains more apple images with low-quality region compared to NDDA dataset. The result shows that the SPM method is less sensitive in detecting features in the low-quality region. On the other hand, our proposed CW-GLCM method achieved the highest classification accuracy, precision and recall in both datasets. This indicates that the proposed method is more robust in detecting features on low-quality region. Figure 11(a) and Figure 11(b) depicts the performance of precision, recall and accuracy for NDDA and NDDAW. The training and testing time tested on each dataset are presented in Figure 11(c) and Figure 11(d).  [15], SPM [16], CNN [17], Texture analysis [18], CLAHE + GLCM + ELM [19] and proposed CW-GLCM method.

Performance Evaluation with Different Image Resolution
To evaluate the efficiency of the proposed method against different image resolution, the test with three different image resolution (i.e. original, small and large) is conducted. The small and large images are created by rescaling them with two parameters, 0.5 and 1.5. The comparative results of their precision, recall, accuracy, training and testing time are presented in Table 9. Table 9. Comparative results (precision, recall, accuracy, training time and testing time) of the proposed method for different resolution images.  [15], SPM [16], CNN [17], Texture analysis [18], CLAHE + GLCM + ELM [19] and proposed CW-GLCM method.

Performance Evaluation with Different Image Resolution
To evaluate the efficiency of the proposed method against different image resolution, the test with three different image resolution (i.e., original, small and large) is conducted. The small and large images are created by rescaling them with two parameters, 0.5 and 1.5. The comparative results of their precision, recall, accuracy, training and testing time are presented in Table 9. From the results, it can be seen that the performance of the proposed CW-GLCM is not affected by the resolution change. The only difference observed is in the computational time. The time taken for training and testing is getting higher as the number of resolutions increased. The results proved that the image resolution does not influence the precision, recall and accuracy of the proposed CW-GLCM. Although the training and testing time are increased with the increment of image resolution, the 10-fold cross-validation experiment on the dataset shows that the proposed method are able to process 27 images within 0.25 s during testing on the original resolution (900 × 700 pixel). These results indicate that the proposed CW-GLCM method can be used in real-time systems.

Discussion
Overall, the proposed CW-GLCM method outperformed others in detecting important features on the low-quality apple image region. The proposed method performance exceeds 86.79% for all the performance measures in both datasets tested. In contrast, a lower precision, recall and accuracy are observed in the other five methods on NDDAW dataset with the maximum recall of 80.65% in BOW method. The classification accuracy of BOW in NDDAW dataset is 78.50%, SPM 69.64%, CNN 78.57%, Texture analysis 62.50% and CLAHE + GLCM + ELM 53.36%. The lower classification accuracy of the BOW, SPM, Texture analysis and CLAHE + GLCM + ELM methods is influenced by the presence of low-quality region on the apple images in NDDAW dataset. While the reasons that reduce the classification accuracy of the CNN method is due to the small sample dataset utilized in the experiment. The CNN deep learning method requires a large number of images for training in order to obtain a desired classification accuracy result [1,[7][8][9][10]. In contrast, the proposed CW-GLCM are able to achieve more than 86.79% for precision, 91.01% recall and 89.11% accuracy for both datasets tested. This indicates that the introduction of Curvelet features and Wavelet Coefficient in the GLCM method can improve the results even with low quality region images in small sample dataset. This is possible since the Curvelet and Wavelet transform able to enhance the apple images, especially on the low-quality region. However, the detection failure can still be observed on the defective apple that had been misclassified as non-defective apple. The example of the false positive classification in which defective apple incorrectly classified as non-defective apple is shown in Figure 12. This blemish defect region may be misclassified as stem ends or calyxes which are the natural parts of the apple that located at the top and bottom of the apple. This is due to similarities exist between these features.

Conclusions
The CW-GLCM method is proposed for apple classification to differentiate between defective and non-defective apple. The proposed methods fused the Curvelet and Wavelet transform with the GLCM method to improve its ability in detecting features on the low-quality region of apple images. In apple classification, it is crucial to detect these features to enable the classifier to differentiate and correctly classify between defective and non-defective apple. Comparative experiments have been performed between the proposed CW-GLCM method with other five existing methods namely BOW [15], SPM [16], CNN [17],Texture analysis [18] and CLAHE + GLCM + ELM [19]. Experimental results show that the proposed CW-GLCM and SPM method attained higher classification accuracy in NDDA dataset with the same precision (96.30%), recall (100%) and accuracy (98.15%). However, lower classification accuracy is observed in SPM method when tested with NDDAW dataset with 71.43% precision, 68.97% recall and 69.64% accuracy. In contrast, our proposed CW-GLCM able to achieve 86.79% precision, 91.01% recall and 89.11% accuracy. In comparison with other methods, our proposed method presents the highest precision, recall and accuracy results in both datasets tested. The result shows that the proposed CW-GLCM method are more robust and can effectively classify between defective and non-defective apple including the apple images with low-quality region.

Future Works
This paper proposed a new method of fusion features namely CW-GLCM for apple classification in smart manufacturing with a goal to optimize productivity. The optimization can be performed using the data analytics and visualization that requires accurate and reliable informative data as the input for analytics model. The result shows that the proposed CW-GLCM method able to effectively classify the defective and non-defective apple including low-quality apple image region. Therefore, it can be used in assisting the decision-making in apple manufacturing industry. Though the proposed CW-GLCM method is superior to other existing methods and achieves more accurate classification in both datasets tested, the method was shown to be less effective in detecting some of the defective apples as shown in section 5. In the future, we will concentrate on improving these drawbacks and also will consider on increasing the diversity of apple images dataset using the data augmentation techniques for more advanced deep learning method. However, the key challenge for the data augmentation is it is very computationally expensive to generate enough samples for training on a large neural network. Furthermore, due to particularities of various types of the defect,

Conclusions
The CW-GLCM method is proposed for apple classification to differentiate between defective and non-defective apple. The proposed methods fused the Curvelet and Wavelet transform with the GLCM method to improve its ability in detecting features on the low-quality region of apple images. In apple classification, it is crucial to detect these features to enable the classifier to differentiate and correctly classify between defective and non-defective apple. Comparative experiments have been performed between the proposed CW-GLCM method with other five existing methods namely BOW [15], SPM [16], CNN [17], Texture analysis [18] and CLAHE + GLCM + ELM [19]. Experimental results show that the proposed CW-GLCM and SPM method attained higher classification accuracy in NDDA dataset with the same precision (96.30%), recall (100%) and accuracy (98.15%). However, lower classification accuracy is observed in SPM method when tested with NDDAW dataset with 71.43% precision, 68.97% recall and 69.64% accuracy. In contrast, our proposed CW-GLCM able to achieve 86.79% precision, 91.01% recall and 89.11% accuracy. In comparison with other methods, our proposed method presents the highest precision, recall and accuracy results in both datasets tested. The result shows that the proposed CW-GLCM method are more robust and can effectively classify between defective and non-defective apple including the apple images with low-quality region.

Future Works
This paper proposed a new method of fusion features namely CW-GLCM for apple classification in smart manufacturing with a goal to optimize productivity. The optimization can be performed using the data analytics and visualization that requires accurate and reliable informative data as the input for analytics model. The result shows that the proposed CW-GLCM method able to effectively classify the defective and non-defective apple including low-quality apple image region. Therefore, it can be used in assisting the decision-making in apple manufacturing industry. Though the proposed CW-GLCM method is superior to other existing methods and achieves more accurate classification in both datasets tested, the method was shown to be less effective in detecting some of the defective apples as shown in Section 5. In the future, we will concentrate on improving these drawbacks and also will consider on increasing the diversity of apple images dataset using the data augmentation techniques for more advanced deep learning method. However, the key challenge for the data augmentation is it is very computationally expensive to generate enough samples for training on a large neural network. Furthermore, due to particularities of various types of the defect, severity and cultivar of apple images in this research, it will become a major challenge for the data augmentation. The overly augmented and redundant augmentation may also introduce biases into the dataset and can slow down the training [96,97]. These will become the focus area for our future work and extend its implementation for multi-class classification. We intend to perform multi-class classification to classify different types of defective apples for further data analytics. Based on the current production data of the defective types in apple production, the data analytics process will identify the patterns and learns for future planning and prediction to improve the apple growth.