1. Introduction
Smart manufacturing is the advancement in manufacturing process through the integration of computer control and various high level adaptability technologies to optimize productivity [
1]. A huge volume, variety and velocity of data in smart manufacturing or referred to as big data, offers an opportunity not only for managing large amount of information, but also to improved diagnostics and prognostics capabilities [
2,
3]. The analytics in the manufacturing process can shift from a reactionary to a predictive practice [
1,
4,
5] by improving the existing capabilities such as product defect detection and supporting new capabilities for future planning and prediction [
1,
4,
5]. In delivering high quality predictive solution for future planning, the data quality is the most important big data factor [
2]. The effective and accurate method is required to provide reliable information as the input for analytics models to make a better decision [
6].
In this paper, we investigate the reliability of fusion features as data input to classify between defective and non-defective apple for automatic inspection and sorting processes. The defective and non-defective information can be further used as the input for analytics model for future prediction. The analytics synthesize, analyze the trends and identify the patterns based on the current production data for future planning, decision making and actions to improve the apple growth and processing efficiency. In our prior work [
1], a vision-based apple classification for smart manufacturing using three image recognition methods which are Bag of Words (BOW), Spatial Pyramid Matching (SPM) and Convolutional Neural Network (CNN) are studied. However, the performances of the dictionary-based (BOW and SPM) and deep learning-based (CNN) methods in the prior work are limited by the number of samples, computational power [
1,
7,
8,
9,
10] and the presence of low-quality regions such as bright features or flecks features on the non-defective apple skin. In apple classification, the detection and extraction of features on low-quality region is important to differentiate between defective and non-defective apples. Failure to detect these features may reduce the classification accuracy.
This research focuses on the detection of suitable features that able to increase the aforementioned classification accuracy in small sample dataset even with low quality region images. A small sample dataset of apple (550 and 560 images) were utilized in this research due to the particularities of minor details apple images that contain low-quality apple skin region. In this research, it is challenging to meet the large-scale datasets requirements of deep learning method such as ImageNet dataset. ImageNet dataset consist of 3.2 million cleanly labeled images and aim to contain 50 million images in the dataset [
11]. The large-scale datasets or big data are normally harvested automatically from a large users or crowds population using crawling techniques, crowd source or application programming interface that provided by social media providers [
12]. While in this research, a small dataset of apple is mainly sampled by the authors as a proof-of-concept to classify defective and non-defective apple. Despite apple having the highest production rate and had steadily increase over the year as reported by United States Department of Agriculture (USDA) Foreign Agriculture Service [
13], the dataset of apple images are limited. Therefore, we mainly sampled the apple dataset for this proposed concept. To suite the problem stated previously, we investigate the fusion features based on Gray Level Co-occurrence Matrix (GLCM) method. The GLCM is chosen since it is one of the most reliable texture-based method which can analyze and describe well the surface and the structure of the images properties. However, this method is dependent on the images texture information [
14], thus, the features may not be effectively extracted from the low-quality region images. To solve this problem, we investigate the effect of Curvelet and Wavelet transform on to the GLCM method to improve and enhance the detection of features on the low-quality image region. We named our new method of fusion features as CW-GLCM method, which fused the features of Curvelet and five GLCMs features (entropy, contrast, correlation homogeneity and energy) based on Wavelet coefficient.
The CW-GLCM is classified using the Decision Tree classifier. The proposed method introduces the Curvelet features to improve and enhance the low-quality region of the images. In addition, the original GLCM method is modified using the Wavelet coefficient to improve the detection of texture information in low-quality region.
To test the proposed method, two datasets of apple namely NDDA and NDDAW that consists of 1110 apple images from defective and non-defective category are created. The NDDA is created to generally evaluate the capability to detect various defective and non-defective apple types. On the other hand, the NDDAW is created particularly to include more images that contains low-quality apple skin region. Then, the performances of the proposed method are compared with five existing methods which are BOW [
15], SPM [
16], CNN [
17], GLCM texture analysis [
18] and CLAHE + GLCM + ELM [
19] in terms of precision, recall, accuracy and computational time using 10-fold cross-validation.
The rest of this paper is organized as follows:
Section 2 presents the related works on the image recognition methods.
Section 3 described the details of the proposed method. The performance of the proposed method is presented in
Section 4; followed by the discussion in
Section 5. The conclusions and future works are finally made in
Section 6 and
Section 7.
2. Related Works
The focus of this research is to investigate suitable feature that able to classify between defective and non-defective apple. The classification will be used for automatic inspection and as input for further analytics. The research falls under image recognition, where in computer vision, the image recognition method is used to detect the instances of the object in digital image. The image recognition method is divided into two main phases, feature extraction and feature classification. The feature extraction phase is a key step to extract important elements of an object while the feature classification is a phase to classify the object into classes based on their similarity. In the feature extraction phase, it is important to define and select useful features to recognize the object [
20]. The features are the characteristic of the object that can be measured such as shape, color, texture and other representations of features. This research will focus on the suitable selection of features and effective feature extraction method in recognizing between defective and non-defective apple, which also includes apple images with a low-quality region. Due to limited related works on apple classification, we also include other related image recognition methods that suits the focus of this research in the following section.
In image recognition, the texture-based features had shown to be among the most successful approach [
21] to recognize image. Texture features are the property representing the surface and structure of an image [
22]. The texture-based method analyses the relations of the neighboring pixels or sub regions for image representation. The widely used texture-based methods are Local Binary Patterns (LBP), GLCM, Gabor and Haar-Like [
14]. LBP is a texture descriptor proposed by Ojala et al. [
23]. LBP describes the texture in the images by using the histogram of label values that obtained the result from the thresholding between the neighborhood pixels with the center pixel [
24,
25]. Although the LBP operator was initially meant for the texture descriptor, it applicability has been extended and applied in various recognition task by some modification [
26]. However, one of the limitation of the LBP operator is its inability to capture the dominant features of a large scale structures [
27,
28]. Thus, researchers had considered to incorporate other technique on the LBP method to improve the capabilities. Al-Hammadi et al. [
24] improved the LBP detection by using the Curvelet transform while Abdulrahman et al. [
27] had used Gabor Wavelet and Principal Component Analysis (PCA). Though there are many version variations of local descriptor based on LBP method [
25], the LBP shows some weaknesses especially for rotation, translation and scale object even for specific LBP rotation invariant version [
28,
29]. Thus, Papakostas et al. [
26] introduced Moment-Based Local Binary Patterns to improve the invariant behavior of LBP method towards rotation, translation and scaling conditions. Their method was tested on face recognition cases. The results had shown that there’s significant factor improvement in the method on the rotation, translation and scaling in pattern recognition problems. Though LBP and its variants such as Classic LBP, Elliptical Local Binary Pattern (ELBP), Uniform ELBP, Local Directional Pattern (LDP), Mean-ELP (M-ELBP) and others have shown the potential in many applications [
30], the methods may not work well in defective and non-defective apple classification due to the presence of low-quality image region features on the apple skin. The LBP method capture the pattern in the images based on circular pattern of the neighborhood. To detect the low-quality region on the apple images, more specific spatial directional method is required so that it could extract the texture at different directions and orientation covering the low-quality image region on the apple skin.
The most efficient texture-based method are GLCM method [
31,
32]. In GLCM, the features were extracted from a co-occurrence matrix based on the selection of GLCM features to be observed depending on the texture data encountered and their cases [
33]. The GLCM method produces features that are able to well describe the relationship of a neighboring pixels in the texture image. However, this method will only useful and effective in recognizing objects with texture information [
14]. Due to this reason, Zhang et al. [
34] proposed automatic lightness correction combine with GLCM, color and statistical features that intended to improve the detection of the defect region in Fuji apple. Then, in selecting the relevant features, I-RELIEF algorithm was used in their work. However, using the I-RELIEF algorithm in the method introduced blind selection problem and increased the complexity. Alternatively, the simplest approach to extract the texture features in the images is using the statistical moments of gray level histogram [
35]. Capizzi et al. [
35] use statistical descriptor to extract texture features from images. To improve the limitation of the texture-based method, they also use the hue, saturation, and value (HSV) space to represent the color information of the images. Their method was tested with an orange dataset using Radial Basis Probabilistic Neural Network (RBPNN) classifier. Though their method showed promising results to classify defective oranges, the method may not work well with the presence of low-quality image region.
To strengthen the characteristics of the images, Fahrurozi et al. [
33] use several edge detection techniques from first and second order edge detection technique to extract the GLCM texture features. The first order was chosen because of its simplicity, while the second-order because of its effectiveness. The limitation of their works is that the researchers investigated the effect of several edge detection technique only on one GLCM features, which is the energy. The selection of GLCM features is further extended in [
36] for apple diseases detection and classification using Particle Swarm Optimization (PSO). However the implementation of PSO had increase the considerable convergence time and computational complexity [
37]. In other works, Moallem et al. [
38] proposed a statistical, textural and geometric features for golden delicious apple grading using the SVM classifier. They used the GLCM method to extract the second order texture features, which are contrast, correlation, energy, homogeneity and entropy; whereas the first order measures textural features are not considered. Although their method was able to achieve convincing result (89.20%–92.50%) for grading golden delicious apple, the success of classification rate decreases when the defective region is close to stem ends area. Conversely, Olaniyi et al. [
18] suggested a texture analysis method based on eight features from first order statistic and second order statistic, which is, GLCM. The first order features used in their work are mean, variance and standard deviation, while the second order are contrast, correlation, energy, homogeneity and entropy. Their method was able to achieve more than 96.25% of the classification accuracy and improved the result in [
38] by utilizing the first order statistic features. However, as the method were solely dependent to the texture-based method, the method may have difficulties in distinguishing between objects that has quite similar texture [
14,
33]. Recently W.Li et al. [
19] proposed CLAHE + GLCM + ELM method using contrast-limited adaptive histogram equalization (CLAHE) and GLCM with Extreme Learning Machine (ELM) classifier to address the limitations of GLCM. The CLAHE was used in their work to depress the noises and to improve the local contrast while the ELM classifier was used to reduce the time complexity. However, their method unable to perform well in terms of sensitivity, specificity and accuracy [
19].
Other representations of features that are extensively used in image recognition are the keypoint-based features. These features describe the image by detecting the keypoint in the image and locate keypoint descriptor patch at the center of the keypoint. In image recognition, the widely used keypoint-based features are Harris corner detection [
39], Scale Invariant Feature Transform (SIFT) [
40], Speeded up Robust Features (SURF) [
41,
42] and Features From Accelerated Segment Test (FAST) [
43]. Harris detection is robust in matching, good stability and repeatability [
44]. However, it is sensitive to scale changes. SIFT detector and descriptor [
40] are robust to affine distortion, illumination changes, invariant to scale and rotation changes [
40]. Although SIFT has shown high repeatability and accuracy, SIFT descriptor has high computational cost [
45]. This issue has been addressed in the SURF detector. The SURF detector is faster than SIFT without degrading the quality and more robust to noise [
46,
47]. To improve on the computational time of earlier methods, Rosten et al. [
43] introduce FAST. FAST is faster than both SURF and SIFT method but this method is not invariant to scale [
48]. Although the keypoint-based features can be applied in almost all kinds of image recognition, these features are limited; in which, due to noise or distortion where a different patch of contexts or scene may be represented by the similar descriptor and different context or scene also can be presented by different descriptors [
49].
To overcome this limitation, a dictionary-based features is used for image recognition. The dictionary-based approach utilized the keypoint patches or regular grid patches or segmentation-based patches to extract a visual pattern (visual words) from the images. Then, the images are represented by counting the number of occurrences patches of each visual words in the images and used it as a feature to train the classifier. In the dictionary-based feature, the BOW method [
15] is among the well-known method. Although BOW method is easy to implement, robust to several parameters such as occlusion, clutter, non-rigid deformation and viewpoint changes, this method disregards the spatial layout information in the visual words [
50,
51]. Disregarding this information may lead to the missing spatial arrangement features on the image composition [
50]. The aforementioned issue was addressed by Lazebnik et al. [
16] in the SPM method. In the SPM method, the spatial layout information is included to improve image representation. This is because the spatial information is important to discriminate the object, since a different object may have the same visual appearance but in different spatial arrangement [
52]. Despite the advantages of the SPM method, this method generates a large numbers of feature redundancies [
53]. In order to eliminate the redundancies and select the representative keypoints, Lin et al. [
50] and Li et al. [
54] proposed a keypoint selection technique to resolve this limitation. Similarly, Xie et al. [
55] also proposed a new spatial partitioning scheme to avoid feature redundancy by modifying the pyramid matching kernel.
In a more recent development, deep-learning based method such as CNN method has received considerable attention in computer vision [
56]. The CNN method has been implemented in many fields of image recognition [
10,
17,
57,
58,
59]. For instance, dos Santos Ferreira et al. [
17] proposed CNN method for weed detection and classification in soybean crops. In CNN method, there are few limitations in the structure of the method. Many studies attempting to improve on this issue. One of the major issues is the fixed-size input image required by the CNN method [
8]. To address this issue, He et al. [
8] proposed a network structure (SPP-net) method that can generate a fixed-length representation regardless of image size or scale. Another issue in the CNN method is the difficulties to train the neural network when the network depth in the structure increases [
10,
60]. To improve the training for the deeper network, a residual learning framework has been proposed by He et al. [
60] that has reformulate the network layers as learning residual functions. Other main issues in the CNN method are it requires large number of training images to avoid over-fitting and are also computationally expensive [
7,
8,
10].
From the above review, texture-based features were among the method that had been considered in existing apple recognition. The GLCM texture-based features is seen as one of the most suitable candidate for classifying defective and non-defective apple. The GLCM method is chosen as it will detect any different property changes on the surface of the apple skin images. However, the GLCM method is less effective in detecting significant features in low-quality image region. In apple classification, failure to detect these features can lead to misclassification between defective and non-defective apples, which consequently reduces the classification accuracy.
Therefore, in this research, we investigate the Wavelet and Curvelet image enhancement technique on the GLCM method to improves the detection of features on low-quality region for apple images. Though there are many image enhancement techniques, in apple classification, it is challenging to enhance the low-quality region while at the same time reduce the uneven illumination effect on images with less computational time and cost [
61]. Some of the image enhancement technique such as Adaptive Histogram Equalization (AHE) and CLAHE are unsuitable to be used for real-time application due to high computational time [
62,
63]. It is also difficult to enhance the low-quality region using traditional image enhancement technique such as frequency-domain. This is due to the lower frequencies that resolved better in frequency, while the higher frequencies are resolved better in time [
64,
65,
66]. The traditional frequency-domain image enhancement technique does not provide simultaneous spatial and spectral resolution.
On the other hand, the Wavelet transform image enhancement technique is capable to provide both spatial resolution and frequency [
61]. The Wavelet transform ensures a good frequency resolution at lower frequencies and good spatial resolution at higher frequencies [
61,
64]. In the proposed method, the Wavelet transform image enhancement is used to improve the quality of the texture of low-quality region in the GLCM method. The Wavelet transform is one of the suitable image enhancement technique for texture analysis [
67]. However, its limitation lies in the curved region areas. To effectively deals with a low-quality region area, we also used Curvelet transform image enhancement technique in the proposed method since it has a better ability in capturing the directional edges of curves, corners and profiles [
68,
69]. The Curvelet transform also provides richer information in both spatial and spectral domains [
70]. In the proposed method namely CW-GLCM, the extracted Curvelet features from Curvelet transform are then fused with the GLCM features based on the Wavelet transform to produce a highly informative fusion feature.
As presented in the prior section, many of the image recognition methods discussed earlier are mostly concerned in more general pattern recognition problems. None of the aforementioned methods focused on the recognition and classification of image compromising low-quality region. Based on the review, five methods which are BOW [
15], SPM [
16], CNN [
17], GLCM texture analysis [
18] and CLAHE + GLCM + ELM [
19] have been chosen to evaluate the proposed method. They are selected due to their popularity and stability to represent the dictionary-based method, deep-learning based method and texture-based method, respectively.