Detection and Classiﬁcation of Defective Hard Candies Based on Image Processing and Convolutional Neural Networks

: Defective hard candies are usually produced due to inadequate feeding or insufﬁcient cooling during the candy production process. The human-based inspection strategy needs to be brought up to date with the rapid developments in the confectionery industry. In this paper, a detection and classiﬁcation method for defective hard candies based on convolutional neural networks (CNNs) is proposed. First, the threshold_li method is used to distinguish between hard candy and background. Second, a segmentation algorithm based on concave point detection and ellipse ﬁtting is used to split the adhesive hard candies. Finally, a classiﬁcation model based on CNNs is constructed for defective hard candies. According to the types of defective hard candies, 2552 hard candies samples were collected; 70% were used for model training, 15% were used for validation, and 15% were used for testing. Defective hard candy classiﬁcation models based on CNNs (Alexnet, Googlenet, VGG16, Resnet-18, Resnet34, Resnet50, MobileNetV2, and MnasNet0_5) were constructed and tested. The results show that the classiﬁcation performances of these deep learning models are similar except MnasNet0_5 with the classiﬁcation accuracy of 84.28%, and the Resnet50-based classiﬁcation model is the best (98.71%). This research has certain theoretical reference signiﬁcance for the intelligent classiﬁcation of granular products.


Introduction
Hard candy, as a major category of confectioneries, is one of the main varieties of products of the Chinese food industry.However, more than 50% of the market share is occupied by foreign brands in the competitive landscape of the Chinese confectionery industry, which is mainly due to the backward industrial structure and the uneven candy quality, such as the different shapes and the various types of defects.Moreover, in most Chinese candy-producing companies, a simple, commonly employed inspection method is having trained inspectors visually identify and manually remove the defective hard candies on the conveyor belt.It is evident that this operation is time consuming and cannot ensure consistency among different operators.
Computer vision is considered one of the best alternatives for performing an online and nondestructive quality inspection [1].Nowadays, many applications that utilize computer vision on food industry products have been developed, especially in the defectdetection area.Many external properties, such as the color, shape, texture, and wavelet features (or combinations of these) are extracted from images, and these features are then used to train classifiers.For example, Chao et al. proposed a multi-step hybrid identification method based on the color sorting table method (CSTM) to identify and remove various foreign bodies in the production process of tobacco packs, with an accuracy rate of 97.8% [2].Carvalho et al. assessed the quality of macadamia kernels by using near infrared spectroscopy (NIRS) and nuclear magnetic resonance (NMR) with chemometric tools such as PCA-LDA and GA-LDA to evaluate external kernel defects [3].Lu et al. used the random forest (RF) to detect defective apples [4].Some researchers have used support vector machines (SVMs) to detect defective fruits and vegetables, such as potatoes, bulk raisins, and rice [5][6][7][8][9][10][11], while others have used the Hough transform (HT) [12] and extreme learning machine (ELM) [13] to sort carrots and tomatoes.Although most defective hard candies can be easily distinguished from good ones by machine vision methods, a few defective hard candies with similar phenotyping may significantly confuse these recognition algorithms, which is not conducive to achieving high-quality sales and industrial upgrading for candy-producing companies.
Besides the aforementioned algorithms, a new branch of machine learning called deep learning has achieved many state-of-the-art results in the field of image classification in recent years [14].Deep learning refers to the use of deeper ANN architectures that combine the process of feature extraction and classification.It encodes the composition of lower-level features into more discriminative higher-level features.Thus, deep learning can solve more complex problems with higher precision.The convolutional neural network (CNN) [15] is a basic deep learning tool, and it has been successfully used in image classification and object detection.The use of a CNN is a new and promising technique that has become more popular in the field of defect detection in agricultural products and industrial parts.Arthur et al. trained the deep residual neural network (ResNet) classifier to detect the external defects on tomatoes, and they found that fine-tuning outperformed feature extraction, revealing the benefit of training additional layers when sufficient data samples are available [16].Xu et al. proposed a feature-wise attention-based relation network (FAR-Net) for multilabel jujube defect classification, which effectively facilitated the learning of correlation between labels and improved the multilabel classification accuracy [17].Ahmad et al. used an improved CNN algorithm to detect the apparent defects of sour lemon fruit and graded them [18].Zhang et al. proposed a new defect detection pipeline, called Image Enhanced Mask R-CNN (IE Mask R-CNN), that includes the best combination of image enhancement and augmentation techniques for pre-processing the dataset, and a Mask R-CNN model tuned for the task of wind turbine blade (WTB) defect detection and classification [19].Duong et al. used the resultant defect signature wavelet image (DSWI) and designed the deep convolution neural network architecture to identify the fault in the bearing [20].In addition, Zhuang et al. [21] used the CNN to classify solid wood flooring; Wan et al. [22] and Wang et al. [23] used the CNN to classify the steel surface defects; Zhou et al. [24] used the CNN to classify the defective green plums.Therefore, these algorithms provide a good reference for the research on the classification of defective hard candies.
The innovations of this study include (a) realizing the segmentation of adhesive hard candies based on concave point detection and (b) introducing the CNN classification models in the defect classification of hard candies.The rest of the paper is constructed as follows: Section 2 introduces the classification system and the collection of the experimental materials.Section 3 describes the segmentation methods based on concave point detect, the results of ellipse fitting and the CNN models used for classification.Section 4 discusses the performance of four CNN models compared with several machine vision methods, and the prototype design of this classification system.Finally, Section 5 summarizes the conclusions and future work.

Classification System for Hard Candies
The hard candy acquisition equipment was composed of four components: a fixing device, a transmission device, an industrial camera, and a strip light source.Hard candy samples were collected using an MV-CA050-10GM/GC industrial camera manufactured by the HIKROBOT Technology company (Hangzhou, Zhejiang Province, China), with a resolution of 2448 × 2048 pixels.The model of the lens was an MVL-HF0828M-6MP with an 8 mm focal length, and was also produced by the HIKROBOT Technology company.The model of the strip light source was a DHK-TL6030-W produced by the Daheng Imaging company (Beijing, China), and was selected to reduce the impact of ambient light during the image acquisition process.The computer CPU used for image processing and for classification model training and testing was an 8th Intel Core i7 processor, the graphics card was an RTX2080Ti, and the computer ran under a Linux system, with a main frequency of 2.6 GHz, a memory of 32 GB, and a display memory of 11 GB.The main structure of the acquisition equipment is shown in Figure 1.

Establish Hard Candy Dataset
The Nantong food machinery company provided about 8 kg of hard candies, including four types of hard candies, as shown in Figure 2. Comparing to the traditional two-type classification of good and defective candies, the four-type classification of hard candies can help identify quality problems in the production process.For example, the holey candies are caused by insufficient cooling, while the broken candies are caused by transportation bumps, and the small candies are caused by insufficient feeding.Therefore, the classification results would been used to guide the improvement industrial production.On the other hand, due to the big difference between four defect types, the classification method could be improved according to the experimental results which will be discussed in Section 4.During the process of image acquisition, the candy samples were manually sprinkled on the conveyor belt that moves at a speed of 3 m/s.The industrial camera captured original images of hard candies while moving along the transport direction.A total of 126 images of mixed candies were captured, which were then divided into 2552 pieces of sub-images.After counting, there were 904 good candy samples, 907 defect candy samples, 337 broken candy samples, and 404 small candy samples.In total, 70% of the samples were used as the model training set of the model, 15% were used as the verification set, and 15% were used as the testing set.The verification set was used to find out the appropriate parameters of the model during the training phase, while the testing set was used to further evaluate the performance of the proposed models in the testing phase.In order to enrich the complexity of the samples, the brightness transformation and image rotation of candy images were used for the training set, and 7132 hard candies were obtained to reconstruct the experimental training samples as shown in Table 1.

Methods
The classification method mainly includes two parts: one is the detection and segmentation of adhesive hard candies, which will be discussed in Section 3.1, and the other is the classification of hard candies.The main steps involved in the classification of defective hard candies are shown in Figure 3.After the classification system starts, the industrial camera captures the original image, when the hard candies reach the designated location.The segmentation method based on concave point detection is used to split the adhesive hard candies.After being preprocessed, the sub-images of the hard candies are put into the pre-trained convolutional neural network model for classification, and the classification results of the four types of hard candies are output.

Identification of Defect Candies
Before being trained by the model based on the CNN, a color channel was constructed to extract the candy mask, which was defined as follows.
where channel pink is the color channel of pink; r, g, and b represent the three-color brightness channels from 0 to 255, respectively; c sug is the highlight coefficient of the red channel, which was found appropriate at the value of 0.5 after several experiments in this work; So that the channel pink ranges from −255 to 255.In order to process and display results conveniently, the values of channel pink were changed into the scale from 0 to 255.The original candy image is shown as Figure 4a, and the thermodynamic image of the transformed pink channel is shown as Figure 4b. Figure 4c shows the histogram of the pink channel, and there is clear difference between the foreground and the background.The threshold_li method [25,26] can give the best threshold by minimizing the cross-entropy between the foreground and the foreground mean, and the background and the background mean.
Taking the advantage of the threshold_li method, the mask of the hard candy is easily split out from the background as shown in Figure 4d.

Segmentation of Adhesive Hard Candies
The adhesive cases were found by the procedure in Section 2.2, and they could not be completely avoided and required further processing.A segmentation algorithm based on concave point detection and ellipse fitting [27] was used to split the adhesive hard candies.This procedure was composed of a determination of adhesive candies, concave point detection, contour segment grouping, and ellipse fitting, shown in Figure 5.

Adhesion Determination
In this paper, a new discriminant method based on area factor is proposed to determine the adhesion of hard candies, which is defined as follows: where A refers to the area.The convex hull is the smallest convex set containing the adhesive candies.The index τ offers a direct and general idea of the appearance of the adhesive candies, which ranges from 0 to 1.The value of index τ is smaller when there is adhesion.Figure 6 shows some typical examples of adhesive candies and their corresponding convex hull.The red line is the boundary of the candy, and the blue line is the convex hull.A non-adhesive candy should have a larger τ value, while adhesive candies have smaller τ values.The receiver operating characteristic (ROC) curve is a good way of determining the threshold when the ground truth is fully known in the training set.When non-adhesive candies are defined as positive cases and adhesive candies are defined as negative, the specificity and the sensitivity are defined as follows: sensitivity = true positive true positive + f alse negative speci f icity = true negative true negative + f alse negative The sensitivity measures the proportion of positives that are correctly identified as such, and the specificity measures the proportion of negatives that are correctly identified as such.Figure 7 shows these two indexes change when τ increases from 0.85 to 0.99.It is clear that the red point in Figure 7 is the optimized point, so that the threshold T τ is set as 0.93 for τ value to determine whether there is adhesion.Specifically, when the τ value is smaller than the threshold, the candy is considered as an adhesive candy; otherwise, it is considered as a non-adhesive candy.

Concave Point Detection
An improved Curvature Scale Space (CSS) algorithm [28] was used to detect the corner points of the contour boundary of the adhesive hard candies.The corner point was defined as the local curvature maximum point located on the target contour.Although some points were detected as local maximums in the curvature values, they were little difference between the adjacent points in the Region of Support (ROS) defined as from one of the neighboring local curvature minima to the next, and the details can refer to [28].Therefore, a local curvature adaptive threshold was proposed to remove redundant corner points, which is defined as follows: where k refers to the mean curvature of the neighborhood area, p i represents the position of the candidate corner point, R 1 and R 2 are the size of the ROS from p i to the closest candidate corner points before and after, respectively; and C is a coefficient which should be greater than 1 and less than 2. Because the round corner has a convex waveform in absolute curvature function but it is not sharper than that of a triangle, C is set as the median value of 1.5 in the proposed method.Since the corner points are composed of concave points and non-concave points, an extraction method is needed.For any detected corner point p i , the point p i−k and the point p i+k , which are k pixels apart from p i , are extracted, and they are then connected by a line.If the line is outside the corresponding adhesion area, the corner point p i is considered a concave point.Otherwise, the corner point p i is considered a non-concave point and removed.Figure 8 shows the result of concave point acquisition, which is marked by white dots.

Contour Segment Grouping
Since each contour segment does not correspond to a single target, there may be cases where multiple contour segments belong to the same target.Therefore, it is necessary to divide the contour segments belonging to the same target into one group.As for a contour segment s i and another contour segment s j , if they are grouped into one group, the following requirements must be satisfied.

1.
If the average distance deviation (ADD) produced by the fitted ellipse after being divided into one group is smaller than that produced by any contour segment before the combination, then these contour segments can be divided into the same group.As for the contour segment s i = {p k (x k , y k )} n k=1 (where x represents the number of pixels in the contour segments, and p k represents a pixel of one certain contour), supposing that the fitted contour segment generated after ellipse fitting is , then the ADD between s i and s f ,i can be defined as follows: If the calculated ADD is smaller, then the real contour segment of the target is closer to the fitted contour segment.Therefore, the constraint can be defined as follows: ADD s i ∪s j ≤ ADD s i , ADD s i ∪s j ≤ ADD s j (7) 2.
If the distance between the gravity center of the fitted ellipse being divided into the same group and that of the ellipse fitted separately for each contour segment is close, then it can be divided into one group.
Suppose that the gravity centers of the ellipse fitted by the contour segments s i and s j are e i and e j , and that the gravity center of the ellipse fitted by the two contour segments is e ij .If d(x, y) is used to represent the Euclidean distance between two points, then the following constraints need to be met: where t 1 is a preset distance threshold whose value is the short axis size of the smallest ellipse fitted separately by each contour segment from the input image.

3.
If two gravity centers of any two ellipses are fitted from contour segments s i and s j , they can be divided into one group.
Supposing that the gravity centers of the ellipse are fitted by s i and s j are e i and e j , and d(x, y) is used to represent the Euclidean distance between two points, the following constraints then need to be met: where t 2 is a preset distance threshold whose value is two to four times higher than t 1 .
The result obtained by satisfying the above three conditions is shown in Figure 5d.The contour segments divided into the same group are marked with the same color in the figure for identification.

Ellipse Fitting
In order to obtain the contour boundary of the adhesive hard candies, an ellipse fitting method [29] based on the least square method is used to complete the adhesion segmentation, as shown in Figure 9.The blue line is the boundary of the fitted ellipse.

Classification of Defective Hard Candies
The convolutional neural networks (CNNs) are able to extract the features of images automatically, which makes it easy for images to be studied [30].The typical structure of the CNN is as follows: Some CNNs are used as a starting point to study new tasks that have already been learned to extract features and information from open image database.Most of the CNNs here were trained with the database of ImageNet [15], and the main applications of pretrained CNNs are for transfer learning, feature extraction or classification.CNN models adopted in this paper are widely known in the literature: • Alexnet [14], one of the first deep networks, is made up of five convolutional layers and three fully connected layers.

•
Googlenet [31], compared to Alexnet, has a much deeper network and a lower number of network parameters.It possesses 7 million parameters and contains nine inception modules, four convolutional layers, three average pooling layers, five fully connected layers, and three softmax layers.• VGG (VGG16) [32], which was developed by the Visual Geometry Group (VGG) of the University of Oxford, is an Alexnet enhanced by replacing kernel-sized filters with multiple 3 × 3 kernel-sized filters one after another • Resnet (Resnet-18, Resnet34 and Resnet50) [33] is a series of deep learning models, which is similar to VGG but is deeper and with shortcut connections.Resnet-N means that the model those the number of convolutional layers and fully connected layers is N in total.
• MobileNetV2 [34] is a mobile architecture which is used to object detection in the framework called SSDLite.This model is one of lightweight neural network model with small model parameters and great performance.

•
MnasNet0_5 [35] is an automated mobile neural architecture search approach, which is faster than the MobileNetV2 on the object detection.
Taking the Resnet-18 convolutional neural network as an example, the classification model in Figure 3 is as shown in Figure 10.

Hard Candy Classification Test Result 4.1.1. Classification Performance of CNN Models
The eight classification models based on convolutional neural networks (Alexnet, Googlenet, VGG16, Resnet-18, Resnet-34, Resnet-50, MobileNetV2 and MnasNet0_5) were constructed, and the collected samples as listed in Table 1 were used for each model's training, validation and testing sets.The number of iteration steps was set to 100, and the minibatch size was set to 8. The learning rate was 0.00009, and Adam was selected as the optimizer.The trained networks are available at https://github.com/NGLS-E/Candy(accessed on 12 August 2021).
The eight classification models' testing results are listed in Table 2.The testing results show that the classification accuracy values of these models based on the convolutional neural network were higher than 97% except for the MnasNet0_5-based model with the accuracy 84.28%.Among them, the classification model based on Resnet-50 had the highest classification accuracy (98.71%).Here, the frames per second (fps) of each method is calculated considering the time by extracting candy candidate areas and classification for a picture with about 30 hard candies in average.Taking the Alexnet-based model as an example, it took about 30 ms to extract candy candidate areas, and the Alexnet-based model took about 99 ms to classify these candies.The total time spent on a single picture would be about 129 ms, so that the fps was about 7.75 (1000/129).Considering the running time and classification accuracy, the Alexnet-based model is the greatest among these models.In order to further analyze the performance of the eight CNN models, the detective accuracy of each type of defect were calculated and their confusion matrixes are listed in Table 3 and ROC-AUC curves of these eight models are shown in Supplementary Materials as Figure S1.The main diagonal shows the average recognition rate of each type of candy for each type of CNN model.Through the analysis of misjudged samples, we found that the defective candies were recognized as good candies when the hole of the defect was too small to inspect.Adding the number of hard candies with a small hole or new features designed manually may be able to future improve the classification accuracy of holey hard candies.For the other thing, if the hole was very small and negligible, the defective hard candy was mistakenly classified as a good one, which is usually acceptable for the producer or consumer.The other group of experiments were carried out to analyze the effectiveness of feature extraction of the proposed framework by feeding the features at the layer just before the first fully connected layer to four traditional classifiers.The results are listed in Table 4, where the accuracy of the Enhanced k-NN model was the best (k = 4) by tuning the value of k with distance weights.The SVM achieves the best accuracy with 90.98% among the traditional methods, but all of them do worse than the almost models based on CNNs except for MnasNet0_5 in Table 2.That may be because the high dimensional output of convolutional network up to 512 dimensions causes the dimensional curse for the traditional classifiers.

Models Accuracy
CDNN [36] 76.73% Enhanced k-NN [37] 74.90% SVM [6] 90.98% Random forest [38] 90.33% Resnet-18-based model 98.20% In order to further analyze the performance of the traditional models, the detective accuracy of each type of defect were also calculated and their confusion matrixes are shown in Figure 11.Comparing the deep learning models, the traditional methods classified defective hard candies (broken and smaller candies) into good ones, which is unacceptable in the actual production.

Prototype Design Principle and Workflow
The mechanical part of the defective hard candy intelligent sorting system is manufactured and provided by Nantong Wealth Machinery Technology Company (Nantong, China).The system can be applied to the actual production process of hard candy, as shown in Figure 12 and a working video of this system is shown in Supplementary Materials.In actual production, the vibrating tray is used as the feeding mechanism to sprinkle the cooled hard candy on the conveyor belt discretely.The conveyor belt transports the hard candy forward to the vision system at a speed of 2 m/s.The single-chip microcomputer counts the encoder (1500 pulses) per second and triggers the camera at intervals of 500 pulses, so that the vision system transmits the three images of candies collected by the camera to the Jetson Xavier per second through the Gigabit Ethernet port, and the computer runs the deployed network model to identify the images, which requires the fps of network model should be greater than 3 including the Alexnet-based model and the MnasNet0_5 model in Table 2. Comparing these two models, the Alexnet-based model was used in our system.Additionally, then the computer converts the results of recognition and the information coordinate into 40 pulse state of 40 spray valves.The status of each pulse will be sent to the single-chip microcomputer through the Modbus communication protocol, and the single-chip microcomputer controls the programmable controller to open the spray valves when the defective candies reach the nozzle area.The 40 nozzles of the spray valve are located at the end of the conveyor belt, corresponding to the 40 divided areas of the conveyor belt.When the candies reach the end of the conveyor belt, the good candies will fly out and fall into the hard candies collection frame due to inertia, while the defective candies will be changed by the airflow from the upper nozzle during the flight to change the flight trajectory, and finally fall into defect hard candy collection box.So far, the system can eliminate defective hard candies and complete the sorting task.

Figure 2 .
Figure 2. Hard candies external quality defects.(a) Good: candies without defects; (b) Holey: candies with holes or pits; (c) Broken: candies with broken contours or irregular shapes; (d) Small: candies with a smaller volume than normal.

Figure 3 .
Figure 3. Main steps involved in the classification of defective hard candies.

Figure 4 .
Figure 4. Background segmentation based on the threshold_li method.(a) Two original candy images.(b) Images in the pink channel.(c) Charts of histogram.(d) Segmentation results based on the threshold_li method.

Figure 5 .
Figure 5. Flow chart of segmentation: (a) Original image.(b) Preprocessed image.(c) Result of concave point detection.(d) Result of contour segment grouping.(e) Result of ellipse fitting.

Figure 6 .
Figure 6.Typical adhesive candies, the corresponding convex hull, and the τ value.

Figure 7 .
Figure 7.The change of sensitivity and specificity when τ increases to determine the threshold of τ value for distinguishing the non-adhesive and adhesive candies.

Figure 8 .
Figure 8. Result of concave point detection.

•
A convolutional layer, a set of convolutional filters that activate image features; • A rectified linear unit layer (ReLU), an activation function; • A layer of subsampling or pooling, a form of down sampling; • A fully connected layer, which integrates the features extracted from the previous layers and outputs them to one dimension; • A softmax layer, which gives the probability of each category established in the database when classification starts.

Figure 10 .
Figure 10.The Resnet-18 network as a classification model in the framework of hard candy classification.

Figure 12 .
Figure 12.Physical map of defective hard candy intelligent sorting system, where (a) is for experiments and debugging and (b) is installed in the production line.

Table 1 .
Sample data distribution.

Table 2 .
The testing results of eight classification models.

Table 3 .
The confusion matrixes of eight classification models for the testing set, where the types of hard candies in the first row mean the predicted labels and those in the second column mean the true.

Table 4 .
The testing results of different models with the features extracted before the first fully connected layer of the Resnet-18-based model.