Smart Camera for Quality Inspection and Grading of Food Products

: Due to the increasing consumption of food products and demand for food quality and safety, most food processing facilities in the United States utilize machines to automate their processes, such as cleaning, inspection and grading, packing, storing, and shipping. Machine vision technology has been a proven solution for inspection and grading of food products since the late 1980s. The remaining challenges, especially for small to midsize facilities, include the system and operating costs, demand for high-skilled workers for complicated conﬁguration and operation and, in some cases, unsatisfactory results. This paper focuses on the development of an embedded solution with learning capability to alleviate these challenges. Three simple application cases are included to demonstrate the operation of this unique solution. Two datasets of more challenging cases were created to analyze and demonstrate the performance of our visual inspection algorithm. One dataset includes infrared images of Medjool dates of four levels of skin delamination for surface quality grading. The other one consists of grayscale images of oysters with varying shape for shape quality evaluation. Our algorithm achieved a grading accuracy of 95.0% on the date dataset and 98.6% on the oyster dataset, both easily surpassed manual grading, which constantly faces the challenges of human fatigue or other distractions. Details of the design and functions of our smart camera and our simple visual inspection algorithm are discussed in this paper. M.Z.; Formal analysis, Z.G.; Investigation, M.Z., T.S. and D.-J.L.; Methodology, M.Z. and D.-J.L.; Project administration, D.-J.L.; Resources, Z.G. and D.-J.L.; Software, M.Z.; Validation, M.Z., T.S. and D.-J.L.; Visualization, M.Z. and T.S.; Writing—original draft, M.Z. and T.S.; Writing—review and editing, Z.G. D.-J.L.


Introduction
Better product quality brings higher profit. Quality affects the costs of processing, marketing, and the pricing of food products. Quality evaluation is a labor-intensive process and constitutes a significant portion of the expense of food production. Due to decreasing labor availability and increasing labor costs, the constant hiring and training of capable workers has become a daunting task for food processing facilities. Machine vision technology has been a proven solution for inspection and grading of food products since the late 1980s. Feedback received from several small and midsize food processing facilities in the US indicated that system and operating costs, operation complexity and, in some cases, unsatisfactory results have discouraged them from embracing or upgrading to the latest developments.
Quality evaluation involves (1) grading to separate products into different quality grades and (2) sorting to detect and remove imperfect products. Human vision can perform these tasks nearly effortlessly, but the efficiency often suffers from inconsistency, fatigue, or inexperience. Many commercial grading Our visual inspection algorithm uses evolutionary learning to automatically learn and discover salient features from training images without human intervention [21]. The proposed algorithm is able to find subtle differences in training images to construct non-intuitive features that are hard to describe, even for humans. This unique ability makes it a good candidate for the visual inspection of food products when the differentiation among grades is hard to describe. Compared to the aforementioned machine learning and sophisticated but powerful deep learning approaches, the proposed algorithm does not require a large number of images or computational power for training [22]. It does not require extensive training for configuration and it is easier to change the algorithm to fit new products or new grading criteria in the factory. Its classification is faster and more efficient than most machine learning and deep neural network approaches as well.
We created two datasets as examples of quality evaluation for specialty crops and agriculture products. One has infrared images of Medjool dates with four levels of skin delamination. The other one consists of grayscale images of oysters with varying shape quality. The challenges and related work on these two representative applications are outlined in the next two subsections.

Date Skin Quality Evaluation
Because of their unique climate patterns, Southern California and Arizona are the best areas in the U.S. to grow dates [23]. Usually, dates are harvested in a short period of time in August and September. They are harvested almost all at once, regardless their maturity. Harvested dates are graded into three or four categories according to their maturity levels [8]. Mature dates are graded based on their skin quality before packing [9].
The literature review found very few good quality work that addresses date surface quality evaluation. RGB images of date fruit were used for the grading and sorting of dates and achieved 80% accuracy in the experiment [24]. A Mamdani fuzzy inference system (MFIS) was used to grade the quality of Mozafati dates [25]. A bag of features (BOF) method was used for evaluating fruit external quality [26]. Histogram and texture features were extracted from monochrome images to classify dates into soft and hard fruit using linear discriminant analysis (LDA) and artificial neural network (ANN) to obtain 84% and 77% accuracy, respectively. A simple CNN structure was used to separate healthy and defective dates and predict the ripening stage of the healthy dates [27].
The majority of work related to date processing is for date classification or recognition. CNN was used for assisting consumers to identify the variety and origination of the dates [28] and classifying dates according to their type and maturity level for robotic harvest decisions [29]. Basic image processing techniques were used for grading date maturity in HSV (Hue, Saturation and Value) color space [30]. Mixtures of handcrafted features, such as shape, color, and texture, were used to classify four [31] and seven [32] varieties of dates. Others used SVM [33] and Gaussian Mixture Models [34] to classify dates. Some of the works mentioned above focused on using handcrafted features for classifying varieties of dates, not quality evaluation. For a few of those that were developed for date quality evaluations, basic image processing techniques and handcrafted features were used. They did not address the challenges mentioned previously. They require an experienced worker to configure and operate the system and their performance is in the mid 80% range.

Oyster Shape Quality
Product quality can be determined by many criteria. For most man-made products, food or none food, quality can be evaluated as the product's dimensions, color, shape, etc. Unlike simple, geometrically shaped man-made products, agriculture and aquaculture products are organic and naturally growing objects with irregular and inconsistent shapes or surface curvature. Their quality is usually evaluated by criteria such as color, size, surface texture, and shape [10,11]. For consumer satisfaction, shape is one factor that cannot be ignored, especially for food products. Our focus of this Electronics 2020, 9,505 4 of 18 experiment is on shape evaluation for food products whose quality evaluation is more challenging than man-made products.
More than 2500 oyster processing facilities were registered and licensed in the United States to grade oysters for packing or repacking as of 22 August 2017 [35]. Sorting oysters by shape and size is an important step in getting high quality half-shell oysters to market. Restaurants prefer oysters that have a strong or thick shell, smooth shape, and deep cup filled with meat. Unfortunately, oysters are not like man-made products that are made with a uniform shape. Some are very long, and others have a depth to them, or are thin and round. Their varying shape and size are the result of growing area salinity, species, food, and tidal action. Even with the same growing conditions, oysters are never identical in shape and size.
We selected an oyster shape grading application that had the most shape variation to demonstrate the performance of our visual inspection algorithm. Currently, whole oysters are graded manually by their diameter and weight, which pretty much ignores the consumer's demand for appearance. A study conducted in 2004 showed that consumers prefer round and medium-size oysters around 2 inches in diameter [36]. Guidelines were established to describe desirable and undesirable shapes [37].
Machine vision methods were developed to estimate the volume and weight of raw oyster meat [38,39]. Not many papers in the literature reported research on shape grading specifically for whole oysters [40,41]. Machine learning methods have been successfully used for shape analysis [10,11] but none of the them used machine learning techniques that are able to automatically learn distinct features from the images and adapt for the different grading criteria the grower prefers. As discussed previously, each oyster grower or harvester has their own set of grading rules. A visual inspection algorithm that is able to learn and mimic human grading is essential to building a versatile smart camera.
As an example, we use the proposed algorithm to grade oysters into three categories. Its grading accuracy is more than adequate for commercial use.

Visual Inspection Algorithm
We aim at developing a versatile visual inspection algorithm that is easy to configure and fast to perform grading tasks for our embedded smart camera. As discussed in the introduction, more sophisticated but powerful machine learning [12][13][14][15][16] and deep learning [17][18][19][20] approaches have been successfully applied to fruit and vegetable grading. Most of them are not suitable for embedded applications because of their computational complexity. Another real challenge for CNNs is that they must be fine-tuned to get the optimal hyper-parameters in order to get the best classification result. The common practice is trial and error, which is not an easy task for people without extensive training. Our visual inspection algorithm is based on our previous evolutionary constructed features (ECO-Feature) algorithm [42,43]. The original algorithm was developed for a binary decision for mainly for object detection. It was sensitive to scale and rotation variations. The new version reported here is designed to use our new evolutionary learning method to achieve multi-class image classification and work well for minor image distortion and geometric transformations. [21,22]. It uses an evolution process to automatically select features that are most suitable for classification. Boosting techniques are used to construct features without human intervention. Figure 1 shows the entire training process which includes evolutionary learning to obtain candidate features and AdaBoost training for final feature construction. The evolutionary learning process selects the candidate features that will most contribute to classification. The AdaBoost training selects and combines candidate features to construct the final features and trains a strong classifier for classification. Section 2.2 introduces the composition of feature. Section 2.3 discusses the evolutionary learning process in detail. AdaBoost training is discussed in Section 2.4.

Feature Composition
The image transforms included in our algorithm for selection were basic convolution-based image transforms. The original version included twenty-four image transforms [42,43]. To our surprise, always the same handful of basic transforms were chosen by the algorithm while achieving very good performance on six datasets with different image types (RGB, grayscale, X-ray, SAR, IT). Including a large number of transforms would require much more time for training, but, based on the results from our experiments, roughly two thirds of them were never or rarely chosen. We focused on learning efficiency in this simplified and improved version for visual inspection. Those image transforms that were rarely or never selected in our previous studies were removed from our toolbox.
For computational efficiency, we retained six simple transforms that are convolution-based and are most often selected by the evolution process. Simplicity and easy implementation made them good candidates for embedded applications. They were Gabor (two wavelengths and four orientations), Gaussian (kernel size), Laplacian (kernel size), Median Blur (kernel size), Sobel (depth, kernel size, x order, y order), and Gradient (kernel size). These image transforms provided the system with the basic tools for feature extraction and helped speed up the evolutionary learning process. Figure 2 shows two example features from the evolutionary learning process. The output of one transform is input to the subsequent transform. The output of the last transform of the feature is used for classification.

Feature Formation
Features for classification are formed by the evolutionary learning process. Each feature consists of a number of image transforms selected from our pool of six basic image transforms. These transforms operate in series. The number, type, and order of these transforms and their respective parameters are determined by the evolutionary learning process. As a result of the evolution process, it is very likely that any one transform could be used more than once in a constructed feature, but

Feature Composition
The image transforms included in our algorithm for selection were basic convolution-based image transforms. The original version included twenty-four image transforms [42,43]. To our surprise, always the same handful of basic transforms were chosen by the algorithm while achieving very good performance on six datasets with different image types (RGB, grayscale, X-ray, SAR, IT). Including a large number of transforms would require much more time for training, but, based on the results from our experiments, roughly two thirds of them were never or rarely chosen. We focused on learning efficiency in this simplified and improved version for visual inspection. Those image transforms that were rarely or never selected in our previous studies were removed from our toolbox.
For computational efficiency, we retained six simple transforms that are convolution-based and are most often selected by the evolution process. Simplicity and easy implementation made them good candidates for embedded applications. They were Gabor (two wavelengths and four orientations), Gaussian (kernel size), Laplacian (kernel size), Median Blur (kernel size), Sobel (depth, kernel size, x order, y order), and Gradient (kernel size). These image transforms provided the system with the basic tools for feature extraction and helped speed up the evolutionary learning process. Figure 2 shows two example features from the evolutionary learning process. The output of one transform is input to the subsequent transform. The output of the last transform of the feature is used for classification.

Feature Composition
The image transforms included in our algorithm for selection were basic convolution-based image transforms. The original version included twenty-four image transforms [42,43]. To our surprise, always the same handful of basic transforms were chosen by the algorithm while achieving very good performance on six datasets with different image types (RGB, grayscale, X-ray, SAR, IT). Including a large number of transforms would require much more time for training, but, based on the results from our experiments, roughly two thirds of them were never or rarely chosen. We focused on learning efficiency in this simplified and improved version for visual inspection. Those image transforms that were rarely or never selected in our previous studies were removed from our toolbox.
For computational efficiency, we retained six simple transforms that are convolution-based and are most often selected by the evolution process. Simplicity and easy implementation made them good candidates for embedded applications. They were Gabor (two wavelengths and four orientations), Gaussian (kernel size), Laplacian (kernel size), Median Blur (kernel size), Sobel (depth, kernel size, x order, y order), and Gradient (kernel size). These image transforms provided the system with the basic tools for feature extraction and helped speed up the evolutionary learning process. Figure 2 shows two example features from the evolutionary learning process. The output of one transform is input to the subsequent transform. The output of the last transform of the feature is used for classification.

Feature Formation
Features for classification are formed by the evolutionary learning process. Each feature consists of a number of image transforms selected from our pool of six basic image transforms. These transforms operate in series. The number, type, and order of these transforms and their respective parameters are determined by the evolutionary learning process. As a result of the evolution process, it is very likely that any one transform could be used more than once in a constructed feature, but  are determined by the evolutionary learning process. As a result of the evolution process, it is very likely that any one transform could be used more than once in a constructed feature, but with different parameters. From our experiments, two to eight transforms per feature seemed to be sufficient for high classification accuracy and computational efficiency.
The evolution process starts by generating a population of phenotypes for features. A portion of the population is selected by a tournament selection method to create a generation. Each feature in the population is evaluated by a fitness function to calculate a fitness score. This feature evaluation process will be discussed in the next section.
Pairs of parent features are randomly selected to generate new features through crossover and mutation. Figure 2 shows an example of crossover and mutation. Crossover is done by taking a part of the transforms from one parent feature and combining it with a part of the transforms from the other parent feature. Mutation is done by replacing a transform or altering the parameters of the transforms. The children features generated from this process form a new generation of features and all bear some resemblance to their parents. As shown in Figure 3, they are likely to have different lengths and/or parameters. This process is carried out for several generations. In our experiments shown in Section 3, it took only around 10 to 15 generations to obtain features with high fitness scores, which, in turn, generate the best grading results. The evolution is terminated when a satisfactory fitness score is reached, or the best fitness score remains stable for several iterations. with different parameters. From our experiments, two to eight transforms per feature seemed to be sufficient for high classification accuracy and computational efficiency. The evolution process starts by generating a population of phenotypes for features. A portion of the population is selected by a tournament selection method to create a generation. Each feature in the population is evaluated by a fitness function to calculate a fitness score. This feature evaluation process will be discussed in the next section.
Pairs of parent features are randomly selected to generate new features through crossover and mutation. Figure 2 shows an example of crossover and mutation. Crossover is done by taking a part of the transforms from one parent feature and combining it with a part of the transforms from the other parent feature. Mutation is done by replacing a transform or altering the parameters of the transforms. The children features generated from this process form a new generation of features and all bear some resemblance to their parents. As shown in Figure 3, they are likely to have different lengths and/or parameters. This process is carried out for several generations. In our experiments shown in Section 3, it took only around 10 to 15 generations to obtain features with high fitness scores, which, in turn, generate the best grading results. The evolution is terminated when a satisfactory fitness score is reached, or the best fitness score remains stable for several iterations.

Feature Evaluation
As mentioned in the previous section, features forms from the evolution process must be evaluated to determine if they have a high fitness score and can truly contribute to classification. Those that have a high fitness score can be included in the population pool for generating new generations. We designed a fitness function to compute a fitness score for each feature for each evolution. Since the evolution process, as shown in our experiments, would take 10 to 15 generation to obtain a satisfactory result, the fitness function must be efficient in order to shorten the training time. Also, the classifier used for fitness score calculation must be for classification of multiple classes for applications require more than just a good or bad decision.
The weak classifier for each feature was trained using the training images for fitness score calculation. The fitness score for each feature is simply calculated as the classification accuracy on the validation images using a simple classifier. We chose the random forest classifier for its efficiency. It was ranked as a top classifier according to the result of the experiments for evaluating a total of 179 classifiers [44]. It is more popular than other classifiers, such as support vector machine (SVW), because of its high computational efficiency [45]. It meets our requirements for being a multi-class classifier and for high processing speed [46].

Feature Selection
The evolutionary learning process outputs one best feature at the end of each evolution run. A large number of features are selected from the evolutionary learning process as the candidate features

Feature Evaluation
As mentioned in the previous section, features forms from the evolution process must be evaluated to determine if they have a high fitness score and can truly contribute to classification. Those that have a high fitness score can be included in the population pool for generating new generations. We designed a fitness function to compute a fitness score for each feature for each evolution. Since the evolution process, as shown in our experiments, would take 10 to 15 generation to obtain a satisfactory result, the fitness function must be efficient in order to shorten the training time. Also, the classifier used for fitness score calculation must be for classification of multiple classes for applications require more than just a good or bad decision.
The weak classifier for each feature was trained using the training images for fitness score calculation. The fitness score for each feature is simply calculated as the classification accuracy on the validation images using a simple classifier. We chose the random forest classifier for its efficiency. It was ranked as a top classifier according to the result of the experiments for evaluating a total of 179 classifiers [44]. It is more popular than other classifiers, such as support vector machine (SVW), because of its high computational efficiency [45]. It meets our requirements for being a multi-class classifier and for high processing speed [46].

Feature Selection
The evolutionary learning process outputs one best feature at the end of each evolution run. A large number of features are selected from the evolutionary learning process as the candidate features because the exact number needed to obtain good classification result is hard to predict. We used a boosting technique to maintain high classification performance.
Freund et. al. proposed a popular boosting algorithm using AdaBoost [47]. It constructs a classifier by combining a number of weak classifiers. A novel multi-class boosting algorithm called Stagewise Additive Modeling using a Multi-class Exponential loss function (SAMME) was proposed [48]. It extended the AdaBoost algorithm to be multi-class. In our algorithm, we used the SAMME classifier to iteratively build an ensemble of multi-class classifiers. We increased the weight of each training sample that is misclassified by the associated weak classifier and decreased it if the sample is correctly classified. This reweighting strategy forced the trained strong classifier to more likely correct its mistakes in the subsequent iterations. The training result is a SAMME model that includes a number of weaker classifiers and their associated weights that represent how much each of them can be trusted for classification. The weighted sum of the outputs from these weak classifiers is used as the final classification result. Figure 4 shows the prototype of our smart camera running the proposed visual inspection algorithm. All electronics components, camera, and optics reside in an IP-64 enclosure (splashing of water and dust tight) and connected to other systems through two circular seal connectors. The prototype uses the Nvidia TX1 module (Nvidia Corp., Santa Clara, CA, USA) where the complex computing, user interface and file system all reside. The Arduino Due microcontroller is used mostly for processing encoder signals and ejector output coordination. FLIR Cameleon-3 USB color camera with 1280 × 1024 pixels and running at 149 frames per second is included to capture RGB images. The Edmund Optics 1296 × 964-pixel NIR CCD camera is an excellent option for capturing near infrared images in the spectrum range of 1500~1600 nm.  [47]. It constructs a classifier by combining a number of weak classifiers. A novel multi-class boosting algorithm called Stagewise Additive Modeling using a Multi-class Exponential loss function (SAMME) was proposed [48]. It extended the AdaBoost algorithm to be multi-class. In our algorithm, we used the SAMME classifier to iteratively build an ensemble of multi-class classifiers. We increased the weight of each training sample that is misclassified by the associated weak classifier and decreased it if the sample is correctly classified. This reweighting strategy forced the trained strong classifier to more likely correct its mistakes in the subsequent iterations. The training result is a SAMME model that includes a number of weaker classifiers and their associated weights that represent how much each of them can be trusted for classification. The weighted sum of the outputs from these weak classifiers is used as the final classification result. Figure 4 shows the prototype of our smart camera running the proposed visual inspection algorithm. All electronics components, camera, and optics reside in an IP-64 enclosure (splashing of water and dust tight) and connected to other systems through two circular seal connectors. The prototype uses the Nvidia TX1 module (Nvidia Corp., Santa Clara, CA, USA) where the complex computing, user interface and file system all reside. The Arduino Due microcontroller is used mostly for processing encoder signals and ejector output coordination. FLIR Cameleon-3 USB color camera with 1280×1024 pixels and running at 149 frames per second is included to capture RGB images. The Edmund Optics 1,296×964-pixel NIR CCD camera is an excellent option for capturing near infrared images in the spectrum range of 1500~1600 nm.

Functions
Our software architecture focuses on a user-friendly experience that does not require expert knowledge while providing the most accurate visual inspection results. The main responsibilities of the system include: • Allowing the user to easily configure the camera settings; • Saving camera setting for future use; • Capturing images from real factory conditions; • Allowing the user to label captured images; • Preparing labeled data for training; • Loading trained models; • Receiving signals from the conveyor belt; • Allowing the user to easily calibrate the sorting outputs; • Classifying objects as they pass under the camera; • Controlling signals to appropriately sort objects on the conveyor belt.

Functions
Our software architecture focuses on a user-friendly experience that does not require expert knowledge while providing the most accurate visual inspection results. The main responsibilities of the system include: • Allowing the user to easily configure the camera settings; Controlling signals to appropriately sort objects on the conveyor belt. Figure 5 shows the different software modules, which hardware module they reside in and how they relate to one another. Details of each software module are briefly outlined below.
Electronics 2020, 8, x FOR PEER REVIEW 8 of 18 Figure 5 shows the different software modules, which hardware module they reside in and how they relate to one another. Details of each software module are briefly outlined below.

Operation
Three different products were used to test our smart camera. As shown in Figure 6, a conveyor encoder and an air ejector system were connected to the camera. Objects were sorted into three categories, one of which passed through to the end of the belt while the other two were ejected using the air ejectors. These three simple cases were used to test the hardware in the smart camera and all software functions, including training and real-time sorting. A video of these demos is submitted with this paper as a Supplementary Material. We used our camera and software to capture images for training. Matchsticks were classified as straight, crooked or broken. Goldfish snacks were classified as either straight, crooked or broken. Pretzels were either whole, broken with two loops still intact or broken with only one loop still intact. We captured 30 images for each of the three classes for each of the three test cases. Sample images of matchsticks, goldfish snacks, and pretzels for training are shown in Figure 7.
Our visual inspection algorithm is optimized by using the image transforms listed in Section 2.2 that focus on the most important aspects of object images. This allows us to achieve similar results as a desktop computer with only an embedded device. These optimizations also allow the system to be

Operation
Three different products were used to test our smart camera. As shown in Figure 6, a conveyor encoder and an air ejector system were connected to the camera. Objects were sorted into three categories, one of which passed through to the end of the belt while the other two were ejected using the air ejectors. These three simple cases were used to test the hardware in the smart camera and all software functions, including training and real-time sorting. A video of these demos is submitted with this paper as a Supplementary Material.  Figure 5 shows the different software modules, which hardware module they reside in and how they relate to one another. Details of each software module are briefly outlined below.

Operation
Three different products were used to test our smart camera. As shown in Figure 6, a conveyor encoder and an air ejector system were connected to the camera. Objects were sorted into three categories, one of which passed through to the end of the belt while the other two were ejected using the air ejectors. These three simple cases were used to test the hardware in the smart camera and all software functions, including training and real-time sorting. A video of these demos is submitted with this paper as a Supplementary Material. We used our camera and software to capture images for training. Matchsticks were classified as straight, crooked or broken. Goldfish snacks were classified as either straight, crooked or broken. Pretzels were either whole, broken with two loops still intact or broken with only one loop still intact. We captured 30 images for each of the three classes for each of the three test cases. Sample images of matchsticks, goldfish snacks, and pretzels for training are shown in Figure 7.
Our visual inspection algorithm is optimized by using the image transforms listed in Section 2.2 that focus on the most important aspects of object images. This allows us to achieve similar results as a desktop computer with only an embedded device. These optimizations also allow the system to be We used our camera and software to capture images for training. Matchsticks were classified as straight, crooked or broken. Goldfish snacks were classified as either straight, crooked or broken. Pretzels were either whole, broken with two loops still intact or broken with only one loop still intact. We captured 30 images for each of the three classes for each of the three test cases. Sample images of matchsticks, goldfish snacks, and pretzels for training are shown in Figure 7.
Our visual inspection algorithm is optimized by using the image transforms listed in Section 2.2 that focus on the most important aspects of object images. This allows us to achieve similar results as a desktop computer with only an embedded device. These optimizations also allow the system to be trained on the device itself, instead of offline using a desktop computer. We ran our camera and the conveyor belt to perform live tests, from image acquisition to realtime product ejection. We ran roughly 20 samples for each class, and our smart camera was able to sort the products with 100% accuracy. The configuration of camera and ejector timing control were also quick and easy. Products were ejected in real time at their designated locations. A video was included with the paper submission and will also be made available online.

Experiments, Results, and Discussions
Besides the three simple test cases shown in the previous section, we created two datasets to test our visual inspection algorithm for specialty crops and aquaculture products. The first dataset includes infrared images of Medjool dates with four levels of skin delamination. The second dataset includes images of oysters with four different shape categories.
Trainings for both datasets were performed on a desktop computer. We limited training iterations up to 80 iterations. We performed training multiple times and obtained 30 features each time. As shown in later sections, it took the algorithm slightly over 60 iterations for the date dataset and less than 10 iterations for the oyster dataset to reach the steady state performance. After the features were learned, the classification can be performed with the strong classifier. We also used an embedded system (Ordroid-XU4) equipped with a Cortex-A15 2GHz processor to test the classification speed. The classification time depends on the number of features used. For 30 features, it took approximately 10 milliseconds for each classification or 100 frames per second.
The three example cases reported in Section 2.5.3 or even the two real-world applications discussed in this section may be viewed as trivial. As discussed in the introduction section, unlike general object recognition that includes a large number of classes, visual inspection applications usually have two (good or bad) or a very small number of classes. Unlike outdoor robot vision applications, they usually have consistent lighting and uniform image background. All these unique situations make it a much more manageable task, make our simple algorithm a viable solution for embedded applications. The three example cases and the following two test cases clearly demonstrated the versatility of the proposed visual inspection algorithm.

Date Skin Quality
Dates are sorted into four grades in Southern California and Arizona, USA according to the criteria shown in Table 1 [7]. With proper calibration and segmentation, the size measurement in terms of the length of the date can be measured with high accuracy since the contrast between the background and fruit is fairly high. In this work, the focus was on testing our visual inspection We ran our camera and the conveyor belt to perform live tests, from image acquisition to real-time product ejection. We ran roughly 20 samples for each class, and our smart camera was able to sort the products with 100% accuracy. The configuration of camera and ejector timing control were also quick and easy. Products were ejected in real time at their designated locations. A video was included with the paper submission and will also be made available online.

Experiments, Results, and Discussions
Besides the three simple test cases shown in the previous section, we created two datasets to test our visual inspection algorithm for specialty crops and aquaculture products. The first dataset includes infrared images of Medjool dates with four levels of skin delamination. The second dataset includes images of oysters with four different shape categories.
Trainings for both datasets were performed on a desktop computer. We limited training iterations up to 80 iterations. We performed training multiple times and obtained 30 features each time. As shown in later sections, it took the algorithm slightly over 60 iterations for the date dataset and less than 10 iterations for the oyster dataset to reach the steady state performance. After the features were learned, the classification can be performed with the strong classifier. We also used an embedded system (Ordroid-XU4) equipped with a Cortex-A15 2GHz processor to test the classification speed. The classification time depends on the number of features used. For 30 features, it took approximately 10 milliseconds for each classification or 100 frames per second.
The three example cases reported in Section 2.5.3 or even the two real-world applications discussed in this section may be viewed as trivial. As discussed in the introduction section, unlike general object recognition that includes a large number of classes, visual inspection applications usually have two (good or bad) or a very small number of classes. Unlike outdoor robot vision applications, they usually have consistent lighting and uniform image background. All these unique situations make it a much more manageable task, make our simple algorithm a viable solution for embedded applications. The three example cases and the following two test cases clearly demonstrated the versatility of the proposed visual inspection algorithm.

Date Skin Quality
Dates are sorted into four grades in Southern California and Arizona, USA according to the criteria shown in Table 1 [7]. With proper calibration and segmentation, the size measurement in terms of the length of the date can be measured with high accuracy since the contrast between the background and fruit is fairly high. In this work, the focus was on testing our visual inspection algorithm on skin quality evaluation, not size measurement. According to Table 1, we labeled four classes of skin quality as Large (<10% skin delamination), Extra Fancy (10%~25%), Fancy (25%~40%), and Confection (>40%) for our experiments.  Figure 8a shows sample images of these four classes from this dataset. All images in the dataset were captured on the blue plastic chains used on a sorting machine as shown in Figure 8b. The dates were singulated and aligned with the chain to allow very minor rotation. The blue plastic background is very bright in the infrared image, whereas the delaminated or dry date skin is slightly darker, and the moist date is the darkest.  Figure 8a shows sample images of these four classes from this dataset. All images in the dataset were captured on the blue plastic chains used on a sorting machine as shown in Figure  8b. The dates were singulated and aligned with the chain to allow very minor rotation. The blue plastic background is very bright in the infrared image, whereas the delaminated or dry date skin is slightly darker, and the moist date is the darkest.

Grade
Descriptions Jumbo 2.0" or longer with less than 10% skin delamination Large 1.5-2.0" with less than 10% skin delamination Extra fancy 1.5" or longer with 10%-25% skin delamination Fancy 1.5" or longer with 25%-40% skin delamination Mini 1.0-1.5" with no more than 25% skin delamination Confection 1.0" or longer with more than 40% skin delamination We recognize that many image processing techniques can be used to extract hand-crafted features to solve this problem with high accuracies around 86% [26], and 95%~98% [9]. This study was for demonstrating the versatility of our visual inspection and its advantages of not depending on hand-crafted features, requiring a very small number of images for training, fast and accurate grading, and ease of training and operation.

Performance
The proposed visual inspection algorithm successfully graded the test images into four grades according to their skin delamination. An overall accuracy of 95% was achieved and easily surpassed human grader's average accuracy of 75%~85% [9]. The confusion matrix of this experiment is shown in Table 2. The error change during training is shown in Figure 9. The error rate dropped as the training went on. It finally reached a steady state of error rate at approximately 5% after 60 iterations of learning. We recognize that many image processing techniques can be used to extract hand-crafted features to solve this problem with high accuracies around 86% [26], and 95%~98% [9]. This study was for demonstrating the versatility of our visual inspection and its advantages of not depending on hand-crafted features, requiring a very small number of images for training, fast and accurate grading, and ease of training and operation.

Performance
The proposed visual inspection algorithm successfully graded the test images into four grades according to their skin delamination. An overall accuracy of 95% was achieved and easily surpassed human grader's average accuracy of 75%~85% [9]. The confusion matrix of this experiment is shown in Table 2. The error change during training is shown in Figure 9. The error rate dropped as the training went on. It finally reached a steady state of error rate at approximately 5% after 60 iterations of learning.
Of the 12 misclassifications out of 240 test samples, 10 occurred between neighboring classes and only two were misclassified by twp grades. For example, one date from the Fancy class was misclassified as Confection, which is only one grade lower than Fancy, and three dates from the Confection class were misclassified as Fancy, which is only one grade higher than Fancy. Grading dates with borderline grades is often subjective and is acceptable, especially for such a low percentage (4.2% of dates were misclassified by one grade). Accepting those borderline samples as correct grades, our visual inspection algorithm correctly graded 238 of 240 samples, an impressive 99.2% accuracy. Of the 12 misclassifications out of 240 test samples, 10 occurred between neighboring classes and only two were misclassified by twp grades. For example, one date from the Fancy class was misclassified as Confection, which is only one grade lower than Fancy, and three dates from the Confection class were misclassified as Fancy, which is only one grade higher than Fancy. Grading dates with borderline grades is often subjective and is acceptable, especially for such a low percentage (4.2% of dates were misclassified by one grade). Accepting those borderline samples as correct grades, our visual inspection algorithm correctly graded 238 of 240 samples, an impressive 99.2% accuracy.

Visualization of Date Features
Since the features for classification were learned automatically by the evolutionary learning algorithm, it was unclear what features the algorithm actually used to obtain such a good performance. As discussed previously, our features consist of a series of simple image transforms and the output of one transform is the input of the next transform. In order to further analyze what features are learned, we selected two learned features and displayed their transform outputs. Because every training image is slightly different, we averaged all outputs of each image transform into a learned feature for each training image for visualization The two selected features are shown in Figure 10. There are four rows in this figure. Each of them is for one grade as marked (Large, Extra Fancy, Fancy, Confection). The first feature selected and shown in Figure 10a included six transforms. They were Median Blur, Laplacian, Gradient, Gaussian, Sobel, and Gabor transforms. The second feature selected and shown in Figure 10b included three transforms. They were Gradient, Sobel, and Gabor transforms. White pixels mean they are more common in the output among all training images.

Visualization of Date Features
Since the features for classification were learned automatically by the evolutionary learning algorithm, it was unclear what features the algorithm actually used to obtain such a good performance. As discussed previously, our features consist of a series of simple image transforms and the output of one transform is the input of the next transform. In order to further analyze what features are learned, we selected two learned features and displayed their transform outputs. Because every training image is slightly different, we averaged all outputs of each image transform into a learned feature for each training image for visualization The two selected features are shown in Figure 10. There are four rows in this figure. Each of them is for one grade as marked (Large, Extra Fancy, Fancy, Confection). The first feature selected and shown in Figure 10a included six transforms. They were Median Blur, Laplacian, Gradient, Gaussian, Sobel, and Gabor transforms. The second feature selected and shown in Figure 10b included three transforms. They were Gradient, Sobel, and Gabor transforms. White pixels mean they are more common in the output among all training images.
The first feature emphasizes more on the texture of the dates. The second feature emphasizes both the shape and texture information. Shape and texture information is the key information used in the infrared date classification in our experiments and, with these learned features, our model obtained near perfect classification performance.

Oyster Shape Quality
We collected 300 oysters with varying shape for this work. We had an experienced grader grade them into three categories: Banana, Irregular, and Good. The Good oysters could be further graded into large, medium, and small based on their estimated diameter. Size can be easily measured by counting the number of pixels of the oyster or fitting a circle or an ellipse to estimate the diameter. We focused on shape grading in this experiment. Broken shells were combined with Irregulars, as they should both be considered the lowest quality. Figure 11 shows an example of each of the three final categories. A sample of Broken shape is also included. We collected 50 pieces of Banana, 100 pieces of Irregular, and 150 pieces of Good shape. The original image size was 640×480. Images were rotated to have the major axis aligned horizontally.

Oyster Shape Quality
We collected 300 oysters with varying shape for this work. We had an experienced grader grade them into three categories: Banana, Irregular, and Good. The Good oysters could be further graded into large, medium, and small based on their estimated diameter. Size can be easily measured by counting the number of pixels of the oyster or fitting a circle or an ellipse to estimate the diameter. We focused on shape grading in this experiment. Broken shells were combined with Irregulars, as they should both be considered the lowest quality. Figure 11 shows an example of each of the three final categories. A sample of Broken shape is also included. We collected 50 pieces of Banana, 100 pieces of Irregular, and 150 pieces of Good shape. The original image size was 640 × 480. Images were rotated to have the major axis aligned horizontally. Oyster shape grading is subjective. Banana shape could be hard to be separated from Irregular, and some Broken shapes could appear very close to Good. We had an experienced grader select 38 pieces of Banana shape, 75 pieces of Irregular shape, and 113 pieces of Good shape to generate a training set. These training samples in this training set are the least ambiguous ones among the samples in its category. Our test set was also created by experienced graders. Our algorithm was trained and tested with these human-graded samples, which, by design, guides it to perform shape grading that satisfies human preference.

Performance
The proposed visual inspection algorithm successfully graded the test images into three grades according to their shape quality. An overall accuracy of 98.6% was achieved and easily surpassed human grader's average accuracy of 75%~85%. The confusion matrix of this experiment is shown in Table 3. All the Banana and Irregular oysters were correctly classified. There was one Good oyster that was misclassified as Irregular. The error rate change during training is shown in Figure 12. The error rate dropped slowly as the training went on and reached a steady state of error rate at 1.4% after 8 iterations of learning.   Oyster shape grading is subjective. Banana shape could be hard to be separated from Irregular, and some Broken shapes could appear very close to Good. We had an experienced grader select 38 pieces of Banana shape, 75 pieces of Irregular shape, and 113 pieces of Good shape to generate a training set. These training samples in this training set are the least ambiguous ones among the samples in its category. Our test set was also created by experienced graders. Our algorithm was trained and tested with these human-graded samples, which, by design, guides it to perform shape grading that satisfies human preference.

Performance
The proposed visual inspection algorithm successfully graded the test images into three grades according to their shape quality. An overall accuracy of 98.6% was achieved and easily surpassed human grader's average accuracy of 75%~85%. The confusion matrix of this experiment is shown in Table 3. All the Banana and Irregular oysters were correctly classified. There was one Good oyster that was misclassified as Irregular. The error rate change during training is shown in Figure 12. The error rate dropped slowly as the training went on and reached a steady state of error rate at 1.4% after 8 iterations of learning.

Predicted Labels
We also analyzed the performance of learned features during the evolutionary learning process and through multiple generations. Figure 13 shows the statistics of the fitness scores for the entire population in each iteration. The feature with the highest fitness score in each iteration was selected for classification. The maximum fitness score reached 100 (a feature provides perfect classification result) after around seven learning iterations.    We also analyzed the performance of learned features during the evolutionary learning process and through multiple generations. Figure 13 shows the statistics of the fitness scores for the entire population in each iteration. The feature with the highest fitness score in each iteration was selected for classification. The maximum fitness score reached 100 (a feature provides perfect classification result) after around seven learning iterations.  Figure 14 shows the process of five different features that were evolved through generations. Our evolutionary learning process is able to learn good features through evolution. Features f1 and f2 took 10 generations to reach their highest fitness scores. Feature 3 was selected, but stopped at around 40%. Features 4 and 5 reached their highest fitness scores after six generations. Fitness score evolves differently but only the best ones are selected for classification.

Visualization of Oyster Features
The most straightforward way to interpret the features our evolutionary learning algorithm constructs is to show the image transform output of each class. The output of each image transform in the feature for all training images were averaged and normalized from zero to 255 in order to be viewed as images. White pixels mean they are more common in the output images among all training images. As examples, two features learned from the oyster dataset are shown in Figure 15. There are three categories of oysters in the dataset, including Good, Banana, and Irregular.  Figure 14 shows the process of five different features that were evolved through generations. Our evolutionary learning process is able to learn good features through evolution. Features f1 and f2 took 10 generations to reach their highest fitness scores. Feature 3 was selected, but stopped at around 40%. Features 4 and 5 reached their highest fitness scores after six generations. Fitness score evolves differently but only the best ones are selected for classification. We also analyzed the performance of learned features during the evolutionary learning process and through multiple generations. Figure 13 shows the statistics of the fitness scores for the entire population in each iteration. The feature with the highest fitness score in each iteration was selected for classification. The maximum fitness score reached 100 (a feature provides perfect classification result) after around seven learning iterations.  Figure 14 shows the process of five different features that were evolved through generations. Our evolutionary learning process is able to learn good features through evolution. Features f1 and f2 took 10 generations to reach their highest fitness scores. Feature 3 was selected, but stopped at around 40%. Features 4 and 5 reached their highest fitness scores after six generations. Fitness score evolves differently but only the best ones are selected for classification.

Visualization of Oyster Features
The most straightforward way to interpret the features our evolutionary learning algorithm constructs is to show the image transform output of each class. The output of each image transform in the feature for all training images were averaged and normalized from zero to 255 in order to be viewed as images. White pixels mean they are more common in the output images among all training images. As examples, two features learned from the oyster dataset are shown in Figure 15. There are three categories of oysters in the dataset, including Good, Banana, and Irregular.

Visualization of Oyster Features
The most straightforward way to interpret the features our evolutionary learning algorithm constructs is to show the image transform output of each class. The output of each image transform in the feature for all training images were averaged and normalized from zero to 255 in order to be viewed as images. White pixels mean they are more common in the output images among all training images. As examples, two features learned from the oyster dataset are shown in Figure 15. There are three categories of oysters in the dataset, including Good, Banana, and Irregular.
There are three rows in Figure 15. Each of them is for one grade, as marked (Good, Banana, Irregular). The first feature selected and shown in Figure 15a included four transforms. They were Sobel, Gabor, Gabor, and Gradient transforms. The second feature selected and shown in Figure 15b included three transforms. They were Gabor, Gradient, and Laplacian transforms.
Both features in Figure 15 show that shape information is the most important piece of information being extracted by the evolutionary learning algorithm. The first feature in Figure 15a emphasizes both the shape and texture information. The feature in Figure 15b emphasizes the shape of the oysters more. Both features in Figure 15 show that shape information is the most important piece of information being extracted by the evolutionary learning algorithm. The first feature in Figure 15a emphasizes both the shape and texture information. The feature in Figure 15b emphasizes the shape of the oysters more.

Conclusions
Vision based computer systems have been around for decades. Vision technology is a proven solution for food grading and inspection since the late 1980s. We have briefly discussed the latest developments in machine vision systems and reviewed sophisticated and more powerful machine learning and deep-learning methods that can easily perform the same visual inspection tasks and with impressive result. As powerful as the popular convolutional neural network approaches are, they often must be fine-tuned to get the optimal hyper-parameters in order to get the best classification results.

Conclusions
Vision based computer systems have been around for decades. Vision technology is a proven solution for food grading and inspection since the late 1980s. We have briefly discussed the latest developments in machine vision systems and reviewed sophisticated and more powerful machine learning and deep-learning methods that can easily perform the same visual inspection tasks and with impressive result. As powerful as the popular convolutional neural network approaches are, they often must be fine-tuned to get the optimal hyper-parameters in order to get the best classification results.
Most facilities we have come in contact with have either the old optical sensor-based systems or bulky computer-based vision systems, neither of which are flexible or user friendly. Most importantly, those systems cannot be easily adapted to new challenges in the increasing demands for food quality and safety. Surprisingly, a compact smart camera that is versatile and capable of "learning" is not yet offered by others. What we have presented in this paper is a niche embedded visual inspection solution for food processing facilities. We have reported our design of an embedded vision system for visual inspection of food products. We have shown impressive results for three simple test cases and two real-world applications for food products.
Our evolutionary learning process was developed for simplicity [21,22] and for visual inspection of food products. It is not only capable of automatically learning unique information from training images, but also improving its performance through the use of boosting techniques. Its simplicity and computational efficiency make it suitable for real-time embedded vision applications. Unlike other robot vision applications, visual inspection for factory automation usually operates indoor and under controlled lighting, especially when using the LED lights with regulated voltage. This is another reason our simple visual inspection algorithm can work well.
We performed the training multiple times for our date and oyster datasets on a desktop computer using our training images for up to 80 iterations to obtain between 30 to 50 features. We used the learned features on our smart camera equipped with a Cortex-A57 processor. The processing time was approximately 10 milliseconds per prediction or 100 frames per second. Our algorithm was proved to be efficient for very high frame rates even with a small ARM-processor.