U-Net-Based Foreign Object Detection Method Using Effective Image Acquisition System: A Case of Almond and Green Onion Flake Food Process

: Supervised deep learning-based foreign object detection algorithms are tedious, costly, and time-consuming because they usually require a large number of training datasets and annotations. These disadvantages make them frequently unsuitable for food quality evaluation and food manufacturing processes. However, the deep learning-based foreign object detection algorithm is an effective method to overcome the disadvantages of conventional foreign object detection methods mainly used in food inspection. For example, color sorter machines cannot detect foreign objects with a color similar to food, and the performance is easily degraded by changes in illuminance. Therefore, to detect foreign objects, we use a deep learning-based foreign object detection algorithm (model). In this paper, we present a synthetic method to efﬁciently acquire a training dataset of deep learning that can be used for food quality evaluation and food manufacturing processes. Moreover, we perform data augmentation using color jitter on a synthetic dataset and show that this approach signiﬁcantly improves the illumination invariance features of the model trained on synthetic datasets. The F1-score of the model that trained the synthetic dataset of almonds at 360 lux illumination intensity achieved a performance of 0.82, similar to the F1-score of the model that trained the real dataset. Moreover, the F1-score of the model trained with the real dataset combined with the synthetic dataset achieved better performance than the model trained with the real dataset in the change of illumination. In addition, compared with the traditional method of using color sorter machines to detect foreign objects, the model trained on the synthetic dataset has obvious advantages in accuracy and efﬁciency. These results indicate that the synthetic dataset not only competes with the real dataset, but they also complement each other.


Introduction
Foreign objects contained in raw materials of food (RMF) not only can be disgusting to consumers but also can have a negative effect on health. With the increase in the consumption of processed food, consumer complaints about foreign objects mixed with food are also increasing. This reduces weakened consumer satisfaction and causes various types of boycotts [1][2][3]. To tackle this problem, a large number of screening personnel are employed to ensure quality production manually. However, most of these manual inspections are slow and inefficient and have a low rate of foreign object detection [4]. To replace manual inspection, many food companies and laboratories in different countries have conducted various studies on foreign object detection using computer vision [5][6][7]. Figure 1 shows various methods for foreign object detection. The conventional foreign object detection method (FODM) is manual detection by humans during food inspection of green onion flakes (GOF), as shown in Figure 1a. Computer vision technology is used to assist humans in detecting foreign objects in the food inspection of GOF, as shown in Figure 1b. Computer vision technology detects foreign objects in moving almonds, as shown in Figure 1c. Both Figure 1b,c are inference stages, not training stages. Foreign objects consist of various types such as insects, wood debris, plants, paper scraps, metal parts, and plastic scraps, as shown in Figure 1d. Computer vision technology is one of the best alternatives to replace the human eye [8][9][10]. Many researchers have proposed various image processing methods to detect foreign objects. However, they have mainly used methods to learn the features of objects manually (handcrafted features). Handcrafted features are obtained from classifying each object in an image belonging to a certain class to extract features of each object directly [11]. Feature extraction is defined as a set of features (e.g., color features, shape features, texture features), and classification is ordering objects into groups based on similarities and differences [12]. Most detection of foreign objects adopts color sorting machines based on computer vision [13]. The color sorter machines mainly use the principle of the color difference between RMF and foreign objects. In addition, these machines focus on detecting the color of an object while ignoring its shape or texture. Therefore, the color sorting machines significantly suffer from detection failures for foreign objects with a similar color [14]. Moreover, they have a disadvantage in that the performance of FODM drops sharply due to changes in illuminance [15] and have another disadvantage in that it is necessary to select the optimal parameters manually.
Recently, deep convolutional neural networks (DNN) have been in the spotlight, replacing handcrafted features. DNN [16] were first used in the 2012 ImageNet Massive Visual Recognition Challenge and became famous for their success in classifying a huge dataset with superior performance. Unlike handcrafted features that manually select the optimal parameters, DNN can automatically optimize parameters based on a training dataset. In previous food safety research, D. Rong [17] proposed a method to detect foreign objects in walnuts by combining two different convolutional neural networks. The study achieved a 95% foreign object detection rate based on a self-collected dataset. Y. Shen [18] proposed a method to detect worms in stored grains. The study results achieved 88mAP detection rates based on a self-collected dataset.
DNN with annotated training datasets shows improvement on various image recognition tasks including image classification [19,20], object detection [21,22], and semantic segmentation [23,24]. However, the performance of DNN greatly depends on the quality and number of training datasets [25]. In food safety research, there are not enough datasets required for training DNN, and many researchers are using datasets collected by themselves. However, when a DNN-based algorithm is applied to the detection of foreign objects, it requires thousands of different images and annotations for training. Manual annotation is a cumbersome task that requires a lot of time and effort. Figure 2 shows how to manually annotate the training data required for DNN-based algorithm training. A manual method to collect annotations is to use annotation tools. Figure 2a shows manually collecting annotations on GOF using Labelme [26]. Labelme is a method of drawing the outline of objects in a polygonal method. Manual annotation of GOF in Figure 2a requires at least 5 minutes of time and effort. Annotation collected using annotation tools is shown in Figure 2b. Our proposed method, similar to the color sorting machines, focuses on the RMF and background of the work bench that can be easily obtained in food inspection. However, it uses DNN to consider not only color but also various features such as shape, texture, and size. To train the features of RMF, several images of RMF mixed with various objects are required [27]. However, our system is not a multi-class classification. It is a pixel-wise binary classification consisting of an RMF category and a category grouping all objects except RMF. For example, if almond is selected as the RMF, it is a pixel-wise binary classification consisting of the almond category and the object category excluding almond. The proposed method predicts pixel-wise binary classification using U-Net [28]. U-Net is an architecture used for medical cell image segmentation [29] and is recognized as a representative model of semantic segmentation using deep learning due to its simple structure and high performance. Accordingly, research using U-Net is actively conducted in various fields such as agriculture, medicine, and engineering [30][31][32][33]. In addition, we introduced a method of generating a synthesis image that trains U-Net to only focus on features of RMF.
The synthesis method is a simple and easy approach to generating training datasets with minimal effort. The conventional synthesis method [34][35][36] should manually generate the annotation of the training dataset. However, the proposed method automatically acquires the mask of RMF using an effective image acquisition system that uses illumination and the Otsu algorithm. The automatically acquired mask of RMF is used as annotations for the training dataset of U-Net detecting RMF. As a result, the time and effort of collecting training datasets and annotations were dramatically reduced using an effective image acquisition system and synthesis image. As a result, the proposed method improves the performance of FODM through the combination of U-Net, a synthetic dataset, and the Otsu algorithm [37], rather than improving the DNN model alone.

Sample Preparation
We adopt almonds and GOF among various RMF to verify the performance of the proposed method. However, GOF and almonds are only examples of various RMF; the proposed method can be widely used in various RMF. Almond is a very familiar nut and is used to make bread, butter, cakes, and other desserts. GOF is used widely in seasoning food, removing unpleasant odors, and increasing the richness of taste. Recently, the consumption of GOF as subsidiary materials for instant food continues to increase. From the acquired image standpoint, most almonds have similar features such as color, texture, shape, and size. Individual GOFs also have similar features to each other. However, thin and light GOFs have a tendency to overlap each other. GOFs overlapping each other have arbitrary shapes and sizes, and sometimes foreign objects are hidden. However, it should be noted that separation of the overlapping GOF and detection of hidden foreign objects are not considered in this paper, because it is expected that overlapping GOF or hidden foreign objects can be resolved by a vibration of the workbench [38]. When the proposed method is deployed with the above vibration equipment together, the overall performance will certainly be enhanced. Table 1 shows the training dataset and the test dataset. The training dataset was acquired at an illuminance intensity of 360 lux. The test dataset acquires the same sample at different illuminance intensities. Test dataset (1) is an image acquired at 360 lux, which is the same illuminance intensity as the training dataset. Test dataset (2) is an image acquired at 550 lux, which is a brighter illuminance intensity than the training dataset. Test dataset (3) is an image acquired at 175 lux, which is a darker illuminance intensity than the training dataset.

Equipment
The color image acquisition system was set up to acquire color images with RMF and foreign objects, as shown in The backlight has a wide illuminating angle and high uniformity with chip mount LED on PCB at a regular interval (LXL300, CLF, KOR). This backlight serves to remove shadows from objects and provide a constant background to the image. The transfer unit included an X-axis and a Y-axis transfer unit for transferring the imaging section. All components except the computer were fixed inside a dark chamber to avoid any light. A light meter was used to measure the intensity of illumination (TES-1330A, TES, TW).

Proposed Method
All foreign objects could not be collected, so FODM using DNN limited them to frequently appearing foreign objects. However, both foreign objects that appear frequently and foreign objects that appear sometimes are foreign objects. Ideally, we want to detect all foreign objects that can be found during food inspection. However, to train a model for FODM, collecting all foreign objects that can be found during food inspection is almost impossible. To resolve this matter, we propose a method for detecting foreign objects without collecting any foreign objects. The main idea is to only focus on RMF and a background that can be easily obtained during food inspection. Only foreign objects will remain naturally when the proposed method removes RMF and a background from the test image. Figure 4 shows the two main steps of the proposed method. The first step is the training of U-Net to predict the RMF. The training dataset of U-Net uses images with RMF pasted in the Food101 background scenes in Section 2.3.2. The next step is the FODM through RMF prediction and background estimation. The proposed method uses deep learning as the main algorithm to detect foreign objects, so it is called deep learning-based foreign object detection (DLFOD). The steps for DLFOD are: (1) predict mask of RMF from an unseen real image using the trained U-Net in Section 2.

Effective Image Acquisition System
The illumination plays an important role in improving the performance of the camera, but it also has a problem of creating shadows of objects. The RMF image, which is required when generating a synthesis image, should not have shadows. Therefore, we propose a method of combining the reflectance and transmittance modes of illumination for shadow removal as shown in Figure 5b. Modes of illumination are reflectance mode and transmittance mode depending on the location where the illumination is installed. Typically, illumination installed above the object being observed is in reflectance mode as shown in Figure 5a, which emphasizes the features of the object in the image gained by the camera, making the colors more vivid. However, the transmittance mode illumination installed under the observation object is mainly used for observing the inside of thin objects. We tried a combination of reflective and transmittance modes for shadow removal. As a result, this method achieved the effect of emphasizing the features of the object more but removing the shadows. In addition, there was an advantage in that the distinction between the foreground and the background becomes clear. This advantage becomes an important clue to easily obtaining the training dataset and annotation required for DNN. We use the Otsu algorithm as a method to separate the RMF images acquired in an effective image acquisition system into foreground and background. In computer vision and image processing, the Otsu algorithm is used to perform automatic image threshold. This threshold is determined by minimizing intraclass intensity variance or, equivalently, by maximizing interclass variance [39].

Generating Synthetic Images
To detect foreign objects using DNN, many images of RMF mixed with various foreign objects are required. In general, it takes a lot of time and effort to collect RMF with various foreign objects. To solve this problem, we generate synthetic images with only RMF pasted on the scene in the open datasets. Open datasets [40][41][42][43][44][45] are collected images from several categories of computer vision. We use open datasets to indirectly replace unspecified foreign objects. Figure 6 shows the main steps of the method to generate the synthetic image. The steps include: (1) prepare an image containing RMF acquired by effective image acquisition system and a randomly selected image from the open dataset, (2) convert RMF to a grayscale image, (3) acquire binarization mask image of RMF from the grayscale image using Otsu algorithm, (4) acquire the image through bitwise AND operation between RMF and the binarization mask image, (5) convert reversed binarization mask image of RMF from binarization mask image, (6) acquire the image through bitwise AND operation between the randomly selected image from the open dataset and the reversed binarization mask image, and (7) acquire the synthesis image by merging the two results. The binarization mask image is used as annotation for the resulting synthetic image. The proposed method acquires the training images and extracts the annotation of RMF without human effort.
To train a model that is robust for the change of illumination intensity, the color jittering was performed by randomly adjusting the saturation, contrast, and brightness of the synthetic image.

Raw Materials of Food Prediction
We require a method to predict the region of RMF for input images mixed with RMF, a background, and foreign objects. In addition, the predicted output image should have the same spatial resolution as the input image. Semantic segmentation is the task of assigning categorical annotations to every pixel in a given image and is used for image segmentation tasks with the same resolution of the input image as the output image. We predict the region of RMF using U-Net. U-Net is used to detect various objects such as vehicles and medicine, but no research references exist concerning the detection of RMF such as almond and GOF. However, RMF is similar to medical cells in that other RMF or foreign objects are adjacent to each other and are symmetrical up and down and left and right. U-Net uses the overlap-tile technique to train symmetric and adjacent cells. Therefore, we train the RMF using U-Net, which enables segmentation between symmetric and adjacent objects using the overlap-tile technique and tasks with the same resolution of the input image as the output image for image segmentation.
The architecture of the U-Net is shown in Figure 7. It consists of contraction and expansion paths and does not use the lateral connection between the contraction and expansion paths. The contraction path is made of contraction blocks. Each block takes two 3 × 3 convolutions, each followed by a rectified linear unit (ReLU) and a 2 × 2 max pooling operation [46] with stride 2 for downsampling. The number of feature maps after each block doubles. The feature map is a mapping that corresponds to the activation of different parts of the image and is also a mapping of where a certain kind of feature is found in the image. A high activation means a certain feature was found. As the number of feature maps increases, the architecture can learn complex structures more effectively because the architecture can find more certain features in the image [47]. For example, the first feature map looks for curves. The next feature map looks at a combination of curves that build circles. The next feature map could detect extended features from circles. Every block in the expansive path is made up of a 2 × 2 convolution and two 3 × 3 convolutions, each followed by a ReLU. The expansive path ensures that the features that are learned while contracting the image will be used to reconstruct it. At the final layer, a 1 × 1 convolution is used to map each 64-component feature vector to the 2 classes. In total, U-Net has 23 convolutional layers.
The energy function is computed by a pixel-wise sigmoid over the final feature map combined with the cross-entropy loss function. The sigmoid layer at the end of the model created a two-channel output and then an output image containing the result-whether it is green onion flakes or not. The sigmoid used to train the model is shown in Equation (1): where x is the input data. The cross-entropy used to train the model is shown in Equation (2): where t i ∈ {0, 1} c is the true label of each pixel, and s i ∈ [0, 1] c is sigmoid output data.

Background Estimation
To leave only foreign objects in the image, we need to remove the background. The background corresponds to the surface of the workbench. Figure 3c shows the workbench image which consists of a backlight. On a workbench without a backlight, shadows appear on RMF, as shown in Figure 8a. Although the shadow is not a foreign object, the FODM is highly likely to predict it as a foreign object. Hence, the workbench with a backlight is effective in removing shadows from the objects, as shown in Figure 8c, and has a white background. The white background pixels have high gray level strength [48]. Moreover, the color intensity of the background is very similar. Consequently, the minimum intensity of the empty workbench image is used to calculate the threshold used to determine whether it is a background or not. This is defined as

Histogram Backprojection
Most food inspection methods use color sorting machines to detect foreign objects [13,49]. The color sorter machines use a method to detect foreign objects based on the color difference between RMF and foreign objects [50]. Color-based foreign object detection mainly uses the histogram backprojection algorithm [51], so it is called histogram backprojection-based foreign object detection (HBFOD). Histogram can be used to roughly inspect the distribution of pixels in an image. Back projection is a method of recording how well the pixels of a given image fit the distribution of pixels in the histogram model. By deriving histograms of both a target image and a source image, the histogram backprojection calculates the ratio histogram of the source with the target [52]. The source S is determined from the object to be found, and the target T is determined to be searched. A ratio histogram R is obtained by dividing S by T: where i is the index of a bin. This ratio histogram R is then backprojected on the image: where C x,y is the pixel value at (x,y), h(C x,y ) is the bin corresponding to C x,y , and the backprojected image is b x,y .

Metrics to Evaluate the DNN Model
We assessed the performance of the FODM as F1-score [53]. F1-score is the harmonic mean of precision and recall computed from the number of foreign objects detected. The highest possible value of an F1-score is 1.0, and the lowest possible value is 0. Recall is the ratio of the number of correctly detected foreign objects to the number of actual foreign objects. Precision is the ratio of the number of correctly detected foreign objects to the number of actual foreign objects and RMF. A high F1-score means that precision and recall are harmoniously high. Therefore, the region of the foreign objects is accurately detected, and the RMF region is not detected as the region of the foreign objects. However, A low F1-score means that there is a gap between precision and recall, or that both precision and recall are low. Therefore, a low F1-score has a disadvantage in that even if the region of foreign objects is accurately detected, the false detection of the RMF region as a foreign object is also high. Mean F1-score is the average F1-score of types of foreign objects included in the test data. A high mean F1-score means that it can detect various types of foreign objects. As a result, a high mean F1-score can detect various types of foreign objects and does not misrecognize the RMF region as a foreign object, so it is a suitable method for food inspection.

Results and Discussion
We compare the effectiveness of the proposed synthesized dataset against the humanannotated dataset. Firstly, we show that the effective image acquisition system obtains images that can be easily distinguished between foreground (RMF) and background (workbench). Secondly, we generate synthetic images by pasting the RMF obtained from the effective image acquisition system onto a randomly selected background in Food101. Lastly, we compare the performance of detecting foreign objects in the test dataset using a trained U-Net for the proposed synthesized dataset with automatically generated annotations and the real dataset with the human-annotated annotations.

Effective Image Acquisition System Result
To acquire RMF images and binarization masks for synthesis image generation, we propose the effective image acquisition system with both reflectance and transmittance modes. The image acquired in reflectance mode has a shadow as shown in Figure 8a, whereas the image acquired in the proposed system has no shadow as shown in Figure 8c, and the foreground and background can be clearly distinguished. The acquired image is automatically converted to a binarization mask using the Otsu algorithm. In the binarization mask obtained based on the reflectance mode, it is difficult to distinguish the boundary between the foreground and the background due to shadows as shown in Figure 8b. On the other hand, the binarization mask obtained based on the proposed system can clearly distinguish the boundary between the foreground and the background as shown in Figure 8d. A method similar to our proposed effective image acquisition system is to acquire an object mask using a depth sensor. The Big Berkeley Instance Recognition Dataset [54] provides object masks using a depth sensor, and many researchers use it as a training dataset for semantic segmentation. However, the depth sensor is difficult to use for RMF that are attached to the background or thin. On the other hand, the proposed image acquisition system is advantageous for acquiring a mask of a thin object such as GOF.

Synthetic Image Result
We augmented the training dataset using the synthetic image; the images in Figure 9 are examples of the data augmentation. To generate the synthetic image, we chose the Food101 dataset as the open dataset to synthesize RMF. The Food101 dataset [55] presented in [41] consists of food images and related objects. The Food101 dataset contains 101 food categories and 101,000 images. For example, samples of the Food101 dataset include spinach, carrots, cucumbers, and mushrooms belonging to natural objects and some categories similar to RMF. In addition, it also includes metal, glass, and paper that belong to man-made objects. The synthetic image is generated by combining the RMF images and randomly selected images from the Food101 dataset. In Figure 9a, almond and GOF are acquired by the effective image acquisition system. In Figure 9b, The image acquired by the image acquisition system is separated into the region segmentation of RMF and the background using the Otsu algorithm. The separated region segmentation of RMF is used as the annotation for the training dataset. Figure 9c includes synthetic images pasted with RMF from the training dataset to the randomly selected background from the Food101 dataset. RMF were surrounded by various objects related to food in the synthetic image. Color jittering was performed by randomly adjusting the saturation, contrast, and brightness of the synthetic image.

Evaluation of the Synthesis Images
DLFOD To evaluate the performance of DLFOD across datasets, we conducted experiments using the acquired real images or synthetic images or both real images and synthesized images as training datasets. Table 1 shows the number of RMF and foreign objects used in the training and test datasets. The real image with the human-annotated annotations consisted of RMF and real foreign objects from the training dataset. On the other hand, the synthesized image with automatically generated annotations uses images pasted with RMF from the training dataset to the randomly selected background from the Food101 dataset. The test dataset consists of real images including RMF mixed with real foreign objects. The test dataset acquired the same sample at different illuminance intensities (360, 175, and 550 lux).
We used the DLFOD introduced in Section 2.3 and initialized all the weights in the training model to values generated randomly from a Gaussian. We trained all models for 50 epochs using SGD + momentum with a learning rate of 0.001, momentum of 0.9, batch size of 5. A weight decay of 0.0005 was also used. We set the value of all the loss weights as 1.0 in our experiments. We ensured that the model hyperparameters did not change by utilizing the same random seed for consistency.  Table 2 shows the evaluation results of the FODM performance of DLFOD according to the training datasets. The mean F1-score of the test dataset (1) obtained from DLFOD trained on the synthetic image (almonds) achieved a performance of 0.82, similar to the mean F1-score obtained from the DLFOD trained on the real image (rows 1 vs. 3). We collected manual annotations of real images (almonds) in the training dataset using annotation tools in Figure 2a and took about 1-2 min per sample-a total of 40 h. On the other hand, synthetic images are not annotated by humans, saving 40 h and effort. The mean F1-score of the test dataset (1) obtained from DLFOD trained on the GOF synthetic image achieved a performance of 0.70, which obtained lower performance than almond (rows 10 vs. 12). Almonds have a distinct shape and texture compared to GOF and do not overlap each other. It is easy for the DLFOD to accurately learn the almond features using the synthetic image. Conversely, GOF is thin and easily overlaps with other GOFs, so the shape of the GOF is unclear, making it difficult for the DLFOD to learn the GOF features compared to almond. This agrees with [56] that the higher-level DNN layer concentrates on the shape of an object. In Figure 10, (a) shows foreign objects having a color similar to that of GOF mixed with GOF at an illuminance intensity of 360 lux, (b) shows that some of the foreign objects with a shape and color similar to GOF are false detections, (c) shows foreign objects mixed with GOF, and (d) shows that some of the RMF and foreign objects are false detections.  The mean F1-score of the test dataset (2) obtained from DLFOD trained on the synthetic image (almonds) achieved a performance of 0.80. On the other hand, the DLFOD learned from the real image was 0.78, which had a lower performance than the synthetic image. In addition, The mean F1-score of the test dataset (3) obtained from DLFOD trained on the real image (almonds) achieved a performance of 0.74, which had a lower performance than the synthetic image. As a result, the mean F1-score of the test dataset (2, 3) obtained from DLFOD that learned the real image showed a large difference in performance according to the change in illuminance. However, the mean F1-score of DLFOD that learned the synthetic image that conducted dataset augmentation using color jitter showed a relatively small difference in performance according to the change in illuminance. Combining the real image and the synthetic image can overcome the disadvantage that the training dataset of the real image is weak to changes in illuminance. These results show that the synthetic dataset not only competes with the real dataset, but the two also complement each other. Table 3 shows the evaluation results of DLFOD trained using the proposed synthetic image and HBFOD. Figures 11 and 12 show the foreign object detection of the model for the test image acquired at the same illuminance (360 lux) as the training dataset. In order to emphasize the foreign object detection performance in RMF, the foreign object regions from the image are marked in red. In Figure 11, (a) shows images of foreign objects of various colors (plastic) mixed with RMF at an illuminance intensity of 360 lux, (b) shows the foreign object detection result of DLFOD for foreign objects of various colors, and (c) shows the foreign object detection result of HBFOD for foreign objects of various colors. For both DLFOD and HBFOD, the foreign object detection result was reasonably good, and all regions of the foreign object were highlighted in red. In Figure 12, (a) shows images of foreign objects (fly eggs, plants, paper scraps) having a color similar to that of a food raw object mixed with RMF at an illuminance intensity of 360 lux, (b) shows the detection result of DLFOD, and (c) shows the foreign object detection result of HBFOD. The foreign object detection result of the DLFOD was reasonably good. On the other hand, HBFOD could not detect foreign objects similar to RMF. To evaluate the performance of DLFOD and HBFOD according to changing illumination intensity, we conducted foreign object detection experiments in various illumination intensities, and Figures 13 and 14 are examples of the experimental results. In order to emphasize the foreign object detection performance in RMF, the foreign object regions from the image are marked in red. In Figure 13, (a) shows images of foreign objects (fly) mixed with RMF at an illuminance intensity of 550 lux, (b) shows the detection result of DLFOD, and (c) shows the foreign object detection result of HBFOD. The foreign object detection result of the DLFOD was reasonably good. HBFOD could distinguish between RMF and foreign objects but only detected a part of the foreign objects. In Figure 14, (a) shows images of foreign objects (fly) mixed with RMF at an illuminance intensity of 175 lux, (b) shows the detection result of DLFOD, and (c) shows the foreign object detection result of HBFOD. DLFOD was reasonably good. HBFOD could distinguish between RMF and foreign objects but only detected a part of the foreign objects. Additionally, HBFOD had a problem of falsely detecting shadows or parts of RMF as foreign objects.     Figure 15 shows the foreign object detection platform. The foreign object detection platform was implemented using the proposed method to verify its applicability in food inspection. It was implemented in the Ubuntu 18.04 environment and used the python language. The foreign object detection platform consists of a screen that outputs the image acquired by the camera and a screen that outputs only foreign objects. After classifying the pixels using the proposed method, on the screen that outputs only the foreign object, foreign objects were highlighted with red lines and bounding boxes to increase the visibility of the classified results.

Conclusions
We proposed a method to detect foreign objects regardless of the type of foreign objects, focusing on RMF and background detection. In particular, we proposed a method that effectively collects the training data required for RMF prediction using U-Net. From a practical standpoint, the effective image acquisition system afforded the possibility to collect training data that can detect RMF and foreign objects without manual annotation.
HBFOD extracted features from images of foreign objects and food based on color and experience. This method could not detect foreign objects with a color similar to RMF, and the performance was easily degraded by changes in illuminance. This paper used a foreign object detection method using DNN to solve the problem of the conventional method. As a result, DLFOD achieved higher performance than HBFOD in detecting foreign objects, although there was a difference in performance depending on the type of RMF such as almond and GOF. Additionally, the DNN which learned RMF using the proposed synthetic images was robust to changes in illuminance compared to HBFOD. However, our proposed method was not suitable for general object detection because there are limitations that objects should have similar viewpoints and scales, and the background should be monotonous, although it was a suitable method for food quality evaluation in which the background is monotonous, and the acquired image had the same viewpoint and scale using a camera installed at the same location. Nevertheless, it should be noted that the detection of foreign objects mixed with thin and overlapping RMF such as GOF still needs to be investigated. Future work will focus on DNN using multi-waveband imaging hardware. We are convinced that the method to improve the performance of foreign object detection is to acquire image datasets with more features of RMF by using multi-waveband imaging hardware.