Surface Defect Detection of Fresh-Cut Cauliflowers Based on Convolutional Neural Network with Transfer Learning

A fresh-cut cauliflower surface defect detection and classification model based on a convolutional neural network with transfer learning is proposed to address the low efficiency of the traditional manual detection of fresh-cut cauliflower surface defects. Four thousand, seven hundred and ninety images of fresh-cut cauliflower were collected in four categories including healthy, diseased, browning, and mildewed. In this study, the pre-trained MobileNet model was fine-tuned to improve training speed and accuracy. The model optimization was achieved by selecting the optimal combination of training hyper-parameters and adjusting the different number of frozen layers; the parameters downloaded from ImageNet were optimally integrated with the parameters trained on our own model. A comparison of test results was presented by combining VGG19, InceptionV3, and NASNetMobile. Experimental results showed that the MobileNet model’s loss value was 0.033, its accuracy was 99.27%, and the F1 score was 99.24% on the test set when the learning rate was set as 0.001, dropout was set as 0.5, and the frozen layer was set as 80. This model had better capability and stronger robustness and was more suitable for the surface defect detection of fresh-cut cauliflower when compared with other models, and the experiment’s results demonstrated the method’s feasibility.


Introduction
Cauliflower (Brassica oleracea L. botrytis) is a variety of Brassica wild cabbage. It has become a popular vegetable in the world because of its rich nutritional value, high yield, and large economic benefits.
Fresh-cut fruits and vegetables, also known as semi-processed fruits and vegetables, refer to fresh fruits and vegetables as raw materials, after a series of processing activities such as grading, cleaning, trimming, preserving, packaging, etc., and then after lowtemperature transportation into the freezer sales of ready-to-eat or ready-to-use fruit and vegetable products. Fresh-cut cauliflower not only maintains the original freshness of the cauliflower itself but processing also makes it clean and hygienic, which can meet the needs of people's contemporary pursuit of a natural, nutritious, fast-paced lifestyle and other aspects. Therefore, the quality grading of fresh-cut cauliflower is an important part of its processing. At present, the surface defect detection of fresh-cut cauliflower is largely performed manually, which is slow and has low precision. Under such detection conditions, it is hard to effectively guarantee the quality of fresh-cut cauliflower, which largely restricts the scale of its production, transportation, and storage in the fresh-cut cauliflower industry. Thus, recognizing and detecting fresh-cut cauliflower surface defects accurately is necessary and urgent. This solution will be of significant economic benefit and have a significant future planning meaning for the development of the fresh-cut cauliflower industry. merchantability. The image acquisition facilities included a computer, darkroom, lifting platform, CCD industrial camera, and LED light bars. The background of the cauliflower image is white, which is convenient to reduce the interference of other environmental factors in the image acquisition. The shooting angle of the camera was vertically downward, 10 cm from the cauliflower. The image acquisition facilities are shown in Figure 1. Fresh-cut cauliflower samples were placed by the experimenter in the center of the lifting platform in the darkroom and then photographed by a computer-controlled CCD industrial camera.

Dataset Acquisition
In this experiment, "Xuebai" cauliflower was taken as the research subject. "Xuebai" cauliflower is a variety of cauliflower in China with uniform, firm flower bulbs and good merchantability. The image acquisition facilities included a computer, darkroom, lifting platform, CCD industrial camera, and LED light bars. The background of the cauliflower image is white, which is convenient to reduce the interference of other environmental factors in the image acquisition. The shooting angle of the camera was vertically downward, 10 cm from the cauliflower. The image acquisition facilities are shown in Figure 1. Fresh-cut cauliflower samples were placed by the experimenter in the center of the lifting platform in the darkroom and then photographed by a computer-controlled CCD industrial camera. Fresh-cut cauliflower samples can be classified into four categories: healthy, diseased, browning, and mildewed. Healthy cauliflowers had full flower bulbs and snowy white color. Diseased cauliflowers were due to bacterial infection of stems, resulting in local blackening of stems, and had lost edible value. The specific performance of browning cauliflowers was that the surface of the cauliflower had brown spots and rot symptoms. Mildewed cauliflowers were infected by gray mold disease, resulting in gray-white colonies on the surface. For data collection, 479 cauliflower samples were chosen. Image enhancement techniques were used to increase the number of datasets so that the model obtained enough information during training to improve its generalization ability [29].
The following image enhancement techniques were used: translation, inversion, rotation, saturation, enhancement, and brightness adjustment. Finally, 4790 images were acquired. The dataset descriptions of the original dataset, augmented dataset, training set, validation set, and test set are shown in Table 1. Part of the original images is provided in Figure 2.  Healthy  150  1500  900  300  300  Diseased  103  1030  610  210  210  Browning  114  1140  700  220  220  Mildewed  112  1120  670  220  230  Total  479  4790  2880 950 960 Fresh-cut cauliflower samples can be classified into four categories: healthy, diseased, browning, and mildewed. Healthy cauliflowers had full flower bulbs and snowy white color. Diseased cauliflowers were due to bacterial infection of stems, resulting in local blackening of stems, and had lost edible value. The specific performance of browning cauliflowers was that the surface of the cauliflower had brown spots and rot symptoms. Mildewed cauliflowers were infected by gray mold disease, resulting in gray-white colonies on the surface. For data collection, 479 cauliflower samples were chosen. Image enhancement techniques were used to increase the number of datasets so that the model obtained enough information during training to improve its generalization ability [29].

CNN and Transfer Learning
CNN is a variety of multi-layer perceptrons, and is often applied to image classification in the field of deep learning. The training of the CNN model usually needs tremendous data and calculation, and it is difficult to achieve the desired effect if the dataset of training samples is relatively small. Training from scratch can be timeconsuming and has a high demand for computer hardware. To solve these problems, we used the method of transfer learning [30], by using the similarity between data or tasks, migration of existing trained models to new models to help train new models, and improving the model according to assigned tasks, such as fine-tuning strategy. The main idea was to adjust one or more layers of the pre-training model to obtain better training results.

Model Establishment
Currently, the mainstream pre-training models are VGGxNet [31], ResNet50 [32], Xception [33], InceptionV3 [34], and MobileNet [35]. However, some models have vast quantities of parameters, such as VGG16, which has 138,357,544 parameters, resulting in a model size of 528MB. Models that are too large require high computing power and large memory size of hardware devices for training and running, this makes it a hard issue for the model to function properly on mobile devices as well as embedded devices. Based on the above-mentioned issues, we need to construct a lightweight model to meet the needs of detecting surface defects of fresh-cut cauliflower. MobileNet, proposed by the Google team in 2017, is a lightweight CNN focused on mobile terminals and embedded devices. Compared with traditional CNN, it greatly reduces the model parameters and operations with a small reduction in accuracy. On ImageNet classification, MobileNet has only 0.9% less accuracy compared to VGG16, while being 32 times smaller, the accuracy of MobileNet is 0.8% higher than GoogleNet and is 1.6 times smaller. Through comprehensive analysis, MobileNet, with a small size and good recognition performance, was selected for this experiment. The architecture of MobileNet is shown in Figure 3.

CNN and Transfer Learning
CNN is a variety of multi-layer perceptrons, and is often applied to image classification in the field of deep learning. The training of the CNN model usually needs tremendous data and calculation, and it is difficult to achieve the desired effect if the dataset of training samples is relatively small. Training from scratch can be time-consuming and has a high demand for computer hardware. To solve these problems, we used the method of transfer learning [30], by using the similarity between data or tasks, migration of existing trained models to new models to help train new models, and improving the model according to assigned tasks, such as fine-tuning strategy. The main idea was to adjust one or more layers of the pre-training model to obtain better training results.

Model Establishment
Currently, the mainstream pre-training models are VGGxNet [31], ResNet50 [32], Xception [33], InceptionV3 [34], and MobileNet [35]. However, some models have vast quantities of parameters, such as VGG16, which has 138,357,544 parameters, resulting in a model size of 528 MB. Models that are too large require high computing power and large memory size of hardware devices for training and running, this makes it a hard issue for the model to function properly on mobile devices as well as embedded devices. Based on the above-mentioned issues, we need to construct a lightweight model to meet the needs of detecting surface defects of fresh-cut cauliflower. MobileNet, proposed by the Google team in 2017, is a lightweight CNN focused on mobile terminals and embedded devices. Compared with traditional CNN, it greatly reduces the model parameters and operations with a small reduction in accuracy. On ImageNet classification, MobileNet has only 0.9% less accuracy compared to VGG16, while being 32 times smaller, the accuracy of MobileNet is 0.8% higher than GoogleNet and is 1.6 times smaller. Through comprehensive analysis, MobileNet, with a small size and good recognition performance, was selected for this experiment. The architecture of MobileNet is shown in Figure 3. The architecture of MobileNet includes 13 depth-wise separable convolutions, and the structure of a depth-wise separable convolution is shown in Figure 4. Each DSC contains a DepthwiseConv2D and a PointwiseConv2D, and each Conv2D is followed by a Batch Normalization [36] and a ReLU.  The architecture of MobileNet includes 13 depth-wise separable convolutions, and the structure of a depth-wise separable convolution is shown in Figure 4. Each DSC contains a DepthwiseConv2D and a PointwiseConv2D, and each Conv2D is followed by a Batch Normalization [36] and a ReLU. The architecture of MobileNet includes 13 depth-wise separable convolutions, and the structure of a depth-wise separable convolution is shown in Figure 4. Each DSC contains a DepthwiseConv2D and a PointwiseConv2D, and each Conv2D is followed by a Batch Normalization [36] and a ReLU. For feature extraction of different defects on the fresh-cut cauliflower surface, we first imported the parameters from ImageNet, then removed the fully connected layer and SoftMax layer, and kept only the convolutional module for extracting image features, which was used as the feature extraction module of the deep transfer learning model in this research.
For the classification of different defects on the fresh-cut cauliflower surface, the number of output categories in the last layer of MobileNet was changed from 1000 to 4, and a dropout layer was added after the global average pooling layer to suppress the overfitting of the model [37]. Finally, the results were output with a softmax layer. These were used as the classifier module of the deep transfer learning model in this research.

CNN Training
This research conducted training towards the developed MobileNet network. In order to obtain better model training performance, we utilized GPU to accelerate the training of the model [38]. To ensure that the model learned the data features sufficiently For feature extraction of different defects on the fresh-cut cauliflower surface, we first imported the parameters from ImageNet, then removed the fully connected layer and SoftMax layer, and kept only the convolutional module for extracting image features, which was used as the feature extraction module of the deep transfer learning model in this research.
For the classification of different defects on the fresh-cut cauliflower surface, the number of output categories in the last layer of MobileNet was changed from 1000 to 4, and a dropout layer was added after the global average pooling layer to suppress the overfitting of the model [37]. Finally, the results were output with a softmax layer. These were used as the classifier module of the deep transfer learning model in this research.

CNN Training
This research conducted training towards the developed MobileNet network. In order to obtain better model training performance, we utilized GPU to accelerate the training of the model [38]. To ensure that the model learned the data features sufficiently during training, the number of training epochs was configured as 100 in this research. For the sake of maintaining a balance between memory capacity and memory efficiency, the batch size for each training group was configured as 32.

Evaluation Metrics
The accuracy, precision, recall, and F1 score can be calculated from the data in the confusion matrix and used as evaluation metrics to assess the performance of the deep learning architecture [39]. The calculation formulas and short descriptions of these evaluation metrics are shown in Table 2.

Hyper-Parameter Optimization
The learning rate is a significant hyper-parameter in the model training process, which represents the update rate of model weights [40]. It is difficult to get the objective function to converge to its local minimum in a reasonable amount of time when setting a learning rate that is too large or too small [41]. In this experiment, five sets of learning rates were designed (0.1, 0.01, 0.001, 0.0001, and 0.00001), and the best one was selected from among them by experiment.

Evaluation Metric Calculation Formula Short Description
Accuracy The ratio of all predictions is correct (including positive and negative categories) to the total number of samples. Precision The ratio of correct predictions of positive categories to samples predicted to be positive. Recall The ratio of correct predictions of positive categories to all actual positive samples. When it comes to CNN models, the trained model is prone to overfitting if the model has a lot of parameters and the training sample is small. Dropout could mitigate the risk of overfitting and achieve the effect of regularization to a certain extent, and the activation values of some neurons are allowed to be randomly set to zero during training to reduce the dependency between some neurons, thus suppressing overfitting and improving the model's generalization capability [42]. The model development process of the experiment used a combination of dropout and learning rate adjustment for the optimal selection of hyper-parameters; four sets of dropout parameters were designed (0, 0.25, 0.5, and 0.75).

Influence of Dropout
The loss, accuracy, precision, recall, and F1 score values of the 20 test groups are shown in Table 3. From Table 3, it is apparent that the combination of learning rate and dropout had a remarkable effect on the loss, accuracy, and F1 score values. In the test set, when the learning rate was 0.1, 0.01, 0.0001, and 0.00001, the loss value gradually increased with the increase in dropout. When the learning rate was 0.1, 0.01, 0.001, and 0.0001, and dropout was 0, the model did not obtain the highest accuracy rate, which indicates that configuring an appropriate dropout value can effectively suppress the overfitting of the model. When the learning rate was 0.00001, the test set's accuracy and F1 score value decreased as the dropout rate increased, while the loss value gradually increased, indicating that the model did not produce overfitting.

Influence of Learning Rate
The curves of validation accuracy and validation loss during the training process are shown in Figures 5-8. By comparing the curves corresponding to the learning rates of the five sets, it can be seen that the rate of curve convergence to smoothness increased with the learning rate. The loss values were close to 0 when the learning rate was 0.001 and 0.0001.       When the learning rate was 0.1, the validation accuracy and validation loss value fluctuated significantly. The validation accuracy curve was more undulating, the validation loss value curve lacked a clear convergence trend, and large loss values occurred at times. When the learning rate was 0.01, the validation loss value curve had a clear convergence trend and was more stable, and the fluctuation of the validation accuracy curve was smaller, but there were still significant fluctuations that required model parameter optimization. When the learning rate was 0.001, the convergence speed of validation accuracy and validation loss was the fastest, the curve of validation accuracy was relatively smooth, the curve of the validation loss value was the smoothest, and the loss value was closest to 0, which indicates that the model has a strong fitting ability. When the learning rate was set to 0.0001, the validation accuracy and validation loss value curves converged steadily, though the convergence speed was relatively slow and the loss value was slightly higher. When the learning rate was 0.00001, the convergence of the validation accuracy and validation loss value curves was the slowest, and neither completely converged, proving that the learning rate was set too small, and it would take a long time to search the optimal parameters.   When the learning rate was 0.1, the validation accuracy and validation loss value fluctuated significantly. The validation accuracy curve was more undulating, the validation loss value curve lacked a clear convergence trend, and large loss values occurred at times. When the learning rate was 0.01, the validation loss value curve had a clear convergence trend and was more stable, and the fluctuation of the validation accuracy curve was smaller, but there were still significant fluctuations that required model parameter optimization. When the learning rate was 0.001, the convergence speed of validation accuracy and validation loss was the fastest, the curve of validation accuracy was relatively smooth, the curve of the validation loss value was the smoothest, and the loss value was closest to 0, which indicates that the model has a strong fitting ability. When the learning rate was set to 0.0001, the validation accuracy and validation loss value curves converged steadily, though the convergence speed was relatively slow and the loss value was slightly higher. When the learning rate was 0.00001, the convergence of the validation accuracy and validation loss value curves was the slowest, and neither completely converged, proving that the learning rate was set too small, and it would take a long time to search the optimal parameters. Combined with Table 3, comparing 20 experimental hyper-parameter combinations, the highest test accuracy rate of 98.85%, F1 score value of 98.87%, and the minimum loss value of 0.033 were achieved when the learning rate was 0.001 and dropout was 0.5. It was the best-performing hyper-parameter combination compared to other combinations.

Frozen Layer Optimization
Transfer learning is a method that aims at applying knowledge learned in one domain to a similar domain [43][44][45]. Through transfer learning, untrained models can acquire the parameters of trained models, thus improving learning efficiency and  Table 3, comparing 20 experimental hyper-parameter combinations, the highest test accuracy rate of 98.85%, F1 score value of 98.87%, and the minimum loss value of 0.033 were achieved when the learning rate was 0.001 and dropout was 0.5. It was the best-performing hyper-parameter combination compared to other combinations.

Frozen Layer Optimization
Transfer learning is a method that aims at applying knowledge learned in one domain to a similar domain [43][44][45]. Through transfer learning, untrained models can acquire the parameters of trained models, thus improving learning efficiency and achieving better accuracy [46].
In this experiment, the training parameters of MobileNet were downloaded from ImageNet for the detection of surface defects in fresh-cut cauliflower. The first few layers of the deep learning model learn the most common patterns of the image, which are relatively easy to migrate. As the convolutional layers in the CNN become deeper and deeper, the content learned becomes more and more specific, the learned parameters may not be fully applicable to the detection of fresh-cut cauliflower surface defects, and negative migration may even occur. Therefore checking if there is negative migration and how to avoid it is the problem to be solved.
To transfer the pre-trained MobileNet model to the task of fresh-cut cauliflower surface defect detection and classification, the pre-trained MobileNet model was reconstructed in the structure of a "feature extraction module + classifier module", as shown in Table 4. From layer 1 to 86, the feature extraction module was used for feature extraction such as color and texture. From layers 87 to 89, the classifier module was used for feature dimensionality reduction and outputting category labels. To select the optimal frozen layer, different numbers of depth-wise separable convolution modules were frozen sequentially according to the structure of MobileNet, i.e., different numbers of layers were selected to be frozen. For example, when FL = 10, only 1~10 layers were frozen, and when the frozen layer = 86, the whole feature extraction module was frozen. The model evaluation metrics are shown in Table 5.
According to Table 5, the best performance of the model was achieved when the frozen layer = 80. The results indicate that the original parameters directly migrated from the MobileNet model cannot be directly used for the detection and recognition of fresh-cut cauliflower surface defects, and the existing fresh-cut cauliflower image samples must be used to further adjust the original parameters of MobileNet to obtain the best recognition results. In this experiment, a CNN combined with transfer learning was used to detect surface defects of fresh-cut cauliflower, and MobileNet was used for training 20 combinations of the hyper-parameter to acquire the best training results, and the best number of the frozen layer was determined by freezing different layers of convolutional layers in the model. In this experiment, the model obtained the best results when the dropout was 0.5, the learning rate was 0.001, and the frozen layer was 80.

Comparative Analysis of Models
To attest to the prominent performance of MobileNet, after optimizations, we applied MobileNet and the other three models (VGG19, InceptionV3, and NASNetMobile) in fresh-cut cauliflower surface defect detection. These three models have successfully demonstrated their excellent performance through competition and application by researchers, so they were used as comparison experiments in this research. Comparison results are shown in Table 6. MobileNet had a significant advantage in test set loss, accuracy, and F1 score values. We can notice from the comparison that MobileNet was more appropriate for the detection of fresh-cut cauliflower surface defects. A confusion matrix is a visualization tool that shows the results of its predictions for a classification task, especially for supervised learning, and in unsupervised learning is generally called a matching matrix. It uses percentages to summarize the ratio of correct and incorrect predictions and breaks them down by each category. The confusion matrix shows which part of the classification model makes errors when making predictions, allowing us to know not only the errors made by the classification model but, more importantly, the type of errors that occurred. The confusion matrices of the four models are shown in Figure 9.
It can be seen from Figure 9 that diseased cauliflower was the most susceptible to error and was misclassified as other categories in VGG19, InceptionV3, and NASNetMobile. Because the color characteristics are not obvious in some samples of diseased cauliflower, they caused misidentification. The improved MobileNet model achieved great classification performance in this experiment, as well as good feature extraction capability. a classification task, especially for supervised learning, and in unsupervised learning is generally called a matching matrix. It uses percentages to summarize the ratio of correct and incorrect predictions and breaks them down by each category. The confusion matrix shows which part of the classification model makes errors when making predictions, allowing us to know not only the errors made by the classification model but, more importantly, the type of errors that occurred. The confusion matrices of the four models are shown in Figure 9.

Visualization of CNN
Deep learning is often referred to as a "black box", where the inner algorithm cannot be observed, but the representation learned by the CNN can be visualized [47]. Convolutional layer feature visualization can help explain how the CNN transforms the input and shows the features learned by the convolutional layers to provide more insight into the process of the CNN analysis of fresh-cut cauliflower surface defect features. During the training stage, we presented some visualization results to obtain a better explanation of how the model works. Feature maps of fresh-cut cauliflower are shown in Figure 10.
In Figure 10, we show the feature maps of eight different depths of the CNN. In Figure 10a, the feature map is closer to the input image, and as the number of layers increases, the feature map is closer to the output of the model, which becomes more and more abstract and not humanly interpretable.
Deep neural networks can effectively act as an information distillation pipeline (IDP), inputting the original data and repeatedly transforming it to filter out irrelevant information and amplifying and refining the useful information. From the above results, it can be seen that as the layers of the convolutional neural network deepen, the characteristics extracted by the convolutional layer become more and more abstract, with less and less visual information about the original fresh-cut cauliflower images and more and more information about the target category. Therefore, the proposed method in this paper can effectively obtain the surface information of fresh-cut cauliflower.

Visualization of CNN
Deep learning is often referred to as a "black box", where the inner algorithm cannot be observed, but the representation learned by the CNN can be visualized [47]. Convolutional layer feature visualization can help explain how the CNN transforms the input and shows the features learned by the convolutional layers to provide more insight into the process of the CNN analysis of fresh-cut cauliflower surface defect features. During the training stage, we presented some visualization results to obtain a better explanation of how the model works. Feature maps of fresh-cut cauliflower are shown in Figure 10. In Figure 10, we show the feature maps of eight different depths of the CNN. In Figure 10a, the feature map is closer to the input image, and as the number of layers increases, the feature map is closer to the output of the model, which becomes more and more abstract and not humanly interpretable.
Deep neural networks can effectively act as an information distillation pipeline (IDP), inputting the original data and repeatedly transforming it to filter out irrelevant information and amplifying and refining the useful information. From the above results, it can be seen that as the layers of the convolutional neural network deepen, the characteristics extracted by the convolutional layer become more and more abstract, with less and less visual information about the original fresh-cut cauliflower images and more (e) depth-wise separable convolution_7; (f) depth-wise separable convolution_9; (g) depth-wise separable convolution_11; (h) depth-wise separable convolution_13.

Discussion
In this experiment, the CNN combined with transfer learning was used for detecting surface defects of fresh-cut cauliflower. Twenty combinations of hyper-parameters were used to determine the best hyper-parameters for model training, and then different layers of the CNN were frozen to determine the best number of frozen layers with the help of the transfer learning method. The experimental results showed that the model achieved the best results in the test when the learning rate was 0.001, the dropout was 0.5, and the number of frozen layers was 80.
In the MobileNet network applied to the quality detection of agricultural products, Ashwinkumar S et al. proposed a model for the identification of plant leaf diseases named OMNCNN. This model operated in different stages, where the MobileNet model was used as the base model in the feature extraction stage. Experimental results showed that OMNCNN achieved excellent performance, with the F1 score reaching 0.985 [48]. To implement plant disease leaf detection on cell phones, Liu Yang et al. performed transfer learning on two lightweight CNNs, MobileNet and InceptionV3, to obtain two crop disease classification models, and ported them to Android cell phones, respectively. Although the overall recognition accuracy of InceptionV3 was slightly higher on the test set, the recognition balance and model size of MobileNet was more advantageous. When both models were ported to the mobile phone side, MobileNet occupied less memory and recognized images faster, indicating that MobileNet is more suitable for plant disease recognition applications on the cell phone side [49]. Aditya Rajbongshi et al. proposed a method for detecting rose diseases with MobileNet. They performed a comparative analysis by using transfer learning and not using transfer learning. By comparison, the MobileNetV1 model with transfer learning achieved better experimental results, reaching a 95.63% accuracy rate [50]. The results of the above research show that it is feasible to use MobileNet for fresh-cut cauliflower surface defect detection. A CNN has the advantage of a strong self-learning capability in image processing; in this respect, it is fundamentally different from traditional machine learning. A CNN obtains hierarchical feature representation by abstracting and analyzing the original input data and automatically learning layer-by-layer transformation, and finally obtains quantities of data features, and this method is more conducive to classification as well as feature visualization. For the future commercialization and market development of fresh-cut cauliflower, combining CNN with transfer learning may be a solution for the automatic detection and classification of fresh-cut cauliflower surface defects, which can improve detection efficiency and reduce production costs.

Conclusions
This research proposed a fresh-cut cauliflower surface defect recognition model to achieve the automatic recognition of fresh-cut cauliflower in quality inspection work. Model building was based on a CNN and transfer learning, and the model optimization was achieved by selecting the optimal combination of training hyper-parameters, and by adjusting the different number of frozen layers; the parameters downloaded from ImageNet were optimally integrated with the parameters trained by our experiment. Finally, we compared the improved MobileNet with VGG19, InceptionV3, and NASNetMobile, and the conclusions we have drawn are:

1.
By comparing 20 combinations of dropout and learning rates, we found that using a suitable combination of hyper-parameters can improve the model's generalization performance and accuracy while ensuring training stability. Therefore, the optimal combination of hyper-parameters (dropout = 0.5, learning rate = 0.001) could obtain the highest test accuracy of 98.85% and F1 score of 98.87%.

2.
The MobileNet model used the transfer learning method by freezing the different numbers of CNN layers, and when the number of frozen layers was 80, the accuracy improved by 0.42%, the F1 score improved by 0.37%, and the model loss value reduced by 0.008.

3.
The improved MobileNet model was compared with VGG19, InceptionV3, and NAS-NetMobile on the fresh-cut cauliflower test set. By comparing the test results, the improved MobileNet model has the best performance. Therefore, MobileNet was more suitable for fresh-cut cauliflower surface defect detection.
For this research, we improved the accuracy and F1 score of the model by adjusting only the training hyper-parameters and combining ImageNet with our trained parameters, but the variety of datasets tested was small, so limitations still exist. In future research, we have a schedule to develop a more comprehensive dataset of fresh-cut cauliflower surface defects and build a more streamlined model by minimizing the network parameters while maintaining a high accuracy rate. In addition, we plan to deploy the model to mobile devices while maintaining its stability and accuracy, allowing this model to be more widely used so that it is more conducive to the industrialization and commercialization of fresh-cut cauliflower.