A Multi-Feature Fusion Based on Transfer Learning for Chicken Embryo Eggs Classiﬁcation

: The fertility detection of Speciﬁc Pathogen Free (SPF) chicken embryo eggs in vaccine preparation is a challenging task due to the high similarity among six kinds of hatching embryos (weak, hemolytic, crack, infected, infertile, and fertile). This paper ﬁrstly analyzes two classiﬁcation di ﬃ culties of feature similarity with subtle variations on six kinds of ﬁve-to seven-day embryos, and proposes a novel multi-feature fusion based on Deep Convolutional Neural Network (DCNN) architecture in a small dataset. To avoid overﬁtting, data augmentation is employed to generate enough training images after the Region of Interest (ROI) of original images are cropped. Then, all the augmented ROI images are fed into pretrained AlexNet and GoogLeNet to learn the discriminative deep features by transfer learning, respectively. After the local features of Speeded Up Robust Feature (SURF) and Histogram of Oriented Gradient (HOG) are extracted, the multi-feature fusion with deep features and local features is implemented. Finally, the Support Vector Machine (SVM) is trained with the fused features. The veriﬁed experiments show that this proposed method achieves an average classiﬁcation accuracy rate of 98.4%, and that the proposed transfer learning has superior generalization and better classiﬁcation performance for small-scale agricultural image samples.


Introduction
A Specific Pathogen Free (SPF) chicken embryo egg is a virus culture source widely used in the biological vaccine preparation manufacturing industry [1].It is significant in the sense that only normally cultivated and fertile embryo eggs can be injected with inoculated viruses in the case of cross-contamination.Before inoculation and culture, except for the live and fertile embryo, all the infertile, weak, crack, hemolytic, and infected chicken embryo eggs must be removed from the incubator to keep them vaccination-secure [2].Therefore, the large-scale detection of candling periodically in black contamination by skilled inspectors (through human vision and traditional experience ways) is ubiquitous [3], which has the costs of heavy manual labor, low efficiency, and produces many detection errors due to much fatigue or differences in individual experience among inspectors.As the hatching of embryo eggs has different characteristics for different hatching days, how the accuracy of fertility detection and classification of embryo eggs can be improved has become the new research focus.
With the development of the imaging system, as well as deep-learning and image-processing techniques, high performances in the identification of fertilized eggs have been obtained [4,5]; however, the targeting of similarity classifications of weak, cracked, or hemolytic embryos in many studies have been able to be provided.According to the years of experience of manual candling detection for the SPF embryo egg in a biological vaccine preparation manufacturing workshop, Figure 1 shows six categories of five-to seven-day hatching samples of SPF chicken embryo eggs, where the live fertile embryo, shown in Figure 1a, should be detected separately from the other embryos.From Figure 1a,e,f, it is obvious from human visual knowledge that the fertile embryo can be easily discriminated amongst the infected and infertile embryos due to the different contrasts of the main embryo body.However, the detection of the other four categories is much more difficult.For the practical manual discrimination of embryo eggs, there lies two great difficulties-the feature similarity of blood vessels in the embryo body between fertile, weak, cracked, and hemolytic embryos, shown in Figure 1a-d, and feature similarity between random eggshell textures of a fertile embryo and the bright cracks of a cracked embryo, shown in Figure 1a,d.Therefore, there exists a great need for the discovery of certain recognition methods of embryo images in an effective way for industrial application.With the development of the imaging system, as well as deep-learning and image-processing techniques, high performances in the identification of fertilized eggs have been obtained [4,5]; however, the targeting of similarity classifications of weak, cracked, or hemolytic embryos in many studies have been able to be provided.According to the years of experience of manual candling detection for the SPF embryo egg in a biological vaccine preparation manufacturing workshop, Figure 1 shows six categories of five-to seven-day hatching samples of SPF chicken embryo eggs, where the live fertile embryo, shown in Figure 1a, should be detected separately from the other embryos.From Figure 1a, 1e and 1f, it is obvious from human visual knowledge that the fertile embryo can be easily discriminated amongst the infected and infertile embryos due to the different contrasts of the main embryo body.However, the detection of the other four categories is much more difficult.For the practical manual discrimination of embryo eggs, there lies two great difficultiesthe feature similarity of blood vessels in the embryo body between fertile, weak, cracked, and hemolytic embryos, shown in Figure 1a, 1b, 1c and 1d, and feature similarity between random eggshell textures of a fertile embryo and the bright cracks of a cracked embryo, shown in Figure 1a  and d.Therefore, there exists a great need for the discovery of certain recognition methods of embryo images in an effective way for industrial application.The two challenges above existed in a practical workshop, and were attributed to the diversity and varieties of the eggs' natural life development processes during incubation.The weak embryo, shown in Figure 1b, is also live and fertile, and similar to a fertile embryo.Compared with the image characteristics of a live fertile embryo, it expresses late development in incubation with less blood vascular net and a slightly brighter upper region.When the weak embryo continues to incubate, it can affect the automatic injection accuracy of specific viruses and affect the final quality of biological vaccine products.The homolytic embryo, shown in Figure 1c, shows the dying embryo which has recently been infected by virus.With the continuous development of the hemolytic embryo, the color or texture of the embryo body gradually turns darker.At the initial stage of infection, the main body The two challenges above existed in a practical workshop, and were attributed to the diversity and varieties of the eggs' natural life development processes during incubation.The weak embryo, shown in Figure 1b, is also live and fertile, and similar to a fertile embryo.Compared with the image characteristics of a live fertile embryo, it expresses late development in incubation with less blood vascular net and a slightly brighter upper region.When the weak embryo continues to incubate, it can affect the automatic injection accuracy of specific viruses and affect the final quality of biological vaccine products.The homolytic embryo, shown in Figure 1c, shows the dying embryo which has recently been infected by virus.With the continuous development of the hemolytic embryo, the color or texture of the embryo body gradually turns darker.At the initial stage of infection, the main body of the dead embryo is similar to the fertile embryo, where it eventually becomes the fully infected embryo.The hemolytic embryo, similar to the infected embryo, also poses a risk to the entire incubator.The cracked embryo, shown in Figure 1d, expresses the cracks or breakages which occurred in the shell during the transportation and placing process.If the shell shows severe breakage, most of the workers can quickly handle it.However, it is rather difficult for the workers to discover the majority of small cracks in a dark or slightly dark workshop.Similarly, even in the candling detection process, the crack is also extremely similar to the texture of the eggshell.With the continued incubation of the cracked embryo, it eventually becomes a fully dead and infected embryo.Due to these detailed and practical requirements, the feature representations and discriminations can be considered.The present academic literatures of fertility detection have been investigated and are shown in Table 1, which provides a summary comparison of most recent approaches reported in the scientific literature.From the literature which have been published to date, it is clear that there are few methods for delivering discriminative functions for the classification of the six categories of chicken embryos (fertile, weak, hemolytic, cracked, infected, and infertile embryos).
In chicken embryo recognition, the feature representation of the embryo image is critical.From the literature listed in Table 1, all of the feature representations of the embryo image lie only in traditional features of manually crafted blood vessels, or only deep features of Convolutional Neural Networks (CNNs) [4], the Back Propagation Neural Network (BPNN) [3], and the Learning Vector Quantization Neural Network (LVQNN) [6].Although the accuracy of the proposed approach is lower than that in [4] and [7] by 0.3 and 1.1 percent, respectively, the two methods could not resolve the two challenges above.In practical situations, the conventional feature extraction of blood vessels has resulted in good performances, and was often affected by complex backgrounds related to color, texture, illumination changes, and random spots [8].With the breakthrough of the Deep Convolutional Neural Network (DCNN) [9,10] for blood vessels, the discriminative features of embryo body images can be automatically learned, and the DCNNs have become the mainstream in image classification and other fields of image processing.
With the popularity of the proposed deep learning and transfer learning method, the DCNNs have recently had wide applications with high image quality [11][12][13][14].The DCNNs integrate image feature extraction with classification in an end-to-end pipeline, and have achieved great breakthroughs in the ILSVRC (ImageNet Large Scale Visual Classification Challenge), such as the GoogLeNet [15], AlexNet [16], VGGNet [17], and ResNet [18].Discriminative hierarchical feature learning [19] has been related to multiscale feature fusion [20], the Sped Up Robust Feature (SURF) [21], and the Histogram of Oriented Gradient (HOG) feature [22], and these features with classical GoogLeNet and AlexNet [23] have provided the possibility of differentiating features of the six embryo eggs above.Generally, there are two methods of application for deep-learning models in image processing: one is training from scratch, and the other is transfer learning [24].Obviously, training from scratch consists of training all the weights and parameters of the models at the training process, while transfer learning preserves all the weights and parameters of the pretrained convolutional layers, and only needs to learn the weights and parameters of the last few layers and to finetune the convolutional layers.Transfer learning is suitable for the smaller dataset with low hardware facilities and fast computational performance [25].Hence, it is one's choice to apply a pretrained network with transfer learning in a small SPF chicken embryo dataset.In regard to the two difficulties in SPF embryo recognition, this paper proposes a novel approach of multi-feature fusion based on transfer learning for chicken embryo classification, shown in Figure 2. Based on the preprocessing of input original images and fine-tuning AlexNet and GoogLeNet, the deep features are extracted from the two DCNNs, and then the deep features are fused with local features of SURF and HOG.Finally, SVM is trained with the fused features.The main contributions are listed as follows: 1.
In order to resolve the issues of insufficient embryo samples and overfitting of the DCNNs during the training process, this paper employed data augmentation to greatly expand the dataset.Original images were preprocessed to obtain the Region of Interest (ROI), and then the ROI was augmented with image processing technologies.changes and random spots of the eggshell.A comparable analysis of the classification accuracy rate between different DCNNs on the same embryo sample dataset has been provided.The experimental results show that the accuracy rate of this proposal is higher than that of other popular learning methods based on the color and texture of chicken embryos.
Symmetry 2019, 11, 606 5 of 16 to further prevent over-fitting and to learn deep features.The verified results show that the accuracy of this proposed model is higher than that of training from scratch.

2.
Multi-feature fusion of local features of SURF and HOG features, as well as deep features, provided complementary information for better generalization ability to illuminate the various changes and random spots of the eggshell.A comparable analysis of the classification accuracy rate between different DCNNs on the same embryo sample dataset has been provided.The experimental results show that the accuracy rate of this proposal is higher than that of other popular learning methods based on the color and texture of chicken embryos.The experimental results verify that the proposed DCNN-based model achieves an accuracy of over 98.0% on the hold-out embryo test samples, which is higher than the other DCNNs and classical recognition models.The remainder of this paper is organized as follows.In Section 2, image preprocessing of ROI extraction is introduced and summarized.Section 3 describes the novel DCNN models, transfer learning, and feature extractions.In Section 4, the analysis and discussions of experimental results are provided by the classification approach to chicken embryo images with transfer learning.Finally, this paper is summarized in Section 5.

Preprocessing
As shown in Figure 1, in regard to fertility detection and classification of SPF chicken embryo egg images, there are some unnecessary things in embryo images, such as the LED light shade and egg trays.Hence, it is necessary to preprocess the input embryo image to save time and prevent overfitting of the deep learning models.For fertile feature representation of different embryos, only the amount and growing status of the main blood vessels are the curial judgment.In order to obtain better computation capacity, the ROI could be extracted, which lies mainly in the whole embryo egg.With the life knowledge of chicken egg hatching, the embryo egg can basically be divided into three parts, including the air cell, embryo body, and excretory region, shown in Figure 3.Under the industrial candling light, the air cell always appears to be yellow and bright, whereas the excretory region of the chicken embryo is always of a dim color, and the embryo's body shows bright red blood vessels, expressing the normal growing status.
Automatic partitioning of ROIs from original images is performed through the long-term investigation of original samples shown in Table 1.With the mean value range of color channels in Table 2, the ROI can be partitioned into three steps, with its results shown in Figure 3.The experimental results verify that the proposed DCNN-based model achieves an accuracy of over 98.0% on the hold-out embryo test samples, which is higher than the other DCNNs and classical recognition models.The remainder of this paper is organized as follows.In Section 2, image preprocessing of ROI extraction is introduced and summarized.Section 3 describes the novel DCNN models, transfer learning, and feature extractions.In Section 4, the analysis and discussions of experimental results are provided by the classification approach to chicken embryo images with transfer learning.Finally, this paper is summarized in Section 5.

Preprocessing
As shown in Figure 1, in regard to fertility detection and classification of SPF chicken embryo egg images, there are some unnecessary things in embryo images, such as the LED light shade and egg trays.Hence, it is necessary to preprocess the input embryo image to save time and prevent over-fitting of the deep learning models.For fertile feature representation of different embryos, only the amount and growing status of the main blood vessels are the curial judgment.In order to obtain better computation capacity, the ROI could be extracted, which lies mainly in the whole embryo egg.With the life knowledge of chicken egg hatching, the embryo egg can basically be divided into three parts, including the air cell, embryo body, and excretory region, shown in Figure 3.Under the industrial candling light, the air cell always appears to be yellow and bright, whereas the excretory region of the chicken embryo is always of a dim color, and the embryo's body shows bright red blood vessels, expressing the normal growing status.  1) The air cell region is the brightest of the three parts under the candling LED light, and its green channel values are mostly over 200, with the green channel value of other uninteresting regions mostly being less than 30.Therefore, the air cell region and the uninteresting region can be partitioned according to green channel values.Suppose that m and n represent the rows and columns of the input embryo image, respectively; a sliding window is given below as Equation ( 1), where the variable   Automatic partitioning of ROIs from original images is performed through the long-term investigation of original samples shown in Table 1.With the mean value range of color channels in Table 2, the ROI can be partitioned into three steps, with its results shown in Figure 3. (1) The air cell region is the brightest of the three parts under the candling LED light, and its green channel values are mostly over 200, with the green channel value of other uninteresting regions mostly being less than 30.Therefore, the air cell region and the uninteresting region can be partitioned according to green channel values.Suppose that m and n represent the rows and columns of the input embryo image, respectively; a sliding window is given below as Equation ( 1), where the variable g i represents the value of the green channel of the input image.

Air Cell
The sliding window array1 runs from top to bottom, and assigns the green values of each row to array1.In array1, g i expresses the number of green values over 230, and there is a comparison between g i with T 1 (the threshold T 1 is 20).If g i > T 1 , then the array1 is used to segment the upper uninteresting part, and the air cell area is determined.
(2) The red channel values of the excretory area are mostly less than 130, while the red values of the bottom part are mostly less than 50.According to the red channel value, the excretory area and bottom part could be segmented.Another sliding window is given below as Equation ( 2), where the variable r i represents the value of the red channel of the input image.
The window array2 should be slid from bottom to top, and the red channel values of each row should be assigned to array2.Similarly, in array2, r i expresses the number of red values over 50, and r i should be compared with T 2 (the threshold T 2 is 40).If r i > T 2 , array2 is used to segment the excretory area and bottom uninteresting region.
(3) The left and right windows can be determined by the ordinate of the leftmost and rightmost pixel points of the embryo body.The ROI of the input image can be cropped with four sliding windows.
Due to the clear differences in feature expressions among the three partitions, it is unsuitable for global filtering to reduce the effect of random spots of the embryo.To better extract deep features to the maximum extent, ROI can be divided into 4 × 4 blocks, where each block is filtered by local median filtering with red, green, and blue channels, respectively.Thus, this can ensure the detection of even minimal differences among each block caused by eventual shadows or bright-light spots, and is able to better obtain the features of each block.

Transfer Learning
In this proposal, two different DCNN models, AlexNet and GoogLeNet, are combined to classify chicken embryos.These two DCNN models have learned rich feature representations from ImageNet, which can be helpful to the classification task of chicken embryo images.According to the existing embryo image dataset, the transfer learning to this new task can be applied to avoid overfitting.By stacking two models, better deep-feature representations can be obtained [25].In the proposed method, parameter transfer is employed, which needs to train the last few layers and finetune previous convolutional layers with a new dataset.Fine-tuning a network with transfer learning is usually much faster and easier than training a network with randomly initialized weights from scratch, which will enable faster convergence and allow for better generalization.
In the proposed parameter transfer learning model, the parameters of five previous convolutional layers are kept, and these parameters of AlexNet are fine-tuned.The standard AlexNet consists of three fully-connected (fc) layers (fc6, fc7, and fc8) [29].Here, shown in Figure 4, the last three layers of fc6-fc8 were removed, and two new fc layers and the Softmax layer have been added, respectively.The number of neurons of the new layers of fc6 and fc7 are 2048 and 6, respectively.The weights of fc6 and fc7 were initialized from a zero-mean Gaussian distributed with a standard deviation of 0.001 referred to in [16], and the biases were initialized with the constant 0.1.These parameters can be obtained after the training of dataset samples.existing embryo image dataset, the transfer learning to this new task can be applied to avoid overfitting.By stacking two models, better deep-feature representations can be obtained [25].In the proposed method, parameter transfer is employed, which needs to train the last few layers and finetune previous convolutional layers with a new dataset.Fine-tuning a network with transfer learning is usually much faster and easier than training a network with randomly initialized weights from scratch, which will enable faster convergence and allow for better generalization.
In the proposed parameter transfer learning model, the parameters of five previous convolutional layers are kept, and these parameters of AlexNet are fine-tuned.The standard AlexNet consists of three fully-connected (fc) layers (fc6, fc7, and fc8) [29].Here, shown in Figure 4, the last three layers of fc6-fc8 were removed, and two new fc layers and the Softmax layer have been added, respectively.The number of neurons of the new layers of fc6 and fc7 are 2048 and 6, respectively.The weights of fc6 and fc7 were initialized from a zero-mean Gaussian distributed with a standard deviation of 0.001 referred to in [16], and the biases were initialized with the constant 0.1.These parameters can be obtained after the training of dataset samples.With the pre-trained GoogLeNet from the ImageNet, only the inception5b was changed, and only one fc layer added after the inception5b module.There were two fc layers after the AveragePool layer named FC_1024 and FC_6, as shown in Figure 5. Compared to the original GoogLeNet, the neuron numbers of FC_1024 and FC_6 are 1024 and 6, respectively.The parameters of the previous convolutional layers were kept.The weights and biases of the new fc layers were initialized similarly to the previous AlexNet.The Softmax layer was applied to compute the cross-entropy loss of the AlexNet and GoogLeNet models during the training process.With the pre-trained GoogLeNet from the ImageNet, only the inception5b was changed, and only one fc layer added after the inception5b module.There were two fc layers after the AveragePool layer named FC_1024 and FC_6, as shown in Figure 5. Compared to the original GoogLeNet, the neuron numbers of FC_1024 and FC_6 are 1024 and 6, respectively.The parameters of the previous convolutional layers were kept.The weights and biases of the new fc layers were initialized similarly to the previous AlexNet.The Softmax layer was applied to compute the cross-entropy loss of the AlexNet and GoogLeNet models during the training process.existing embryo image dataset, the transfer learning to this new task can be applied to avoid overfitting.By stacking two models, better deep-feature representations can be obtained [25].In the proposed method, parameter transfer is employed, which needs to train the last few layers and finetune previous convolutional layers with a new dataset.Fine-tuning a network with transfer learning is usually much faster and easier than training a network with randomly initialized weights from scratch, which will enable faster convergence and allow for better generalization.
In the proposed parameter transfer learning model, the parameters of five previous convolutional layers are kept, and these parameters of AlexNet are fine-tuned.The standard AlexNet consists of three fully-connected (fc) layers (fc6, fc7, and fc8) [29].Here, shown in Figure 4, the last three layers of fc6-fc8 were removed, and two new fc layers and the Softmax layer have been added, respectively.The number of neurons of the new layers of fc6 and fc7 are 2048 and 6, respectively.The weights of fc6 and fc7 were initialized from a zero-mean Gaussian distributed with a standard deviation of 0.001 referred to in [16], and the biases were initialized with the constant 0.1.These parameters can be obtained after the training of dataset samples.With the pre-trained GoogLeNet from the ImageNet, only the inception5b was changed, and only one fc layer added after the inception5b module.There were two fc layers after the AveragePool layer named FC_1024 and FC_6, as shown in Figure 5. Compared to the original GoogLeNet, the neuron numbers of FC_1024 and FC_6 are 1024 and 6, respectively.The parameters of the previous convolutional layers were kept.The weights and biases of the new fc layers were initialized similarly to the previous AlexNet.The Softmax layer was applied to compute the cross-entropy loss of the AlexNet and GoogLeNet models during the training process.

Network Training and Deep Feature Extraction
In experimentation, the two DCNN models were trained separately, based on parameter transfer learning to obtain two different kinds of deep features.The parameters of all the convolution layers of AlexNet and GoogLeNet were preserved and finetuned.All the input images were scaled to 227 × 27 and 224 × 224 pixels to fit the required input sizes of AlexNet and GoogLeNet, respectively.For transfer learning in our approach, there is no need to train two DCNNs for much more epochs because the training loss no longer declines after a certain number of epochs, as shown in Figures 6 and 7.
Here, the epochs of AlexNet and GoogLeNet were set as 30 and 40, respectively, where their iterations were 9000 and 12500.During training, all hyperparameters were set empirically-the learning rates of AlexNet and GoogLeNet were set as 0.0001 and 0.001, respectively, and then decreased to half every four epochs.and 7. Here, the epochs of AlexNet and GoogLeNet were set as 30 and 40, respectively, where their iterations were 9000 and 12500.During training, all hyperparameters were set empirically-the learning rates of AlexNet and GoogLeNet were set as 0.0001 and 0.001, respectively, and then decreased to half every four epochs.
During the training process, a very small learning rate was selected for the two models, which can suppress over-fitting.The two DCNNs were trained separately on a single GTX970 Graphics Processing Unit (GPU) by using the Stochastic Gradient Descent (SGD), which is used to calculate the gradient of the objective function to acquire the nonredundant object optimization.The minibatch size of both models was set as 32 for the limitation of GPU memory (4GB), and the momentum was set as 0.9.After training, the deep features of the embryo images were obtained from the fc6 layer of AlexNet, and the FC_1024 layer of GoogLeNet.

SURF and HOG Feature Extraction
To extract the local features of ROIs with obvious illumination changes, SURF [30] is the first choice for local feature description, as it is not sensitive to image blurring and viewpoint changes.
Referring to [21] and to avoid feature dimensional inconsistency, the 100 top strongest responses to interest points, with 64 dimensions of each point, were extracted to be SURF feature descriptors of each embryo image, shown in Figure 8.   iterations were 9000 and 12500.During training, all hyperparameters were set empirically-the learning rates of AlexNet and GoogLeNet were set as 0.0001 and 0.001, respectively, and then decreased to half every four epochs.
During the training process, a very small learning rate was selected for the two models, which can suppress over-fitting.The two DCNNs were trained separately on a single GTX970 Graphics Processing Unit (GPU) by using the Stochastic Gradient Descent (SGD), which is used to calculate the gradient of the objective function to acquire the nonredundant object optimization.The minibatch size of both models was set as 32 for the limitation of GPU memory (4GB), and the momentum was set as 0.9.After training, the deep features of the embryo images were obtained from the fc6 layer of AlexNet, and the FC_1024 layer of GoogLeNet.

SURF and HOG Feature Extraction
To extract the local features of ROIs with obvious illumination changes, SURF [30] is the first choice for local feature description, as it is not sensitive to image blurring and viewpoint changes.
Referring to [21] and to avoid feature dimensional inconsistency, the 100 top strongest responses to interest points, with 64 dimensions of each point, were extracted to be SURF feature descriptors of each embryo image, shown in Figure 8.During the training process, a very small learning rate was selected for the two models, which can suppress over-fitting.The two DCNNs were trained separately on a single GTX970 Graphics Processing Unit (GPU) by using the Stochastic Gradient Descent (SGD), which is used to calculate the gradient of the objective function to acquire the nonredundant object optimization.The minibatch size of both models was set as 32 for the limitation of GPU memory (4GB), and the momentum was set as 0.9.After training, the deep features of the embryo images were obtained from the fc6 layer of AlexNet, and the FC_1024 layer of GoogLeNet.

SURF and HOG Feature Extraction
To extract the local features of ROIs with obvious illumination changes, SURF [30] is the first choice for local feature description, as it is not sensitive to image blurring and viewpoint changes.Referring to [21] and to avoid feature dimensional inconsistency, the 100 top strongest responses to interest points, with 64 dimensions of each point, were extracted to be SURF feature descriptors of each embryo image, shown in Figure 8.
To find the cracked features of the embryo, the HOG [31] local texture feature existing at the edges was extracted.Invariant to geometric and optical deformation, HOG is usually employed to extract the appropriate descriptors of a cracked embryo.The ROI of the embryo was scaled to 512 × 512 pixels to avoid the high-dimensional features.An HOG feature visualization of a cracked embryo is illustrated in Figure 9.

Datasets
The datasets included 1000 training samples and 1000 testing samples of six categories of chicken embryo images.To resolve overfitting problems, the original image needed to be rotated or mirrored to generate more images.The angles of rotation were 90°, 180°, and 270°.respectively.Besides, horizontal flipping, vertical flipping, and random Gaussian noises with a mean of 0 and variance of 0.008, as well as random crop were employed to expand the dataset, as shown in Figure 10.The augmented dataset is shown in Table 3. Specifically, ten images were generated from an original image, including three images from rotation, one image from random noise, two images from horizontal and vertical flipping, and four images from random cropping.After data augmentation, the total dataset reached 10,000 images.With the trained AlexNet and GoogLeNet, the deep features from each DCNN were extracted.Simultaneously for each embryo image, the local features of SURF and HOG were extracted, respectively.These different features were fused and classified with the Support Vector Machine (SVM).The recognition accuracy rate ϵ was calculated to evaluate the performances of the proposed method, given as Equation ( 3): where  is the number of successful classifying embryos and  is the number of test sample images.To find the cracked features of the embryo, the HOG [31] local texture feature existing at the edges was extracted.Invariant to geometric and optical deformation, HOG is usually employed to extract the appropriate descriptors of a cracked embryo.The ROI of the embryo was scaled to 512 × 512 pixels to avoid the high-dimensional features.An HOG feature visualization of a cracked embryo is illustrated in Figure 9.To find the cracked features of the embryo, the HOG [31] local texture feature existing at the edges was extracted.Invariant to geometric and optical deformation, HOG is usually employed to extract the appropriate descriptors of a cracked embryo.The ROI of the embryo was scaled to 512 × 512 pixels to avoid the high-dimensional features.An HOG feature visualization of a cracked embryo is illustrated in Figure 9.

Datasets
The datasets included 1000 training samples and 1000 testing samples of six categories of chicken embryo images.To resolve overfitting problems, the original image needed to be rotated or mirrored to generate more images.The angles of rotation were 90°, 180°, and 270°.respectively.Besides, horizontal flipping, vertical flipping, and random Gaussian noises with a mean of 0 and variance of 0.008, as well as random crop were employed to expand the dataset, as shown in Figure 10.The augmented dataset is shown in Table 3. Specifically, ten images were generated from an original image, including three images from rotation, one image from random noise, two images from horizontal and vertical flipping, and four images from random cropping.After data augmentation, the total dataset reached 10,000 images.With the trained AlexNet and GoogLeNet, the deep features from each DCNN were extracted.Simultaneously for each embryo image, the local features of SURF and HOG were extracted, respectively.These different features were fused and classified with the Support Vector Machine (SVM).The recognition accuracy rate ϵ was calculated to evaluate the performances of the proposed method, given as Equation ( 3): where  is the number of successful classifying embryos and  is the number of test sample images.

Datasets
The datasets included 1000 training samples and 1000 testing samples of six categories of chicken embryo images.To resolve overfitting problems, the original image needed to be rotated or mirrored to generate more images.The angles of rotation were 90 • , 180 • , and 270 • .respectively.Besides, horizontal flipping, vertical flipping, and random Gaussian noises with a mean of 0 and variance of 0.008, as well as random crop were employed to expand the dataset, as shown in Figure 10.The augmented dataset is shown in Table 3. Specifically, ten images were generated from an original image, including three images from rotation, one image from random noise, two images from horizontal and vertical flipping, and four images from random cropping.After data augmentation, the total dataset reached 10,000 images.With the trained AlexNet and GoogLeNet, the deep features from each DCNN were extracted.Simultaneously for each embryo image, the local features of SURF and HOG were extracted, respectively.These different features were fused and classified with the Support Vector Machine (SVM).The recognition accuracy rate was calculated to evaluate the performances of the proposed method, given as Equation ( 3 where R is the number of successful classifying embryos and T is the number of test sample images.and HOG were extracted, respectively.These different features were fused and classified with the Support Vector Machine (SVM).The recognition accuracy rate ϵ was calculated to evaluate the performances of the proposed method, given as Equation ( 3): where  is the number of successful classifying embryos and  is the number of test sample images.

Performance Comparison
To describe the different feature fusions conveniently, the fusion of HOG and SURF features were simplified as HS-and similarly, the AlexNet deep feature as Alex-DF; GoogLeNet deep feature as Google-DF; fusion of the AlexNet deep feature, HOG, and SURF as Alex-HS; fusion of the GoogLeNet deep feature, HOG, and SURF as Google-HS; and the fusion of the AlexNet Deep Feature, GoogLeNet deep feature, HOG, and SURF as Alex-Google-HS, respectively.For comparison with the basic Multi-Layer Perception (MLP), a four-layer perceptron was redesigned, in which the neurons of the first hidden layer were set as 256, the second hidden layer as 512, the third hidden layer as 512, and the output layer as 6.The different feature fusion methods were verified, respectively with the same samples, and the comparison results are shown in Table 4.  From Table 4, it can be seen that the basic MLP model obtained the lowest accuracy rate of 65.0%, as MLP lost some of the image's spatial information and was thus unable to effectively extract the blood vessel features of the chicken embryo.The critical judgement features of blood vessels have much spatial information, so the accuracy of the MLP method alone was lower than the traditional hand-crafted features.Similarly, the accuracy of only the local features fused with HOG and SURF was 74.0%, which also could not satisfy the classification needs.This could be explained in that the MLP alone, or fusion with SURF and HOG alone could not recognize subtle and slight features of chicken embryos with high similarity.In experimental practice, due to the greatly similar life characteristics of the four types of embryo eggs (weak, fertile, hemolytic, and cracked), it is extremely difficult to extract all of the blood vessel features by using only the classical and traditional features of HS, or other hand-crafted features.However, the AlexNet and GoogLeNet can improve the

Performance Comparison
To describe the different feature fusions conveniently, the fusion of HOG and SURF features were simplified as HS-and similarly, the AlexNet deep feature as Alex-DF; GoogLeNet deep feature as Google-DF; fusion of the AlexNet deep feature, HOG, and SURF as Alex-HS; fusion of the GoogLeNet deep feature, HOG, and SURF as Google-HS; and the fusion of the AlexNet Deep Feature, GoogLeNet deep feature, HOG, and SURF as Alex-Google-HS, respectively.For comparison with the basic Multi-Layer Perception (MLP), a four-layer perceptron was redesigned, in which the neurons of the first hidden layer were set as 256, the second hidden layer as 512, the third hidden layer as 512, and the output layer as 6.The different feature fusion methods were verified, respectively with the same samples, and the comparison results are shown in Table 4. From Table 4, it can be seen that the basic MLP model obtained the lowest accuracy rate of 65.0%, as MLP lost some of the image's spatial information and was thus unable to effectively extract the blood vessel features of the chicken embryo.The critical judgement features of blood vessels have much spatial information, so the accuracy of the MLP method alone was lower than the traditional hand-crafted features.Similarly, the accuracy of only the local features fused with HOG and SURF was 74.0%, which also could not satisfy the classification needs.This could be explained in that the MLP alone, or fusion with SURF and HOG alone could not recognize subtle and slight features of chicken embryos with high similarity.In experimental practice, due to the greatly similar life characteristics of the four types of embryo eggs (weak, fertile, hemolytic, and cracked), it is extremely difficult to extract all of the blood vessel features by using only the classical and traditional features of HS, or other hand-crafted features.However, the AlexNet and GoogLeNet can improve the accuracy rate to more than 90%.Two of the deep networks were pretrained on ImageNet, and the common features of the images were well-preserved and expressed.They are comparably suitable for learning deeper features of specific images on small datasets, as well as for improving classification accuracy.Deep features can not only show a much better accuracy rate for the small training dataset, but shows superior generalization ability for image classification.The performance of Google-HS is better than that of Alex-HS, which shows the deeper network architecture and the better accuracy rate.The reason why the fusion of deep features and local features can achieve better accuracy than a single deep feature is because these two features fused together results in a good combination of both their advantages.Alex-Google-HS shows the highest accuracy rate, where the fusion of two deep features has a different classification capability.GoogLeNet employs different convolution kernels of 5 × 5, 3 × 3 and 1 × 1, which is equivalent to multi-scale feature fusion.Furthermore, with a 7 × 7 convolution kernel of AlexNet, the combination of the two deep features greatly improves the respective performance of each model.
To verify the performances of different transferred layers, the corresponding layers of AlexNet model were trained and performed with the same samples, as shown in Table 5.It can be seen that when the first five convolution layers are transferred, the accuracy is 98.4%, which outperforms the layers of conv1-conv5 + fc6 by 0.3%.In general, the more layers that are transferred, the better the results which can be obtained, with the premise being on high similarity of labelling between the source dataset and target dataset [32,33].However, the similarity between the source ImageNet and the target datasets of chicken embryos is not very high.From Table 5, it can be seen that if all layers except for the last classification layer are transferred, it shows that the highest accuracy rate could be 97.9%, which is lower than the previous five transferred layers.For the classification of specific SPF embryo images, it is clear and significant that only the appropriate layers of the deep network can be transferred.To verify the effectiveness of transfer learning, the AlexNet and GoogLeNet models were trained by transfer learning and from scratch, respectively.The accuracy rate .and training time of the two models were compared with the specific loss, which is shown in Table 6.The time consumption of transfer learning is significantly less than that of training from scratch.The AlexNet requires less training time than GoogLeNet because the architecture of the latter is deeper than that of AlexNet.The accuracy of transfer learning in the two DCNNs is higher than that of training from scratch with our test dataset, which demonstrates the better efficiency of transfer learning.To illustrate the matching extent between deep networks and datasets, other DCNNs, such as VGG16 [17] and ResNet [18] models, were trained on the same augmented sample datasets, shown in Table 7.The VGG16, ResNet50, and ResNet101 were trained with parameter transfer learning to prevent overfitting, because the three models have much more parameters than the proposed model.Then, after the deep feature extraction, the local features of SURF and HOG were fused with deep features, respectively.From Table 7, the proposed method outperforms the three deep networks of 1.2%, 0.8%, and 0.6%, respectively.In fact, the three models obtained rather ideal performances, but their classification capacities degraded due to the small-scale datasets and too many parameters.The fusion with deep features and local features was able to provide complementary information, which further improved feature representation on embryo image classification; the specific results are shown in Table 8.Our method still has a 1.6% error rate.Most of the misclassified images are fertile, weak, and hemolytic embryos, just because of the great similarity of blood vessels in ROI, where even highly experienced experts could hardly distinguish between the different growing statuses of these embryos.In this experiment, only 16 embryos were misclassified, and the confusion matrix is shown in Figure 11.Therefore, the proposed approach still needs to be further improved to better recognize the weak, fertile, and hemolytic embryos.The fusion with deep features and local features was able to provide complementary information, which further improved feature representation on embryo image classification; the specific results are shown in Table 8.Our method still has a 1.6% error rate.Most of the misclassified images are fertile, weak, and hemolytic embryos, just because of the great similarity of blood vessels in ROI, where even highly experienced experts could hardly distinguish between the different growing statuses of these embryos.In this experiment, only 16 embryos were misclassified, and the confusion matrix is shown in Figure 11.Therefore, the proposed approach still needs to be further improved to better recognize the weak, fertile, and hemolytic embryos.

The Influence of Learning Rate on Learning Model
During training, the learning rate significantly influenced the performances and final results.For the pretrained AlexNet and GoogLeNet, the transferred layers had been fully trained, and in this proposal, only the last few new layers needed to be trained, and the previous layers needed to be finetuned.The learning rate of GoogLeNet was initialized to a small value, as shown in Figure 10, and the general trend of the accuracy rate curve was upward until reaching a rate of 0.001, with the highest accuracy rate being 91.2%.However, as the learning rate continued to increase, the accuracy rate decreased sharply because the pretrained GoogLeNet model was close to the optimal solution.Otherwise, if the learning rate increased, the GoogLeNet model would have skipped the optimal solution and resulted in great loss and low accuracy.Similarly, for the AlexNet model, with the learning rate increasing to 0.0006, the accuracy rate sharply decreased and remained almost the same.From Figure 12, the accuracy rate of AlexNet stays highest at the beginning with a learning rate of 0.0001.To achieve better performances of chicken embryo classification, the learning rates of GoogLeNet and AlexNet should be initialized to 0.001 and 0.0001, respectively.

The Influence of Learning Rate on Learning Model
During training, the learning rate significantly influenced the performances and final results.For the pretrained AlexNet and GoogLeNet, the transferred layers had been fully trained, and in this proposal, only the last few new layers needed to be trained, and the previous layers needed to be finetuned.The learning rate of GoogLeNet was initialized to a small value, as shown in Figure 10, and the general trend of the accuracy rate curve was upward until reaching a rate of 0.001, with the highest accuracy rate being 91.2%.However, as the learning rate continued to increase, the accuracy rate decreased sharply because the pretrained GoogLeNet model was close to the optimal solution.Otherwise, if the learning rate increased, the GoogLeNet model would have skipped the optimal solution and resulted in great loss and low accuracy.Similarly, for the AlexNet model, with the learning rate increasing to 0.0006, the accuracy rate sharply decreased and remained almost the same.From Figure 12, the accuracy rate of AlexNet stays highest at the beginning with a learning rate of 0.0001.To achieve better performances of chicken embryo classification, the learning rates of GoogLeNet and AlexNet should be initialized to 0.001 and 0.0001, respectively.

Conclusions
Based on the diversity and varieties in the processes of natural development of life in chicken egg incubation, this paper has summarized two classification challenges of feature similarities of fiveto seven-day SPF embryo eggs in a practical vaccine preparation workshop, where one is feature similarity of blood vessels in embryo bodies between fertile and weak, cracked, and hemolytic embryos, and the other is feature similarity between random eggshell textures of fertile embryos and the bright cracks of cracked embryos.To obtain better classification performances, a novel multifeature fusion of deep features, SURF and HOG local features for chicken embryo classification based on transfer learning, was proposed.DCNN has a strong capability for generalization and has good performance on the identification and classification tasks by applying transfer learning in the case of a small dataset.This proposal does not need complex preprocessing, and can learn features automatically.Firstly, all input embryo images are pre-processed and their ROI are cropped, and then the sample datasets are augmented.These images are input into the pretrained DCNN to learn deep discriminative features.Secondly, the deep features of AlexNet and GoogLeNet, as well as the local features of SURF and HOG were fused as the final features.Finally, the SVM was trained to classify the input embryo images.The experiments show the superiority of this proposed approach, and in the test dataset, the average classification accuracy rate was 98.4%, which is better than some of the state-of-the-art image classification methods in a small dataset.It can also be used by vaccinemakers to automatically inspect the initially incubated five-and seven-day SPF eggs.
The DCNN based on transfer learning should not only be much faster and easier than training a network with randomly initialized weights from scratch, but it can also obtain a high level of accuracy in image classification.The original contribution of this paper lies in how an appropriate transfer learning network architecture for image classification tasks with small datasets is proposed, and the classification method of multi-feature fusion with classic local features and deep features can be verified at a higher accuracy rate.The generalization ability of DCNN based on transfer learning and the advantages of multi-feature fusion can accurately classify agricultural images of high similarity with subtle variations.Our future work will focus on the image classification of nine-to eleven-day SPF hatching chicken embryos, as well as the redefinition and reconstructions of deep network architecture for small agricultural image samples.
of fertility detection and classification of embryo eggs can be improved has become the new research focus.

Figure 1 .
Figure 1.Specific Pathogen Free (SPF) chicken embryo egg samples of five-to seven-day hatching (captured by the industrial color camera of 120M pixels (1292 × 964) produced by China Daheng Group Inc., with candling with LED white light directly from the top): (a) Fertile embryo, representing the normally hatching egg with rich vascular net; (b) weak embryo, representing the late hatching egg with less vascular net; (c) hemolytic embryo, representing the dying egg with hemolysis and gradually becoming an infected embryo; (d) cracked embryo, representing the cracks or breakages which occurred in the eggshell; (e) infected embryo, representing the fully dead egg which is infected by viruses and has become inconsistent in shape and opaque; and (f) infertile embryo, representing the unfertilized egg and where it is most transparent.

Figure 1 .
Figure 1.Specific Pathogen Free (SPF) chicken embryo egg samples of five-to seven-day hatching (captured by the industrial color camera of 120M pixels (1292 × 964) produced by China Daheng Group Inc., with candling with LED white light directly from the top): (a) Fertile embryo, representing the normally hatching egg with rich vascular net; (b) weak embryo, representing the late hatching egg with less vascular net; (c) hemolytic embryo, representing the dying egg with hemolysis and gradually becoming an infected embryo; (d) cracked embryo, representing the cracks or breakages which occurred in the eggshell; (e) infected embryo, representing the fully dead egg which is infected by viruses and has become inconsistent in shape and opaque; and (f) infertile embryo, representing the unfertilized egg and where it is most transparent.

Figure 2 .
Figure 2. The proposed architecture on multi-feature fusion with transfer learning for SPF chicken embryo egg classification.

Table 2 .
Mean value of color channels of each part of the embryo.

Figure 2 .
Figure 2. The proposed architecture on multi-feature fusion with transfer learning for SPF chicken embryo egg classification.

Figure 3 .
Figure 3.The cropped regions of interest (ROI) of a chicken embryo.

Figure 3 .
Figure 3.The cropped regions of interest (ROI) of a chicken embryo.

Figure 4 .
Figure 4.The process of transfer learning, including the loading of pre-trained models of AlexNet through the removal of the last three layers and adding three new layers of AlexNet: fc6, fc7, and the Softmax layer; and fine-tuning the previous layers and training the new layers of AlexNet with the chicken embryo dataset.

Figure 4 .
Figure 4.The process of transfer learning, including the loading of pre-trained models of AlexNet through the removal of the last three layers and adding three new layers of AlexNet: fc6, fc7, and the Softmax layer; and fine-tuning the previous layers and training the new layers of AlexNet with the chicken embryo dataset.

Figure 4 .
Figure 4.The process of transfer learning, including the loading of pre-trained models of AlexNet through the removal of the last three layers and adding three new layers of AlexNet: fc6, fc7, and the Softmax layer; and fine-tuning the previous layers and training the new layers of AlexNet with the chicken embryo dataset.

Figure 8 .
Figure 8. Speeded Up Robust Feature (SURF) of interest points of fertile embryo.

Figure 9 .
Figure 9. Histogram of Oriented Gradient (HOG) and its visualization of crack embryo.

Figure 8 .
Figure 8. Speeded Up Robust Feature (SURF) of interest points of fertile embryo.

Figure 8 .
Figure 8. Speeded Up Robust Feature (SURF) of interest points of fertile embryo.

Figure 9 .
Figure 9. Histogram of Oriented Gradient (HOG) and its visualization of crack embryo.

Figure 9 .
Figure 9. Histogram of Oriented Gradient (HOG) and its visualization of crack embryo.

Figure 11 .
Figure 11.Confusion matrix of the proposed Alex-Google-HS method, with an average accuracy rate of 98.4% on the test dataset.

Figure 12 .
Figure 12.The effect curve of initial learning rate on the training of the AlexNet and GoogLeNet models.

Table 1 .
Comparison between chicken embryo image detection described in the academic literature, and our proposed approach of uniquely discriminating the six embryos.

Table 2 .
Mean value of color channels of each part of the embryo.

Table 3 .
The numbers of the training and testing datasets.

Table 3 .
The numbers of the training and testing datasets.

Table 4 .
Recognition comparisons of different feature fusions.

Table 4 .
Recognition comparisons of different feature fusions.

Table 5 .
Comparison of accuracy rates on different transferred layers.

Table 6 .
Comparison of accuracy rate and training time on different training methods.

Table 7 .
Comparisons on state-of-the-art methods.

Table 8 .
Classification accuracy of chicken embryos.

Table 8 .
Classification accuracy of chicken embryos.