Automatic Zebrafish Egg Phenotype Recognition from Bright-Field Microscopic Images Using Deep Convolutional Neural Network

Zebrafish eggs are widely used in biological experiments to study the environmental and genetic influence on embryo development. Due to the high throughput of microscopic imaging, automated analysis of zebrafish egg microscopic images is highly demanded. However, machine learning algorithms for zebrafish egg image analysis suffer from the problems of small imbalanced training dataset and subtle inter-class differences. In this study, we developed an automated zebrafish egg microscopic image analysis algorithm based on deep convolutional neural network (CNN). To tackle the problem of insufficient training data, the strategies of transfer learning and data augmentation were used. We also adopted the global averaged pooling technique to overcome the subtle phenotype differences between the fertilized and unfertilized eggs. Experimental results of a five-fold cross-validation test showed that the proposed method yielded a mean classification accuracy of 95.0% and a maximum accuracy of 98.8%. The network also demonstrated higher classification accuracy and better convergence performance than conventional CNN methods. This study extends the deep learning technique to zebrafish egg phenotype classification and paves the way for automatic bright-field microscopic image analysis.


Introduction
Zebrafish embryos have gained popularity in biological research since they share 84% of genes associated with human disease [1] and they are nearly transparent under bright-field microscopes. Zebrafish egg is a special form of the embryo, and it is usually used to study the influence of environmental factors on embryo development. To evaluate the biological endpoints based on zebrafish eggs, microscopic screening is frequently performed [2]. By far, the analysis of zebrafish microscopic images is mostly performed by human operators. With the advances in image acquisition systems, the number of microscopic images is increasing rapidly, making manual assessments increasingly time-consuming. Therefore, automatic analysis of zebrafish microscopic image becomes an urgent demand [3].
To meet this stringent demand, a series of studies was conducted for computerized zebrafish microscopic image analysis [4,5]. Most techniques were based on traditional machine learning strategies, i.e., using texture filters to extract hand-crafted image features and then using classification algorithms (e.g., the supported vector machine and random forest) to conduct phenotype pattern recognition. The performances of these methods highly rely on the quality of hand-crafted image features, but the design and selection of hand-crafted features involve subjective human interventions, which limit the objectiveness and robustness of the method.
In the last decade, deep learning methods experienced dramatic development, leading to improvements in many pattern recognition applications, such as image processing, video analysis, and language recognition [6][7][8][9][10][11]. Compared to the conventional machine learning methods, deep learning overcomes the limitation of hand-crafted features by automatically optimizing the feature extraction and classification procedure. The core of deep learning for image analysis is the revolutionary development of the convolutional neural network (CNN) [12,13]. CNN was originally designed to recognize and classify object patterns in images. As of today, numerous CNN-based powerful image classification models are developed, including Alex Krizhevsky Network (AlexNet) [14], Visual Geometry Group (VGG) nets [15], and Residual Neural Network (ResNet) [16]. These methods were also applied to biological image analysis [17,18], leading to improvement of accuracy and robustness.
Despite the fast development of deep learning techniques, their applications in zebrafish egg microscopic image analysis are rare. A common data analysis task for zebrafish egg images analysis is to classify whether the egg is fertilized or not, in order to verify if the tested drug has impaired the fertilization process. To accomplish this task, there are several challenging problems to solve: • Imbalanced training dataset. In biological research, it is difficult to collect a balanced number of fertilized and unfertilized egg samples as the training dataset. The imbalanced training set will result in insufficient classification ability for the category with fewer training samples, leading to unsuccessful network training. • Small training dataset. The training of deep neural network requires no less than thousands of training samples. However, it is difficult to collect enough training data for a specific biological image analysis task. Small training sample set will lead to overfitting of the training data, hampering the generalization ability of the network. • Subtle inter-class differences. In bright-field microscopic images, fertilized and unfertilized zebrafish eggs usually demonstrate subtle inter-class differences. This challenging problem becomes a technical bottleneck for automated zebrafish egg image analysis.
To overcome these problems, this paper proposed a deep learning algorithm for automated zebrafish egg fertilization status classification from microscopic images. Dedicated data augmentation and transfer learning strategy were used to tackle the imbalanced and small training set problem. The global average pooling scheme was used to address the subtle inter-class differences. Experimental results showed that the proposed method yielded dramatic accuracy improvement compared to traditional CNN network, and the classification accuracy for zebrafish eggs could reach up to 98.8%.

Data Collection
In this study, the microscopic images of zebrafish eggs were acquired using a bright-field microscopy imaging device called ImageXpress [19]. The system automatically placed three or four embryos in a U-shaped bottom transparent well plate. The image of each plate was collected using a ×2 dry objective between 3 and 3.7 h post fertilization. Transition Metal Oxide Nanoparticles were applied to the zebrafish embryos, and some of the eggs became unfertilized due to the toxicity effect of the nanoparticles. Figure 1 shows a typical sample image of the zebrafish eggs. The eggs are to be classified into two classes, fertilized and unfertilized. The fertilized eggs contain the nucleus surrounded by dark yolk membranes, whereas the unfertilized eggs have clear yolk membranes.

Method Workflow
As illustrated in Figure 2, our automatic zebrafish egg recognition and counting method consisted of three major steps. The input image is a well plate image containing three or four eggs. For the first step, each individual egg was detected and separated as a small patch. The patch of each egg was then fed into a deep convolutional neural network to calculate the classification feature vector. Finally, a global average pooling layer was used to classify the fertilization status based on the feature vector. Details of the proposed method are explained in the following subsections.

Egg Detection
As required by the classification task, the microscopic images were pre-segmented and cropped into square patches of single eggs. This was achieved via a template matching step, which detected the center of each egg. Figure 3 demonstrates the principle of template matching. As shown in Figure  3a, the template was constructed by manually cropping K typical egg patches of N × N pixels from the training images. The K patches were reoriented into the same direction, and an averaged template was created by calculating the average image of them. Then, the average template was rotated with 30 degrees interval to generate 12 template patches of different orientations (Figure 3b). To perform

Method Workflow
As illustrated in Figure 2, our automatic zebrafish egg recognition and counting method consisted of three major steps. The input image is a well plate image containing three or four eggs. For the first step, each individual egg was detected and separated as a small patch. The patch of each egg was then fed into a deep convolutional neural network to calculate the classification feature vector. Finally, a global average pooling layer was used to classify the fertilization status based on the feature vector. Details of the proposed method are explained in the following subsections.

Method Workflow
As illustrated in Figure 2, our automatic zebrafish egg recognition and counting method consisted of three major steps. The input image is a well plate image containing three or four eggs. For the first step, each individual egg was detected and separated as a small patch. The patch of each egg was then fed into a deep convolutional neural network to calculate the classification feature vector. Finally, a global average pooling layer was used to classify the fertilization status based on the feature vector. Details of the proposed method are explained in the following subsections.

Egg Detection
As required by the classification task, the microscopic images were pre-segmented and cropped into square patches of single eggs. This was achieved via a template matching step, which detected the center of each egg. Figure 3 demonstrates the principle of template matching. As shown in Figure  3a, the template was constructed by manually cropping K typical egg patches of N × N pixels from the training images. The K patches were reoriented into the same direction, and an averaged template was created by calculating the average image of them. Then, the average template was rotated with 30 degrees interval to generate 12 template patches of different orientations (Figure 3b). To perform

Egg Detection
As required by the classification task, the microscopic images were pre-segmented and cropped into square patches of single eggs. This was achieved via a template matching step, which detected the center of each egg. Figure 3 demonstrates the principle of template matching. As shown in Figure 3a, the template was constructed by manually cropping K typical egg patches of N × N pixels from the training images. The K patches were reoriented into the same direction, and an averaged template was created by calculating the average image of them. Then, the average template was rotated with 30 degrees interval to generate 12 template patches of different orientations (Figure 3b). To perform template matching, each of the 12 templates was moved with N/10 pixels interval in both x and y directions throughout the test image. For each moved position, the mutual information between the template and its covered image area was calculated as the similarity metric. The top 20% positions with the largest mutual information were maintained as the candidate egg centers. At last, candidate centers close to each other (within N/5 pixels distance) were clustered, and the mean coordinates of clustered candidates were used as the egg center. Based on the detected egg centers, a bounding box of size N × N was used to crop the egg out of the image. In this study, we found K = 10 sufficed for our needs, and a cropping size of N = 150 pixels ensured to enclose all eggs.
template matching, each of the 12 templates was moved with N/10 pixels interval in both x and y directions throughout the test image. For each moved position, the mutual information between the template and its covered image area was calculated as the similarity metric. The top 20% positions with the largest mutual information were maintained as the candidate egg centers. At last, candidate centers close to each other (within N/5 pixels distance) were clustered, and the mean coordinates of clustered candidates were used as the egg center. Based on the detected egg centers, a bounding box of size N × N was used to crop the egg out of the image. In this study, we found K = 10 sufficed for our needs, and a cropping size of N = 150 pixels ensured to enclose all eggs.

Convolutional Feature Extraction
After the egg detection step, each cropped egg patch was fed into a convolutional neural network to extract the image features for egg classification. To train such a network, we needed to overcome the limitation of the small and imbalanced training dataset. Our study involved only a few hundreds of samples of zebrafish eggs, which were not enough for training a deep neural network. Compared to the popular ImageNet [14] dataset of over ten million sample images, the size of our datasets is at least four orders of magnitude less. When the number of weights to be trained in a neural network is far more than the number of training samples, the problem of overfitting is likely to occur, and the network will have poor generalization ability.
Another problem with our training set is that the sample numbers of different categories were seriously imbalanced. The ratio between fertilized and unfertilized eggs was almost 6:1 in our dataset. Imbalanced training data could potentially diminish the specificity of the network, making the network incompetent to recognize the relatively smaller category, i.e., the unfertilized eggs.
To overcome the limitation of the small and imbalanced training set, we used the image augmentation strategy to increase the training set size and to balance the training sample numbers of different categories. Image augmentation is the process to increase the training set by creating altered versions of the existing sample images, and it is proved to be an effective solution to prevent overfitting [14]. Typical ways of data augmentation include rotation, translation, zooming, flipping, scaling, color perturbation, and adding random noise.
In our study, image augmentation strategies were carefully chosen according to the characteristics of our datasets (as shown in Figure 4). Since the eggs were captured at the same time point, they had similar sizes. All the images were captured under the same environmental light

Convolutional Feature Extraction
After the egg detection step, each cropped egg patch was fed into a convolutional neural network to extract the image features for egg classification. To train such a network, we needed to overcome the limitation of the small and imbalanced training dataset. Our study involved only a few hundreds of samples of zebrafish eggs, which were not enough for training a deep neural network. Compared to the popular ImageNet [14] dataset of over ten million sample images, the size of our datasets is at least four orders of magnitude less. When the number of weights to be trained in a neural network is far more than the number of training samples, the problem of overfitting is likely to occur, and the network will have poor generalization ability.
Another problem with our training set is that the sample numbers of different categories were seriously imbalanced. The ratio between fertilized and unfertilized eggs was almost 6:1 in our dataset. Imbalanced training data could potentially diminish the specificity of the network, making the network incompetent to recognize the relatively smaller category, i.e., the unfertilized eggs.
To overcome the limitation of the small and imbalanced training set, we used the image augmentation strategy to increase the training set size and to balance the training sample numbers of different categories. Image augmentation is the process to increase the training set by creating altered versions of the existing sample images, and it is proved to be an effective solution to prevent overfitting [14]. Typical ways of data augmentation include rotation, translation, zooming, flipping, scaling, color perturbation, and adding random noise.
In our study, image augmentation strategies were carefully chosen according to the characteristics of our datasets (as shown in Figure 4). Since the eggs were captured at the same time point, they had similar sizes. All the images were captured under the same environmental light condition so that the grayscale level of different eggs was similar. The most possible variation of the eggs is the different orientation caused by random placement. Therefore, we used image rotation and flipping to simulate possible deviation of egg orientations. In order to improve the balance of the dataset, we augmented the unfertilized eggs more than the fertilized eggs. Each fertilized patch was rotated three times with 60 degrees interval, while each unfertilized patch was rotated 18 times with 10 degrees interval. All the rotated patches were also flipped vertically to simulate the effect of different illumination orientations of the environmental light. As a result, the ratio between fertilized and unfertilized eggs was close to 1:1 after the augmentation.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 12 condition so that the grayscale level of different eggs was similar. The most possible variation of the eggs is the different orientation caused by random placement. Therefore, we used image rotation and flipping to simulate possible deviation of egg orientations. In order to improve the balance of the dataset, we augmented the unfertilized eggs more than the fertilized eggs. Each fertilized patch was rotated three times with 60 degrees interval, while each unfertilized patch was rotated 18 times with 10 degrees interval. All the rotated patches were also flipped vertically to simulate the effect of different illumination orientations of the environmental light. As a result, the ratio between fertilized and unfertilized eggs was close to 1:1 after the augmentation. Based on the augmented training dataset, a convolutional neural network was trained. The network used the architecture of VGG-16 [15], the winner of the 2014 Large Scale Visual Recognition Challenge (ILSVRC). As shown in Figure 5, this architecture consisted of 5 blocks of 13 convolutional layers. For each convolutional layer, a convolution kernel of size 3 × 3 was convolved with the layer input to produce a tensor of outputs. The output tensor of the convolutional layer was then transferred into a finite value by an activation function of Rectified Linear Unit (ReLu), i.e., F(x) = x for x > 0 and 0 otherwise. At the end of each block of the convolutional layers, there was a maxpooling layer to perform down-sampling by dividing the output feature map from each block into 2 × 2 pooling regions and computing the maximum of each region. The down-sampled feature map from each max-pooling layer was then fed into the next convolutional layers as an input. To train this deep convolutional network, the transfer learning strategy was used. The network weights pre-trained on the ImageNet dataset was adopted as the initialization. By using VGG-16 as the initial model, we were able to take advantage of deep features learned from millions of natural Based on the augmented training dataset, a convolutional neural network was trained. The network used the architecture of VGG-16 [15], the winner of the 2014 Large Scale Visual Recognition Challenge (ILSVRC). As shown in Figure 5, this architecture consisted of 5 blocks of 13 convolutional layers. For each convolutional layer, a convolution kernel of size 3 × 3 was convolved with the layer input to produce a tensor of outputs. The output tensor of the convolutional layer was then transferred into a finite value by an activation function of Rectified Linear Unit (ReLu), i.e., F(x) = x for x > 0 and 0 otherwise. At the end of each block of the convolutional layers, there was a max-pooling layer to perform down-sampling by dividing the output feature map from each block into 2 × 2 pooling regions and computing the maximum of each region. The down-sampled feature map from each max-pooling layer was then fed into the next convolutional layers as an input.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 12 condition so that the grayscale level of different eggs was similar. The most possible variation of the eggs is the different orientation caused by random placement. Therefore, we used image rotation and flipping to simulate possible deviation of egg orientations. In order to improve the balance of the dataset, we augmented the unfertilized eggs more than the fertilized eggs. Each fertilized patch was rotated three times with 60 degrees interval, while each unfertilized patch was rotated 18 times with 10 degrees interval. All the rotated patches were also flipped vertically to simulate the effect of different illumination orientations of the environmental light. As a result, the ratio between fertilized and unfertilized eggs was close to 1:1 after the augmentation. Based on the augmented training dataset, a convolutional neural network was trained. The network used the architecture of VGG-16 [15], the winner of the 2014 Large Scale Visual Recognition Challenge (ILSVRC). As shown in Figure 5, this architecture consisted of 5 blocks of 13 convolutional layers. For each convolutional layer, a convolution kernel of size 3 × 3 was convolved with the layer input to produce a tensor of outputs. The output tensor of the convolutional layer was then transferred into a finite value by an activation function of Rectified Linear Unit (ReLu), i.e., F(x) = x for x > 0 and 0 otherwise. At the end of each block of the convolutional layers, there was a maxpooling layer to perform down-sampling by dividing the output feature map from each block into 2 × 2 pooling regions and computing the maximum of each region. The down-sampled feature map from each max-pooling layer was then fed into the next convolutional layers as an input. To train this deep convolutional network, the transfer learning strategy was used. The network weights pre-trained on the ImageNet dataset was adopted as the initialization. By using VGG-16 as the initial model, we were able to take advantage of deep features learned from millions of natural To train this deep convolutional network, the transfer learning strategy was used. The network weights pre-trained on the ImageNet dataset was adopted as the initialization. By using VGG-16 as the initial model, we were able to take advantage of deep features learned from millions of natural images [20]; therefore, the risk of overfitting was further reduced, and the convergence of the training was accelerated. During the training, the weights of the first three blocks of layers were frozen to retain the extracted simple features by VGG-16. Other two blocks of convolutional layers were fine-tuned with a small learning rate to make sure that the magnitude of the updates from each fine-tuning iteration stayed small.
For the bottleneck feature training phase, RMSprop optimizer was used for faster general localization. For the fine-tuning phase, Stochastic Gradient Descent (SGD) optimizer with momentum was chosen for better generalization ability. Choosing proper optimizers is crucial since it directly affects the convergence of the algorithm. Both optimizers we chose here originate from the optimizer of Gradient Descent. However, the basic Gradient Descent method calculates the gradient of the whole data set for performing only one update. Therefore, it is extremely slow and memory expensive for experiments with large datasets. Stochastic Gradient Descent (SGD) method was designed to rectify the above problems of the regular Gradient Descent method by performing a parameter update for each training example. To further improve convergence accuracy and reduce fluctuation, a momentum term was added to the SGD method. It restricts the oscillation in one direction during searching to improve the speed of the convergence. Based on the SGD with momentum method, RMSprop optimizer restricts the oscillations in the vertical direction. In this way, a larger learning rate could be adapted to have a larger searching pace in the horizontal direction to increase convergence speed. For bottleneck feature training phase, RMSprop optimizer was used for faster general localization at the beginning. While Stochastic Gradient Descent (SGD) optimizer with momentum was chosen for more precisely global minima localization.
Learning rate is one of the most important aspects of Gradient Descent because it determines the pace size for searching the global minima of the optimizing algorithm. Here, a small learning rate of 0.0001 was used to perform fine adjustments to weights without changing the overall weight structure. We used a small learning rate so that the features learned previously were not wrecked. For the training process of each data fold, we ran 50 epochs and saved the best result of model weights at the epoch when the validation loss was the least. The technique of reduced learning rate was used, i.e., the learning rate was multiplied with 0.2 when the training loss stopped reducing for 3 epochs.

Global Average Pooling Classifier
After features were extracted by the convolutional network module, a classification module based on global average pooling method was used instead of the traditional fully connected layer classifier. Conventionally, in a convolutional neural network, convolutional layers are usually followed by several fully connected layers to vectorize the feature extracted by convolutional layers and to accomplish the classification task via a softmax logistic regression layer. However, fully connected layers involve many weights to be trained, which increase the cost of computing and reduce the convergence speed of the network. On the other hand, the increment of weights will also increase model complexity, which may easily lead to overfitting. Effective techniques have been proposed to avoid overfitting, such as dropout [21,22]. Using global average pooling (GAP) instead of fully connected layers to classify different categories directly from feature maps is a revolutionary innovative improvement made to traditional convolutional neural networks [23]. Instead of adding fully connected layers on top of the feature maps from convolutional layers, GAP generates one feature map for each corresponding category to be classified, vectorize the features by global average pooling, and feed the vectors directly into the final softmax classifier, as shown in Figure 2. Compared to traditional fully connected layers, GAP had enforced the correspondences between feature maps and categories. Besides, the GAP didn't introduce extra weights to be optimized for the network, which had reduced the prone of overfitting.

Results
In this study, a total of 211 zebrafish egg microscopic images containing 638 eggs were acquired using the ImageXpress system. A human biologist with over 10 years' experience was invited to assign fertilization labels to all the eggs, resulting in 546 fertilized eggs and 92 unfertilized. The labels of human expert were used as the gold standard for method validation. The network was constructed using the Keras platform on a server with NVIDIA K4000 Graphics Processing Unit (GPU). The training process took~60 min for each training subsample set and took less than 5 s on each test image.

Zebrafish Egg Classification Accuracy
To validate the proposed method, a five-fold cross-validation scheme was used. For K-fold cross-validation, the original dataset was randomly partitioned into K equal-sized subsample sets. The training and validation processes were repeated K times. Each time, one subsample set was retained, in turn, as the validation data, while other K−1 subsample sets were used as the training data. We chose 5-fold cross-validation according to the overall size of the dataset so that there were no less than 500 eggs in each training set. The training and validation processes were repeated five times, and the accuracy of each validation subsample set was calculated. Table 1 reports the accuracy of each cross-validation fold. The accuracy was defined as Accuracy = (N TP + N TN )/N all , where N TP , N TN , N all stand for the number of true positive, true negative, and all eggs, respectively. In this study, we considered unfertilized eggs as positive samples and fertilized eggs as negative samples, respectively. As reflected in Table 1, the third fold had the highest accuracy (98.8%), and the second fold had the lowest accuracy (93.2%). Even the lowest accuracy was higher than 93%, and the mean accuracy of all folds was 95.0%, meaning that the proposed method has a quite high classification accuracy for zebrafish egg fertilization status. The standard deviation of all the folds was also small (2.2%), meaning this method performs stably over different test datasets.

Comparison between Regular Fully Connected Layers and Global Average Pooling
Our method used global averaged pooling layers instead of the regular fully connected layers to improve the classification accuracy. To verify the advantage of global averaged pooling, we calculated the classification results with regular fully connected layer to compare with the results based on global average pooling. In this experiment, the dropout technique with an experimental value of 50% dropout probability was adopted with two fully connected layers to compare with the global average pooling classifier. Fully connected layers are usually accompanied by the dropout method to promote convergence. Dropout is a regularization technique to prevent overfitting for neural network models [14,22]. Neurons are randomly selected with a given probability to be dropped out and ignored during training so that the network could learn multiple independent internal representations and improve the generalization ability.
The comparison was based on the same five-fold cross-validation dataset, as mentioned above. The highest classification accuracy of the regular fully connected layers was 97.3%, which was less than the highest accuracy of 98.8% of the global average pooling method. Moreover, we also found that the global averaged pooling method had better convergence performance for model training. Figure 6 plots the training accuracy curve and validation accuracy curve of both global averaged pooling and conventional fully connected layers. The global averaged pooling method shows more steady convergence process with less fluctuation. The global averaged pooling method also has a narrower gap between the training and validation curves, implying better generalization ability than the fully connected layers method.

Comparison between Augmented Dataset and Original Dataset
Our method used data augmentation strategy to overcome the limitation of the small and imbalanced training dataset. In this Section, we compared the performances of zebrafish egg classification with and without data augmentation. Besides the training result acquired with augmented datasets of 7864 image patches, another model was trained with the original dataset of 638 patches without augmentation. The improvement of accuracy in the case of the augmented datasets against the original datasets was significant. Validation accuracy was improved from 83.8% to 98.8% after data augmentation.
To further analyze the effect of balancing the imbalanced datasets by augmentation, we had computed the metrics of sensitivity, specificity, precision, and accuracy between the methods with and without data augmentation (as shown in Table 2). The metrics were defined as Sensitivity = NTP/(NTP+NFN), Specificity = NTN/(NTN+NFP), Precision = NTP/(NTP+NFP), where N stands for the number of samples, TP, FP, TN, FN represent true positive, false positive, true negative, false negative, respectively. From Table 2, it can be observed that data augmentation led to evident improvements in both sensitivity and accuracy, while the specificity and precision of both methods were at the same level. We also compared the convergence performance of model training between the methods with and without data augmentation. As shown in Figure 7, data augmentation led to faster convergence speed and a smaller gap between the training accuracy curve and validation accuracy curve, implying that data augmentation yielded better specificity and generalization ability.

Comparison between Augmented Dataset and Original Dataset
Our method used data augmentation strategy to overcome the limitation of the small and imbalanced training dataset. In this Section, we compared the performances of zebrafish egg classification with and without data augmentation. Besides the training result acquired with augmented datasets of 7864 image patches, another model was trained with the original dataset of 638 patches without augmentation. The improvement of accuracy in the case of the augmented datasets against the original datasets was significant. Validation accuracy was improved from 83.8% to 98.8% after data augmentation.
To further analyze the effect of balancing the imbalanced datasets by augmentation, we had computed the metrics of sensitivity, specificity, precision, and accuracy between the methods with and without data augmentation (as shown in Table 2). The metrics were defined as Sensitivity  Table 2, it can be observed that data augmentation led to evident improvements in both sensitivity and accuracy, while the specificity and precision of both methods were at the same level. We also compared the convergence performance of model training between the methods with and without data augmentation. As shown in Figure 7, data augmentation led to faster convergence speed and a smaller gap between the training accuracy curve and validation accuracy curve, implying that data augmentation yielded better specificity and generalization ability.

Comparison with Other Zebrafish Embryo Microscopic Image Analysis Studies
As we surveyed the existing studies, there was rarely any research on zebrafish egg fertilization status classification from microscopic images. The most similar study to ours is from Liu et al. [4] who used support vector machine (SVM) to classify zebrafish embryo hatching status based on handcrafted image features. It is hard to rigorously compare our method with Liu's method since the application purpose is different. As a rough comparison, their method achieved average recognition accuracy of 97.4 ± 61.0%, while our method had an average accuracy of 95.0 ± 2.2%. Although the two methods have similar accuracy, the standard deviation of our method (2.2%) is much less than theirs (61.0%), meaning that our method is considerably more stable. Moreover, our method doesn't need any hand-crafted feature; thus, the cost of algorithm design and the involvement of subjective interference of our method is much less. It is evident that our deep learning approach has better stability and objectiveness than the traditional machine learning methods based on hand-crafted features.

Discussion
In this study, exploratory research was conducted on CNN-based zebrafish egg phenotype classification from microscopic images. Due to the particularity of zebrafish egg research, we were facing the problems of the small imbalanced dataset and subtle inter-class difference. To tackle these problems, the strategies of transfer learning, data augmentation, and global averaged pooling were used.
It is known that training a deep network from scratch with random initialization is a formidable task. It requires millions of well-annotated training images, which are difficult to obtain in our study. Transfer learning is a technique to obtain deep features that an existing model has learned from tens of thousands of natural image datasets, either as an initialization or a fixed feature extractor for the task of interest. In some studies, transfer learning has been used to analyze medical images and achieved dramatic performance improvement for classification tasks of small datasets [24,25]. In this study, we used VGG-16 model previously trained on millions of natural scene images [15]. Compared to medical images like Computed Tomography (CT) and Magnetic Resonance Imaging (MRI), brightfield microscopic images share more common image features with natural scene images; therefore, we directly used the original VGG-16 model without modifications to its network architecture. As shown in our experimental results (Table 1), a mean accuracy of 95.0% was obtained based on fivefold cross-validation, proving the effectiveness of the transfer learning.

Comparison with Other Zebrafish Embryo Microscopic Image Analysis Studies
As we surveyed the existing studies, there was rarely any research on zebrafish egg fertilization status classification from microscopic images. The most similar study to ours is from Liu et al. [4] who used support vector machine (SVM) to classify zebrafish embryo hatching status based on hand-crafted image features. It is hard to rigorously compare our method with Liu's method since the application purpose is different. As a rough comparison, their method achieved average recognition accuracy of 97.4 ± 61.0%, while our method had an average accuracy of 95.0 ± 2.2%. Although the two methods have similar accuracy, the standard deviation of our method (2.2%) is much less than theirs (61.0%), meaning that our method is considerably more stable. Moreover, our method doesn't need any hand-crafted feature; thus, the cost of algorithm design and the involvement of subjective interference of our method is much less. It is evident that our deep learning approach has better stability and objectiveness than the traditional machine learning methods based on hand-crafted features.

Discussion
In this study, exploratory research was conducted on CNN-based zebrafish egg phenotype classification from microscopic images. Due to the particularity of zebrafish egg research, we were facing the problems of the small imbalanced dataset and subtle inter-class difference. To tackle these problems, the strategies of transfer learning, data augmentation, and global averaged pooling were used.
It is known that training a deep network from scratch with random initialization is a formidable task. It requires millions of well-annotated training images, which are difficult to obtain in our study. Transfer learning is a technique to obtain deep features that an existing model has learned from tens of thousands of natural image datasets, either as an initialization or a fixed feature extractor for the task of interest. In some studies, transfer learning has been used to analyze medical images and achieved dramatic performance improvement for classification tasks of small datasets [24,25]. In this study, we used VGG-16 model previously trained on millions of natural scene images [15]. Compared to medical images like Computed Tomography (CT) and Magnetic Resonance Imaging (MRI), bright-field microscopic images share more common image features with natural scene images; therefore, we directly used the original VGG-16 model without modifications to its network architecture. As shown in our experimental results (Table 1), a mean accuracy of 95.0% was obtained based on five-fold cross-validation, proving the effectiveness of the transfer learning.
To further address the small imbalanced dataset problem, data augmentation strategy was used in this study. The effect of data augmentation was evident. As shown in Table 2, the sensitivity and accuracy were improved dramatically after data augmentation. The model trained without data augmentation yielded quite low sensitivity (68.0%), implying that this model tended to make negative judgments (fertilized). This is because the training data without augmentation contained much less unfertilized eggs than the fertilized eggs, making the model inadequate to recognized fertilized eggs. Therefore, dedicated data augmentation is very crucial for training a network for recognizing both types of eggs.
To cope with the subtle inter-class differences, global averaged pooling was used instead of the conventional fully connected layers. Global average pooling classifier enforced the correspondences between feature maps and categories without introducing extra weights to be optimized, and thus reduced the fluctuation during the training process and promoted fast and steady convergence. As reflected from the experimental results (Figure 3), global averaged pooling not only yield improved averaged classification accuracy but also lead to faster and more stable convergence of the training and validation curves. Such an advantage is crucial for biological microscopic image classification since the genetic or biological changes usually result in quite subtle phenotype differences.
As a limitation of this study, the proposed method still used a conventional template matching scheme to locate each egg in the well-plate image. There are several state-of-the-art neural networks for fast object detection, such as Faster-RCNN, YOLO, etc. [26]. However, as we tested these models, they performed well on locating the eggs but failed to accurately distinguish between the fertilized and unfertilized eggs. Therefore, we chose to use conventional CNN structure equipped with global averaged pooling to overcome the subtle inter-class difference. In the future study, we will focus on combining the object detection networks with our network architecture so that the whole workflow (including detection and classification) can be performed with only one network.

Conclusions
This study applied the deep learning technique to classify fertilized and unfertilized zebrafish eggs from bright-field microscopic images. Transfer learning and data augmentation schemes were used to overcome the problem of the small imbalanced training dataset. Global averaged pooling was adopted to improve the classification accuracy over subtle inter-class differences. Our future research direction will focus on applying this method in daily zebrafish egg acquisition workflow so that the proposed algorithm can promote the research outcome of high throughput biological experiments.