1. Introduction
Deep learning has become widely used in image analysis. A typical deep learning model can have millions of parameters. For example, VGG16, VGG19 and ResNet50, respectively, have 138.4 million, 143.7 million, and 25.6 million parameters [
1,
2,
3]. As a result of deep learning models having millions of parameters, large datasets have to be used to train the model to estimate model parameters. Researchers have spent a lot of resources (time and money) to collect and annotate large datasets for deep learning model purposes.
However, in many applications, it is not necessary to fully re-train the model based on a large annotated dataset. Fully re-estimating the deep learning model can incur a lot of cost, and the strategy of fully re-estimating the deep learning model is not feasible due to budget and resource limits. In addition, fully re-estimating the model with a small dataset may not lead to a good performance. For a small dataset, a simple model is often preferred over a complex model [
4]. For example, if only limited observations are available for the regression, a linear regression or polynomial regression (with a degree less than four) is often preferred over non-parametric regression models [
4]. The other way is to use pre-trained parameters in a complex model, such as deep learning neural network models, which are also recommended [
4]. Researchers have developed the transfer learning approach to solve this issue. Transfer learning allows researchers to make use of pretrained models based on other datasets to analyze their own problem using a small- or medium-sized dataset [
5]. One typical method of transfer learning for image analysis is a two-stage approach. In Stage 1, the lower neural network layers in the deep learning model (for example, in the VGG16 model, there are 16 convolutional neural network layers) are used with pretrained weights from another dataset, which is a standard big dataset such as ImageNet [
6], to transform the input images into features. In Stage 2, the extracted features and the ground truth response (
y) are fed into a neural network where all neural network parameters are estimable [
7]. In this way, a satisfactory performance can be obtained even with a small dataset.
We extend the two-stage approach in general. In Stage 1, a feature extraction method is used to extract features from input images. The feature extraction methods can be principal component analysis or pre-trained deep neural network models with fixed parameters. Stage 2 is a supervised learning problem (regression or classification) with extracted features from Stage 1 as inputs and response variables (continuous or categorical y) as outputs. The method in Stage 2 can be a neural network or random forest. We intend to evaluate the performance of our general two-stage approach with different Stage 1 methods and different Stage 2 methods.
Our proposed two-stage machine learning strategy is a general approach in that researchers can select appropriate Stage 1 methods and Stage 2 methods according to the research problem, objective and data. In our previous work [
8], we proposed machine learning methods to predict continuous phenotypes and binary phenotypes based on plant images. Our methods belong to the general framework of our proposed two-stage approach. In Stage 1, we adopt principal components analysis (PCA) to extract features, i.e., PCs. In Stage 2, we use a range of machine learning methods (random forest, partial least squares and LASSO) to predict the plant phenotypes, including the number of leaves of a plant, based on plant images. Our proposed methods work for plant image phenotyping [
8].
Another example of a method that belongs to our proposed two-stage method is standard deep transfer learning [
9]. In Stage 1, the lower layers of neural network models with pre-trained fixed weights are used to extract features. In Stage 2, these features are fed into the upper layers of neural networks. Thus, deep learning with the weights of lower layers fixed is also present in our proposed general framework [
9].
Image-based plant phenotyping, i.e., plant image phenotyping, refers to a rapidly emerging research area concerned with quantitative measurements of the structural and functional properties of plants based on plant images [
8]. Image-based plant phenotyping facilitates the extraction of traits noninvasively by analyzing a large number of plants in a relatively short period of time. Plant image phenotyping has the advantages of low cost, a high throughput and the fact that it is non-destructive [
10]. Based on plant image phenotyping, agricultural and biological researchers can track the growth dynamics of plants and identify the time of critical events (such as plant flowering) and morphological changes (such as the number of leaves, plant size and position of each leaf) so that they can better analyze the problem. An example of a problem is how different factors (fertilizer usage amount, temperature and moisture) influence plants [
8]. In this article, we illustrate our method by evaluating the performance of our general two-stage framework with different Stage 1 and Stage 2 methods. We evaluate how these methods work for plant image phenotyping, especially in detecting the number of leaves of plants by analyzing RGB images.
The remaining of the paper is as follows:
Section 2 specifies the methods and data.
Section 3 shows the results.
Section 4 presents our discussions.
Section 5 draws the conclusions.
2. Materials and Methods
The proposed method is a general two-stage approach. In Stage 1, a feature extraction method is used to extract features from input images. We adopt principal component analysis in this paper. In our ongoing project, we are evaluating the performance of other feature extraction methods, especially pre-trained deep neural network models with predetermined and fixed parameters. As transfer learning, the values of neural network parameters are pre-trained using a large dataset, i.e., ImageNet. ImageNet is a large image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet is described by multiple words or word phrases [
6]. ImageNet includes 80,000 nouns with each noun illustrated in on average 1000 images, and it was created to satisfy researchers’ critical need for more data to enable more general machine learning methods [
6]. The use of ImageNet to pre-train neural network parameters to obtain parameter values and the use of fixed ImageNet weights in deep learning have demonstrated advantages in the literature [
11].
Stage 2 is a supervised learning problem (regression or classification) with extracted features from Stage 1 as inputs and response variables (continuous or categorical
y) as outputs. We adopted partial least squares (PLS) and LASSO as regression methods, and partial least squares discriminant analysis (PLS-DA) and LASSO as classification methods. LASSO often shows good prediction performance for high-dimensional data with the use of the
penalty [
4]. When model interpretation is preferred instead of model prediction, the LASSO method is often used to identify predictors impacting the response variable, assuming sparse signals [
4]. With recent developments in explanatory machine learning and artificial intelligence, researchers are trying to develop models that are easy to interpret, instead of black-box machine learning models. When an explanation is preferred, LASSO methods and decision trees are often used due to their good interpretability [
4]. A range of methods on visual interpretability for deep learning have been developed in the literature [
12]. In our ongoing project, we are working on evaluating the performance of random forest and neural networks as our Stage 2 methods.
The dataset used in our study is the University of Nebraska Lincoln (UNL) Component Plant Phenotyping Dataset (CPPD) [
13]. The UNL-CPPD dataset consists of images of 13 maize plants from two side views (0 degrees and 90 degrees). Plants were imaged once per day from 2 days to 28 days after planting at the UNL Lemnatec Scanalyzer 3D high-throughput phenotyping facility using an RGB camera.
The RGB images were converted to grayscale images and resized to
, which is the size of input images for deep learning models including VGG16, VGG19 and ResNet50 [
1,
2,
3]. In this paper, each grayscale image was converted into a numerical matrix of 224 rows and 224 columns, which was vectorized/reshaped to a column vector of length
= 50,176. The data were centered and scaled before extraction of principal components. Principal components were extracted from the centered and scaled vectors representing the images. The extracted principal components were then fed into Stage 2 machine learning methods (any appropriate supervised learning method can be used) to make a prediction.
The phenotype leaf number refers to the number of leafs in a plant image. It is an integer and we treat it as a continuous phenotype. Then, the binary variable “leafy” was created as leafy = 1 if the leaf number was more than the median leaf number. We applied regression methods to predict the phenotype “plant leaf number” and applied classification methods for the binary phenotype “leafy”.
Five-fold cross validation (CV) was used to evaluate the performance. In the regression problem, the performance evaluation metrics are Mean Square Error (MSE), Root Mean Square Error (RMSE) and Mean Absolute Deviation (MAD), specified as
where
is the true response value and
is the predicted response value for observation
i. In the classification problem, the performance evaluation metric is accuracy, which is the number of correct classifications divided by the total number of classifications.
4. Discussion
Our proposed method is a general two-stage framework allowing the choice of a Stage 1 method and a Stage 2 method. When the Stage 1 method and the Stage 2 method are based on the same neural network model, it is reduced to the deep transfer learning model. The current report uses principal component as the Stage 1 method and partial least squares and LASSO as the Stage 2 methods based on the UNL COPD dataset. We note that the current report is of limited scope, and more methods and more datasets are needed to conduct a more thorough analysis and report. In our ongoing project, we are working to evaluate the performance of the two-stage approach using different Stage 1 methods (deep neural networks and principal component) and Stage 2 methods (partial least squares, LASSO and random forest), and will compare their performances with those of deep transfer learning in the literature based on multiple datasets.
Regarding Stage 1 methods to extract features from images, two widely used methods are (1) principal component analysis and (2) pre-trained deep learning. Both methods work for plant image phenotyping (image regression, classification and segmentation), as shown in the literature, including two of our previous studies [
8,
14].
Although our two-stage method is a general framework, the most widely used framework is deep transfer learning, which has already shown great success in the literature. We want to explore the possibilities of other models by using different Stage 1 and Stage 2 methods. In terms of prediction performance, we expect deep transfer learning may achieve the best prediction performance, but it is still worth comparing different methods. The objective of this article is to compare different methods belonging to our two-stage general framework for better prediction and interpretation so that researchers can have a better understanding and more tools when they want to develop novel machine learning methods.