An Automatic Method for Stomatal Pore Detection and Measurement in Microscope Images of Plant Leaf Based on a Convolutional Neural Network Model

: Stomata are microscopic pores on the plant epidermis that regulate the water content and CO 2 levels in leaves. Thus, they play an important role in plant growth and development. Currently, most of the common methods for the measurement of pore anatomy parameters involve manual measurement or semi-automatic analysis technology, which makes it di ﬃ cult to achieve high-throughput and automated processing. This paper presents a method for the automatic segmentation and parameter calculation of stomatal pores in microscope images of plant leaves based on deep convolutional neural networks. The proposed method uses a type of convolutional neural network model (Mask R-CNN (region-based convolutional neural network)) to obtain the contour coordinates of the pore regions in microscope images of leaves. The anatomy parameters of pores are then obtained by ellipse ﬁtting technology, and the quantitative analysis of pore parameters is implemented. Stomatal microscope image datasets for black poplar leaves were obtained using a large depth-of-ﬁeld microscope observation system, the VHX-2000, from Keyence Corporation. The images used in the training, validation, and test sets were taken randomly from the datasets (562, 188, and 188 images, respectively). After 10-fold cross validation, the 188 test images were found to contain an average of 2278 pores (pore widths smaller than 0.34 µ m (1.65 pixels) were considered to be closed stomata), and an average of 2201 pores were detected by our network with a detection accuracy of 96.6%, and the intersection of union (IoU) of the pores was 0.82. The segmentation results of 2201 stomatal pores of black poplar leaves showed that the average measurement accuracies of the (a) pore length, (b) pore width, (c) area, (d) eccentricity, and (e) degree of stomatal opening, with a ratio of width-to-maximum length of a stomatal pore, were (a) 94.66%, (b) 93.54%, (c) 90.73%, (d) 99.09%, and (e) 92.95%, respectively. The proposed stomatal pore detection and measurement method based on the Mask R-CNN can automatically measure the anatomy parameters of pores in plants, thus helping researchers to obtain accurate stomatal pore information for leaves in an e ﬃ cient and simple way.


Introduction
Stomata control the fluxes in carbon dioxide and water vapor levels across a leaf [1][2][3][4]. The rest of the epidermis is covered by an impervious cuticle, with limited possibilities for gas exchange [5]. The gas exchange capacity depends on stomatal density (i.e., stomatal number per unit area), stomatal size (length and width), and pore dimensions (length and aperture) [6][7][8][9]. Pore area is dynamically adjusted by changes in pore aperture, since pore length is rather rigid during the opening and closure of stomata [10,11]. Active pore aperture adjustments in response to internal (e.g., water status) and external (e.g., environmental) factors are physiologically regulated [12][13][14][15]. Instead, pore length and stomatal density are anatomical features that are set during leaf elongation [10,16]. These stomatal features have been the focus of a wide range of studies [6][7][8]17,18]. For instance, they are often used to estimate stomatal conductance (gs) based on the equation by Brown and Escombe [19] or modified versions of this equation [17,20]. In this way, for instance, reconstructions of gs over geological time scales were made possible based on the stomatal traits of fossilized leaves [8]. Estimations of gs from stomatal anatomy features are also important when the effect of changing a single feature on gs is investigated [21,22].
At present, most methods for the measurement of stomatal pores involve manual measurement from images using image processing software, such as ImageJ [23]. This type of method requires researchers to manually label points of interest, such as boundaries, lengths, and widths, in a pore. The disadvantages of this method are that, first, it requires manual intervention, and second, due to the huge amount of data required, only some of the data points can be used to build the model, thus making the results inaccurate. This has led to many automatic measure methods being proposed in an attempt to handle the large data volumes associated with leaf pores and the high accuracy requirements of data analysis.
A method for measuring the parameters of stomatal pore anatomy was first proposed by Omasa and Onoe [24]. This method used Fourier transform and unsharpened mask techniques to remove the noise of original images and calculate the length and width of sunflower pores through edge detection. The disadvantages of this method are that it requires a large amount of computation and is only suitable for single-porosity images. Laga et al. [25] proposed an automatic method. They firstly detected stomata using template matching technology and then extracted stomatal aperture by binary segmentation. However, this method is reliant on a template for each plant species. Liu et al. [26] used maximum stable external regions (MSERs) to detect and measure grape stomata. This is a semi-automatic method because it requires the user to properly select ellipses to fit different stomata. An automatic method for the pore measurement of grape varieties was proposed by Jayakody et al. [27]. This method is based on machine learning theory and uses histogram of oriented gradients (HOG) characteristics to construct a cascade target detector to detect pores and then calculate various associated parameters through binary image segmentation and skeletal technology. Though the method is fully automated, it requires the analyzed microscope image to contain rich background features. Toda et al. [28] firstly used a HOG feature to detect stomata followed by a convolutional neural network (CNN) to classify clipped individual pores as open or closed; finally, they used binary image segmentation to complete automatic pore measurement. As the algorithm needs to manually define the range of parameters (such as area, solids, spindle length, and centroid coordinates), it is not able to identify pores when their size or shape is not within the predefined parameter range. Bhugra et al. [29] proposed a method based on a deep learning neural network to identify pores in SEM images and performed a method of stomatal pore segmentation. This method uses a single shot multi-box detector (SSD) to detect the distribution of stomata using a super-resolution convolutional neural network (SRCNN) to improve the resolution of cropped local images; finally, it uses a fully convolutional network (FCN) to segment the pores. A method for stomatal pore segmentation based on the Chan-Vese (CV) model was presented by Li et al. [30]. The author first used a faster region-based convolutional neural network (Faster R-CNN) to detect the stomatal location and then cropped the detected stomata, thus constructing single stomatal pictures and segmenting these single stomata using a CV model, but the method requires manually adjusting the parameters of the CV model, such as the quality of image and the stomatal shape, according to the image being processed. Furthermore, this method can only handle pores with a larger stomatal aperture because it is more difficult to measure pores with a smaller degree of aperture.
In this paper, we propose an automatic high-throughput method based on the mask region-based convolutional neural network (Mask R-CNN) [31] to acquire parameters of stomatal pore anatomy. The Mask R-CNN, based on the Faster R-CNN [32], is an instance segmentation model. When a new stomatal image is used as input, this model processes the image by not only providing a bounding box around the object but also providing a prediction about the category of each pixel. This method was applied to the detection of stomata based on microscope images of plant leaves, and the results were evaluated visually and in terms of several quantitative indices.

Data Acquisition
In this paper, we used one-year-old black poplar (Populus nigra) growing in its natural environment as the experimental plant. Black poplar is often used as a protective forest species and plays an important role in environmental protection. It also serves as a habitat for many animals, and its seeds can be eaten by finches. Therefore, black poplar plays an important role in ecology and the protection of the environment.
The Keyence VHX-2000 microscope observation system with a large depth-of-field [33] was used to obtain stomatal microscope images of fully focused black poplar leaves, as shown in Figure 1. When a new stomatal image is used as input, this model processes the image by not only providing a bounding box around the object but also providing a prediction about the category of each pixel. This method was applied to the detection of stomata based on microscope images of plant leaves, and the results were evaluated visually and in terms of several quantitative indices.

Data Acquisition
In this paper, we used one-year-old black poplar (Populus nigra) growing in its natural environment as the experimental plant. Black poplar is often used as a protective forest species and plays an important role in environmental protection. It also serves as a habitat for many animals, and its seeds can be eaten by finches. Therefore, black poplar plays an important role in ecology and the protection of the environment.
The Keyence VHX-2000 microscope observation system with a large depth-of-field [33] was used to obtain stomatal microscope images of fully focused black poplar leaves, as shown in Figure 1. The resolution of obtained stomatal pores images was 1600 × 1200, and the magnification was 1000×. A self-made lifting device was used to simultaneously obtain leaves from each position, including the bottom, middle, and top leaves of different trees, as well as the top, middle, and base of a single leaf. To arrive at a relatively robust conclusion, the task of collecting images of different parts of poplar leaves continued for three months (July-October). Poplar leaves need to be fixed with a self-made leaf holder to make them flatter during the collection process. Examples of the collected images are shown in Figure 2. The resolution of obtained stomatal pores images was 1600 × 1200, and the magnification was 1000×. A self-made lifting device was used to simultaneously obtain leaves from each position, including the bottom, middle, and top leaves of different trees, as well as the top, middle, and base of a single leaf. To arrive at a relatively robust conclusion, the task of collecting images of different parts of poplar leaves continued for three months (July-October). Poplar leaves need to be fixed with a self-made leaf holder to make them flatter during the collection process. Examples of the collected images are shown in Figure 2.
1000×. A self-made lifting device was used to simultaneously obtain leaves from each position, including the bottom, middle, and top leaves of different trees, as well as the top, middle, and base of a single leaf. To arrive at a relatively robust conclusion, the task of collecting images of different parts of poplar leaves continued for three months (July-October). Poplar leaves need to be fixed with a self-made leaf holder to make them flatter during the collection process. Examples of the collected images are shown in Figure 2.

Methods
The aim of this study was to acquire a fully automated method for stomatal pore segmentation and the quantitative measurement of plants. The overall flow of the proposed method is shown in Figure 3. The input was a stomatal microscope image. The output was the stomatal pore anatomy parameters detected by the model, including the length and width of the pore and the pore area, the elliptical eccentricity based on the pore [26], and the degree of stomatal opening-for a detailed introduction of anatomy parameters, refer to Section 2.2.2. The method consisted of 3 steps: (1) make a network training set, validation set, and test set to obtain a network model; (2) use the trained network model to obtain the mask (segmentation results) contour coordinates of each pore in the test image; and (3) obtain the parameters of the pores based on the least squares fitting of ellipses according to the contour coordinates of the mask and stomatal pore measurement model [34].

Methods
The aim of this study was to acquire a fully automated method for stomatal pore segmentation and the quantitative measurement of plants. The overall flow of the proposed method is shown in Figure 3. The input was a stomatal microscope image. The output was the stomatal pore anatomy parameters detected by the model, including the length and width of the pore and the pore area, the elliptical eccentricity based on the pore [26], and the degree of stomatal opening-for a detailed introduction of anatomy parameters, refer to Section 2.2.2. The method consisted of 3 steps: (1) make a network training set, validation set, and test set to obtain a network model; (2) use the trained network model to obtain the mask (segmentation results) contour coordinates of each pore in the test image; and (3) obtain the parameters of the pores based on the least squares fitting of ellipses according to the contour coordinates of the mask and stomatal pore measurement model [34].
First, all acquired black poplar microscope images were randomly divided into a training set, a validation set, and a test set. The labeling tool LabelMe [35] was used to perform manual labeling under the guidance of a botanist. Examples of labeled results are shown in Figure 4. Each microscope image included about 10-20 open stomata. The experimental network was achieved based on the model of Matterport [36] according to the actual requirements of the experiment (for a detailed introduction to the Mask R-CNN, refer to Section 2.2.1). The validation and test sets were used to optimize the network hyperparameters. Details of the hyperparameter adjustment process are found in Section 2.4. The second step used the trained Mask R-CNN to detect and segment the pore region in the test set, as well as to generate a binary map with the same size as that of the input image for each pore region. To measure the pore anatomy parameters, we cropped the binary map to create a small map for use in the stomatal pore measurement model, thus ensuring that the pore was located in the center. The mask area on the cropped binary map is represented by the white region in Figure 3. The measurement model performed boundary extraction (the red line in Figure 3) on the mask area to obtain the contour coordinates of the mask area. The third step was ellipse fitting based on least squares [37]. The ellipse fitting (the blue line in Figure 3) of the mask area was performed according to the contour coordinates of the mask region, and the length, width, area, eccentricity, and stomatal aperture of the pores were the output (for a detailed introduction to pore measurement, refer to Section 2.2.2).

Methods
The aim of this study was to acquire a fully automated method for stomatal pore segmentation and the quantitative measurement of plants. The overall flow of the proposed method is shown in Figure 3. The input was a stomatal microscope image. The output was the stomatal pore anatomy parameters detected by the model, including the length and width of the pore and the pore area, the elliptical eccentricity based on the pore [26], and the degree of stomatal opening-for a detailed introduction of anatomy parameters, refer to Section 2.2.2. The method consisted of 3 steps: (1) make a network training set, validation set, and test set to obtain a network model; (2) use the trained network model to obtain the mask (segmentation results) contour coordinates of each pore in the test image; and (3) obtain the parameters of the pores based on the least squares fitting of ellipses according to the contour coordinates of the mask and stomatal pore measurement model [34].  First, all acquired black poplar microscope images were randomly divided into a training set, a validation set, and a test set. The labeling tool LabelMe [35] was used to perform manual labeling under the guidance of a botanist. Examples of labeled results are shown in Figure 4. Each microscope image included about 10-20 open stomata. The experimental network was achieved based on the model of Matterport [36] according to the actual requirements of the experiment (for a detailed introduction to the Mask R-CNN, refer to Section 2.2.1). The validation and test sets were used to optimize the network hyperparameters. Details of the hyperparameter adjustment process are found in Section 2.4. The second step used the trained Mask R-CNN to detect and segment the pore region  in the test set, as well as to generate a binary map with the same size as that of the input image for each pore region. To measure the pore anatomy parameters, we cropped the binary map to create a small map for use in the stomatal pore measurement model, thus ensuring that the pore was located in the center. The mask area on the cropped binary map is represented by the white region in Figure  3. The measurement model performed boundary extraction (the red line in Figure 3) on the mask area to obtain the contour coordinates of the mask area. The third step was ellipse fitting based on least squares [37]. The ellipse fitting (the blue line in Figure 3) of the mask area was performed according to the contour coordinates of the mask region, and the length, width, area, eccentricity, and stomatal aperture of the pores were the output (for a detailed introduction to pore measurement, refer to Section 2.2.2).

Architecture of the Model
In this work, a Mask R-CNN based on deep learning was used to detect and segment the stomatal pores of a microscope image of black poplar. The Mask R-CNN is a two-stage instance segmentation algorithm (where the first stage is a region proposal that judges whether a positive object exists or not and the second stage predicts the category of the object that is output from the first stage and describes their masks) and is composed of the following four modules, whose network architecture is shown in Figure 5.

Architecture of the Model
In this work, a Mask R-CNN based on deep learning was used to detect and segment the stomatal pores of a microscope image of black poplar. The Mask R-CNN is a two-stage instance segmentation algorithm (where the first stage is a region proposal that judges whether a positive object exists or not and the second stage predicts the category of the object that is output from the first stage and describes their masks) and is composed of the following four modules, whose network architecture is shown in Figure 5.  (1) Feature Extraction: The Mask R-CNN is composed of a Resnet50 network and a feature pyramid network (FPN) that have been turned into a top-down multiscale feature extraction network. The use of a multiscale feature extraction network is superior to the use of the ResNet50 network in isolation as an extraction network. The FPN transforms the feature map (C1-C5) extracted by the residual network into different scale feature maps by up-sampling (P2-P5) and max-pooling (P6 subsamples from P5 with a stride of 2, only used in the region proposal network). This architecture makes the network pay attention to both detailed information and semantic information in the images. In the process of feature extraction of the neural network, the shallower the layer in the network, the more detailed the information in the image; however, the sematic information of the image is ignored. On the contrary, the deeper the network layer, the poorer the detailed information becomes-but the sematic information of this image improves. The FPN uses the method of cascade links with different scale feature maps by sharing information, acquiring an improved feature extraction result.
(2) Region Proposal Network (RPN): This network proposes a probable target region by the branches of sub-classification and sub-regression. Anchors are used in this process. In one image, a different size of the target means it cannot be predicted with a cell of the same size, so anchors with different aspect ratios are generated by the network at each pixel, and a feature map generates multiple anchors. Anchors slide into the feature map and generate the candidate region features, which are extracted as low-dimensional features and then sent to the sub-fully connection layer in a process called boundary box regression; additionally, the classification layers acquire the proposed target. (1) Feature Extraction: The Mask R-CNN is composed of a Resnet50 network and a feature pyramid network (FPN) that have been turned into a top-down multiscale feature extraction network. The use of a multiscale feature extraction network is superior to the use of the ResNet50 network in isolation as an extraction network. The FPN transforms the feature map (C1-C5) extracted by the residual network into different scale feature maps by up-sampling (P2-P5) and max-pooling (P6 subsamples from P5 with a stride of 2, only used in the region proposal network). This architecture makes the network pay attention to both detailed information and semantic information in the images. In the process of feature extraction of the neural network, the shallower the layer in the network, the more detailed the information in the image; however, the sematic information of the image is ignored. On the contrary, the deeper the network layer, the poorer the detailed information becomes-but the sematic information of this image improves. The FPN uses the method of cascade links with different scale feature maps by sharing information, acquiring an improved feature extraction result.
(2) Region Proposal Network (RPN): This network proposes a probable target region by the branches of sub-classification and sub-regression. Anchors are used in this process. In one image, a different size of the target means it cannot be predicted with a cell of the same size, so anchors with different aspect ratios are generated by the network at each pixel, and a feature map generates multiple anchors. Anchors slide into the feature map and generate the candidate region features, which are extracted as low-dimensional features and then sent to the sub-fully connection layer in a process called boundary box regression; additionally, the classification layers acquire the proposed target. (3) RoIAllign: In the head architecture of the Mask R-CNN, the input size is fixed to 7 × 7. However, the region of interest (ROI) size for the RPN of each target is different, so RoIAlign is required for size normalization. The difference between the RoIAlign and RoIPooling structure used by the Faster R-CNN is that the former uses bilinear interpolation in pixel processing and not the rounding operation used by the latter. This improves the accuracy of mask generation in segmentation tasks.
(4) Mask-Generated: The Mask R-CNN adds an FCN on the basis of the Faster R-CNN to generate a mask for the target, and several branches (box regression, classification, and mask-generated) operate in parallel. After the feature map passes the RPN, the ROI is sent to the mask-generated branch.
The total loss function is defined as: where L cls is the SoftMax loss, L reg is the smooth L 1 loss, and L mask is the binary cross-entropy loss. L reg is given by: where the smooth function is defined as smooth L1 (x) = 0.5x 2 ,|x|<=1 |x|−0.5, otherwise , t represents the box coordinates generated by the prediction, and t* represents the manually labeled box coordinates.
The L mask is defined as: where s is the true binary mask by manual label and s* is the predicted binary mask.

Stomatal Pore Measurement
The pore measurement algorithm consists of mask contour coordinate extraction, ellipse fitting, and parameter calculation. The pore parameter measurement process is shown in Figure 6. (3) RoIAllign: In the head architecture of the Mask R-CNN, the input size is fixed to 7 × 7. However, the region of interest (ROI) size for the RPN of each target is different, so RoIAlign is required for size normalization. The difference between the RoIAlign and RoIPooling structure used by the Faster R-CNN is that the former uses bilinear interpolation in pixel processing and not the rounding operation used by the latter. This improves the accuracy of mask generation in segmentation tasks.
(4) Mask-Generated: The Mask R-CNN adds an FCN on the basis of the Faster R-CNN to generate a mask for the target, and several branches (box regression, classification, and maskgenerated) operate in parallel. After the feature map passes the RPN, the ROI is sent to the maskgenerated branch.
The total loss function is defined as: where cls L is the SoftMax loss, reg L is the smooth L1 loss, and mask L is the binary cross-entropy loss. reg L is given by: where the smooth function is defined as , t represents the box coordinates generated by the prediction, and t* represents the manually labeled box coordinates. The mask L is defined as: where s is the true binary mask by manual label and s* is the predicted binary mask.

Stomatal Pore Measurement
The pore measurement algorithm consists of mask contour coordinate extraction, ellipse fitting, and parameter calculation. The pore parameter measurement process is shown in Figure 6. The mask generation branch of the network generates a binary image with the same size as the input image for each stomatal pore in each test image. Each pixel in the image is used as a coordinate point. After calculation, 4.8 pixels = 1 μm. The value of each point is either false or true, where false means that the area in which the pixel is located is not the mask area of the pores and true means that the pixel is in the area of the pores in the mask. Each picture generates n binary maps through the mask generation branch, where n is the number of detected pores in the picture. The stomatal pore measurement model extracts the boundary coordinates of the mask in each binary image and crops the pore region from these binary images, and-through the least squares ellipse fitting technique [34,37]-the values of the pore length and width [24] can be obtained and recorded as 2a and 2b, as shown in Figure 6. In order to further analyze the physiological characteristics of the stomatal pores, the anatomical parameters of pores can also be obtained, including area, eccentricity, and the degree of pore opening. The mask generation branch of the network generates a binary image with the same size as the input image for each stomatal pore in each test image. Each pixel in the image is used as a coordinate point. After calculation, 4.8 pixels = 1 µm. The value of each point is either false or true, where false means that the area in which the pixel is located is not the mask area of the pores and true means that the pixel is in the area of the pores in the mask. Each picture generates n binary maps through the mask generation branch, where n is the number of detected pores in the picture. The stomatal pore measurement model extracts the boundary coordinates of the mask in each binary image and crops the pore region from these binary images, and-through the least squares ellipse fitting technique [34,37]-the values of the pore length and width [24] can be obtained and recorded as 2a and 2b, as shown in Figure 6. In order to further analyze the physiological characteristics of the stomatal pores, the anatomical parameters of pores can also be obtained, including area, eccentricity, and the degree of pore opening. The pore area represents the size of the channel for gas and moisture exchange [9], and it determines the stomatal conductance to CO 2 and H 2 O [17], which is defined as the product of the pore length (2a), pore width (2b), and pi. The pore area is: Stomatal aperture adjusts depending on the prevailing ambient environmental conditions [38], which is the ratio of the pore width (2b) to the pore length (2a). The stomatal aperture is defined as: The eccentricity of the pores after being fit into an ellipse can also be used to calculate the stomatal aperture [26]. The eccentricity is determined as follows:

Evaluation Indices
In addition to a visual assessment, several quantitative indices were calculated to evaluate the performance of the proposed method in comparison with the predicted and ground truth, and these included precision, recall, the intersection of union (IoU), and pore measurement accuracy.
Precision and recall are widely used to evaluate the performance of object detection methods. Precision and recall are defined as follows: where True Positive (TP) means that the pores are correctly identified by the model from the defined pore region, False Positive (FP) means the background is misidentified as pores, and False Negative (FN) means the pores are misidentified as the background. The IoU index can be used to compare similarities and differences between finite sample sets. The greater the IoU coefficient, the higher the similarity between the segmentation result and the corresponding ground truth.
The IoU index is defined as follows: where S pred is the detection result location and S gt is the real object location. We used the relative error to verify the effect of stomatal pore measurement. In this paper, the relative errors of the parameters of the true mask of each pore in the image and the fitting ellipses of the segmentation mask were obtained. The relative error formula for each pore is as follows: where gt parameter represents the anatomical parameters of pore areas marked by ground truth after ellipse fitting (the pore length and width, area, eccentricity, and stomatal aperture) and pred parameter represents the anatomical parameters of the mask obtained by our method of segmentation after ellipse fitting.

Model Parameters and Operating Environment
In order for the original network to train our self-made dataset, we adjusted the output channel of the fully connected layer of the network from 81 to 2 (for the two categories of pore and background) prior to network training. The weights generated by the Mask R-CNN training on the balloon dataset were used as the initial weights for training [36]. Compared with the training result using the weights downloaded with ImageNet, these weights resulted in a faster convergence of the model parameters with an increased accuracy of results. In order to make the network perform optimally, we adjusted the hyperparameters, including the training epoch, learning rate, size of the image, and batch size of the network, according to the characteristics of the data. During the training process, we determined the number of training epochs according to the change trend of the validation set accuracy and adjusted other hyperparameters, including the learning rate, size of image, and batch size, according to the test accuracy of the model on the test set. In the proposed method, the epochs for network training were (1) 40, (2) 120, and (3) 160, which were the numbers of train epochs for the (1) head architecture, (2) depth 4+ of ResNet50, and (3) the entire network, respectively, and there were 100 steps in each epoch. The batch size was set as 1; the size of the image was resized to 1024 × 1024; the learning rate was (1) 0.001, (2) 0.001, and (3) 0.0001 (the number of train epochs for the (1) head architecture, (2) depth 4+ of Resnet, and (3) entire network, respectively); the weight decay rate was 0.0001, the learning momentum was 0.9; and the detected confidence was 90%. The hyperparameter settings are shown in Table 1. In this study, the experiments were performed using the deep learning platforms Keras (2.0.8) and Tensorflow (1.10.0) in Python 3.5. The hardware support for the experiment was a Nvidia GeForce RTX2080ti. The operating system was Windows 10 with an Intel (R) Core (TM) i7-9700k, 3.6 GHz CPU, and 16 GB memory.

Pore Detection and Segmentation
Examples of the detection and segmentation results of the proposed method are shown in Figure 7. As can be seen in Figure 7, the proposed method achieved a good performance regarding the detection and segmentation of stomatal pores. We randomly divided the training and test sets into 10 subdatasets, used nine subdatasets to train the model, and used the rest of the subdataset for testing. Through the statistical analysis of 188 test images and after 10-fold cross validation, an average of 2201 stomata were detected from an average of 2278 stomata marked by the proposed method. In order to quantitatively analyze the validity and feasibility of the proposed model, the areas corresponding to stomatal pores were manually labeled using the LabelMe [35] marking tool under the guidance of botanical experts, and the obtained results were used as the ground truth (GT). The annotation results were compared with the measured results obtained by the proposed method. The average precision was 96.72%, the average recall rate was 96.87%, and the average IoU of the pore was 0.82. The average time to process an image was about 912 ms. The complete code for the project can be accessed at https://github.com/lijunyu159/stomatal_pore_measurement-MaskRCNN (accessed on 15 July 2020).

Pore Measurement
The black poplar stomatal pore parameters of the ground truth value and the corresponding predicted values were calculated, and the corresponding relative error of measurement was acquired. The error results (the average value after 10-fold cross validation) are shown in Table 2, and a scatterplot of the true values against the predicted values is shown in Figure 8.

Pore Measurement
The black poplar stomatal pore parameters of the ground truth value and the corresponding predicted values were calculated, and the corresponding relative error of measurement was acquired. The error results (the average value after 10-fold cross validation) are shown in Table 2, and a scatterplot of the true values against the predicted values is shown in Figure 8.  The calculation process is as follows: (1) Each manually labeled stomatal pore corresponding to the mask generated by the model is found according to the maximized IoU; (2) through the 'Stomatal Pore Measurement' section and the formulas proposed in Section 2.3, evaluation indices, the anatomical parameters of the manually labeled pore region, the anatomical parameters of the mask generated by the model are obtained; and (3) the relative errors between them can be calculated the average relative error for each of the pore anatomical parameters is summed separately for each respective pore and divided by the total number of pores.
In addition, the relationship between the stomatal aperture and the measurement error was analyzed. Based on 10-fold cross validation, the number of pores with a stomatal aperture larger than 40% was an average of 469, that larger than 30% and less than 40% was an average of 954, that larger than 20% and less than 30% was an average of 646, and that larger than 10% and less than 20% was an average of 91. The relationship between the degree of stomatal opening and the average measurement error is shown in Figure 9. It can be seen that the accuracy of measurement results was proportional to the degree of stomatal opening.   The calculation process is as follows: (1) Each manually labeled stomatal pore corresponding to the mask generated by the model is found according to the maximized IoU; (2) through the 'Stomatal Pore Measurement' section and the formulas proposed in Section 2.3, evaluation indices, the anatomical parameters of the manually labeled pore region, the anatomical parameters of the mask generated by the model are obtained; and (3) the relative errors between them can be calculated the average relative error for each of the pore anatomical parameters is summed separately for each respective pore and divided by the total number of pores.
In addition, the relationship between the stomatal aperture and the measurement error was analyzed. Based on 10-fold cross validation, the number of pores with a stomatal aperture larger than 40% was an average of 469, that larger than 30% and less than 40% was an average of 954, that larger than 20% and less than 30% was an average of 646, and that larger than 10% and less than 20% was an average of 91. The relationship between the degree of stomatal opening and the average measurement error is shown in Figure 9. It can be seen that the accuracy of measurement results was proportional to the degree of stomatal opening. The calculation process is as follows: (1) Each manually labeled stomatal pore corresponding to the mask generated by the model is found according to the maximized IoU; (2) through the 'Stomatal Pore Measurement' section and the formulas proposed in Section 2.3, evaluation indices, the anatomical parameters of the manually labeled pore region, the anatomical parameters of the mask generated by the model are obtained; and (3) the relative errors between them can be calculated the average relative error for each of the pore anatomical parameters is summed separately for each respective pore and divided by the total number of pores.
In addition, the relationship between the stomatal aperture and the measurement error was analyzed. Based on 10-fold cross validation, the number of pores with a stomatal aperture larger than 40% was an average of 469, that larger than 30% and less than 40% was an average of 954, that larger than 20% and less than 30% was an average of 646, and that larger than 10% and less than 20% was an average of 91. The relationship between the degree of stomatal opening and the average measurement error is shown in Figure 9. It can be seen that the accuracy of measurement results was proportional to the degree of stomatal opening.

Algorithm Comparison
The proposed method was compared with Li's method [30] in terms of measurement accuracy and time. Li's method used a Faster R-CNN and CV model to detect and segment pores, and the anatomical parameters of pores were obtained with an ellipse fitting technique. The disadvantages of this method are as follows: (1) Only one single stoma can be used at once during stomatal pore segmentation, resulting in a long overall process time (about 1.58 s per stoma); (2) the CV model needs to be manually adjusted in the process of segmenting pores; and (3) Li's method is incapable of fitting an incomplete pore at the boundary of an image, two different examples of original images as shown in Figure 10a (top row and bottom row), and corresponding failed segmentation examples as shown in Figure 10b. Our method improves upon the drawbacks of Li's method. First, we only used a network model, the Mask R-CNN, to achieve pore positioning and segmentation, resulting in a great improvement in measurement speed. Second, the neural network adjusted the parameters according to the stomatal pore characteristics of the training set to obtain a network model with a good degree of generalization, thus eliminating the need to adjust the parameters twice. Third, the proposed method could segment and perfectly fit the pore at the boundary of the image (top row and bottom row of Figure 10a), the corresponding segmentation and ellipse fitting results as shown in Figure 10c,d, respectively.

Algorithm Comparison
The proposed method was compared with Li's method [30] in terms of measurement accuracy and time. Li's method used a Faster R-CNN and CV model to detect and segment pores, and the anatomical parameters of pores were obtained with an ellipse fitting technique. The disadvantages of this method are as follows: (1) Only one single stoma can be used at once during stomatal pore segmentation, resulting in a long overall process time (about 1.58 s per stoma); (2) the CV model needs to be manually adjusted in the process of segmenting pores; and (3) Li's method is incapable of fitting an incomplete pore at the boundary of an image, two different examples of original images as shown in Figure 10a (top row and bottom row), and corresponding failed segmentation examples as shown in Figure 10b. Our method improves upon the drawbacks of Li's method. First, we only used a network model, the Mask R-CNN, to achieve pore positioning and segmentation, resulting in a great improvement in measurement speed. Second, the neural network adjusted the parameters according to the stomatal pore characteristics of the training set to obtain a network model with a good degree of generalization, thus eliminating the need to adjust the parameters twice. Third, the proposed method could segment and perfectly fit the pore at the boundary of the image (top row and bottom row of Figure 10a), the corresponding segmentation and ellipse fitting results as shown in Figure 10c and d, respectively. In this experiment, we used Li's method to test our data. The comparison results are given in Table 3. Table 3. Comparison of the mean relative error of the anatomical parameters of the two methods of Li [30] and that proposed in this paper.

Methods
Li's Proposed Average pore length error 16 In Table 3, it can be seen that the errors for the proposed method were clearly smaller than those of Li's method.

Model Generalization Ability
To test the generalization ability of the proposed model, we tested the pores of two tree species: ginkgo and poplar. The corresponding datasets can be found in [39].
The images of the poplar were divided into (1)   In this experiment, we used Li's method to test our data. The comparison results are given in Table 3. Table 3. Comparison of the mean relative error of the anatomical parameters of the two methods of Li [30] and that proposed in this paper.

Li's Proposed
Average pore length error 16.8% 5.3% Average pore width error 19.3% 6.5% Average area error 37.2% 9.27% Average eccentricity error 1.5% 0.91% Average stomatal aperture error 13% 7.05% In Table 3, it can be seen that the errors for the proposed method were clearly smaller than those of Li's method.

Model Generalization Ability
To test the generalization ability of the proposed model, we tested the pores of two tree species: ginkgo and poplar. The corresponding datasets can be found in [39]. The images of the poplar were divided into (1) 60, (2) 20, and (3) 20 image sets, and those of the ginkgo were divided into (1) 55, (2) 18, and (3) 18 image sets ((1) training set, (2) validation set, and (3) test set). We only used the test dataset of the poplar and ginkgo to test the performance of the trained model based on the black poplar dataset without fine-tuning. The process of fine-tuning was on the basis of the model trained with black poplar. The training sets for poplar and gingko were used to train the model with transfer learning.
A comparison of the segmentation results of the model without and with fine-tuning for poplar is shown in Figure 11. (3) test set). We only used the test dataset of the poplar and ginkgo to test the performance of the trained model based on the black poplar dataset without fine-tuning. The process of fine-tuning was on the basis of the model trained with black poplar. The training sets for poplar and gingko were used to train the model with transfer learning. A comparison of the segmentation results of the model without and with fine-tuning for poplar is shown in Figure 11. A comparison of the segmentation results of the model without and with fine tuning for ginkgo is shown in Figure 12. A comparison of the segmentation results of the model without and with fine tuning for ginkgo is shown in Figure 12. The model did not show a good generalization ability without fine-tuning because it lacked diversity in sample features since we only used black poplar as the training dataset. However, the model performed well with fine-tuning when using a small dataset. Table 4 shows the improvement of the generalization ability of the model with fine-tuning through transfer learning. After the model was retrained with a small dataset of a different tree species, the detection precision and recall rates were greatly improved, thus indicating that this model has some generalization ability. The model did not show a good generalization ability without fine-tuning because it lacked diversity in sample features since we only used black poplar as the training dataset. However, the model performed well with fine-tuning when using a small dataset. Table 4 shows the improvement of the generalization ability of the model with fine-tuning through transfer learning. After the model was retrained with a small dataset of a different tree species, the detection precision and recall rates were greatly improved, thus indicating that this model has some generalization ability.  Table 5 shows that the model performed well in parameter measurement with fine-tuning.

Discussion
The results of the experiment demonstrated that the proposed method could obtain a higher segmentation accuracy for most stomatal pores and does not require non-uniform illumination correction for leaf microscope images. For black poplar, the pore detection accuracy of the proposed method was 96.72% and the recall rate was 96.87%. Examples of the stomata that our method failed to detect are shown in Figure 13. Reasons for this failure include that the degree of stomatal aperture was too small, the improper operation of the microscope resulted in fuzzy images, and the presence of impurities such as trichomes in the pores.

Discussion
The results of the experiment demonstrated that the proposed method could obtain a higher segmentation accuracy for most stomatal pores and does not require non-uniform illumination correction for leaf microscope images. For black poplar, the pore detection accuracy of the proposed method was 96.72% and the recall rate was 96.87%. Examples of the stomata that our method failed to detect are shown in Figure 13. Reasons for this failure include that the degree of stomatal aperture was too small, the improper operation of the microscope resulted in fuzzy images, and the presence of impurities such as trichomes in the pores.
In the first example, the stomatal aperture was too small, as shown in the first row in Figure 13. After measurement, the minimum threshold at which the model can be segmented was 1.86 pixels, about 0.38 μm. Secondly, stomatal pores were blurred due to the improper operation of the microscope, which prevented the successful detection of the pores, as shown in the second row in Figure 13. Lastly, because the dataset was composed of living stomata, there were many impurities such as trichomes that could be found in the pores of stomata, thus preventing their successful detection by the algorithm, as shown in the third row in Figure 13. In this work, when the pores were manually labeled, we made a label rule that the pores at the edge of the image had to be exposed by more than 50% before being considered pores. Otherwise, they were regarded as part of the background of the image. As shown in Figure 14, when we manually conducted labeling (whether it was for the training, validation, or test sets), we considered In the first example, the stomatal aperture was too small, as shown in the first row in Figure 13. After measurement, the minimum threshold at which the model can be segmented was 1.86 pixels, about 0.38 µm. Secondly, stomatal pores were blurred due to the improper operation of the microscope, which prevented the successful detection of the pores, as shown in the second row in Figure 13. Lastly, because the dataset was composed of living stomata, there were many impurities such as trichomes that could be found in the pores of stomata, thus preventing their successful detection by the algorithm, as shown in the third row in Figure 13.
In this work, when the pores were manually labeled, we made a label rule that the pores at the edge of the image had to be exposed by more than 50% before being considered pores. Otherwise, they were regarded as part of the background of the image. As shown in Figure 14, when we manually conducted labeling (whether it was for the training, validation, or test sets), we considered these pores as part of the background, and they remained unlabeled. When the proposed model was used for testing, similar to the first picture in the last line of Figure 14, they were detected as pores, such that the number of pores detected by the algorithm was slightly greater than the number of manually labeled pores and the precision rate of the algorithm was lower than expected. these pores as part of the background, and they remained unlabeled. When the proposed model was used for testing, similar to the first picture in the last line of Figure 14, they were detected as pores, such that the number of pores detected by the algorithm was slightly greater than the number of manually labeled pores and the precision rate of the algorithm was lower than expected. Figure 14. The case of error prediction in the model (the black area is the automatic filling of the image by the algorithm).
In addition, the proposed method incorrectly identified non-stomatal pores as pores due to the influence of guard cells and leaf background colors in the microscope images of living plant leaves, as shown in Figure 15, thus reducing the precision rate of the proposed method.

Conclusions
In this work, an automatic method for the segmentation and measurement of plant stomatal pores based on the Mask R-CNN model was proposed. The method consists of three parts: segmentation based on the Mask R-CNN model, the extraction of contour coordinates for pore region segmentation results for ellipse fitting, and the calculation of pore anatomy parameters. After 10-fold cross validation by segmenting and measuring an average of 2201 pores, the average measurement accuracies of 10 results of the (1) pore length, (2) pore width, (3) area, (4) eccentricity, and (5) stomatal aperture were (1) 94.66%, (2) 93.54%, (3) 90.73%, (4) 99.09%, and (5) 92.95%, respectively. The experimental results showed that the proposed method can provide more accurate stomatal pore anatomy parameters in comparison with state-of-the-art stomatal segmentation methods. After finetuning with small datasets in the optimized model, the proposed method also demonstrated good performance when applied to other species, which could reduce the labeling workload of researchers to some extent. In future work, we will apply the proposed method to the stomata of more plant species and further improve the generalization ability of the proposed model.   In addition, the proposed method incorrectly identified non-stomatal pores as pores due to the influence of guard cells and leaf background colors in the microscope images of living plant leaves, as shown in Figure 15, thus reducing the precision rate of the proposed method.
Forests 2020, 11, x FOR PEER REVIEW 16 of 18 these pores as part of the background, and they remained unlabeled. When the proposed model was used for testing, similar to the first picture in the last line of Figure 14, they were detected as pores, such that the number of pores detected by the algorithm was slightly greater than the number of manually labeled pores and the precision rate of the algorithm was lower than expected. Figure 14. The case of error prediction in the model (the black area is the automatic filling of the image by the algorithm).
In addition, the proposed method incorrectly identified non-stomatal pores as pores due to the influence of guard cells and leaf background colors in the microscope images of living plant leaves, as shown in Figure 15, thus reducing the precision rate of the proposed method.

Conclusions
In this work, an automatic method for the segmentation and measurement of plant stomatal pores based on the Mask R-CNN model was proposed. The method consists of three parts: segmentation based on the Mask R-CNN model, the extraction of contour coordinates for pore region segmentation results for ellipse fitting, and the calculation of pore anatomy parameters. After 10-fold cross validation by segmenting and measuring an average of 2201 pores, the average measurement accuracies of 10 results of the (1) pore length, (2) pore width, (3) area, (4) eccentricity, and (5) stomatal aperture were (1) 94.66%, (2) 93.54%, (3) 90.73%, (4) 99.09%, and (5) 92.95%, respectively. The experimental results showed that the proposed method can provide more accurate stomatal pore anatomy parameters in comparison with state-of-the-art stomatal segmentation methods. After finetuning with small datasets in the optimized model, the proposed method also demonstrated good performance when applied to other species, which could reduce the labeling workload of researchers to some extent. In future work, we will apply the proposed method to the stomata of more plant species and further improve the generalization ability of the proposed model.

Conclusions
In this work, an automatic method for the segmentation and measurement of plant stomatal pores based on the Mask R-CNN model was proposed. The method consists of three parts: segmentation based on the Mask R-CNN model, the extraction of contour coordinates for pore region segmentation results for ellipse fitting, and the calculation of pore anatomy parameters. After 10-fold cross validation by segmenting and measuring an average of 2201 pores, the average measurement accuracies of 10 results of the (1) pore length, (2) pore width, (3) area, (4) eccentricity, and (5) stomatal aperture were (1) 94.66%, (2) 93.54%, (3) 90.73%, (4) 99.09%, and (5) 92.95%, respectively. The experimental results showed that the proposed method can provide more accurate stomatal pore anatomy parameters in comparison with state-of-the-art stomatal segmentation methods. After fine-tuning with small datasets in the optimized model, the proposed method also demonstrated good performance when applied to other species, which could reduce the labeling workload of researchers to some extent. In future work, we will apply the proposed method to the stomata of more plant species and further improve the generalization ability of the proposed model.