1. Introduction
Image processing approaches have been used in a range of applications. One of the challenging and interesting applications is plant monitoring, where camera systems are employed to capture information of plants for further tasks, such as leaf detection and counting [
1,
2,
3,
4,
5], growth monitoring [
6,
7,
8,
9], leaf classification [
10], disease detection [
11,
12,
13], stress monitoring [
14,
15], and phenotyping [
16,
17,
18].
A leaf segmentation technique is commonly used as the preliminary process, which extracts the leaf area from the background in the image. Once the leaf area is detected, image analysis of the leaf object can be carried out. The most common leaf segmentation techniques are thresholding-based approaches, watershed, random walker, K-Means, artificial neural networks (ANNs), and color-index-based approaches.
HSV color thresholding was used in [
19] for rice plant segmentation as the initial stage of plant height measurement. Color thresholding using the CIELAB color model was used in [
20] for vine leaf segmentation from real environment images. HSI color segmentation was employed in [
9] to extract leafy vegetables using a Kinect sensor. In [
21], thresholding using a new color model was used to measure the leaf area of lettuce plants. Otsu thresholding based on the hue component was used for leaf area measurement in [
22]. A software tool for estimating the leaf area was developed in [
23] using RGB color thresholding to extract the leaf from the background.
The watershed algorithm has been used for rosette plant leaf segmentation [
16] with a Dice score greater than 90%, cotton leaf segmentation [
24] with a correct rate of 98%, and vegetable leaf segmentation [
25]. The random walker algorithm was used in an interactive tool for leaf annotation [
26] with a Dice of 97%. The robust random walker algorithm was proposed in [
27] for leaf segmentation under different conditions with an F-measure of 98%. The K-Means algorithm was used for extraction of paddy leaf images for detecting leaf disease [
12], conducting tomato leaf segmentation [
28] with an F1 score of 98%, and detecting defective regions in leaves [
29]. Deep learning was used in [
2] for leaf counting. Mask R-CNN was used in [
10] for leaf segmentation with a misclassification error of 1% and classification against complex backgrounds. A convolutional neural network (CNN) was used to detect and recognize leaf diseases [
13]. Leaf segmentation based on color indexes, including normalized difference index (NDI), excess green minus excess red index (ExGR), and color index of vegetation extraction (CIVE), was addressed in [
30] with segmentation rates of 80.7%, 80.9%, and 81.4%, respectively.
Even though the previously described leaf segmentation techniques showed a high performance, they used a visible camera (RGB camera). Thus, they worked in the daytime. Furthermore, most of them were not real-time systems. Other cameras, such as an infrared camera, are also employed. The NoIR camera, which is a standard camera without the infrared filter, thus allowing the infrared spectrum (around 880 nm) to be captured by the sensor, was used in [
3,
4,
31,
32]. A near-infrared (NIR) camera was used in [
5] for leaf phenotyping. A stereo infrared camera that worked in the NIR spectrum (700 nm to 1400 nm) was used in [
7] for leaf growth modeling. To the best of our knowledge, no such works have used a thermal camera for leaf segmentation. The typical applications of a thermal camera in plant monitoring are for plant canopy temperature measurement [
33,
34,
35,
36], plant water status monitoring [
34,
37], and leaf stomata conductance measurement [
14,
38]. A technique related to leaf segmentation was proposed by [
36], in which a visible camera is combined with a thermal camera. In that system, the visible camera was used for leaf segmentation in order to define the region for temperature measurement by the thermal camera.
The thresholding technique is a simple, effective method for leaf segmentation, thus it is suitable for implementation on an embedded device for real-time application. However, the algorithm is sensitive to lighting changes. Therefore, the segmentation performance under different lighting should be adequately evaluated.
In this paper, we present the results of an experiment on techniques for Vetiveria zizanioides leaf segmentation using three camera sensors: a visible camera (standard RGB camera), a non-infrared filter (NoIR) camera, and a thermal camera. We evaluated the performance of several popular image segmentation techniques implemented with the three cameras. The objective was to find the best solution for a low-cost embedded camera system for real-time implementation of leaf monitoring.
The main contributions of this study are as follows:
The image acquisition system employs low-cost camera systems suitable for real-time implementation.
The evaluated image segmentation techniques are fast-computation algorithms suitable for implementation on the embedded device.
The leaf segmentation performance of camera types and segmentation techniques was compared in an outdoor environment during the day and at night.
This is the first work to investigate the feasibility of using a thermal camera for leaf segmentation.
The image dataset used for evaluation comprised real images captured from the natural outdoor environment.
The rest of the paper is organized as follows:
Section 2 presents the materials and methods.
Section 3 presents the result and discussion.
Section 4 covers the conclusion.
2. Materials and Methods
2.1. Image Data Collection
The image datasets were prepared using images captured by the multi-camera system in the outdoor environment. The multi-camera system contains 3 camera sensors: visible, NoIR, and thermal, as shown in
Figure 1. Each camera is connected to a Raspberry Pi 3 Model B+ for image processing and data storage. The visible camera, on the right side of the figure, uses a Sony IMX219 camera sensor with an image resolution of 8 megapixels. The NoIR camera system, on the left side, consists of a camera module with a 5-megapixel Omnivision 5647 sensor (without an infrared filter), and a pair of infrared LEDs. The thermal camera, in the center, is a Seek Thermal CompactPro with a resolution of 320 × 240 pixels, connected to the Raspberry Pi module via a USB interface, as shown in the figure; its vertical position is above both the visible and NoIR cameras. The image data were also sent to Google Drive every 5 min for easy data storage and access. The images were captured continuously for a whole day (day and night). The proposed camera systems provide a low-cost solution, where the prices of the Raspberry Pi module and visible, NoIR, and thermal cameras are about USD 45, USD 7, USD 28, and USD 384, respectively.
The plant used for image data collection was
Vetiveria zizanioides, planted in polybags and put in an outdoor environment (yard) during image collection. Photographs of the plants and the environment taken by the visible, NoIR, and thermal cameras are shown in
Figure 2,
Figure 3 and
Figure 4, respectively. Due to the arrangement of the three cameras (as shown in
Figure 1), the image viewpoint of each camera is different. Therefore, to make a fair comparison, we employed image warping to transform the images taken by the NoIR and thermal cameras to the match the reference image taken by the visible camera. In addition, to optimize the cameras’ resolution and the computation time of the image processor, all images were resized to 600 × 400 pixels. It is noted that even though the resized images of the thermal camera are larger than the original ones, it will not affect the performance significantly for the following reason. A thermal camera captures images based on the thermal energy emitted by the objects; thus, the parts of things appear in uniform color (as shown in
Figure 4) rather than the precise color as with the visible camera. Therefore, resizing an image to a higher resolution does not change the detail of objects significantly.
Figure 3a,b depict the original and transformed NoIR images, respectively.
Figure 4a,b depict the original and transformed thermal images, respectively. The figures show that the viewpoints of the NoIR and thermal images are similar to that of the visible image, in the sense that the objects and their orientation in the images are almost the same. For instance, the leaves, polybags, and background objects appear in nearly the same position or area in
Figure 2,
Figure 3b and
Figure 4b. Since the objects’ appearances in the observed images (visible, NoIR, and thermal images) are almost the same, we may evaluate the segmentation performance fairly.
2.2. Image Segmentation Techniques
2.2.1. Thresholding Technique
Thresholding is a simple technique for separating objects from the background by introducing a threshold. Pixels are assigned to the foreground if their intensity is lower than the threshold; otherwise, they are set as background. There are about 40 thresholding algorithms, as investigated in [
39]. In this work, we evaluated 9 thresholding algorithms: Otsu, Multi-Otsu, Yen, Isodata, Li, Local, Minimum, Mean, and Triangle. Short explanations of the algorithms are described in the following.
The cumulative distribution function (
F) is defined as:
where
p(
g) is the probability mass function, and
g is the intensity value in the image (
g = 0, ..., 255). The mean of foreground and background can be expressed as a function of threshold level
T as:
The variance of foreground and background can be expressed as a function of threshold level
T:
Otsu thresholding [
40] finds an optimal threshold
Topt by maximizing the between-class variance of foreground and background, and is given as [
39]:
Multi-level Otsu (Multi-Otsu) thresholding [
41,
42] is an extension of Otsu thresholding, where
M-1 optimal thresholds (
Topt_1, …,
Topt_M-1) are calculated for multi-class (
M classes) separation.
Yen thresholding [
43] finds an optimal threshold
Topt by maximizing the entropic correlation, and is given as [
39]:
where
Isodata thresholding [
44] finds an optimal threshold
Topt, which is defined as [
39]:
Li thresholding [
45,
46] finds an optimal threshold
Topt, by minimizing the information theoretic distance, which is given as [
39]:
where
In Local thresholding [
47], an optimal threshold is calculated for each pixel by considering the mean value of its neighbors. The optimal threshold at pixel-(
x,y)
Topt(
x,y) is defined as:
where
mwxw(
x,y) is the local mean value over a window size
w at the neighbors of pixel-(
x,y), and
C is a constant.
Minimum thresholding [
48] finds an optimal threshold
Topt as the valley between two maxima of the histogram, which is defined such that the following formula is satisfied:
where
yg is the number of pixels with intensity value
g.
Mean thresholding [
48] finds an optimal threshold
Topt as the mean of the intensity value, which is calculated as the integer part of the following:
Triangle thresholding [
49] finds an optimal threshold
Topt in the histogram based on the Triangle method, as illustrated in
Figure 5. The algorithm locates a point
A by maximizing the distance between the triangle line and the histogram, then the optimal threshold
Topt is defined by adding a fixed offset
Ofs to
A.
2.2.2. K-Means Segmentation
The K-Means algorithm [
50] is a machine learning technique that is used to cluster a dataset into K classes. The algorithm divides the dataset into classes such that there is maximum distance between classes and minimum distance within the class [
28]. The algorithm is as follows:
Set the number of classes (=K).
Initialize K cluster centers from the dataset randomly.
Calculate the distance of each pixel to each cluster center using a distance function.
Classify each pixel to the nearest cluster, i.e., the closest distance.
Recalculate the cluster center using the pixels belonging to the cluster.
Repeat Steps 2 to 5 until there is no change in the cluster center.
2.3. Performance Measurement of Leaf Segmentation
Leaf segmentation performance was measured using the standard metrics: Recall, Precision, and F1 score. The metrics are defined using true positive (TP), false negative (FN), and false positive (FP). TP indicates that the extracted leaf pixel was correctly identified as a leaf. FN indicates that the leaf was not extracted. FP indicates that the extracted leaf pixel was wrongly identified as a leaf. Recall, Precision, and F1 score are defined as follows:
The summation of TP and FN in (16) represents the ground-truth leaf pixels. Thus, Recall represents the portion of ground-truth leaf pixels present in the extracted leaf pixels in the segmented image [
1]. The summation of TP and FP in (17) represents the extracted leaf pixels. Thus, the Precision represents the portion of extracted leaf pixels in the segmented image that matches the ground-truth leaf pixels [
1]. The F1 score combines Recall and Precision and represents the mean harmonic between them.
In addition to TP, FN, and FP, the true negative (TN) is also computed to evaluate the confusion table composed of TP, FN, FP, and TN. TN indicates that the extracted non-leaf was correctly identified as a non-leaf object.
According to (16)–(18), the F1 score does not consider TN like Matthews’s correlation coefficient or Cohen’s kappa. However, it is noted here that the F1 score is selected for measuring performance rather than Matthews’s correlation coefficient or Cohen’s kappa because our objective is to extract the leaf (foreground) from the image. Thus, we emphasize the TP more than TN. Furthermore, the F1 score is commonly used for evaluating the leaf segmentation performance, as described in
Section 1.
2.4. Image Dataset
The image dataset, collected as described in
Section 2.1, consisted of images captured by visible, NoIR, and thermal cameras. There was a total of 672 images containing:
133 images taken by the visible camera during the day (06:00–17:00 h);
275 images taken by the NoIR camera during the day and at night (06:00–05:00 h (next day)); and
264 images taken by the thermal camera during the day and at night (06:00–05:00 h (next day)).
It is noted here that all three cameras captured the same scene at the same time interval, where each scene consists of four or five Vetiveria zizanioides plants. The data loss during transmission from the camera modules on the site to the cloud storage causes differences in the number of images captured by the NoIR and thermal cameras. This loss is affected by several problems, such as the camera module’s image acquisition error, the internet connection problem, and the error while accessing the Google Drive cloud storage. Since the visible camera only captured images during the day, the number of captured images is half those of the NoIR and thermal cameras.
Since the image data are time-stamped in intervals of 5 min, the image intensity during a whole day can be plotted, as shown in
Figure 6. The blue, red, and green lines represent the intensity of visible, NoIR, and thermal images, respectively. It is clearly shown in the figure that at night, the intensity of the visible image is zero (no objects captured), and the image intensity varies during the day due to variations in sunlight (for visible and NoIR images) and temperature (for thermal images).
Since infrared LEDs are the light source for the NoIR camera, the camera can capture objects at night. However, due to the low power of the LEDs, only objects near the camera are captured. Therefore, the image intensity is low, as shown in
Figure 6.
Figure 6 also shows that there is a significant difference in image intensity between daytime and nighttime images with the NoIR camera. This can be observed in
Figure 7, which shows NoIR images at 13:00 h and 19:00 h. In the thermal image, the difference is insignificant, indicating that the temperature did not drop abruptly at night. This situation can be explained by observing
Figure 8, which shows thermal images at 13:00 h and 19:00 h. In the images, brighter colors (orange and white) represent higher temperatures than dark colors (blue and purple). As shown in
Figure 8, in the night image, some background parts are orange, indicating high temperature. This condition produces images with medium intensity as shown in
Figure 6.
2.5. Ground-Truth Images
Ground-truth images were prepared manually, labeling the leaves using image editor software (Microsoft Paint). The ground truth for visible and NoIR cameras was prepared from visible and NoIR images, respectively. However, due to the different characteristics of a thermal camera, its ground-truth images are challenging to prepare from thermal images; instead, we used the ground-truth images from the visible images. Fortunately, this arrangement complies with our objective to evaluate the feasibility of a thermal camera for leaf segmentation rather than leaf temperature measurement.
Examples of ground-truth images for visible and NoIR cameras are shown in
Figure 9a,b, respectively. In the figures, green represents the leaf object, while other colors represent the non-leaf objects and background.
2.6. Method of Leaf Segmentation Performance
The segmentation algorithms were implemented on a Raspberry Pi 3 Model B+ powered with: Broadcom BCM2837B0, Cortex-A53 64-bit SoC @ 1.4GHz, and run with the Raspberry Pi OS operating system. The algorithms were written in Python using the OpenCV and Scikit-image libraries.
The comparison of leaf segmentation performance is divided into five parts:
Comparison of camera types and segmentation algorithms.
Comparison of camera types only (performance measurement of algorithms is averaged or maximized).
Comparison of segmentation algorithms only (performance measurement of camera types is averaged or maximized).
Comparison of execution time.
Comparison of segmentation algorithms using the well-known tested images.