Next Article in Journal
Modelling Water and Pesticide Transport in Soil with MACRO 5.2: Calibration with Lysimetric Data
Next Article in Special Issue
Weedy Rice Classification Using Image Processing and a Machine Learning Approach
Previous Article in Journal
Experimental Assessment of the Elastic Properties of Exocarp–Mesocarp and Beans of Coffea arabica L. var. Castillo Using Indentation Tests
Previous Article in Special Issue
Assessing the Efficiency of Remote Sensing and Machine Learning Algorithms to Quantify Wheat Characteristics in the Nile Delta Region of Egypt
 
 
Order Article Reprints
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Automated Crop Growth Detection Method Using Satellite Imagery Data

1
Information Management Center, Council of Agriculture Executive Yuan, Taipei City 10014, Taiwan
2
Department of Computer Science and Information Engineering, National Ilan University, Yilan 260007, Taiwan
3
Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei City 106335, Taiwan
4
Department of Management Information Systems, National Pingtung University of Science and Technology, Pingdong 912301, Taiwan
5
Department of Biotechnology and Animal Science, National Ilan University, Yilan 260007, Taiwan
*
Author to whom correspondence should be addressed.
Agriculture 2022, 12(4), 504; https://doi.org/10.3390/agriculture12040504
Received: 17 February 2022 / Revised: 24 March 2022 / Accepted: 28 March 2022 / Published: 2 April 2022
(This article belongs to the Special Issue The Application of Machine Learning in Agriculture)

Abstract

:
This study develops an automated crop growth detection APP, with the functionality to access the cadastral data for the target field, that was to be used for a satellite-imagery-based field survey. A total of 735 ground-truth records of the cabbage cultivation areas in Yunlin were collected via the implemented APP in order to train a deep learning model to make accurate predictions of the growth stages of the cabbage from 0 to 70 days. A regression analysis was performed by the gradient boosting decision tree (GBDT) technique. The model was trained on multitemporal multispectral satellite images, which were retrieved from the ground-truth data. The experimental results show that the mean average error of the predictions is 8.17 days, and that 75% of the predictions have errors less than 11 days. Moreover, the GBDT algorithm was also adopted for the classification analysis. After planting, the cabbage growth stages can be divided into the cupping, early heading, and mature stages. For each stage, the prediction capture rate is 0.73, 0.51, and 0.74, respectively. If the days of growth of the cabbages are partitioned into two groups, the prediction capture rate for 0–40 days is 0.83, and that for 40–70 days is 0.76. Therefore, by applying appropriate data mining techniques, together with multitemporal multispectral satellite images, the proposed method can predict the growth stages of the cabbage automatically, which can assist the governmental agriculture department to make cabbage yield predictions when creating precautionary measures to deal with the imbalance between production and sales when needed.

1. Introduction

Taiwan is mainly covered by mountainous terrains and forests, and it ranks among the 20 most densely populated places in the world. Hence, there are very limited lands available for agricultural farming. Other factors, such as the aging agricultural population and climate change, make the cost of agricultural operations much higher than in other countries. Moreover, because of the frequent occurrence of typhoons and the impact of extreme weather in recent years, the income of farmers is not as stable as in other industries. In order to ensure benefits for farmers, especially in terms of income security, one of the important primary tasks of the government’s agricultural department is to accurately forecast the crop yield production in order to stabilize the production and sales of agriculture products.
Since 2008, the Agricultural Research Institute in Taiwan has started to use aerial photos and hyperspectral satellite images for crop cultivation area estimation [1,2,3]. Some of its achievements have been recognized by the government’s agriculture office, under the Council of Agriculture, Taiwan, and the methods were adopted to collect reference data. However, these agricultural surveys were conducted manually or semiautomatically. This study develops an automated crop growth detection method that integrates artificial intelligence (AI) with the geographic information platform that was built by the Council of Agriculture in order to provide a faster, more accurate, and labor-saving prediction of the crop yield production.
Cabbage is an important vegetable in Taiwan, and it is usually grown and harvested during the intercropping period in the fall and winter. Because the growth period of cabbage is short, and growing cabbage is fairly easy, cabbage is always selected as one of the intercrops. Nevertheless, the imbalance between cabbage production and sales remains a perennial problem. When overproduction occurs, the market price drops dramatically, and, on the other hand, if crops are devastated by typhoons, the cabbage prices rise substantially. Therefore, being able to accurately estimate the area of the cabbage field and predict the cabbage yield are important tasks for the government agriculture department in order to stabilize cabbage prices. The total area of cabbage fields can be estimated by the monthly number of seedlings. Currently, the Agriculture and Food Agency in Taiwan collects the numbers of seedlings every ten days, with a numerical error less than 10%. Since cabbages will be ready to harvest at around 70 days after planting, the monthly cabbage yield can also be obtained. However, the price-stabilization policy might not be efficiently implemented since the agricultural survey by the Agriculture and Food Agency does not include the cabbage cultivation locations. The Taiwan Agricultural Research Institute, on the other hand, uses satellite photos for agricultural interpretations. Basically, after 40 days of growth, the cultivation locations of cabbages can be inferred from the satellite photos and, on the basis of the calculated harvest areas, the cabbage yield can be predicted. Although the method that has been adopted by the Taiwan Agricultural Research Institute is able to capture the geographic locations, it can only be performed during the early heading stage (i.e., the mid-to-late growth stages). As a result, the cost of price stabilization has increased significantly.
In this study, we used the abovementioned ground-truth data of the collected cabbages, multitemporal multispectral satellite image information with deep learning, and GBDT for the loopback analysis, in order to train a model that can predict the full cycle of the cabbage growth days. If properly implemented, this model can help agricultural units to estimate the harvest areas of cabbage in the early stage, and to deal with the problem of imbalance in production and sales.
This paper consists of six sections. Section 1 introduces the motivation and the problem statement. Section 2 presents the background and a review of the related literature and studies, including the production and sales challenges of cabbages, studies based on multitemporal satellite images, and research on spectral satellite image analysis for feature recognition. Section 3 describes how to collect ground-truth data, and how to obtain the corresponding satellite spectral data. Moreover, the gradient boosting decision tree (GBDT) algorithm for data mining is stated. Section 4 reports the experiment results, including the assembly of the ground-truth data and the satellite imagery data, the selection of the image features, the GBDT validation results, and the programming language and software that are adopted in this paper. Section 4 and Section 5 provide the results and discussions, and the conclusions, respectively.

2. Materials and Methods

This study aims to create an AI-based model that can interpret satellite images and that can predict the growth stages, or the days of growth, of cabbage. First, ground-truth data, with accurate records of the areas and the values of the land and information on the landholders, must be gathered. The traditional process of ground-truth data collection involves the information and photographs that are collected on location, and the validation of the cadastral data. In order to make it easier to conduct surveys, this study develops an APP that can obtain the cadastral data at the time of the image acquisition. Next, agricultural professionals can interpret the growth stages of the crops according to these ground-truth data. After the interpretations of the agricultural professionals and the cadastral data labelling, these ground-truth data can be used as the learning goals of the data mining techniques on the satellite images in this study.
On the basis of the cadastral data of the grouth-truth data, the satellite images on the day of the image acquistion, and the growth stages that were interpreted by professionals, as Figure 1 shows, the APP accessed the satellite images of the full growth stages before and after the photos were taken as the raw data for interpretation and anticipation.
Data mining is a technology that is used for full data analysis, and it mainly involves the process of finding useful information in big data, the results of which are usually used to make various decisions. The cleaned data will be divided into training data and test data. Training data is used to train a model that can identify or make this decision, and the model defines its parameters, and it is then continuously trained to find the best parameters. Test data is used to validate the final trained model for final validation in order to ensure that the model is not recognized by only the training data. The overall process is shown in Figure 2.

2.1. Ground-Truth Data Collection

The development of the agricultural survey APP allows users to easily photograph ground objects without additional equipment or special photography techniques. The key point for ground-truth data collection is to take photos that can accurately reveal the location of an object (i.e., the distance between the smartphone camera and the target object) (see Figure 3.).
Although smartphone cameras have built-in GPS systems to find the coordinates of the phone, the purpose of agricultural surveys is to find the cadastral data of the crop cultivation area. Therefore, the proposed agricultural survey APP adds a reference point on the camera screen for users to set on the target object. Since GPS results may differ between the smartphones of different brands, some brands of cameras are made of non-single lenses, and the target distance can be calculated by using the function of the camera. However, if the phone has only a single lens, it is impossible to calculate the target distance directly. Therefore, in order to avoid brand differences, some adjustments were made, and the proposed APP has a function to detect variations in the GPS. As shown in Figure 4, it is only when variations in the GPS are small that the user can take photos successfully.
As depicted in Figure 5, this study assumes that the coordinates of the CCD center are (0,0,0); that the coordinates of the target object (P) are (X,Y,Z); that the coordinates of the object in the image plane is (x,y); and that the focal length is f (Equation (1)):
x 0 X 0 = y 0 Y 0 = f 0 Z 0
Moreover, the study assumes that the coordinates of the object (P) in the local coordinate system are ( X P , Y P , Z P ) ; that the coordinates of the CCD center are ( X 0 , Y 0 , Z 0 ) ; and the matrix is rotated in order to obtain the coordinates of the object (P) (Equations (2) and (3)):
[ ( X P X 0 ) ( Y P Y 0 ) ( Z P Z 0 ) ] = [ R 11 R 12 R 13 R 21 R 22 R 23 R 31 R 32 R 33 ] [ X 0 Y 0 Z 0 ]
X 0 = R 11 ( X P X 0 ) + R 21 ( Y P Y 0 ) + R 31 ( Z P Z 0 ) Y 0 = R 12 ( X P X 0 ) + R 22 ( Y P Y 0 ) + R 32 ( Z P Z 0 ) Z 0 = R 13 ( X P X 0 ) + R 23 ( Y P Y 0 ) + R 33 ( Z P Z 0 )  
The abovementioned equations can lead to the following two collinearity equations (Equation (4)):
x 0 = f R 11 ( X P X 0 ) + R 21 ( Y P Y 0 ) + R 31 ( Z P Z 0 ) R 13 ( X P X 0 ) + R 23 ( Y P Y 0 ) + R 33 ( Z P Z 0 ) y 0 = f R 12 ( X P X 0 ) + R 22 ( Y P Y 0 ) + R 32 ( Z P Z 0 ) R 13 ( X P X 0 ) + R 23 ( Y P Y 0 ) + R 33 ( Z P Z 0 )
With the coordinates of the smartphone at the time of the image acquisition and the angle of elevation, the proposed APP can find the coordinates of the CCD center. Next, the camera with a 55AE lens must be set at a height of 1.5 m in order to calculate the coordinates of the object ( X P , Y P , Z P ) .

2.2. Satellite Image Processing

According to the cadastral data of the grouth-truth data (the date of the image acquistion and the growth stages that were interpreted by the agricultural professionals) the proposed APP retrieves the needed satellite images of the full growth stages. This study uses the multispectral SkySat satellite imagery that is owned by the U.S. company, Planet, which records the red, green, blue, and near-infrared (NIR) light that is reflected off the ground.
The satellite images were postprocessed with Haralick Texture [4,5], which was introduced by Haralick in 1973. Haralick Texture can express the quantitative value of the surface material of an object, and, when it is necessary to compare the touch, texture, or pattern of different materials, this method can express the difference value in their features. Thus, it was used to extract the features of the satellite image as one of the parameters for the training. The SEaTH algorithm [6] was also used to find the most suitable threshold value for each band of the satellite spectrum as a feature to recognize cabbage.
The processed data were also classified into growers by using by differenced image classification [7,8] (shown on Figure 6) and the photographic images in order to ensure that the satellite images could effectively distinguish objects that were not otherwise apparent in the images of the single growth days. For example, on the basis of the normalized difference vegetation index (NDVI) and the spectral reflection curves of the cabbage on different growth days, several satellite images of different growth days can be selected in the same area, and the image of the growth time of the cabbage can be used as the reference point to calculate the growth days of the cabbage in the satellite images of the same area. The difference values of the different growth days for the influence characteristics of the same area can be used to highlight the differences in the different growth periods of the cabbage.

2.3. Data Mining Method: Gradient Boosting Decision Tree (GBDT)

Decision trees (DTs) are used as a basic machine learning method for classification and regression [9]. DTs with visual explainability speed up the processing time, but overfitting can easily occur. Although pruning helps to prevent overfitting, its efficacy is not significant. Boosting is a method that is used in classification to equally weight all of the training examples, such as by increasing the weight of incorrectly classified examples, and by decreasing the weight of correctly classified examples. The boosting method trains multiple classifiers in linear combinations so as to improve their performance.
The gradient boosting algorithm is an ensemble machine learning method that combines many different algorithms in the framework, and every model is created at the gradient descent direction of the loss function for the model performance evaluation [10]. Generally, the smaller values of the loss function represent the better performance of the model. Minimizing the loss function simultaneously boosts the performance of the model. Therefore, the optimal method is to decrease the gradient of the loss function so as to improve the model performance.

2.3.1. Tree Ensemble Methods

Tree ensemble methods, including GBM, GBDT, and random forests, are commonly used, and particularly for classification and regression problems. Tree ensemble methods comprise several features:
  • Tree ensemble methods do not reduce or change the input variables. It is not necessary to standardize the input variables;
  • Tree ensemble methods allow users to understand the correlations between variables;
  • Tree ensemble methods can be used in multiple research fields for quantization.
GBDT, which is the model that is based on tree ensembles, is a technique that continuously wins Kaggle and other data analysis competitions, and it has also been extensively used in both academia and industry. GBDT integrates the gradient boosting algorithm with the decision tree algorithm for deep learning. Since one single decision tree cannot satisfy practical applications, GBDT creates many CART-based decision trees, and the sum of the functions ( f k ) on each decision tree is used to reflect the result of each attribute (Equation (5)):
y ^ i = k = 1 K f k ( x i ) , f k
The definitions of the variables are as follows: K is the number of decision trees; f is the functions in the function space, ; and is all possible CART ensembles.
Generally, the way to optimize supervised learning is to train the sum of Loss + Regularization (Equations (6)–(8)) to the minimum. The loss function ( i n l ( y i , y ^ i ) ) must be trained first before regularization ( k = 1 K Ω ( f k ) ) in order to simplify the branches and the depths of the decision trees, or to adjust the weight for a second derivative:
θ = { W j | j = 1 , , n }
O b j ( θ ) = L ( θ ) + Ω ( θ )
O b j = i n l ( y i , y ^ i ) + k = 1 K Ω ( f k )
The optimization of GBDT can be achieved by using the following heuristic method (Equation (9)):
O b j = i = 1 n l ( y i , y ^ i ( t ) ) + i = 1 t Ω ( f i )
  • Determine the next step according to the computational results: train the model by using the loss function to minimize errors;
  • Cut off redundant branches: standardize the branches of the decision trees to simplify the complexity of the trees;
  • Explore the deepest decision tree: limit the expansion of the functions;
  • Balance the extensiveness of the leaf nodes: standardize the weight of the second- derivative leaf nodes.

2.3.2. Gradient Boosting Machine Learning Techniques

GBDT first fixes what has been learned, and it determines the function of a tree whenever a new tree is added. In Equation (10), t is the t-th step:
y ^ i ( 0 ) = 0 y ^ i ( 1 ) = f 1 ( x i ) = y ^ i ( 0 ) + f 1 ( x i ) y ^ i ( 2 ) = f 1 ( x i ) + f 2 ( x i ) = y ^ i ( 1 ) + f 2 ( x i ) y ^ i ( t ) = k = 1 t f k ( x i ) = y ^ i ( t 1 ) + f t ( x i )
As Equation (11) shows, the decision tree that learned in the t-th step determines the tree to add ( f t ( x i ) ) according to the minimum optimization objective:
y ^ i ( t ) = y ^ i ( t 1 ) + f t ( x i )

2.3.3. Optimization Objective

  • Reduce the loss function to the minimum (Equations (12–(14)):
i = 1 n l ( y i ,   y ^ i ( t 1 ) + f t ( x i ) )
O b j ( t ) = i = 1 n l ( y i , y ^ i ( t ) ) + i = 1 t Ω ( f i ) = i = 1 n l ( y i , y ^ i ( t 1 ) + f t ( x i ) ) + Ω ( f t ) + const
To achieve the goal of square loss = 0, l ( y i ,   y ^ i ) = ( y i   y ^ i ) 2 0
O b j ( t ) = i = 1 n ( y i ( y ^ i ( t 1 ) + f t ( x i ) ) ) 2 + Ω ( f t ) + const = i = 1 n ( 2 ( y ^ i ( t 1 ) y i ) f t ( x i ) + f t ( x i ) 2 ) + Ω ( f t ) + const
2.
Regularization for Decision Trees
According to Equation (15), the tree is defined as a set of vectors, in which γ T is the number of leaves, and 1 2 λ j = 1 T ω j 2 is the second regularization of the leaf weight:
f t ( x ) = ω q ( x ) ,   ω R T ,   q :   R d { 1 , 2 , , T } Ω ( f t ) = γ T + 1 2 λ j = 1 T ω j 2

2.3.4. K-Fold Cross Validation for Data Mining

The K-fold cross-validation method splits the training dataset into k subsamples, among which one subsample is used as the test data for the model validation, while the rest of the k-1 subsamples are used as training data. The k-fold cross-validation process is repeated k times, with each of the k subsamples used exactly once for validation. The k results from the folds are averaged or combined in order to generate a single estimation (Figure 7).
Since our dataset is recorded over consistent intervals of time, the data are arranged in chronological order for the cross validation. The growth days of the cabbages can then be predicted according to the historical data collected, as shown in Figure 8.

2.3.5. Model Validation

Validation of Regression Models: the mean absolute error (MAE) is the average absolute difference between the observed values and the calculated values. Since absolute errors in replicated measurements of the same physical quantity may differ, we averaged the absolute value of the errors in order to obtain the MAE. Compared with the mean error, the MAE is an absolute error measure that is used to prevent the positive and negative deviations from canceling one another. Therefore, the MAE can better reflect the real situation of the error of prediction (Equation (16)):
M A E = 1 n j = 1 n | y j y ^ j |
Validation of Classification Models: a confusion matrix is created on the basis of the values of the real target attributes and the predicted values in order to compute the classification metrics, including the precision, the recall and the F-measure.
Figure 9 shows the abovementioned steps and the flowchart of the methodology.

3. Results

3.1. Ground-Truth Data and Satellite Imagery Data Collection

3.1.1. Ground-Truth Data

During the period from 15 December 2018 to 11 February 2019, the field survey APP captured 735 ground-truth data records of the cabbage cultivation areas in Yunlin County, and agricultural professionals examined the records in order to interpret the cabbage growth stages, as shown in Figure 10 and Figure 11.

3.1.2. Satellite Imagery Data

The multitemporal multispectral images that contain the ground-truth data were captured by miniature satellites operated by Planet Labs. For cabbages, the days to maturity are about 70 days, but this could vary according to seasons or regions. Therefore, this study gathered the data on 0–70 days of cabbage growth for the analysis, and it collected the satellite imagery data on the basis of the cadastre during the complete cabbage growth stages, according to the ground-truth data and the professional interpretations (Figure 12, Figure 13, Figure 14 and Figure 15). The data collection was conducted from 15 October 2018 to 19 March 2019, and 5654 records were collected for 0–70 days of the cabbage growth (Figure 16).

3.2. Feature Selection

3.2.1. Spectral Feature Selection

The multispectral satellite images that are used in this study are 3 m × 3 m pixels, and each pixel can reflect all of the visible RGB lights and the invisible infrared radiation (four spectral values in total). The corresponding pixel values are obtained according to the cadastral data. Next, a data analysis is conducted in order to distinguish the correlation between 16 attributes, including the mean, the std, the max and min of all the pixel values on each cadastre, and the days of cabbage growth (Figure 17). The timeframe of 20 to 60 days of cabbage growth was selected to present the values of the spectral features, including the mean, max, and min of the infrared radiation (Table 1).

3.2.2. Vegetation Index Feature Selection

The pixel spectral values were used to calculate eight types of vegetation indices, including the NDVI [11,12,13], the infrared percentage vegetation index (IPVI) [14], the cropping management factor index (CMFI), the band ratio (BR), the square band ratio (SQBR), the vegetable index (VI), the average brightness index (ABI), and the modified soil adjusted vegetation index (MSAVI) [15,16]. A total of 16 attributes, including the mean and standard deviation of the individual vegetation index, were employed for the correlation analysis between the indices and the cabbage growth stages (i.e., the days of growth of the cabbages). The results are shown in Figure 18 and Figure 19. The period between 20 and 60 days indicates a highly positive correlation with the vegetation indices. In particular, Figure 19 shows that the vegetation indices are more correlated, in comparison with the spectrum information, to the days of growth of the cabbages. However, most of the vegetation indices are calculated on the basis of near-infrared and red-light values, which happen to be highly correlated. Therefore, only one of the three completely correlated (i.e., correlation coefficients equal to 1 or −1) vegetation indices, such as the NDVI (IPVI, MSAVI, CMFI), the BR (SQBR), or the VI, was adopted (as shown in Table 2).

3.2.3. Texture Feature Selection

The texture features of the satellite images that are taken into account include: a total of 6 image gradient [17] attributes (the mean and standard deviations of gx, gy, and gxy), and 13 attributes that are derived from the GLCM method, which makes a total of 19 attributes. However, the correlation analysis shows that all of the texture features are not correlated with the days of growth of the cabbages (Table 3). The reason may be that the resolution of the satellite imagery that was used was not high enough. Therefore, these features were not included in the model training.

3.2.4. Threshold Feature Selection

In order to increase the model prediction accuracy, the respective thresholds for the abovementioned features that were strongly correlated to the cabbage growth stages (namely, the near-infrared spectrum value and the NDVI, the BR, and the VI, which are the three vegetation indices) were used to filter the noise. The range of each of these features was defined by their induvial maximum and minimum values. For each of the four features, the 5654 samples were partitioned into three subsets according to their range values (see Table 4). Two threshold candidates were obtained for each feature. For example, one of the thresholds (2588) for the NIR can be obtained by (5785−-990)/3 + 990. GBDT was applied five times in order to train the model on the basis of the cross-validation method, and the minimum mean square error (MSE) was adopted to select the optimal threshold values (Table 5).
Once the optimal threshold value for each of the four features was determined, the additional 20 features for each image sample were defined as follows: the proportion of pixels with values above the threshold, and the mean/standard deviations of pixels with values above/below the threshold. The features are listed in Appendix A.

3.2.5. Features Defined by the Differences between Two Consecutive Satellite Images

In the analysis of the cabbage vegetative index, the NDVI curve of the cabbage in different periods, which is averaged in units of 10 days, shows a significant positive correlation between the 20th and 60th days of the cabbage-growing period (Figure 20). Compared to the growth stages of cabbage (Table 6), the NDVI values of the cabbage show a rising trend in both the mature stage and the cupping stage (Figure 21). Considering the result from the previous comparison, it can be inferred that the NDVI growth rate of the two consecutive satellite images can be used to effectively predict the growth stage of cabbages. Therefore, the individual difference values of the two consecutive images of each of the previously mentioned features were selected as the features to be used in this study, as shown in Table 7.

3.2.6. Feature Summary

In summary, this study uses a total of 54 image features, 3 near-infrared light features (the average, standard deviation, and maximum values), 3 vegetation index features (NDVI, BR, and VI), 20 features that were obtained from near-infrared light, the NDVI, the BR, and the VI after threshold filtering (proportion of pixels with values above the threshold, and mean/standard deviation values of pixels with values above/below the threshold), and 1 feature that indicates the day of the year that the image was captured. So far, there are 27 features listed above, and the individual difference value for each feature was calculated on the basis of two consecutive images, which adds up to 54 features for the data mining (see Appendix A).

3.3. Modeling and Verification

The collected ground-truth data, along with the location information, consist of a total of 735 samples, among which 220 samples were randomly selected for testing, and the remaining 515 samples were used for the cross-validation training of the regression and classification models.

3.3.1. GBDT Regression Analysis

The GBDT regression technique was applied to the testing set (with 220 samples) to predict the growth stage of the cabbage from Day 0 to Day 70. The experimental results are shown in Table 8 and Figure 22. The average error is 8.17 days, with 75% of the prediction errors within 11 days, which are summarized in Table 9 and Figure 22.

3.3.2. GBDT Classification Analysis

The GBDT classification technique was applied to the testing set (with 220 samples) to identify the growth status of the cabbage from three different stages: namely, the cupping, early heading, and mature stages. The experimental results are shown in Table 10 And Figure 23 and the confusion matrix shown in Figure 24. The classification recalls are 73, 51 and 74%, respectively. If the growth status of cabbage has been divided into two stages instead of three (namely, within 40 days and beyond 40 days), then the prediction recalls become 83 and 76%, respectively. The results are shown in Table 11 and Figure 25.

3.4. Programing Language and Libraries

This study uses Python, which is the most popular language for data analysis, data science, and machine learning. The biggest advantage of using Python is that there are plenty of libraries to support the latest technologies. Table 12 lists all the libraries or packages that were used to conduct the experiments.

4. Discussion

This paper first attempts to use multiperiod multispectral satellite image information for training a deep learning model. Cabbage is an important short-term cash crop, which often encounters an imbalance between production and sales. This study uses GBDT for the regression analysis. With the limited amount of training data (i.e., 735 ground-truth images), the trained model was able to automatically predict the days of growth of the cabbage on the basis of the satellite images of the cabbage field, with an average error of 8.17 days, and 75% of the predictions having errors less than 11 days. (Note: the cabbage’s growth cycle is between 0 and 70 days.) Moreover, this study also uses GBDT for the classification analysis. First, if the cabbage growth status is divided into three stages (the cupping, early heading, and mature stages), then the prediction recalls for these three stages are 73, 51, and 74%, respectively. Second, if the growth status is divided into two stages (within 40 days and beyond 40 days), then the prediction recalls become 83 and 76%, respectively. These results show that, with the appropriate data mining technology, multiperiod and multispectral satellite image information can be used to automatically determine the days of growth and the growth stages of cabbage. If the government’s agriculture department could make use of such results, by knowing the harvested area of cabbage, government staffs can prepare and respond early when the possibility of an imbalance between production and sales emerges.
In addition, the field survey APP that was developed in this study for collecting the ground-truth data, together with the cadastral information, have proven to come in handy. The ground-truth data for the experiments were captured with the help from colleagues at the Yunlin Irrigation Association. The collected photos were manually examined before use as training data. The mobile devices were also screened to ensure that the data obtained by their GPS and digital compass sensors, as well as the alignment information set by the APP, were accurate. The cadastral information of the image can be calculated by these data, which provide the ground-truth information for the AI training of the aerial and satellite images. The experimental result is an important opportunity for agricultural production surveys. If this APP can be adopted by agricultural administration offices, the on-site surveys of agricultural products can be more accurate. Previously, only statistical data from townships were available, but, with the help of this APP, it is possible to obtain the cadastral information. When there is an important crop that needs production guidance, the producer can be found through the cadastral information so that the related actions can be directly and effectively performed, or, in cases where the agricultural product is damaged because of natural disasters, farmers can use the APP to take pictures and to apply for assistance so that the rescue can take place more efficiently. Moreover, this APP is easy to use, which makes it a handy tool for agricultural market reporters during agricultural investigations. It can also be used by general users in order to collect updated agricultural information through mass outsourcing, which has the advantages of saving time, manpower, and costs.

5. Conclusions

The results of the proposed method have verified that data mining technology can be used to predict the growth stages or days of growth of cabbage by analyzing the multispectral satellite images. For future works, besides the further improvement of the model, it is also necessary to develop a recognition method to identify cabbage fields by multispectral satellite images. The combination of these two methods would become a highly effective solution for assisting the production and sales of cabbage.
Future research could use the Assisted Global Positioning System, which is a technology that uses cell-phone-based station signals to obtain GPS locations more accurately and quickly, and to find the location of photographed objects. In addition, it may be possible to try other crops for growth prediction, and, possibly, the methods of this study could be applied to other similar species for growth prediction.

Author Contributions

D.-C.H.: conceptualization, methodology, data curation, validation, formal analysis, visualization, writing—original draft; F.H.: formal analysis, visualization, writing—original draft; F.J.T.: software, validation, writing—original draft; T.-Y.W.: conceptualization, methodology, project administration; Y.-C.L.: investigation. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Ministry of Science and Technology, Taiwan under Grant No MOST 110-2221-E-020-023, and MOST 108-2321-B-197-004.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This study was supported by the Ministry of Science and Technology, Taiwan, under grant No. MOST 110-2221-E-020-023, and MOST 108-2321-B-197-004.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

No.CategoryFeaturesAttributesDescriptionIf Used?
1DateMonthmonthCaptured month
2DaydateCaptured day
3Day of the yearydayThe day of the year that the image was captured.Y
8SpectrumRed-light averagemean_r
9Red-light standard deviationstd_r
10Red-light max valuemax_r
11Red-light min valuemin_r
12Green-light averagemean_g
13Green-light standard deviationstd_g
14Green-light max valuemax_g
15Green-light min valuemin_g
16Blue-light averagemean_b
17Blue-light standard deviationstd_b
18Blue-light max valuemax_b
19Blue-light min valuemin_b
20Near-infrared-light averagemean_nir Y
21Near-infrared-light standard deviationstd_nir Y
22Near-infrared-light max valuemax_nir Y
23Near-infrared-light min valuemin_nir
24TextureImage x gradient averagegx_meanRGB image x gradient
25Image x gradient standard deviationgx_stdRGB image x gradient
26Image y gradient averagegy_meanRGB image y gradient
27Image y gradient standard deviationgy_stdRGB image y gradient
28Image x and y gradients averagegxy_meanRGB image x and y gradients
29Image x and y gradients standard deviationgxy_stdRGB image x and y gradients
30Haralick textureharalick_1GLCM gives 13 features
31haralick_2
32haralick_3
33haralick_4
34haralick_5
35haralick_6
36haralick_7
37haralick_8
38haralick_9
39haralick_10
40haralick_11
41haralick_12
42haralick_13
43Vegetation IndexNormalized difference vegetation indexndviAverageY
44ndvi_stdStandard deviation
45Infrared percentage vegetation indexipviAverage
46ipvi_stdStandard deviation
47Cropping management factor indexcmfiAverage
48cmfi_stdStandard deviation
49Band ratiobrAverageY
50br_stdStandard deviation
51Square band ratiosqbrAverage
52sqbr_stdStandard deviation
53Vegetation indexviAverageY
54vi_stdStandard deviation
55Average brightness indexabiAverage
56abi_stdStandard deviation
57Modified soil adjusted vegetation indexmsaviAverage
58msavi_stdStandard deviation
59ThresholdProportion of pixels with values above the near-infrared thresholdnir_ratioThreshold 4186Y
60The mean value of pixels with values above the near-infrared thresholdnir_meanY
61The standard deviation value of pixels with values above the near-infrared thresholdnir_stdY
62The mean value of pixels with values below the near-infrared thresholdnir_un_meanY
63The standard deviation value of pixels with values below the near-infrared thresholdnir_un_stdY
64Proportion of pixels with values above the NDVI threshold ndvi_ratioThreshold 0.47Y
65The mean value of pixels with values above the NDVI threshold ndvi_meanY
66The standard deviation value of pixels with values above the NDVI thresholdndvi_stdY
67The mean value of pixels with values below the NDVI threshold ndvi_un_meanY
68The standard deviation value of pixels with values below the NDVI thresholdndvi_un_stdY
69Proportion of pixels with values above the BR thresholdvi_ratioThreshold 3.02Y
70The mean value of pixels with values above the BR threshold vi_meanY
71The standard deviation value of pixels with values above the BR thresholdvi_stdY
72The mean value of pixels with values below the BR threshold vi_un_meanY
73The standard deviation value of pixels with values below the BR thresholdvi_un_stdY
74Proportion of pixels with values above the VI thresholdabi_ratioThreshold 1034Y
75The mean value of pixels with values above the VI threshold abi_meanY
76The standard deviation value of pixels with values above the VI thresholdabi_stdY
77The mean value of pixels with values below the VI threshold abi_un_meanY
78The standard deviation value of pixels with values below the VI thresholdabi_un_stdY
79DifferenceThe difference values of all the features used.*_diffOnly calculate features that are marked Y in the last column.
*_diff is is only that column If Used? calculate, eg: abi_un_std_diff.

References

  1. Pluto-Kossakowska, J. Review on Multitemporal Classification Methods of Satellite Images for Crop and Arable Land Recognition. Agriculture 2021, 11, 999. [Google Scholar] [CrossRef]
  2. Fernández-Sellers, M.; Siesto, G.; Lozano-Tello, A.; Clemente, P.J. Finding a suitable sensing time period for crop identification using heuristic techniques with multi-temporal satellite images. Int. J. Remote Sens. 2021, 1–18. [Google Scholar] [CrossRef]
  3. Felegari, S.; Sharifi, A.; Moravej, K.; Amin, M.; Golchin, A.; Muzirafuti, A.; Tariq, A.; Zhao, N. Integration of Sentinel 1 and Sentinel 2 Satellite Images for Crop Mapping. Appl. Sci. 2021, 11, 10104. [Google Scholar] [CrossRef]
  4. Haralick, R.M.; Shanmugam, K.; Dinstein, I.H. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 1973, 610–621. [Google Scholar] [CrossRef][Green Version]
  5. Miyamoto, E.; Merryman, T. Fast Calculation of Haralick Texture Features; Human Computer Interaction Institute, Carnegie Mellon University: Pittsburgh, PA, USA, 2005. [Google Scholar]
  6. Nussbaum, S.; Niemeyer, I.; Canty, M. SEATH-a new tool for automated feature extraction in the context of object-based image analysis. In Proceedings of the 1st International Conference on Object-Based Image Analysis (OBIA), Salzburg, Austria, 4–5 July 2006. [Google Scholar]
  7. Wan, S.; Chang, S.-H.; Peng, C.-T.; Chen, Y.-K. A novel study of artificial bee colony with clustering technique on paddy rice image classification. Arab. J. Geosci. 2017, 10, 1–13. [Google Scholar] [CrossRef]
  8. Wolter, P.T.; Mladenoff, D.J.; Host, G.E.; Crow, T.R. Using multi-temporal landsat imagery. Photogramm. Eng. Remote Sens. 1995, 61, 1129–1143. [Google Scholar]
  9. Vens, C.; Struyf, J.; Schietgat, L.; Džeroski, S.; Blockeel, H. Decision trees for hierarchical multi-label classification. Mach. Learn. 2008, 73, 185–214. [Google Scholar] [CrossRef][Green Version]
  10. Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
  11. Burgan, R.E. Monitoring Vegetation Greenness with Satellite Data; US Department of Agriculture, Forest Service, Intermountain Research Station: Ogden, UT, USA, 1993; Volume 297.
  12. Elvidge, C.D.; Chen, Z. Comparison of broad-band and narrow-band red and near-infrared vegetation indices. Remote Sens. Environ. 1995, 54, 38–48. [Google Scholar] [CrossRef]
  13. Rouse, J.W., Jr.; Haas, R.H.; Schell, J.; Deering, D. Monitoring the Vernal Advancement and Retrogradation (Green Wave Effect) of Natural Vegetation; NASA: Washington, DC, USA, 1973.
  14. Crippen, R.E. Calculating the vegetation index faster. Remote Sens. Environ. 1990, 34, 71–73. [Google Scholar] [CrossRef]
  15. Huete, A.R. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
  16. Qi, J.; Chehbouni, A.; Huete, A.R.; Kerr, Y.H.; Sorooshian, S. A modified soil adjusted vegetation index. Remote Sens. Environ. 1994, 48, 119–126. [Google Scholar] [CrossRef]
  17. Jacobs, D. Image Gradients. Available online: https://www.cs.umd.edu/~djacobs/CMSC426/ImageGradients.pdf (accessed on 10 December 2021).
Figure 1. Multitemporal satellite image analysis.
Figure 1. Multitemporal satellite image analysis.
Agriculture 12 00504 g001
Figure 2. Data mining procedure.
Figure 2. Data mining procedure.
Agriculture 12 00504 g002
Figure 3. The crucial technique for the agricultural survey APP is to measure the distance between the camera and the object.
Figure 3. The crucial technique for the agricultural survey APP is to measure the distance between the camera and the object.
Agriculture 12 00504 g003
Figure 4. While taking photos, the proposed agricultural survey APP offers a reference point on the target object, and it has a GPS variation detection function.
Figure 4. While taking photos, the proposed agricultural survey APP offers a reference point on the target object, and it has a GPS variation detection function.
Agriculture 12 00504 g004
Figure 5. Vector diagram of the CCD center and the target object when photographing.
Figure 5. Vector diagram of the CCD center and the target object when photographing.
Agriculture 12 00504 g005
Figure 6. Differential image classification flowchart.
Figure 6. Differential image classification flowchart.
Agriculture 12 00504 g006
Figure 7. K-fold model cross validation.
Figure 7. K-fold model cross validation.
Agriculture 12 00504 g007
Figure 8. Time series cross validation.
Figure 8. Time series cross validation.
Agriculture 12 00504 g008
Figure 9. Flowchart of the methodology.
Figure 9. Flowchart of the methodology.
Agriculture 12 00504 g009
Figure 10. Backstage interpretations of the field survey images by an agricultural professional.
Figure 10. Backstage interpretations of the field survey images by an agricultural professional.
Agriculture 12 00504 g010
Figure 11. Agricultural professionals interpret the growth days of cabbages according to the field survey images.
Figure 11. Agricultural professionals interpret the growth days of cabbages according to the field survey images.
Agriculture 12 00504 g011
Figure 12. Satellite images of cabbages (10 days).
Figure 12. Satellite images of cabbages (10 days).
Agriculture 12 00504 g012
Figure 13. Satellite images of cabbages (30 days).
Figure 13. Satellite images of cabbages (30 days).
Agriculture 12 00504 g013
Figure 14. Satellite images of cabbages (60 days).
Figure 14. Satellite images of cabbages (60 days).
Agriculture 12 00504 g014
Figure 15. Multitemporal satellite images of cabbages under the same parcel number.
Figure 15. Multitemporal satellite images of cabbages under the same parcel number.
Agriculture 12 00504 g015
Figure 16. Number of satellite images on days of cabbage growth.
Figure 16. Number of satellite images on days of cabbage growth.
Agriculture 12 00504 g016
Figure 17. Correlation between spectral features and cabbage growth stages.
Figure 17. Correlation between spectral features and cabbage growth stages.
Agriculture 12 00504 g017
Figure 18. Correlations between vegetation index features and cabbage growth stages.
Figure 18. Correlations between vegetation index features and cabbage growth stages.
Agriculture 12 00504 g018
Figure 19. High or even complete correlations between pairs of vegetation index features.
Figure 19. High or even complete correlations between pairs of vegetation index features.
Agriculture 12 00504 g019
Figure 20. Correlation analysis between cabbage growth stages and NDVI values.
Figure 20. Correlation analysis between cabbage growth stages and NDVI values.
Agriculture 12 00504 g020
Figure 21. Cabbage growth stages vs. NDVI values.
Figure 21. Cabbage growth stages vs. NDVI values.
Agriculture 12 00504 g021
Figure 22. Growth stage prediction results of cabbage from Day 0 to Day 70.
Figure 22. Growth stage prediction results of cabbage from Day 0 to Day 70.
Agriculture 12 00504 g022
Figure 23. Distribution chart of the growth stage prediction of cabbage.
Figure 23. Distribution chart of the growth stage prediction of cabbage.
Agriculture 12 00504 g023
Figure 24. Confusion matrix of the cabbage growth stage classification from cupping, early heading, and mature stages.
Figure 24. Confusion matrix of the cabbage growth stage classification from cupping, early heading, and mature stages.
Agriculture 12 00504 g024
Figure 25. The confusion matrix of the cabbage growth stage classification from early growth and mid-heading stages.
Figure 25. The confusion matrix of the cabbage growth stage classification from early growth and mid-heading stages.
Agriculture 12 00504 g025
Table 1. Correlation coefficients between spectral features and cabbage growth stages.
Table 1. Correlation coefficients between spectral features and cabbage growth stages.
Spectral FeaturesCorrelation Coefficient
mean_nir Mean value of NIR0.486128
std_nir Standard deviation of NIR0.393943
max_nir Maximum of NIR0.478906
Table 2. Correlation analysis between vegetation indices and cabbage growth stages.
Table 2. Correlation analysis between vegetation indices and cabbage growth stages.
Vegetation IndexCorrelation Coefficients
NDVI0.47121
IPVI0.47121
CMFI−0.47121
BR0.443567
SQBR0.458123
VI0.508448
MSAVI0.468535
Table 3. Correlations between texture features and cabbage growth stages.
Table 3. Correlations between texture features and cabbage growth stages.
Texture FeatureCorrelation Coefficients
haralick_1−0.012057
haralick_2−0.011231
haralick_30.004898
haralick_4−0.029392
haralick_50.065032
haralick_6−0.123112
haralick_7−0.029654
haralick_8−0.054594
haralick_9−0.016616
haralick_100.016184
haralick_11−0.009838
haralick_120.060944
haralick_13−0.014681
gxy_mean−0.004639
gxy_std0.022363
gx_mean−0.002482
gx_std0.005792
gy_mean0.000629
gy_std0.045106
Table 4. Important features and their threshold candidates.
Table 4. Important features and their threshold candidates.
FeaturesMinimumMaximumThreshold Candidates
(3 Partitions)
NIR99057852588, 4186
NDVI−0.090.760.19, 0.47
BR0.837.43.02, 5.21
VI−51941421034, 2588
Table 5. The optimal threshold values for each feature on the basis of cross-validation method.
Table 5. The optimal threshold values for each feature on the basis of cross-validation method.
FeaturesMinimumMaximumOptimal Threshold Value
NIR99057854186
NDVI−0.090.760.47
BR0.837.43.02
VI−51941421034
Table 6. Cabbage growth stages.
Table 6. Cabbage growth stages.
Cabbage Growth StagesDays
Cupping Stage0–40
Early Heading Stage40, 50, 60
Mature Stage70–80
Table 7. Differences in individual features between two consecutive satellite images captured at the same location.
Table 7. Differences in individual features between two consecutive satellite images captured at the same location.
Dayndvistd_nirstd_nir_diff
70.721168647.696193NaN
280.2816601019.547319371.851125
370.2706011351.375908331.828589
480.4476061576.486765225.110857
550.4540971657.49521081.008445
630.5150931578.958585−78.536626
Table 8. Growth stage prediction results of cabbage from Day 0 to Day 70.
Table 8. Growth stage prediction results of cabbage from Day 0 to Day 70.
DayCountMeanStd.Min.25%50%75%Max.
103.003.272.520.851.963.084.485.88
20161.007.269.460.022.134.598.3052.81
30322.009.128.180.083.757.0411.9643.66
40296.009.047.640.023.186.8413.2038.71
50366.008.197.690.022.215.6912.1143.20
60184.007.746.460.023.176.5710.7746.61
7091.004.523.490.191.783.577.2013.64
Table 9. Cabbage growth stage prediction summary.
Table 9. Cabbage growth stage prediction summary.
StatisticsMAE
Count1423
Mean8.17072
Std.7.746078
Min.0.017453
25%2.672483
50%5.971289
75%11.03056
Max.52.809956
Table 10. Growth status classification results of cabbage from cupping, early heading, and mature stages.
Table 10. Growth status classification results of cabbage from cupping, early heading, and mature stages.
StagesPrecisionRecallF1-ScoreSupport
Cupping Stage
(0–25 days)
0.690.730.71415
Early Heading Stage
(25–40 days)
0.500.510.50379
Mature Stage
(40–70 days)
0.790.740.77629
Avg./Total0.680.680.681423
Table 11. Growth stage classification results of cabbage from early growth and mid-heading stages.
Table 11. Growth stage classification results of cabbage from early growth and mid-heading stages.
StagesPrecisionRecallF1-ScoreSupport
Early Growth Stage
(0~40 days)
0.810.830.82794
Mid-Heading Stage
(40~70 days)
0.780.760.77629
Avg./Total0.800.800.801423
Table 12. Python libraries or packages used in this study.
Table 12. Python libraries or packages used in this study.
FunctionalityLibraries or Packages
Data ManipulationPandas
Image Preocessingcv2
Reading Satellite ImagesTifffile
Model Trainingscikit-learn
ModelLightGBM
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Hsiou, D.-C.; Huang, F.; Tey, F.J.; Wu, T.-Y.; Lee, Y.-C. An Automated Crop Growth Detection Method Using Satellite Imagery Data. Agriculture 2022, 12, 504. https://doi.org/10.3390/agriculture12040504

AMA Style

Hsiou D-C, Huang F, Tey FJ, Wu T-Y, Lee Y-C. An Automated Crop Growth Detection Method Using Satellite Imagery Data. Agriculture. 2022; 12(4):504. https://doi.org/10.3390/agriculture12040504

Chicago/Turabian Style

Hsiou, Dong-Chong, Fay Huang, Fu Jie Tey, Tin-Yu Wu, and Yi-Chuan Lee. 2022. "An Automated Crop Growth Detection Method Using Satellite Imagery Data" Agriculture 12, no. 4: 504. https://doi.org/10.3390/agriculture12040504

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop