3.1. Algorithm Verification
At the beginning, the data measured by an ultrasonic sensor was divided into 3 clusters that represented the crop canopy, the lower leaves, and the ground, respectively. Therefore, the initial number of clusters
c is 3. To ensure the reliability of the clustering results, the threshold value
ε is generally between 10
−4 and 10
−6. So, in this paper, it was set to 0.0001 in order to reduce the number of iterations. As for the minimum number of samples
θn, the standard deviation
θS and the minimum distance between two cluster centers
θc, we performed tests by setting different values and verified the accuracy of the clustering results with some known sample data sets. Then, we selected the values that provided the best clustering results as the preset values. Finally, they were set to 20, 5, and 2, separately. The corn plants from the 3-leaf stage to the 6-leaf stage were scanned by an ultrasonic sensor and the data were clustered with the proposed algorithm. The results are shown in
Table 1 and in
Figure 2.
For corn plants of the 3-leaf stage, the cluster center value 63.25 and 80.17 represented the distances from the sensor to the canopy and the ground, respectively. For corn plants at the 4-leaf stage, the cluster center values 59.81, 65.53, and 79.22 were the distances from the sensor to the canopy, to the lower leaves, and to the ground. For corn plants at the 5-leaf stage, there were 4 clusters. Except for the distances from the sensor to the canopy and the ground, the distance from the sensor to the lower leaves was classified into 2 clusters. Their centers were 51.21 and 66.14. For corn plants at the 6-leaf stage, there were also 4 clusters. Nevertheless, the maximum value of cluster center 66.46 can not represent the distance from the sensor to the ground. Due to the growth of the plant, the canopy volume increases and prevents the ultrasonic wave from reaching the ground; as a result, the distance from the sensor to the ground can not be measured. The distance from the sensor to the lower leaves was classified into 3 clusters. Their centers were 43.19, 46.87, and 66.46. All test indexes were very small and proved that the results were reliable. So, we can use the fuzzy ISODATA to deal with the distance data to acquire the locations of the crop canopy and the ground, which are very important for controlling the boom height. However, if the crop was fully covered and no ground was observed, the cluster representing the ground would not be acquired. However, that would not affect the cluster representing the canopy.
The calculation results of fuzzy ISODATA were compared with ones of other methods, such as k-means clustering, mean, and median. The results are listed in
Table 2.
Obviously, the values of fuzzy ISODATA are the closest to the manually measured values. Compared with the k-means clustering algorithm, fuzzy ISODATA has better clustering results. The reason is that the number of clusters is difficult to estimate in k-means clustering. Moreover, the choice of initial cluster center has a greater impact on the clustering results. However, fuzzy ISODATA can dynamically adjust the number and center of clusters to make the clustering results closer to the objective results. As for the methods of computing the mean and median, they all have big errors because they can not exclude the points of lower leaves and ground when calculating the mean and median. However, for fuzzy ISODATA, there are relatively more parameters to be set, and the parameter values are not easy to determine. To get good clustering results, good initial values are required.
3.3. Influence of Sensor Moving Speed on Calculation Accuracy
At the 3-leaf stage, the measurement error was larger, owing to the larger inclination angle of leaves. That would make the conclusions less reliable. From the 4-leaf to 6-leaf stage, the leaves became flatter and flatter, and the measurement error was smaller. Moreover, the experiments results showed that the conclusions for the 4-leaf stage are the same with that for the 5-leaf stage and 6-leaf stage. So, we only present the results for the 4-leaf stage.
For five randomly chosen corn plants at the 4-leaf stage, the distances from the sensor to the canopy were calculated at different sensor moving speeds from 0.5 km/h to 6 km/h. The clustering results are demonstrated in
Figure 4, and the detailed values are shown in
Table 4.
The
F test result indicated that the variances of calculated values and manually measured values were homogeneous. Therefore, they can be compared by a
t-test. When the significance level was 0.05,
t0.05 = 2.57, and the
t values at all speeds were all less than
t0.05. It proved that there were no significantly differences between the calculated values using fuzzy ISODATA and the manually measured values. However, as shown in
Table 4, the absolute error increased with the increase of sensor moving speed. The reason was that the faster the sensor moved, the less the amount of data it acquired, which is demonstrated in
Figure 4. When the speed reached 6 km/h, there were only about 40 points. Although the collected data were reduced, the distance value from the sensor to the crop canopy could still be obtained and relatively accurate after clustering.
The distribution of the absolute error between the manually measured values and the calculated values was checked using the MATLAB function lillietest(x) at different sensor moving speeds. This function outputs an
h value and a
p value. If
h = 0, the inputed data follows a normal distribution. If
p > 0.05, the null hypothesis can be accepted. The test result is shown in
Table 5. It indicated that the absolute error was normally distributed at the 5% significance level.
Analysis of variance was applied to verify the influence of the sensor moving speed on the absolute error. The results are shown in
Table 6. It indicated that the sensor moving speed has a significant influence on the absolute error at the 5% significance level.
3.4. Correlation between Calculated Values and Manually Measured Values
Regression analysis was used to describe the relationship between calculated values and manually measured values for all growth stages. The calculated distances using fuzzy ISODATA and manually measured distances from the sensor to the canopy were graphed on an XY scatter chart. After the data points were plotted, a trend line, equation, and
R2 were added to illustrate the relationship. The results are shown in
Figure 5, which illustrated that the linear regression of calculated distances on manually measured distances had an
R2 of 0.88 with all growth stages included. The results were encouraging, since the calculated distances showed a strong linear correlation with the manually measured distances. The trend for increased correlation with time was similar to the absolute error described before. This trend in corn is intelligible considering the increased size of the plant leaf with advancing growth. Moreover, as the corn plant grew, the increased inclination angle of the leaf made it easier for the canopy to be detected by the sensor, therefore causing the correlation to increase.
The p-values of slope and intercept of the regression line were 1.53 × 10−19 and 0.00448 separately. They are far less than 0.01, which indicates that the two regression coefficients are highly significant. The residual distribution was tested using the MATLAB function lillietest(x). The result suggested that it did not follow a normal distribution. We found two outliers, the standardized residuals of which are 3.145 and 3.574, respectively. Furthermore, the two data were from the 3-leaf growth stage. Since the leaves were very inclined at this stage, the ultrasonic echo perhaps did not return to the sensor after reflection. That led to the wrong measured data. Consequently, we eliminated the two outliers and performed the regression analysis one more. The new regression equation was y = 0.876x + 10.587, and the new R2 value was 0.94. The p-values of the two coefficients were 1.38 × 10−24 and 1.43 × 10−6 separately. It indicates that they are more significant than those of the old regression equation. In addition, the residual was normally distributed after elimination of the two outliers. The standard errors of regression coefficients were calculated for different growth stages. They were 0.56, 0.39, 0.14, 0.10 for the slope and 34.06, 23.96, 6.45, 3.79 for the intercept respectively from the 3-leaf to the 6-leaf growth stage. The smaller the error, the more accurate the coefficient. The errors decreased with the plant growth. In addition, we calculated the leverage of all points. The results demonstrated that there was not a high leverage point. None of points can strongly influence the slope of the regression line.