Segmentation and Angle Calculation of Rice Lodging during Harvesting by a Combine Harvester

Zhuang, Xiaobo; Li, Yaoming

doi:10.3390/agriculture13071425

Open AccessArticle

Segmentation and Angle Calculation of Rice Lodging during Harvesting by a Combine Harvester

by

Xiaobo Zhuang

^*

and

Yaoming Li

Key Laboratory of Modern Agricultural Equipment and Technology, Ministry of Education, Jiangsu University, Zhenjiang 212013, China

^*

Author to whom correspondence should be addressed.

Agriculture 2023, 13(7), 1425; https://doi.org/10.3390/agriculture13071425

Submission received: 14 June 2023 / Revised: 11 July 2023 / Accepted: 13 July 2023 / Published: 19 July 2023

(This article belongs to the Section Agricultural Technology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Rice lodging not only brings trouble to harvesting but also reduces yield. Therefore, the effective identification of rice lodging is of great significance. In this paper, we have designed a bilinear interpolation upsampling feature fusion module (BIFF) to decompose the quadruple upsampling of the connected part of encoder and decoder into two double upsampling processes and insert the intermediate feature layer in the backbone network for feature fusion in this process. The global attention mechanism(GAM) attention module is added to the feature extraction network, allowing the network to effectively focus on the lodging regions, thus effectively improving the segmentation effect. The average accuracy of the improved network is 93.55%, mrecall is 93.65%, and mIoU is 88.10%, and the feasibility of the improvement is demonstrated by ablation experiments and comparison with other algorithms. In addition, the angle calculation method is designed by combining the detection algorithm, adding a detection head branch to the output results for reading the distance information from the depth camera, and combining the distance information with mechanical analysis to determine the relationship between the angle of the stalk and the vertical direction when the rice is upright, tilted and fallen. A comparison of the calculated angle with the actual measured angle gives a final average error of approximately 5.364%, indicating that the harvest boundary extraction algorithm in this paper is highly accurate and has value for application in real-time harvesting scenarios.

Keywords:

deeplabv3+; rice lodging; attention mechanisms; depth of vision

1. Introduction

Rice lodging is one of the most common agricultural disasters [1]. It negatively affects rice production and brings great difficulties to mechanical grain harvest. Research on image segmentation of lodging in rice and angle calculation is significant. It allows for early detection and assessment of lodging, facilitating timely intervention. Accurate angle calculation provides data on the severity of lodging, aiding in resource allocation and optimizing post-harvest activities. Furthermore, studying lodging through image detection and angle calculation contributes to precision agriculture, empowering farmers to make informed decisions and enhancing overall productivity. In recent years, the rapid development of deep learning in the field of computer vision has made it possible to detect and assess lodging through visual methods.

In the study conducted by Jiang et al. in 2022 [2], the researchers analyzed the performance of different models in segmenting wheat lodging plots by calculating the lodging area. Through comparative experiments, it was observed that the SegFormer-B1 model outperformed other models, showcasing a higher prediction rate and displaying stronger generalization ability. The model, established using a mixed-stage dataset, consistently outperforms those built on single-stage datasets in terms of segmentation effectiveness. Notably, the SegFormer-B1 model, trained on the mixed-stage dataset, achieved an impressive mIoU of 89.64% and proves applicable for monitoring wheat lodging across the entire growth cycle of wheat.

In 2022, Tang et al. [3]. introduced the pyramid transposed convolution network (PTCNet), a semantic segmentation model specifically designed for extracting and detecting wheat lodging in large-scale areas using high-resolution GaoFen-2 satellite images. The PTCNet incorporated a fusion of multi-scale high-level features and low-level features, resulting in enhanced segmentation accuracy and improved sensitivity in extracting wheat lodging areas. Moreover, the network incorporated four types of vegetation indices and three types of edge features, which were evaluated to determine their impact on the segmentation accuracy.

In the study conducted by Yu et al. in 2023 [4]. The PSPNet model was enhanced by integrating the convolutional LSTM (ConvLSTM) temporal model, incorporating the convolutional attention module (CBAM), and employing the Tversky loss function. The impact of this improved PSPNet network model on monitoring wheat lodging was examined across varying image sizes and different growth stages. As a result, the model proposed in this study effectively utilizes temporal sequence features to enhance image segmentation accuracy and accurately extract lodging areas at different growth stages.

There is relatively little research on the calculation of the lodging angle. In the research of Wen et al. in 2023 [5]. This paper proposes a real-time detection method for wheat lodging using binocular vision. It determines the angle relationship between the stem and vertical direction for upright, inclined, and lodged wheat. The discrimination of lodging degree is based on the height of the visual point cloud on wheat crops’ surface. The binocular camera captures an image parallax for wheat within the harvesting region and calculates its three-dimensional coordinates using an optical axis parallel model. Further analysis yields the wheat stem height. By analyzing the vision-detected stem height, the paper determines the location and area of wheat lodging within the combine harvester’s region. Field experiments show a 5.5 cm detection error in stem height and an algorithm speed under 2000 milliseconds, enabling accurate analysis and calculation of wheat lodging location, contour, and area. This study provides valuable information for adaptive header control in combine harvesters.

These studies highlight the advancements in using image analysis and computer vision techniques to automate the estimation of lodging angles in rice crops. By accurately determining the lodging angle, farmers and researchers can assess the impact of lodging on crop yield and implement appropriate interventions for lodging management.

While the aforementioned studies have made progress in applying deep learning to rice lodging detection, there are still some limitations.

Model robustness: Some studies may only validate their models under specific conditions, such as specific lighting, weather, or field environments. In such cases, the model’s robustness may be limited, and it may not achieve reliable lodging detection in different environments and conditions.

Real-time and practical applicability: Some research is still in the laboratory or experimental stage and has not reached the real-time and practical requirements for application. In real-field conditions, real-time performance and efficiency are crucial factors for the successful implementation of such technology.

Efforts to address these aspects and improve research in these areas will contribute to the better application of deep learning techniques in addressing rice lodging, making them more reliable and feasible in practical scenarios.

DeepLabv3+ is a popular semantic segmentation model that has been widely used in various computer vision tasks. Based on this paper, we propose an improved deeplabv3+ model and design a method to calculate the lodging angle based on a depth camera. The main work is as follows:

(1) In this paper, a bilinear interpolation upsampling feature fusion module is designed to decompose the quadruple upsampling of the encoder and decoder connection into two double upsampling processes and insert the intermediate feature layers in the backbone network for feature fusion, making full use of the medium-sized feature layers generated in the backbone network.

(2) Introduction of the GAM attention mechanism added to the A-strengthening feature extraction network part, allowing the network to effectively focus on the lodging region, thus effectively improving the segmentation effect.

(3) The angle calculation method is designed by combining the detection algorithm, adding a branch of the detection head to the output to read the distance information from the depth camera, and using the distance information to determine the relationship between the stalk and the vertical direction when the rice is upright, tilted and fallen by mechanical analysis.

2. Materials and Methods

2.1. Feature Fusion Module Based on Bilinear Interpolation Upsampling Design

Bilinear interpolation is a commonly used technique in image processing for upsampling, increasing the resolution of an image, or upsampling low-resolution feature maps to a higher resolution. This interpolation method calculates the value of a new pixel by taking a weighted average of its neighboring pixels in the image. In this paper, an upsampling feature fusion module is designed based on this, and more pixel points are considered in the interpolation process to merge with the feature layers of the same size in the backbone network to improve the effectiveness of the feature layers.

The process of implementing bilinear interpolation is shown in Figure 1. First, the position of the target pixel to be generated in the high-resolution image is determined as P. This is usually performed by multiplying the pixel coordinates of the low-resolution image by a magnification factor to determine the corresponding position in the high-resolution image. Then, based on the position of the target pixel, four neighboring pixels Q₁₁, Q₁₂, Q_21, and Q₂₂ are determined, which are located at the four pixel positions closest to the position of the target pixel in the original low-resolution image, the coordinates of which are given in the figure. For each color channel of the target pixel, the interpolated values are calculated using the following steps: (1). In the horizontal direction, the interpolation is performed based on the distance between the target pixel position and the neighboring pixel position. This can be achieved by a weighted average of the neighboring pixel values, where the closer pixels have a higher weight, as shown in Equations (1) and (2) to obtain R₁, R₂.

f (x, y_{1}) \approx \frac{x_{2} - x}{x_{2} - x_{1}} f (Q_{11}) + \frac{x - x_{1}}{x_{2} - x_{1}} f (Q_{21})

(1)

f (x, y_{2}) \approx \frac{x_{2} - x}{x_{2} - x_{1}} f (Q_{12}) + \frac{x - x_{1}}{x_{2} - x_{1}} f (Q_{22})

(2)

(2). The above steps are repeated for each horizontal interpolation result in the vertical direction as in Equation (3).

f (x, y) \approx \frac{y_{2} - y}{y_{2} - y_{1}} f (x, y_{1}) + \frac{y - y_{1}}{y_{2} - y_{1}} f (x, y_{2})

(3)

The resulting interpolation result P is used as the value of the target pixel. The above steps are repeated for each pixel in the image until a complete high-resolution image is generated.

The benefit of using bilinear interpolation is that it allows for the smooth filling of pixel values during upsampling, avoiding jagged edges or mosaic effects. It generates intermediate pixel values by considering the information from surrounding pixels, providing more accurate image details and smooth transition effects.

In deep learning, bilinear interpolation is often used as an upsampling technique within convolutional neural networks (CNNs) to upsample low-resolution feature maps to a higher resolution for comparison or fusion with high-resolution feature maps or labels. This upsampling technique can yield more accurate results in tasks such as image segmentation.

In this paper, a bilinear interpolation upsampling feature fusion module (BIFF) is designed to decompose the quadruple upsampling into two double upsampling processes and insert an intermediate feature layer in the backbone network for feature fusion, making full use of the mid-size feature layers generated in the backbone network, thus effectively improving the segmentation results. The module structure is shown in Figure 2. The bilinear interpolation of the two inputs into two-fold upsampling is performed on the output to obtain the input of the next module by ADD. The formulation of the module is shown in Equation (4). I_R and I_E are the two inputs to the module, where I_R is the input feature layer obtained from the backbone network, I_E is the input feature layer from the non-backbone network, and O is the overall output of the module.

O = F_{B I U} (F_{3 \times 3} (I_{R})) \oplus F_{B I U} (I_{E})

(4)

2.2. Global Attention Mechanism Attention Module

Attention mechanisms have long been considered effective since they were proposed, and experiments have shown that this is indeed the case, with common attention mechanisms, such as SENet [6], ECA [7], CBAM [8], etc. In this paper, a global attention mechanism (GAM) is used, which has the advantage over the previous attention in that it takes into account the interaction between space and channel, avoids channel information across dimensions resulting in loss, and the reinforcement modules associated with adjacent feature layers work simultaneously to reduce the loss of channel and location space as much as possible.

GAM improves the submodule based on the sequential channel-space attention mechanism of CBAM, and its improved structure is shown in Figure 3, where the input is

F_{1} \in R^{C \times H \times W}

, the intermediate state F₂ and the output F₃ are shown in Equations (5) and (6), where MC and MS are channel and space attention, respectively, and denote pixel multiplication.

F_{2} = M_{C} (F_{1}) \otimes F_{1}

(5)

F_{3} = M_{S} (F_{2}) \otimes F_{2}

(6)

The channel attention part is shown in Figure 4, for the input feature layer by a three-dimensional replacement operation to maintain the dimensional features, and then the multilayer perceptron is used to amplify the channel-space dependence across dimensions for inverse alignment. The arrangement is reversed, the Activation function output is used to get the weight of F₁, and then it is multiplied with F₁ to get F₂.

The spatial attention part is shown in Figure 5. In order to fully utilize the spatial information in the feature layer the two convolutional layers are fused with spatial information. The maximum pooling method reduces the amount of information and produces a negative gain to the network, so the pooling operation is removed to retain the feature mapping. Finally, the number of parameters is effectively increased to get the weights of spatial attention.

2.3. Detailed Description of Network Architecture

The DeepLabV3+ model mainly consists of two parts, the encoder and decoder [9], where the encode part obtains the picture features; the decode part is responsible for decoding the features to obtain the prediction results, i.e., classifying the kinds corresponding to each pixel point of the input incoming pictures, and its overall network structure is shown in Figure 6.

The rice field dataset images were passed into the backbone feature extraction network to obtain two initial effective feature layers [10]: a shallow feature layer with low semantic information and a deep feature layer with high semantic information. The low semantic feature layer has a larger feature layer width and height with less semantic information due to less convolutional downsampling, while the high semantic feature layer is the opposite. After obtaining these two initial effective feature layers, the layers were processed differently.

The backbone network was constructed by stacking the Resnet residual blocks to achieve a backbone feature extraction network, which was adapted in the Deeplabv3+ application by setting the expansion rate in the 3 × 3 convolution part to obtain four size feature layers. The final output feature layer obtained from the backbone network is shown in Figure 6. The final output feature layer application obtained by the backbone network requires enhanced feature extraction at ASPP, and this output is passed through a 1 × 1 convolutional kernel, and three 3 × 3 convolutional kernels with different expansion rates, which are 6, 12, and 18 respectively. A global average pooling operation is performed, and the five feature layers obtained are subsequently fed into the GAM attention module for the final output of five feature layers for stacking. The number of channels is adjusted using a 1 × 1 convolution to finally obtain the output of the entire encoder section. The final output of the whole encoder section is obtained.

In the decoder part, the medium-size output block3 of Resnet is transferred to the BIFF module for feature fusion, and the first BIFF output obtained is saved as one end input of the next BIFF module, and the output obtained is again saved by using bilinear interpolation upsampling and block2 fusion. Block1, as a low semantic feature layer, passes through a 1 × 1 Concat stack convolution of 1 with the output mentioned earlier, and 3 × 3 convolutions are used to adjust the number of channels to the number of categories. Finally, the output image is upsampled and adjusted to the same size as the input image. This way, the output result represents the input image, and each pixel represents a different type. Different colors are assigned according to the type of pixel, and the colored small image is resized. The original image and the resized image are mixed to obtain the output image.

2.4. Calculation Method of Lodging Angle Based on Depth Camera

Based on the above study, the angle calculation method is designed by combining the detection algorithm, adding a branch of the detection head to the output to read the distance information from the depth camera, and combining the distance information with mechanical analysis to determine the relationship between the stalk and the vertical direction when the rice is upright, tilted and fallen, before we need to set the study scenario on an unmanned harvester.

Figure 7 below shows how the depth camera is fixed to the combine harvester. The depth camera is fixed to the top left corner of the combine harvester’s cab using a camera mount. The depth camera chosen for this paper is the Intel Realsense D415 [11,12], which is set to 640 × 480, 30 FPS/S for both depth and RGB color images. It should be noted that 25° downward, to capture as much of the crop at the front end of the cutter as possible, the combine cutter and paddle wheel should be avoided.

The regression operation of the detection algorithm gives the harvester the distance

d_{2}

from the farthest point without inversion and

d_{1}

from the nearest point with severe inversion.

s_{1}

is the length of the inversion area,

s_{2}

and

s_{3}

are the distances from the farthest and nearest points respectively to the straight line projected on the ground by the camera’s X-axis,

h_{1}

is the value of the camera’s fixed height from the ground and

h_{2}

is the height of the inversion area.

In practical applications it is usually approximated that the inverted area has its roots in the same plane as the non-inverted, as in Figure 8, so that the calculation of the inverted angle is simply a matter of calculating α. It can be further tapped that here it is only necessary to consider using mathematical methods to find

h_{2}

and

s_{1}

.

As the distance d from the target to the camera can be obtained from the depth camera, here the change in distance forward at a certain time can be found by using the time difference method to calculate

h_{2}

through the known harvester speed.

d_{3}

represents the distance after t moments relative to the original camera and the speed of the harvester is v, which can be derived as the angle α as in Equation (7).

α = \arccos \frac{d_{1}^{2} + {(v t)}^{2} - d_{3}^{2}}{2 d_{1} \cdot v t}

(7)

After finding α the homeopath can obtain

h_{2}

and

s_{1}

, so γ that is the angle of inversion as in Equation (8).

γ = 90 - θ = 90 - \arctan \frac{h_{2}}{s_{1}} = 90 - \arctan \frac{h_{1} - d_{1} \cdot \sin α}{s_{2} - d_{1} \cdot \cos α}

(8)

where

s_{2}

is the distance from the furthest point to the line projected on the ground by the camera’s X-axis, similar to

s_{3}

using the same method to find the difference in angle after time t, and thus obtain the distance parameter.

In the process of calculating the angle, an important parameter

s_{1}

is obtained, which can be used not only to calculate the angle of inversion, but also to calculate the area as the width of the outer rectangle of the inversion area. The length of the outer rectangle of the inverted area is:

l = X_{r} - X_{l}

(9)

X_{r}

and

X_{l}

are the X-axis coordinates of the rightmost and leftmost points in the depth camera coordinate system respectively, and are known values. Therefore, the inverted area

S_{l o d g i n g}

can be calculated as in Equation (10).

S_{l o d g i n g} = s_{1} \cdot l = s_{1} (X_{r} - X_{l})

(10)

3. Results

3.1. Data Set Acquisition and Training Setup

3.1.1. Building the Dataset

To verify the feature recognition of lodging rice and the adaptive control system for the working parameters of the combine harvester header, data measurements were conducted on the combine harvester’s reel and header control functions in the Wujiang National Modern Agriculture Demonstration Zone in Jiangsu Province. Practical experiments were conducted on lodging rice under different lodging conditions. The experiment included verifying the lodging recognition effect of the image recognition system, the adjustment function, and the effect of the combine harvester header.

This study used unmanned aerial vehicles to collect rice field images [13,14] for training, and the data were collected in the Wujiang National Modern Agriculture Demonstration Zone in Suzhou City, Jiangsu Province. The main rice varieties planted were Nanjing 46 and Ningxiangjing 9. The layout of some experimental fields adopted drone aerial photography, as shown in Figure 9 below.

The height of the camera installed on the combine harvester measured 2.0 m. The drone flight altitude was also set to this value for capturing images and videos. The drone selected was MAVIC AIR2, which used 4000× Capture RGB images at a resolution of 3000 with a resolution of 4000× Capture videos at 3000/30 fps.

The captured videos were extracted and appropriately copped using opencv [15]. According to the experimental requirements, the following divisions were made: Approximately 3000 rice field images were retained as the training dataset. They were divided into training and validation sets based on 9:1. Among them, there were 2700 training sets and 300 validation sets. The Labme tool was used to label different areas, including areas with lodging, areas without lodging, harvested areas, and background areas.

The captured video was extracted and appropriately cropped using opencv [15], and about 3000 rice field images were retained as the training data set, which was divided into training and validation sets according to 9:1

3.1.2. Experimental Platform and Training Parameter Settings

The experiments in this paper were conducted in Windows 10 operating system with the following hardware environment: CPU was I7-11700K, GPU was RTX 3070, memory 32 GB 3200 MHz, and video memory 8 GB. The software environment was as follows: Python 3.7, pytorch 11.0, CUDA 11.3.

The training settings were as follows: the number of epochs was 100, batch size was 16, optimizer was SGD, multithreading was set to 4, and initial learning rate was 10⁻².

3.1.3. Experimental Results and Data Analysis

The performance metrics used in the experiments are Precision, Recall, mIoU and Mean Accuracy. The result is shown in Figure 10 below.

In addition, the change in the loss function during the training process was also an important indicator of the performance of the network. As shown in Figure 11, the training set in this paper decreased below 0.3 after 18 epoch iterations and reached a minimum value of about 0.069 after the 98th epoch.

As shown in Figure 12, where Figure 12a is the original image and Figure 12b is the segmented effect, the red area is the detected unharvested and inverted condition, green is unharvested without inverted condition, yellow is the harvested rice field area, and dark is the background area.

Finally, the calculated parameters are exported to the detected image to form a visual interface. This is shown in Figure 13.

3.2. Ablation Experiments and Comparison of Mainstream Algorithms

3.2.1. Ablation Experiments

To explore the effect of the BIFF module and GAM attention module on the network, the corresponding ablation experiments were designed. Each set of data was measured from the above training parameter settings, with no other changes except for the corresponding modules.

The experimental data are shown in Table 1, from top to bottom are the original Deeplabv3+model, the BIFF-Deeplabv3+ model, and the improved Deeplabv3+ model, with the mPrecison, mRecall, and mIoU of the original network being 85.72%, 81.50%, and 84.75%, respectively. The performance of the network with the BIFF module, GAM attention, and the combination of the two added on top of this showed significant improvement scores. This shows the feasibility of the design of this paper.

3.2.2. Comparison of Mainstream Algorithm Models

In order to further validate the effectiveness of the improved lodging region segmentation method, corresponding training and testing were conducted on other commonly used semantic segmentation networks. The experimental data are shown in Table 2, from top to bottom, there are FCN [16], UNet [17], SegNet [18], Deeplabv3+, BIFF-Deeplabv3+, and improved Deeplabv3+ models. By comparing the trained parameters, it can be seen that the model proposed in this paper is more effective.

3.3. Lodging Angle Calculation Accuracy Experiment

In order to reduce the impact of weather and the harvesting process in areas with inversion, a manual angle measurement was carried out half an hour in advance, using a T-shaped ruler fixed perpendicular to the ground to ensure accurate measurement, and the actual angle was measured manually with a measuring tape, as shown in Figure 14a–c.

Five paddy fields with inversion were selected, and five significantly different inversion angles were measured in each field. After the measurements were taken, the measured locations needed to be marked so that the predicted angles at that location could be recorded and the errors could be calculated during the harvesting process. The data from this experiment are recorded in Table 3 below.

The accuracy of the method used in this paper to calculate the angle of fall is verified, but there are some unavoidable situations in the actual experimental process despite taking into account the influence of external uncertainties. The final calculation yielded an average relative error of about 5.364%, indicating that this paper’s harvest boundary extraction algorithm has high accuracy and has value for application in real-time harvesting scenarios.

4. Conclusions

In this paper, a bilinear interpolation upsampling feature fusion module is designed to decompose the quadruple upsampling of the connected part of the encoder and decoder into two double upsampling processes and insert the intermediate feature layer in the backbone network for feature fusion in this process, making full use of the medium-sized feature layer generated in the backbone network. The introduction of the GAM attention mechanism added to the feature extraction network part allows for network to focus effectively on the inverted region, thus effectively improving the segmentation effect. The improved network has achieved 93.55% accuracy, 93.65% recall, and 88.10% mIoU, respectively, demonstrating the feasibility of the improvement through ablation experiments and comparison with other algorithms. In addition, the angle calculation method is designed by combining the detection algorithm, adding a detection head branch to the output results for reading the distance information from the depth camera, and combining the distance information to determine the relationship between the stalk and the vertical direction when the rice is upright, tilted and fallen by mechanical analysis. A comparison of the calculated angle with the actual measured angle gives a final average error of approximately 5.364%, indicating that the angle calculation error in this paper is within an acceptable range and has some value for application in real-time harvesting scenarios.

Author Contributions

Conceptualization, X.Z. and Y.L.; methodology, X.Z.; software, X.Z.; validation, X.Z. and Y.L.; formal analysis, X.Z.; investigation, X.Z.; resources, Y.L.; data curation, Y.L.; writing—original draft preparation, X.Z.; writing—review and editing, X.Z.; visualization, X.Z.; supervision, Y.L.; funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

Please add: This work was supported by the Key Research and Development Program of Zhenjiang, China (NY2021009).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wu, W.; Ma, B.L. Understanding the trade–off between lodging resistance and seed yield, and developing some non–destructive methods for predicting crop lodging risk in canola production. Field Crops Res. 2022, 288, 108691. [Google Scholar] [CrossRef]
Jiang, S.; Hao, J.; Li, H.; Zuo, C.; Geng, X.; Sun, X. Monitoring Wheat Lodging at Various Growth Stages. Sensors 2022, 22, 6967. [Google Scholar] [CrossRef] [PubMed]
Tang, Z.; Sun, Y.; Wan, G.; Zhang, K.; Shi, H.; Zhao, Y.; Chen, S.; Zhang, X. Winter Wheat Lodging Area Extraction Using Deep Learning with GaoFen-2 Satellite Imagery. Remote Sens. 2022, 14, 4887. [Google Scholar] [CrossRef]
Yu, J.; Cheng, T.; Cai, N.; Zhou, X.G.; Diao, Z.; Wang, T.; Du, S.; Liang, D.; Zhang, D. Wheat Lodging Segmentation Based on Lstm_PSPNet Deep Learning Network. Drones 2023, 7, 143. [Google Scholar] [CrossRef]
Wen, J.; Yin, Y.; Zhang, Y.; Pan, Z.; Fan, Y. Detection of wheat lodging by binocular cameras during harvesting operation. Agriculture 2022, 13, 120. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE Press: New York, NY, USA, 2018; pp. 7132–7141. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Wang, Y.; Wang, C.; Wu, H.; Chen, P. An improved Deeplabv3+ semantic segmentation algorithm with multiple loss constraints. PLoS ONE 2022, 17, e0261582. [Google Scholar] [CrossRef] [PubMed]
Azad, R.; Asadi-Aghbolaghi, M.; Fathy, M.; Escalera, S. Attention Deeplabv3+: Multi-level Context Attention Mechanism for Skin Lesion Segmentation. In European Conference on Computer Vision; Springer International Publishing: Cham, Switzerland, 2020; pp. 7–10. [Google Scholar]
Lourenço, F.; Araujo, H. Intel RealSense SR305, D415 and L515: Experimental Evaluation and Comparison of Depth Estimation. In Proceedings of the VISIGRAPP (4: VISAPP), Virtual, 8–10 February 2021; pp. 362–369. [Google Scholar]
Carfagni, M.; Furferi, R.; Governi, L.; Santarelli, C.; Servi, M.; Uccheddu, F.; Volpe, Y. Metrological and critical characterization of the Intel D415 stereo depth camera. Sensors 2019, 19, 489. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Tan, L.; Lv, X.; Lian, X.; Wang, G. YOLOv4_Drone: UAV image target detection based on an improved YOLOv4 algorithm. Comput. Electr. Eng. 2021, 93, 107261. [Google Scholar] [CrossRef]
Yang, M.D.; Huang, K.S.; Kuo, Y.H.; Tsai, H.P.; Lin, L.M. Spatial and Spectral Hybrid Image Classification for Rice Lodging Assessment through UAV Imagery. Remote Sens. 2017, 9, 583. [Google Scholar] [CrossRef] [Green Version]
Golekar, D.; Bula, R.; Hole, R.; Katare, S.; Parab, S. Sign language recognition using Python and OpenCV. Int. Res. J. Mod. Eng. Technol. Sci. 2022, 4, 1–5. [Google Scholar]
Villa, M.; Dardenne, G.; Nasan, M.; Letissier, H.; Hamitouche, C.; Stindel, E. FCN-based approach for the automatic segmentation of bone surfaces in ultrasound images. Int. J. Comput. Assist. Radiol. Surg. 2018, 13, 1707–1716. [Google Scholar] [CrossRef] [PubMed]
Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.W.; Wu, J. Unet 3+: A full-scale connected unet for medical image segmentation. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1055–1059. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Schematic diagram of bilinear interpolation.

Figure 2. Bilinear interpolation upsampling feature fusion module.

Figure 3. Schematic diagram of GAM attention structure.

Figure 4. GAM channel attention.

Figure 5. GAM spatial attention.

Figure 6. Improved DeeplabV3+ structure.

Figure 7. Depth camera fixation with combine harvesters.

Figure 8. Inversion angle calculation model.

Figure 9. Aerial maps of the study area.

Figure 10. Training results of various evaluation standards.

Figure 11. Training loss function curve.

Figure 12. (a,b) Model generalization ability test.

Figure 13. Schematic diagram of inverted angle and area detection.

Figure 14. (a–c) Schematic diagram of the harvesting process.

Table 1. Ablation experiments.

Group	BIFF	GAM	mPrecision (%)	mRecall (%)	MIoU (%)
1	×	×	85.72	81.50	84.75
2	√	×	87.35	85.92	87.67
3	√	√	93.55	93.65	88.10

Table 2. Performance comparison of mainstream target detection models.

	mPrecison (%)	mRecall (%)	mIoU (%)	Loss
FCN	81.27	80.97	80.40	0.137
UNet	85.38	84.65	82.72	0.126
SegNet	87.08	88.81	84.28	0.073
DeeplabV3+	85.72	81.50	84.75	0.071
BIFF-DeeplabV3+	87.35	85.92	87.67	0.087
Improved DeeplabV3+	93.55	93.65	88.10	0.069

Table 3. Experimental accuracy data of lodging angle.

Group	Test	Angle 1	Angle 2	Angle 3	Angle 4	Angle 5
Group 1	True angle (°)	15.5	33.3	59.5	72.7	77.3
	Calculate angle (°)	17.0	31.3	58.2	76.4	78.9
	relative error (%)	9.7	6.0	2.2	1.8	2.1
Group 2	True angle (°)	14.3	23.5	53.8	65.5	70.2
	Calculate angle (°)	16.2	25.7	52.5	63.4	70.8
	relative error (%)	13.3	9.4	2.4	3.2	0.9
Group 3	True angle (°)	16.4	27.8	45.7	52.5	75.3
	Calculate angle (°)	14.2	25.2	42.3	55.3	72.3
	relative error (%)	13.4	9.4	7.4	3.4	4.0
Group 4	True angle (°)	20.4	29.7	45.3	55.5	70.5
	Calculate angle (°)	21.2	25.4	42.5	53.6	70.9
	relative error (%)	4.0	14.5	4.4	3.4	0.6
Group 5	True angle (°)	22.4	30.7	47.5	56.7	67.5
	Calculate angle (°)	20.4	30.0	45.2	55.7	68.4
	relative error (%)	8.9	2.3	4.8	1.2	1.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhuang, X.; Li, Y. Segmentation and Angle Calculation of Rice Lodging during Harvesting by a Combine Harvester. Agriculture 2023, 13, 1425. https://doi.org/10.3390/agriculture13071425

AMA Style

Zhuang X, Li Y. Segmentation and Angle Calculation of Rice Lodging during Harvesting by a Combine Harvester. Agriculture. 2023; 13(7):1425. https://doi.org/10.3390/agriculture13071425

Chicago/Turabian Style

Zhuang, Xiaobo, and Yaoming Li. 2023. "Segmentation and Angle Calculation of Rice Lodging during Harvesting by a Combine Harvester" Agriculture 13, no. 7: 1425. https://doi.org/10.3390/agriculture13071425

APA Style

Zhuang, X., & Li, Y. (2023). Segmentation and Angle Calculation of Rice Lodging during Harvesting by a Combine Harvester. Agriculture, 13(7), 1425. https://doi.org/10.3390/agriculture13071425

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Segmentation and Angle Calculation of Rice Lodging during Harvesting by a Combine Harvester

Abstract

1. Introduction

2. Materials and Methods

2.1. Feature Fusion Module Based on Bilinear Interpolation Upsampling Design

2.2. Global Attention Mechanism Attention Module

2.3. Detailed Description of Network Architecture

2.4. Calculation Method of Lodging Angle Based on Depth Camera

3. Results

3.1. Data Set Acquisition and Training Setup

3.1.1. Building the Dataset

3.1.2. Experimental Platform and Training Parameter Settings

3.1.3. Experimental Results and Data Analysis

3.2. Ablation Experiments and Comparison of Mainstream Algorithms

3.2.1. Ablation Experiments

3.2.2. Comparison of Mainstream Algorithm Models

3.3. Lodging Angle Calculation Accuracy Experiment

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI