Classiﬁcation of Maize Lodging Extents Using Deep Learning Algorithms by UAV-Based RGB and Multispectral Images

: Lodging depresses the grain yield and quality of maize crop. Previous machine learning methods are used to classify crop lodging extents through visual interpretation and sensitive features extraction manually, which are cost-intensive, subjective and inefﬁcient. The analysis on the accuracy of subdivision categories is insufﬁcient for multi-grade crop lodging. In this study, a classiﬁcation method of maize lodging extents was proposed based on deep learning algorithms and unmanned aerial vehicle (UAV) RGB and multispectral images. The characteristic variation of three lodging extents in RGB and multispectral images were analyzed. The VGG-16, Inception-V3 and ResNet-50 algorithms were trained and compared depending on classiﬁcation accuracy and Kappa coefﬁcient. The results showed that the more severe the lodging, the higher the intensity value and spectral reﬂectance of RGB and multispectral image. The reﬂectance variation in red edge band were more evident than that in visible band with different lodging extents. The classiﬁcation performance using multispectral images was better than that of RGB images in various lodging extents. The test accuracies of three deep learning algorithms in non-lodging based on RGB images were high, i.e., over 90%, but the classiﬁcation performance between moderate lodging and severe lodging needed to be improved. The test accuracy of ResNet-50 was 96.32% with Kappa coefﬁcients of 0.9551 by using multispectral images, which was superior to VGG-16 and Inception-V3, and the accuracies of ResNet-50 on each lodging subdivision category all reached 96%. The ResNet-50 algorithm of deep learning combined with multispectral images can realize accurate lodging classiﬁcation to promote post-stress ﬁeld management and production assessment. net using small convolution kernels (e.g., 1 × 1, 3 × 3) to reduce the computational cost effectively. On the basis of that, Inception-V3 decomposes the 3 × 3 convolution kernel into 1 × 3 and 3 × 1 convolution kernels (Figure 8). The depth and nonlinearity of the network increase, which makes the network classiﬁcation ability stronger.


Introduction
According to the data released by China's National Bureau of Statistics, the planting area of maize reached 43.324 million hectares in 2021, increasing by 2.059 million hectares compared with 2020. The total output of maize achieved 272 million tons, which made it the most productive of China's major crops. The yield variance of maize has an important impact on national food security and agricultural economic development. However, crop lodging is one of the major negative elements to affect maize output. It is stated as the displacement of the above-ground stems from their upright position or failure of root-soil attachment [1]. Lodging is generally caused by rainstorms, loose soil, high planting density and unreasonable fertilization [2][3][4]. Lodging hinders the growth of maize [5], reduces grain quality [6] and affects mechanized harvesting [7], which is becoming an important restricting issue to increase maize yield [8]. Therefore, precise and efficient classification with different maize lodging extents can help agricultural departments to investigate the influence of maize growth, guide farmers to implement post-stress field management and facilitate insurance firms to settle disputes properly [9,10].
The traditional lodging assessment methods widely used are mainly visual inspection and artificial measurement [11], which are inefficient, time-consuming and environmentconstraining [12]. The inaccuracy and subjectivity of them may lead to compensation disputes between farmers and insurance companies, which cannot meet the needs of precision agriculture. Remote sensing technology, as a new approach, has greatly promoted the development of crop lodging detection [13]. The lodging incidence in wheat, rice and barley were detected using visible and thermal infrared images based on the ground-based and space-borne platforms [14][15][16][17]. In recent decades, the unmanned aerial vehicle (UAV) has been increasingly applied for lodging monitoring due to its advantages of convenient, flexible, low cost and high resolution [18,19]. It can timely and accurately obtain centimeterlevel images with multiple sensors, which plays a powerful role in lodging detection [20]. Many studies detected crop lodging based on a UAV system equipped with a digital camera. They discriminated lodging from non-lodging and evaluated the extents of crop lodging by analyzing color and texture features [21][22][23]. However, compared to RGB images with only three visible bands, multispectral images with red edge and infrared bands reflecting the growth capacity of crops can offer more information in crop lodging [12,24,25]. Both spatial and spectral information of ground targets are obtained in the meantime. Therefore, the information richness of lodging features between these two types of images are different. It is worth studying to verify the performance of discriminating lodging severity extents using RGB and multispectral images.
The appropriate classification methods for crop lodging extents are significant as well as the selection of data source. Traditionally, machine learning algorithms consisting of a support vector machine (SVM) [8], decision tree [26] and nearest neighbor [5] were used to classify lodging by extracting crop morphology and spectral characteristics [21,23,27]. However, these manual approaches for extracting features often required empirical knowledge and were typically suboptimal in the results [28]. With the development of machine learning, the convolutional neural network (CNN) of deep learning has gradually become the mainstream. CNN algorithms can automatically extract image features, and depict rich intrinsic information with strong nonlinear modeling ability. Xia hao et al. [29] proposed a classification model named GL-CNN on account of convolutional neural networks to determine the optimal growth stage of leafy vegetables. Ananda et al. [30] used the Visual Geometry Group (VGG) model to achieve the disease detection and classification of grapes and tomatoes. CNN has been proved to be superior to existing traditional machine classification algorithms [31]. The Inception and ResNet algorithms were proposed with better performance, which could automatically extract target features from images more accurately. They have been widely used in disease detection and crop classification in intelligent agriculture [32,33]. However, there are few studies on maize lodging classification based on deep learning algorithms. The maize lodging characteristics of multiple data types need to be analyzed. The performance difference using RGB and multispectral images will be compared. Previous studies have often focused on the overall classification accuracy of crop lodging, which were unable to fully embody the quality of the model. The classification effects of algorithms under subdivision categories are also worthy of attention.
The purpose of this study is to use deep learning algorithms to monitor the lodging extents of maize based on RGB or multispectral images. The lodging extents are discriminated as non-lodging, moderate lodging and severe lodging by lodging angle. The specific objectives are as follows: (1) to analyze the characteristics of obtained images with different lodging extents, (2) classify lodging extents of maize based on RGB and multispectral images through VGG-16, Inception-V3 and ResNet-50 algorithms and (3) evaluate classification performance in different lodging extents to determine the optimal algorithm.

Materials and Methods
The processes of classifying maize lodging extents in the study were showed in Figure 1. The RGB and multispectral images were acquired via UAV, which were respectively cropped, augmented and labeled to build the datasets. The difference of each band

Materials and Methods
The processes of classifying maize lodging extents in the study were showed in Figure 1. The RGB and multispectral images were acquired via UAV, which were respectively cropped, augmented and labeled to build the datasets. The difference of each band of RGB and multispectral images caused by maize lodging extents were analyzed. The classification results of maize lodging extents using three deep learning algorithms were compared and validated.

Study Area
The study area is located in Lishu County, Siping City, southwest Jilin Province, China ( Figure 2). The geographical coordinates of it are 43°02′ N-43°46′ N, 123°45′ E-124°53′ E. Lishu is in the hinterland of Songliao Plain and the major grain producing county with a maize planting area of 213,300 hectares. During the maize growth period, sunshine and precipitation are sufficient, which can fully meet the growth needs of one ripe a year. From late August to early September in 2020, strong winds and heavy rain caused crop lodging.

Study Area
The study area is located in Lishu County, Siping City, southwest Jilin Province, China ( Figure 2). The geographical coordinates of it are 43 • 02 N-43 • 46 N, 123 • 45 E-124 • 53 E. Lishu is in the hinterland of Songliao Plain and the major grain producing county with a maize planting area of 213,300 hectares. During the maize growth period, sunshine and precipitation are sufficient, which can fully meet the growth needs of one ripe a year. From late August to early September in 2020, strong winds and heavy rain caused crop lodging.

Data Acquisition
The data collection of maize lodging canopy images in this study was performed with a DJI Phantom 4 Pro (DJI-Innovations, Inc., Shenzhen, China) at 12:00 am on 12 September 2020. The weather was cloudless and windless. The overall weight of UAV system is 1388 g, and the duration of flight is about 30 min. In this study, the flight altitude was 30 m

Data Acquisition
The data collection of maize lodging canopy images in this study was performed with a DJI Phantom 4 Pro (DJI-Innovations, Inc., Shenzhen, China) at 12:00 am on 12 September 2020. The weather was cloudless and windless. The overall weight of UAV system is 1388 g, and the duration of flight is about 30 min. In this study, the flight altitude was 30 m above the ground. The forward and lateral overlap was 80%. The digital camera had three color channels of red, green and blue with a resolution of 1 cm/pixel. The multispectral images were collected by a Parrot Sequoia camera (MicaSense, Inc., Seattle, DC, USA). It consisted of four multispectral channels of green (550 nm), red (660 nm), red edge (735 nm) and near-infrared (790 nm) with a resolution of 2 cm/pixel. The global positioning system (GPS) and irradiance sensors were equipped at the same time. Before and after each flight, radiometric calibration images were obtained by a calibrated reflectance panel. The field inspection was taken after UAV data acquisition. Lodging has a huge impact on both yield and grain quality. Lodging caused a maize yield loss of approximately 0-50% at different lodging angles [3]. In general, the smaller the lodging angle, the smaller the yield loss. Lodging classification can provide a basis for predicting future harvest yield. According to the investigation of maize lodging in the study area, we categorically defined three lodging extents based on crop lodging angle: non-lodging (NL) maize with a crop angle <10 • , moderate lodging (ML) maize with a crop angle between 10-50 • and severe lodging (SL) maize with a crop angle >50 • (Figure 3).

Data Cleaning and Augmentation
RGB and multispectral images of the entire study area were obtained by Agisoft Photoscan software. RGB images were resampled to 2 cm/pixel to match the resolution of multispectral images. Then, the images of the entire study area were cropped into small images with a resolution of 300 × 300 pixels. The actual spatial size of each image was 6 m, achieving a precise classification of maize lodging. Considering the partial areas of the images were not related to maize lodging, the original dataset of 1326 images was acquired by deleting the cropped images containing roads and weeds. Then, each sample was labeled as non-lodging, moderate lodging and severe lodging by an expert through visual interpretation ( Figure 4).

Data Cleaning and Augmentation
RGB and multispectral images of the entire study area were obtained by Agisoft Photoscan software. RGB images were resampled to 2 cm/pixel to match the resolution of multispectral images. Then, the images of the entire study area were cropped into small images with a resolution of 300 × 300 pixels. The actual spatial size of each image was 6 m, achieving a precise classification of maize lodging. Considering the partial areas of the images were not related to maize lodging, the original dataset of 1326 images was acquired by deleting the cropped images containing roads and weeds. Then, each sample was labeled as non-lodging, moderate lodging and severe lodging by an expert through visual interpretation ( Figure 4). m, achieving a precise classification of maize lodging. Considering the partial areas of the images were not related to maize lodging, the original dataset of 1326 images was acquired by deleting the cropped images containing roads and weeds. Then, each sample was labeled as non-lodging, moderate lodging and severe lodging by an expert through visual interpretation ( Figure 4). For the purpose of improving the overall generalization ability of the model, abundant training images are needed in the deep learning algorithms to avoid over-fitting. Data augmentation undertakes a more crucial improvement upon the classification accuracy in the dataset [28]. Therefore, we performed data augmentation on the obtained dataset to expand the number of samples. In this study, we enhanced image numbers by random rotation, horizontal inversion and vertical inversion. A dataset of 5000 RGB images and 5000 multispectral images was generated by data augmentation without introducing extra labeling costs. The dataset included 1616 non-lodging samples, 1684 moderate lodging samples and 1700 severe lodging samples. The results of image augmentation taking an RGB image as an example are shown in Figure 5. For the purpose of improving the overall generalization ability of the model, abundant training images are needed in the deep learning algorithms to avoid over-fitting. Data augmentation undertakes a more crucial improvement upon the classification accuracy in the dataset [28]. Therefore, we performed data augmentation on the obtained dataset to expand the number of samples. In this study, we enhanced image numbers by random rotation, horizontal inversion and vertical inversion. A dataset of 5000 RGB images and 5000 multispectral images was generated by data augmentation without introducing extra labeling costs. The dataset included 1616 non-lodging samples, 1684 moderate lodging samples and 1700 severe lodging samples. The results of image augmentation taking an RGB image as an example are shown in Figure 5.    [34]. CNN architecture is mainly divided into convolution layer, pooling layer and fully connected layer ( Figure 6). The various aspects in the whole image are assigned importance for establishing a distinction between different objects in convolution layer. The weights of convolution kernels (not directly accessible to users) are constantly updated during algorithm iterations. After the convolution, the pooling operation can reduce the spatial size of the convolved features. It can help reduce the computing power requirements of data processing. We generally use two pooling methods, including maximum pooling and average pooling. Maximum pooling was superior in this study, because it could suppress noise while reducing dimension. Convolution and pooling layers were combined to extract image features of different levels. The last layer is the fully connected layer, which identifies the extracted features and provides the predicted label by using Softmax regression classifier eventually.

Convolutional Neural Networks
Convolutional neural networks (CNN) have been essential to the developm deep learning. Remarkable advancements have been made on image classification CNN architecture is mainly divided into convolution layer, pooling layer and fully nected layer ( Figure 6). The various aspects in the whole image are assigned impor for establishing a distinction between different objects in convolution layer. The w of convolution kernels (not directly accessible to users) are constantly updated duri gorithm iterations. After the convolution, the pooling operation can reduce the spatia of the convolved features. It can help reduce the computing power requirements o processing. We generally use two pooling methods, including maximum pooling an erage pooling. Maximum pooling was superior in this study, because it could sup noise while reducing dimension. Convolution and pooling layers were combined tract image features of different levels. The last layer is the fully connected layer, w identifies the extracted features and provides the predicted label by using Softmax r sion classifier eventually.

VGG-16
VGG-16 is a CNN algorithm proposed by the Visual Geometry Group of Oxford University [35]. It consists of thirteen convolution layers (extracting image features), five maximum pooling layers (reducing image spatial size) and three fully connected layers (classifying images into labels) (Figure 7). Compared with traditional convolutional neural networks, this algorithm uses a 3 × 3 convolution kernel to replace the larger one (e.g., 5 × 5, 7 × 7). This optimization effectively reduces the number of model parameters and extracts the detail features of the images more accurately. Hence, it can improve the computing speed and has good generalization performance. VGG-16 is a CNN algorithm proposed by the Visual Geometry Group of Oxford University [35]. It consists of thirteen convolution layers (extracting image features), five maximum pooling layers (reducing image spatial size) and three fully connected layers (classifying images into labels) (Figure 7). Compared with traditional convolutional neural networks, this algorithm uses a 3 × 3 convolution kernel to replace the larger one (e.g., 5 × 5, 7 × 7). This optimization effectively reduces the number of model parameters and extracts the detail features of the images more accurately. Hence, it can improve the computing speed and has good generalization performance.

Inception-V3
Inception-V3 is the most representative algorithm among inception algorithms [36]. It uses the Inception module, which performs multiple convolution and max pooling operations in parallel to obtain a deeper feature map. The Inception-V2 references VGG net using small convolution kernels (e.g., 1 × 1, 3 × 3) to reduce the computational cost effectively. On the basis of that, Inception-V3 decomposes the 3 × 3 convolution kernel into 1 × 3 and 3 × 1 convolution kernels (Figure 8). The depth and nonlinearity of the network

Inception-V3
Inception-V3 is the most representative algorithm among inception algorithms [36]. It uses the Inception module, which performs multiple convolution and max pooling operations in parallel to obtain a deeper feature map. The Inception-V2 references VGG Agriculture 2022, 12, 970 7 of 16 net using small convolution kernels (e.g., 1 × 1, 3 × 3) to reduce the computational cost effectively. On the basis of that, Inception-V3 decomposes the 3 × 3 convolution kernel into 1 × 3 and 3 × 1 convolution kernels ( Figure 8). The depth and nonlinearity of the network increase, which makes the network classification ability stronger.

Inception-V3
Inception-V3 is the most representative algorithm among inception algorithm It uses the Inception module, which performs multiple convolution and max pool erations in parallel to obtain a deeper feature map. The Inception-V2 references V using small convolution kernels (e.g., 1 × 1, 3 × 3) to reduce the computational cos tively. On the basis of that, Inception-V3 decomposes the 3 × 3 convolution kernel 3 and 3 × 1 convolution kernels (Figure 8). The depth and nonlinearity of the n increase, which makes the network classification ability stronger.

ResNet-50
ResNet-50 is proposed to solve the degradation problem in neural network tr which means the performance of the algorithm decreases with the deepening of n layers [37]. Residual block is the core of ResNet network (Figure 9), which mainly co the convolution layer across layer by jumping connection and short circuit method transfer the input x as the initial result directly to the output, ensuring the integrity information. The output result is H(x) = F(x) + x, where F(x) is the residual function

ResNet-50
ResNet-50 is proposed to solve the degradation problem in neural network training, which means the performance of the algorithm decreases with the deepening of network layers [37]. Residual block is the core of ResNet network (Figure 9), which mainly connects the convolution layer across layer by jumping connection and short circuit methods. It can transfer the input x as the initial result directly to the output, ensuring the integrity of the information. The output result is H(x) = F(x) + x, where F(x) is the residual function, which helps to transmit information to deeper neural networks and improve the accuracy of the algorithm. 022, 12, x FOR PEER REVIEW helps to transmit information to deeper neural networks and imp algorithm.   These three algorithms were used to classify different lodging extents and test their accuracy performance. The ReLU function was used as the activation function, and the dropout layer was imported to prevent the algorithms from overfitting (dropout_ratio = 0.5). The last layer (fully connected layer) was replaced by three classification categories to adapt to the dataset of this study.
To demonstrate the algorithms' validity and reliability, 70% of the samples (without substitution) were randomly selected as the training set and the remaining 30% of samples were the test set.

Validation
The image classification results of the dataset are evaluated by the confusion matrix, test accuracy and Kappa coefficient. The test accuracy is figured by the ratio between the number of correctly classified samples and the total number of samples in the test set. Kappa coefficient is a robust measure of the extents of agreement. In order to evaluate these indicators more persuasively, we repeated the experiments 10 times. The test accuracy and Kappa coefficients were calculated by the following formula and recorded as the average of ten repetitions: where x ii refers to the correctly predicted samples, x ij refers to the elements of the i-th row and j-th column of the confusion matrix, n is the number of classifications and N is the total number of samples in test set.

Research Images Analysis
In order to obtain an understanding of the lodging features under different types of images better, all the samples were used to observe the characteristics variation of maize canopy in the different lodging extents. The intensity values of RGB images and the reflectance of multispectral images were extracted directly by the statistical function of the ENVI 5.3 software.

RGB Images Analysis
RGB images contain the intensity values in red, green and blue color channels ranging from 0 to 255. Different intensity values of the three channels are combined into different colors. The means and standard deviations of three channels with different lodging extents were calculated in Figure 10. The intensity values of lodging (moderate lodging and severe lodging) were all significantly higher than that of non-lodging in three bands, but those of moderate and severe lodging were close relatively. In the non-lodging area, there were interspaces between maize plants along with shadows, and the soil was exposed to aerial photography, which made the intensity values low. After lodging, the plants tilted and piled each other, causing the soil to be covered. The intensity values increased with the decrease of soil bareness and the increase of plant density. Meanwhile, the changes of intensity values with different lodging extents were consistent in three bands, which were the lowest in blue and highest in green band. In Table 1, compared with the intensity values of non-lodging maize in three bands, those of the moderate lodging increased by 37.64%, 21.68% and 27.73%, and those of the severe lodging increased by 53.81%, 32.89% and 40.81%, respectively. It showed that the intensity values increased rapidly after lodging, and the increase rate of the values in the blue band was the highest. the changes of intensity values with different lodging extents were consistent in three bands, which were the lowest in blue and highest in green band. In Table 1, compared with the intensity values of non-lodging maize in three bands, those of the moderate lodging increased by 37.64%, 21.68% and 27.73%, and those of the severe lodging increased by 53.81%, 32.89% and 40.81%, respectively. It showed that the intensity values increased rapidly after lodging, and the increase rate of the values in the blue band was the highest.

. Multispectral Images Analysis
Multispectral images show the reflectance in green, red, red edge and near-infrared bands with different lodging extents. The reflectance ranges from 0 to 1. The means and standard deviations of four channels with different lodging extents were calculated in Figure 11. The spectral reflectance increased following the enhancement of lodging extents in four bands. The reason was that lodging has changed the morphological structure of the maize population. The original maize canopy was damaged with the stems exposed. As the severity of maize lodging increased, more stems were exposed in aerial images taken by the UAV. Furthermore, the reflectance of the leaf was lower than that of the stem [24]. In Table 2, the reflectance of the red edge and near-infrared bands was significantly higher than that of the green and red bands. Compared with the reflectance of non-lodging maize in four bands, that of the moderate lodging increased by 6.45%, 12.50%, 19.51%

Multispectral Images Analysis
Multispectral images show the reflectance in green, red, red edge and near-infrared bands with different lodging extents. The reflectance ranges from 0 to 1. The means and standard deviations of four channels with different lodging extents were calculated in Figure 11. The spectral reflectance increased following the enhancement of lodging extents in four bands. The reason was that lodging has changed the morphological structure of the maize population. The original maize canopy was damaged with the stems exposed. As the severity of maize lodging increased, more stems were exposed in aerial images taken by the UAV. Furthermore, the reflectance of the leaf was lower than that of the stem [24]. In Table 2, the reflectance of the red edge and near-infrared bands was significantly higher than that of the green and red bands. Compared with the reflectance of non-lodging maize in four bands, that of the moderate lodging increased by 6.45%, 12.50%, 19.51% and 13.20%, and that of the severe lodging increased by 19.35%, 25%, 36.58% and 20.75%, respectively. It indicated that in different lodging extents, the variation of reflectance in the red edge band was more evident than that in the visible band, which meant the increase rate of reflectance in the red edge band was the largest as well. respectively. It indicated that in different lodging extents, the variation of re the red edge band was more evident than that in the visible band, which m crease rate of reflectance in the red edge band was the largest as well.

Lodging Classification Using RGB Images
VGG-16, Inception-V3 and ResNet-50 had pre-trained CNN models to dea images. Their weight parameters were trained and identified based on a huge RGB images from the ImageNet Dataset (http://image-net.org/index, accessed 2022). The transfer-learning method could achieve sharing of model features hyperparameter transfer. Therefore, the backbone parameters of three CNN were initialized using the pre-trained weights, which could save algorithm tr and obtain accurate results. The PyTorch framework with Python 3.6 was used all experiments, and GTX 1070 6G GPU was employed to accelerate the over The learning rate, batch size and the number of iterations of the three algorith to 0.0001, 20 and 100, respectively. The networks were trained with the Adam Figure 11. Spectral reflectance of four bands under different lodging extents.

Lodging Classification Using RGB Images
VGG-16, Inception-V3 and ResNet-50 had pre-trained CNN models to deal with RGB images. Their weight parameters were trained and identified based on a huge number of RGB images from the ImageNet Dataset (http://image-net.org/index, accessed on 4 March 2022). The transfer-learning method could achieve sharing of model features through the hyperparameter transfer. Therefore, the backbone parameters of three CNN algorithms were initialized using the pre-trained weights, which could save algorithm training time and obtain accurate results. The PyTorch framework with Python 3.6 was used to support all experiments, and GTX 1070 6G GPU was employed to accelerate the overall process. The learning rate, batch size and the number of iterations of the three algorithms were set to 0.0001, 20 and 100, respectively. The networks were trained with the Adam optimizer and cross-entropy loss function to optimize the objectives.
The changes of classification accuracy and loss in three algorithms during 100 iterations were shown in Figure 12. Due to the use of pre-training models, the initial training accuracies were all more than 0.6. With the continuous optimization of the algorithms, the classification accuracy improved rapidly. Eventually, the training accuracy of the three algorithms reached 86.16%, 91.89% and 94.16%, respectively. In addition, we chose crossentropy as the loss function, and loss gradually decreased following the opposite overall trend to accuracy curves. Both of them began to maintain stability after approximately 20 iterations. The convergence rate of ResNet-50 algorithm was obviously faster than the other two algorithms. In Table 3, the test accuracies of the three algorithms were 83.55%, 87.32% and 90.08% with Kappa coefficients of 0.7421, 0.8040 and 0.8599, respectively. The overfitting phenomenon did not occur in the training process. ResNet-50 obtained the optimal performance in three algorithms, whose test accuracy was 7.81% and 3.16% higher than VGG-16 and Inception-V3. However, in addition to discussing the overall classification accuracies of the three algorithms, the classification performance of different categories was also worth further analysis.
overfitting phenomenon did not occur in the training process. ResNet-50 obtained the optimal performance in three algorithms, whose test accuracy was 7.81% and 3.16% higher than VGG-16 and Inception-V3. However, in addition to discussing the overall classification accuracies of the three algorithms, the classification performance of different categories was also worth further analysis.  The confusion matrices of the three algorithms are shown in Figure 13. It indicated that the performance varied for different lodging severity extents in three algorithms. The identifications of non-lodging all achieved good results, whose classification accuracies were more than 90%. The accuracies of Inception-V3 and ResNet-50 in moderate lodging were improved over 10% compared to that of VGG-16. The three algorithms had no distinct differences in severe lodging. However, the classification error of the three algorithms between moderate lodging and severe lodging was high with almost over 10%, especially VGG-16, which made it difficult to identify the subdivision of lodging effectively.  The confusion matrices of the three algorithms are shown in Figure 13. It indicated that the performance varied for different lodging severity extents in three algorithms. The identifications of non-lodging all achieved good results, whose classification accuracies were more than 90%. The accuracies of Inception-V3 and ResNet-50 in moderate lodging were improved over 10% compared to that of VGG-16. The three algorithms had no distinct differences in severe lodging. However, the classification error of the three algorithms between moderate lodging and severe lodging was high with almost over 10%, especially VGG-16, which made it difficult to identify the subdivision of lodging effectively.
Agriculture 2022, 12, x FOR PEER REVIEW 12 of 17 Figure 13. The confusion matrices of the three algorithms through RGB images. Types of lodging extents: NL is non-lodging, ML is moderate lodging, SL is severe lodging.

Lodging Classification Using Multispectral Images
For multispectral images, the backbone parameters of the three algorithms needed to be randomly initialized to retrain models by the Xavier initialization method [38]. The last layer (fully connected layer) was replaced by three classification categories as well. The Figure 13. The confusion matrices of the three algorithms through RGB images. Types of lodging extents: NL is non-lodging, ML is moderate lodging, SL is severe lodging.

Lodging Classification Using Multispectral Images
For multispectral images, the backbone parameters of the three algorithms needed to be randomly initialized to retrain models by the Xavier initialization method [38]. The last layer (fully connected layer) was replaced by three classification categories as well. The software environment and hyperparameter settings were the same as the operation on the RGB images.
The fluctuations of classification accuracy and loss of the three algorithms during 100 iterations using multispectral images were represented in Figure 14. In the early stage of algorithm optimization, the accuracy and loss curves showed an oscillating trend. That was because the algorithms were quickly adjusting the parameters to meet the classification requirements at the beginning. Then, the training accuracy of the three algorithms gradually increased and converged after 60 iterations with 92.34%, 94.70% and 98.55%, respectively. With the continuous optimization, the loss decreased quickly and ResNet-50 was the first to realize convergence. In Table 4, the test accuracies of the three algorithms were 89.91%, 92.36% and 96.32% with Kappa coefficients of 0.8318, 0.8935 and 0.9551, respectively. There was no over-fitting phenomenon in the training process as well. The test accuracy of ResNet-50 was 7.12% and 4.28% higher than VGG-16 and Inception-V3, respectively. Figure 13. The confusion matrices of the three algorithms through RGB images. Types of lodging extents: NL is non-lodging, ML is moderate lodging, SL is severe lodging.

Lodging Classification Using Multispectral Images
For multispectral images, the backbone parameters of the three algorithms needed to be randomly initialized to retrain models by the Xavier initialization method [38]. The last layer (fully connected layer) was replaced by three classification categories as well. The software environment and hyperparameter settings were the same as the operation on the RGB images.
The fluctuations of classification accuracy and loss of the three algorithms during 100 iterations using multispectral images were represented in Figure 14. In the early stage of algorithm optimization, the accuracy and loss curves showed an oscillating trend. That was because the algorithms were quickly adjusting the parameters to meet the classification requirements at the beginning. Then, the training accuracy of the three algorithms gradually increased and converged after 60 iterations with 92.34%, 94.70% and 98.55%, respectively. With the continuous optimization, the loss decreased quickly and ResNet-50 was the first to realize convergence. In Table 4, the test accuracies of the three algorithms were 89.91%, 92.36% and 96.32% with Kappa coefficients of 0.8318, 0.8935 and 0.9551, respectively. There was no over-fitting phenomenon in the training process as well. The test accuracy of ResNet-50 was 7.12% and 4.28% higher than VGG-16 and Inception-V3, respectively.  The confusion matrices of the three algorithms through multispectral images are shown in Figure 15. The three algorithms still performed well in the classification of nonlodging, which was higher than 92%. Compared with the RGB images, the classification of moderate lodging and severe lodging was significantly improved by multispectral images, and the accuracies of Inception-V3 and ResNet-50 were more than 90%. The accuracy error of ResNet-50 between moderate lodging and severe lodging was less than 5%, which can better classify the three extents of maize lodging.
ResNet-50 96.32% 0.9551 The confusion matrices of the three algorithms through multispectral images are shown in Figure 15. The three algorithms still performed well in the classification of nonlodging, which was higher than 92%. Compared with the RGB images, the classification of moderate lodging and severe lodging was significantly improved by multispectral images, and the accuracies of Inception-V3 and ResNet-50 were more than 90%. The accuracy error of ResNet-50 between moderate lodging and severe lodging was less than 5%, which can better classify the three extents of maize lodging. Figure 15. Confusion matrix for the three algorithms through multispectral images. NL is non-lodging, ML is moderate lodging, SL is severe lodging.

Classification Results
In this study, the experiment results indicated that the overall performance of three deep learning algorithms using multispectral images in classification of different maize lodging extents was better than that of RGB images with an increase of 6.42%, 5.77% and 6.93% (Figure 16). The maize lodging classification based on RGB images using three algorithms realized high accuracy in non-lodging, which was suitable for the binary classification of lodging and non-lodging. Among the three deep learning algorithms, ResNet-50 was efficient and robust to classify the different lodging extents with the fastest convergence rate and highest classification accuracy during algorithm training. ResNet-50 also had the highest improvement in classification accuracy of multispectral images compared with RGB images, which could extract the lodging features more effectively. Therefore, ResNet-50 was the optimal algorithm to realize the classification of maize lodging extents.

Classification Results
In this study, the experiment results indicated that the overall performance of three deep learning algorithms using multispectral images in classification of different maize lodging extents was better than that of RGB images with an increase of 6.42%, 5.77% and 6.93% (Figure 16). The maize lodging classification based on RGB images using three algorithms realized high accuracy in non-lodging, which was suitable for the binary classification of lodging and non-lodging. Among the three deep learning algorithms, ResNet-50 was efficient and robust to classify the different lodging extents with the fastest convergence rate and highest classification accuracy during algorithm training. ResNet-50 also had the highest improvement in classification accuracy of multispectral images compared with RGB images, which could extract the lodging features more effectively. Therefore, ResNet-50 was the optimal algorithm to realize the classification of maize lodging extents.

Discussion
Lodging is a major factor in decreasing the crop yields worldwide. Accurate classification of lodging extents is beneficial to monitoring crop production and conducting reasonable decision-making. Timely and effectively obtaining experimental data plays a crucial role in it. Some researchers used satellite data to conduct crop lodging studies [14,39]. However, it is susceptible to clouds and the revisiting time is long with low spatial resolution. With the development of UAV technology, remote sensing research based on UAVs platform has been highly valued and become a hotspot [40]. The wide application of UAVs has indeed facilitated the monitoring of crop lodging. Tan et al. [23] used RGB images for grading lodging severity with the accuracy of 79.1%. Sun et al. [25] realized the detection of maize lodging with the overall accuracy of 86.61% and the Kappa coefficient of 0.8327 using maximum likelihood classification (MLC) by multispectral images. Furthermore, through applying machine learning methods, such as nearest neighborhood classification and Support Vector Machine (SVM), Chauhan et al. [24] and Rajapaksa et al. [41] reported the wheat lodging classification using multispectral images with 90% and 92.6% accuracies, respectively. Multispectral images had more potential to explore the characteristics of crop lodging. The lodging feature extraction is also of great significance for the classification result. Canopy texture, crop height, spectral reflectance and vegetation indices were extracted separately to research in the above study. The extraction process was both time-consuming and subjective. The features extracted for different crops were also different. It created difficult problems for further research of crop lodging.
We further realized the maize lodging classification based on deep learning algorithms. Deep learning algorithms can automatically extract intrinsic features from massive data through supervised learning to classify different lodging extents. Among the three deep learning algorithms in this study, the ResNet-50 algorithm performed best, with a test accuracy of 96.32% and Kappa coefficients of 0.9551, which was significantly better than traditional machine learning algorithms. On the type of images used above, although the lodging classification using multispectral images was more accurate, the low cost of RGB images acquisition and more than 80% test accuracy made it more beneficial for smallholders to detect crop lodging. The application of transfer-learning method can greatly shorten the training time of the models, which can facilitate more timely agricultural disaster assessment and management. In addition, through using multispectral images, the reflectance variation in red edge band was more evident than that in visible band with the increase of lodging severity extents, which may be an important factor for better lodging classification using multispectral images. Using red edge band to extract sensitive features for classifying lodging extents is worth further study.
There are still some deficiencies that need to be improved. We divided the experimental plots into three lodging extents. Further detailed classification of the lodging extents is necessary, which meets the requirements of precision agriculture. Moreover, the models presented in this study need to be tested and validated in other crop lodging classifications. The solution of them can serve crop yield prediction and precise agricultural insurance claim.

Conclusions
In this study, unmanned aerial vehicles (UAVs) provided convenience for multiple types of data acquisition. The RGB and multispectral images of maize lodging canopy were tested to classify different lodging extents. The images were preprocessed by cropping, cleaning and enhancing to generate the dataset containing 5000 subimages. The experimental results indicated that the spectral reflectance increased with the increase of lodging severity on the multispectral images of maize lodging. The red edge band was the most sensitive to the change of lodging severity extents. The classification performance of the three algorithms using RGB images, although good for non-lodging with over 90% accuracy, was unsatisfactory for moderate and severe lodging. The test accuracies of VGG-16, Inception-V3 and ResNet-50 were 89.91%, 92.36% and 96.32% with Kappa coefficients of