Maize Nitrogen Grading Estimation Method Based on UAV Images and an Improved Shufﬂenet Network

: Maize is a vital crop in China for both food and industry. The nitrogen content plays a crucial role in its growth and yield. Previous researchers have conducted numerous studies on the issue of the nitrogen content in single maize plants from a regression perspective; however, partition management techniques of precision agriculture require plants to be divided by zones and classes. Therefore, in this study, the focus is shifted to the problems of plot classiﬁcation and graded nitrogen estimation in maize plots performed based on various machine learning and deep learning methods.


Introduction
Maize is an important food crop and industrial raw material in China, and maintaining stable maize yields plays a vital role in national food security [1].The nitrogen content is an important factor affecting maize growth, and insufficient nitrogen can significantly impact the number of grains per spike [2].Within a certain range of fertilizer applications, the number of grains per spike increases with the amount of nitrogen fertilizer applied.However, excessive nitrogen fertilization has little effect on the number of grains per spike [3].Excessive fertilizer application can cause various problems, including increased costs, wasted resources, plant lodging, and environmental pollution.Therefore, nitrogen estimation for maize during the growing period is the basis for proper nitrogen fertilizer application, which helps improve maize yield and fertilizer utilization rates and avoid soil, air, and water pollution caused by blind fertilizer application.This means that the technology can be applied to crop growth monitoring and nitrogen fertilizer management.
Hence, it has significant economic, social, and ecological benefits.
The traditional method for estimating the nitrogen content of maize is to indirectly grasp the nitrogen status of crop leaves through the soil and plant analyzer development (SPAD) measurements of the SPAD-502 chlorophyll meter.Many research results have been found on applying chlorophyll meters for nitrogen deficits and nitrogen requirement prediction, crop growth evaluation, and water and fertilizer management measures in rice, maize, sorghum, and spinach crops [4][5][6].Studies have shown that the SPAD values of chlorophyll meter readings for different fertility periods can indirectly reflect the chlorophyll content of crop leaves and the total plant nitrogen content, and can further guide the follow-up application of nitrogen fertilizer.However, for large farmland areas, the method of chlorophyll meter determination consumes a great deal of workforce and material resources and may result in some drawbacks.Therefore, there is an urgent need to develop a high-throughput real-time nitrogen estimation method for agricultural fields.
With the rapid development of information technology and the remarkable improvement of the agricultural information level, the methods for estimating crop nitrogen contents based on remote sensing technology have gradually emerged in recent years.Vigneau et al. used a tractor carrying a HySpex VNIR1600-160 (Norsk Elektro Optikk, Norway) hyperspectral camera to scan and obtain spectral data from 400 to 1000 nm in the wheat canopy to establish a quantitative model to estimate the canopy leaf nitrogen content with a coefficient of determination (R2) of 0.889 [7].Tao et al. used a power exponential relationship model.Additionally, they achieved good predictions of nitrogen content in wheat leaves with a correlation coefficient of 0.67, which reached a highly significant level [8].The development of UAV-based remote sensing systems has taken remote sensing and precision agriculture further.The use of UAV for crop monitoring offers great possibilities to obtain field data in a simple, fast, and cost-effective way compared to previous methods [9].Liu et al. used a UHD185 hyperspectral spectrometer (450-950 nm) carried by UAV to obtain hyperspectral images of wheat at the joining stage, heading stage, flowering stage, and grain filling stage, and used the sensitive bands obtained via a correlation analysis to establish a multiple regression model and B.P. neural network model, which could better estimate canopy leaf nitrogen contents.The model's R2 value reached 0.948 [10].In addition to UAVs, the use of starboard data is gradually appearing on our horizon.Delloye et al. used Sentinel-2 and SPOT satellite data to estimate the canopy nitrogen threshold via the inversion of wheat canopy chlorophyll contents using artificial neural networks and other algorithms, which provided a scientific basis for rapid decision-making based on crop nitrogen fertilization requirements [11].
From the above research results, it can be seen that the data type for nitrogen detection has gradually shifted from mainly relying on hyperspectral to multispectral data.The platforms for spectral acquisition have mainly moved to near-ground, airborne, and satellite-based with the development of remote sensing technology.With the rapid development of artificial intelligence technology in recent years, smart agriculture offers an effective solution to today's agricultural sustainability challenges.For crop selection, crop management, and crop production prediction, artificial intelligence technologies have far-reaching implications [12].The methods for nitrogen detection have also evolved from linear models to more complex mathematical models such as partial least squares regression (PLSR), support vector machines (SVM), back-propagation neural networks, and genetic algorithms (G.A.) for the modeling and estimation of the nitrogen content in recent years.However, the prevailing methods for estimating crop nitrogen contents tend to obtain the specific nitrogen content of a plant precisely.There is a relative gap in the study of plot-based nitrogen grading tasks.Precision agriculture has recently been a hot field of agricultural science research.The core of precision agriculture is the precise management of soil nutrients, and zoning management is the main means to achieve this management.Specifically, zoning management manages areas with similar production potential, similar nutrient utilization, and similar environmental effects as a management unit [13].The fertilizer dosage should be adjusted according to the soil nutrient status and crop nutrient demands of different management units to improve the soil production potential, improve the nutrient utilization rate, reduce environmental pollution, and improve the crop yield and quality [14].Scientific and reasonable management zoning can guide farmers' field water and fertilizer management and provide an economical and effective means for accurately managing farmland nutrients.Therefore, the perspective of maize nitrogen estimation was changed to the problem of plot classification for the purpose that it could meet the needs of zoning management in precision agriculture.
In recent years, deep learning has made significant contributions in the field of agriculture as well.Gulzar presented a fruit image classification model that leveraged deep transfer learning and the MobileNetV2 architecture.The model enabled efficient and accurate fruit classification by utilizing pretrained deep learning models in the agricultural domain [15].Mamat et al. improved the image annotation technique for fruit classification [16].By employing deep learning algorithms, this method achieved more precise identification and annotation of agricultural product images, enhancing the effectiveness of the fruit classification.Aggarwal et al. employed a stacked ensemble approach based on artificial intelligence to predict protein subcellular localization in confocal microscopy images [17].This method offered highly accurate predictions for protein localization, serving as a valuable tool for protein analyses in agriculture research.Dhiman et al. presented a comprehensive review of image acquisition, preprocessing, and classification methods for detecting citrus fruit diseases [18].They summarized image capture, preprocessing, and classification techniques, providing important guidance for disease detection and management in agriculture through an in-depth analysis of the existing literature.In addition, Ünal et al. provided an overview of smart agriculture practices specifically in potato production, which contributed to the development of intelligent solutions for potato production [19].
In this research, machine learning (ML) and deep learning (DL) methods were used to train the training set based on plot-specific UAV images, save the model, and test it on the test set to output the nitrogen ratings of corn plots.The performances of the ML and DL methods were compared on the test set, and the relatively optimal ShuffleNet network model was improved by considering the large farmland-embedded device application scenario.The improved ShuffleNet network model performs well and allows the double guarantee of accuracy and efficiency for the nitrogen grading problem in maize plots.
The main contributions of this paper are as follows: • Proposing methods based on the most recent advances in machine learning and deep learning approaches, as these methods are proven to be remarkably accurate and effective.They can be advantageously utilized in the field of crop management support through innovative precision agriculture approaches;

•
The perspective on maize nitrogen estimation is shifted to the problem of plot classification to better align with the requirements of zoning management in precision agriculture.
The rest of the paper is organized as follows.Section 2 introduces the collection and preparation process for the dataset in the paper, the selection of the experimental methods, and the model improvements.Section 3 presents a comparison and analy-sis of the experimental results.Finally, Section 4 concludes with a summary of the assessment's findings.

Materials and Methods
UAVs represent a low-cost alternative inspection and data analysis technology mainly used for monitoring and spraying in precision agriculture.The maturing artificial intelligence technology also has far-reaching implications for crop management and detection in precision agriculture.In this study, the two techniques were combined to collect maize images from maize test plots and the ML and DL models were adopted to derive the results for maize nitrogen levels in each plot.The specific flow chart is shown in Figure 1.First, the UAV images of maize farmland were acquired by UAV.Afterwards, we began constructing the dataset and proceeded to select the model.The ML dataset was generated via the feature extraction of UAV images, actual measurements of maize SPAD values, and labeling, and the DL dataset was generated using RGB images from UAV images, which corresponded to the labels one by one.The ML and DL methods were selected for classifying the maize nitrogen levels.Finally, a performance evaluation was performed for comparison and improvement between models.The rest of the paper is organized as follows.Section 2 introduces the collection and preparation process for the dataset in the paper, the selection of the experimental methods, and the model improvements.Section 3 presents a comparison and analysis of the experimental results.Finally, Section 4 concludes with a summary of the assessment's findings.

Materials and Methods
UAVs represent a low-cost alternative inspection and data analysis technology mainly used for monitoring and spraying in precision agriculture.The maturing artificial intelligence technology also has far-reaching implications for crop management and detection in precision agriculture.In this study, the two techniques were combined to collect maize images from maize test plots and the ML and DL models were adopted to derive the results for maize nitrogen levels in each plot.The specific flow chart is shown in Figure 1.First, the UAV images of maize farmland were acquired by UAV.Afterwards, we began constructing the dataset and proceeded to select the model.The ML dataset was generated via the feature extraction of UAV images, actual measurements of maize SPAD values, and labeling, and the DL dataset was generated using RGB images from UAV images, which corresponded to the labels one by one.The ML and DL methods were selected for classifying the maize nitrogen levels.Finally, a performance evaluation was performed for comparison and improvement between models.

Data Collection
The data for the study were obtained from the experimental maize field from the Jilin Academy of Agricultural Sciences in Gongzhuling, Jilin Province, China.The experimental maize field is located at 124°82′ E, 43°52′ N, and the altitude of the experimental field is 207 m above sea level.Plots were used as the minimum counting units to meet the needs of land zoning management.The trial field was divided into three

Data Collection
The data for the study were obtained from the experimental maize field from the Jilin Academy of Agricultural Sciences in Gongzhuling, Jilin Province, China.The experimental maize field is located at 124 • 82 E, 43 • 52 N, and the altitude of the experimental field is 207 m above sea level.Plots were used as the minimum counting units to meet the needs of land zoning management.The trial field was divided into three large areas: the common hybrid trial area (7 rows and 96 columns, totaling 180 plots), the high-nitrogen hybrid trial area (3 rows and 60 columns, totaling 45 plots), and the high-nitrogen selfincompatible line trial area (6 rows and 60 columns, totaling 120 plots).Each area of the trial field was set up with one protection row at the top and bottom and two protection columns at the left and right.The experiment started on 8 May 2022, with 100 pounds of compound fertilizer applied per acre during maize growth, and the images were collected after 3 months of growth.
The Genie4RTK UAV, manufactured by DJI Innovation Technology, Shenzhen, China, as depicted in Figure 2a, was employed for intermittent aerial photography at an altitude Agronomy 2023, 13, 1974 5 of 22 of 30 m on 25 August 2022.Throughout the UAV's flight, it remained perpendicular to the ground, ensuring minimal image distortion when capturing images of various plots.The weather conditions on the day of image acquisition were sunny, with light winds.The UAV operation commenced at 12:14, precisely during the peak solar elevation of the day, facilitating a uniform distribution of sunlight across the experimental field.Consequently, during the vertical image acquisition process, no shadows were observed across the different plots.Additionally, the UAV's preset flight path was set at a 31 • angle relative to the experimental field to facilitate the creation of a UAV panoramic view of the cornfield using DJI Terra software.DJI Terra was used for stitching to generate six panoramas of UAV images of the test field (containing RGB images and five remote sensing parameters stored in TIFF format, including the LCI, GNDVI, NDRE, NDVI, and OSAVI).Among them, the RGB image panorama of the test field is shown in Figure 2b.
high-nitrogen hybrid trial area (3 rows and 60 columns, totaling 45 plots), and the highnitrogen self-incompatible line trial area (6 rows and 60 columns, totaling 120 plots).Each area of the trial field was set up with one protection row at the top and bottom and two protection columns at the left and right.The experiment started on 8 May 2022, with 100 pounds of compound fertilizer applied per acre during maize growth, and the images were collected after 3 months of growth.
The Genie4RTK UAV, manufactured by DJI Innovation Technology, Shenzhen, China, as depicted in Figure 2a, was employed for intermittent aerial photography at an altitude of 30 m on 25 August 2022.Throughout the UAV's flight, it remained perpendicular to the ground, ensuring minimal image distortion when capturing images of various plots.The weather conditions on the day of image acquisition were sunny, with light winds.The UAV operation commenced at 12:14, precisely during the peak solar elevation of the day, facilitating a uniform distribution of sunlight across the experimental field.Consequently, during the vertical image acquisition process, no shadows were observed across the different plots.Additionally, the UAV's preset flight path was set at a 31° angle relative to the experimental field to facilitate the creation of a UAV panoramic view of the cornfield using DJI Terra software.DJI Terra was used for stitching to generate six panoramas of UAV images of the test field (containing RGB images and five remote sensing parameters stored in TIFF format, including the LCI, GNDVI, NDRE, NDVI, and OSAVI).Among them, the RGB image panorama of the test field is shown in Figure 2b.To obtain the true label of maize nitrogen, the actual content of maize nitrogen is also needed.The SPAD value is the relative value of the leaf chlorophyll content, and studies have shown that the leaf SPAD values are significantly and positively correlated with the total nitrogen content at different fertility stages, which represents that the SPAD value can be used to reflect the actual nitrogen content and nitrogen nutrition status of the crop [20].A SPAD-502 chlorophyll meter was used to detect the SPAD values of each column of maize, and then we derived the mean value of the detected SPAD values of maize in the plot to derive the average SPAD, which represents the nitrogen status of the plot.
A total of 294 plots were selected for nitrogen measurements, consisting of 135 plots for the common hybrid trial area, 45 plots for the high-nitrogen hybrid trial area, and 114 plots for the high-nitrogen self-incompatible line trial area.Each plot was treated as an To obtain the true label of maize nitrogen, the actual content of maize nitrogen is also needed.The SPAD value is the relative value of the leaf chlorophyll content, and studies have shown that the leaf SPAD values are significantly and positively correlated with the total nitrogen content at different fertility stages, which represents that the SPAD value can be used to reflect the actual nitrogen content and nitrogen nutrition status of the crop [20].A SPAD-502 chlorophyll meter was used to detect the SPAD values of each column of maize, and then we derived the mean value of the detected SPAD values of maize in the plot to derive the average SPAD, which represents the nitrogen status of the plot.
A total of 294 plots were selected for nitrogen measurements, consisting of 135 plots for the common hybrid trial area, 45 plots for the high-nitrogen hybrid trial area, and 114 plots for the high-nitrogen self-incompatible line trial area.Each plot was treated as an experimental subject, resulting in a total of 294 experimental subjects for this experiment.To capture the necessary data, we captured UAV images of each experimental subject by cropping the panorama of the experimental field.Additionally, we measured the leaf SPAD of each maize plant within the plot to obtain the average SPAD values for each experimental subject.

Image Preprocessing
In the real world, the images collected are often incomplete, inconsistent, and highly susceptible to noise (in this paper, our team mainly solve the problem of data inconsistency).Image preprocessing is necessary to facilitate the analysis and improve the accuracy [21].The UAV panorama of the test field was generated after reading the file by DJI Terra.To obtain the UAV images of each cell in the test field, it is necessary to perform image segmentation on the panorama of the test field.Since each category within the UAV image is the same size, the RGB image was used as the base for image segmentation.The same segmentation method was then applied to the other categories to complete all segmentation tasks.The specific RGB image segmentation steps are as follows: (1) Angle correction: Since the experimental maize field is tilted 31 • to the left in the RGB panorama, an artificial angle correction is performed to square the experimental maize field in the visual field by rotating the RGB panorama clockwise by 31 • ; (2) Grayscale: The RGB image is a three-channel image, including R, G, and B channels.
To make the subsequent line segmentation easier, we begin by converting the RGB image to grayscale using Formula (1): experimental subject.

Image Preprocessing
In the real world, the images collected are often incomplete, inconsistent, and highly susceptible to noise (in this paper, our team mainly solve the problem of data inconsistency).Image preprocessing is necessary to facilitate the analysis and improve the accuracy [21].
The UAV panorama of the test field was generated after reading the file by DJI Terra.To obtain the UAV images of each cell in the test field, it is necessary to perform image segmentation on the panorama of the test field.Since each category within the UAV image is the same size, the RGB image was used as the base for image segmentation.The same segmentation method was then applied to the other categories to complete all segmentation tasks.The specific RGB image segmentation steps are as follows: (1) Angle correction: Since the experimental maize field is tilted 31° to the left in the RGB panorama, an artificial angle correction is performed to square the experimental maize field in the visual field by rotating the RGB panorama clockwise by 31°; (2) Grayscale: The RGB image is a three-channel image, including R, G, and B channels.
To make the subsequent line segmentation easier, we begin by converting the RGB image to grayscale using Formula (1): (3) Remove protected rows and columns: Protected rows and columns (mentioned in 2.1) are not our experimental objects.Hence, cropping the RGB panorama manually by removing them is required;   To obtain the UAV images of each cell, the segmentation of the RGB images was used to guide the creation of the remaining category images.Here, 294 × 6 images were obtained, which were stored in TIFF format.

Data Preparation 2.3.1. Machine Learning Dataset Preparation
Since only the average SPAD per corn plot was obtained, the data needed to be converted into categorical information.Therefore, a K-means clustering analysis based on the obtained average SPAD of each plot was performed after removing outliers [22].Our team set K = 3 (divided into three intervals of low, medium, and high nitrogen) to obtain three clustering centers and the category labels of each maize plot, and the result of the clustering analysis is shown in Figure 4.The ranges of nitrogen content (PNC) in our maize plots are 0.8-1.5% for low nitrogen, 1.5-2.5% for medium nitrogen, and over 2.5% for high nitrogen.The numbers for the low, medium, and high nitrogen categories in the dataset are 154, 94, and 46, respectively.Finally, each UAV image is matched to its corresponding label to obtain the required dataset for this type of ML.

Machine Learning Dataset Preparation
Since only the average SPAD per corn plot was obtained, the data needed to be converted into categorical information.Therefore, a K-means clustering analysis based on the obtained average SPAD of each plot was performed after removing outliers [22].Our team set K = 3 (divided into three intervals of low, medium, and high nitrogen) to obtain three clustering centers and the category labels of each maize plot, and the result of the clustering analysis is shown in Figure 4.The ranges of nitrogen content (PNC) in our maize plots are 0.8-1.5% for low nitrogen, 1.5-2.5% for medium nitrogen, and over 2.5% for high nitrogen.The numbers for the low, medium, and high nitrogen categories in the dataset are 154, 94, and 46, respectively.Finally, each UAV image is matched to its corresponding label to obtain the required dataset for this type of ML.

Deep Learning Dataset Preparation
Due to the limited volume of our dataset, which consists of only 294 RGB images, it failed to meet the high demands of neural networks for extensive data.Additionally, there was an imbalance between classes.Data augmentation on the segmented RGB image dataset was performed to address these challenges.Data augmentation techniques can reduce the risk of overfitting effectively and improve the accuracy and robustness of DL models [23].In this experiment, enhancement techniques such as rotation, mirroring, Gaussian noise, luminance, and Gaussian blur were selected to expand the DL dataset from 294 to 1933 images, of which 640, 721, and 552 were low-, medium-, and highnitrogen images, respectively.Each RGB image was matched with the corresponding label obtained in Section 2.3.1, resulting in the dataset needed for our DL model.

Deep Learning Dataset Preparation
Due to the limited volume of our dataset, which consists of only 294 RGB images, it failed to meet the high demands of neural networks for extensive data.Additionally, there was an imbalance between classes.Data augmentation on the segmented RGB image dataset was performed to address these challenges.Data augmentation techniques can reduce the risk of overfitting effectively and improve the accuracy and robustness of DL models [23].In this experiment, enhancement techniques such as rotation, mirroring, Gaussian noise, luminance, and Gaussian blur were selected to expand the DL dataset from 294 to 1933 images, of which 640, 721, and 552 were low-, medium-, and high-nitrogen images, respectively.Each RGB image was matched with the corresponding label obtained in Section 2.3.1, resulting in the dataset needed for our DL model.

Feature Extraction
For the ML method, feature extraction is an important process that involves two aspects.First, there is the statistical analysis and transformation of images to extract the required features.Second, there is the transformation and operation of group measurements of a pattern to highlight its representative features [24].For this paper, feature extraction involves extracting, combining, and transforming the channel parameters to generate a new feature subset based on the ML dataset.Figure 5 shows the flow of the feature extraction and Table 1 presents the features selected in this paper.
Traditional nitrogen detection methodologies often relied on single-feature analyses, which lacked the capacity to capture the multidimensional intricacies inherent in the datasets.To overcome this limitation, our research focuses on an integrated approach, where RGB, spectral indices, and texture features are synergistically employed to unearth novel insights into nitrogen detection.
RGB data derived from images captured by remote sensing devices present distinct advantages for nitrogen detection [25].RGB data inherently encapsulate information about leaf coloration, which is influenced by the chlorophyll content.By incorporating RGB features, our model effectively captures the spatial distribution and structural variations within the vegetation, enabling the better differentiation of nitrogen-rich and nitrogendeficient regions.

Feature Extraction
For the ML method, feature extraction is an important process that involves tw aspects.First, there is the statistical analysis and transformation of images to extract th required features.Second, there is the transformation and operation of grou measurements of a pattern to highlight its representative features [24].For this pape feature extraction involves extracting, combining, and transforming the chann parameters to generate a new feature subset based on the ML dataset.Figure 5 shows th flow of the feature extraction and Table 1 presents the features selected in this paper.Traditional nitrogen detection methodologies often relied on single-feature analyse which lacked the capacity to capture the multidimensional intricacies inherent in th datasets.To overcome this limitation, our research focuses on an integrated approac where RGB, spectral indices, and texture features are synergistically employed to unear novel insights into nitrogen detection.
RGB data derived from images captured by remote sensing devices present distin advantages for nitrogen detection [25].RGB data inherently encapsulate informatio about leaf coloration, which is influenced by the chlorophyll content.By incorporatin RGB features, our model effectively captures the spatial distribution and structur variations within the vegetation, enabling the better differentiation of nitrogen-rich an nitrogen-deficient regions.
Spectral indices are widely recognized for their sensitivity to specific biochemical an biophysical properties of vegetation [26].Indices such as the normalized differen vegetation index (NDVI) serve as proxies for vegetation health and photosynthet activity, both closely associated with nitrogen availability.Leveraging spectral indices our feature selection process affords the ability to gauge plant vitality and consequent infer the nitrogen content.
Texture features, rooted in the spatial arrangement and distribution of pix intensities, offer valuable supplementary information for nitrogen detection [27].Textu descriptors, including but not limited to Haralick features, enable the characterization fine-grained patterns and structures within vegetation, thereby capturing subt variations indicative of nitrogen levels.Integrating texture features augments th discriminative capacity of our model, furnishing it with the ability to discern intrica variations that might elude single-feature analyses.Spectral indices are widely recognized for their sensitivity to specific biochemical and biophysical properties of vegetation [26].Indices such as the normalized difference vegetation index (NDVI) serve as proxies for vegetation health and photosynthetic activity, both closely associated with nitrogen availability.Leveraging spectral indices in our feature selection process affords the ability to gauge plant vitality and consequently infer the nitrogen content.
Texture features, rooted in the spatial arrangement and distribution of pixel intensities, offer valuable supplementary information for nitrogen detection [27].Texture descriptors, including but not limited to Haralick features, enable the characterization of fine-grained patterns and structures within vegetation, thereby capturing subtle variations indicative of nitrogen levels.Integrating texture features augments the discriminative capacity of our model, furnishing it with the ability to discern intricate variations that might elude single-feature analyses.

Data Preprocessing
Before feeding the features into the classifier, the data must be preprocessed to account for variations in feature magnitudes.Specifically, the data need to be converted from various specifications or distributions into a standardized format called dimensionless data.The goal is to accelerate the solution, enhance the model's precision, and prevent a specific feature with an unusually wide range of values from disproportionately affecting distance calculations.Data normalization was used for data preprocessing.The equation is shown in (2).
where µ is the mean of the current feature and σ is the variance of the current feature.

Support Vector Machine
A support vector machine (SVM) is a powerful machine learning algorithm for solving classification problems [29].It separates different classes of samples by finding a dividing hyperplane in the sample space, while maximizing the minimum distance from two points sets to this hyperplane.In two-point sets, the edge points that are nearest to the hyperplane are called support vectors.SVMs can be categorized into linear SVMs and nonlinear SVMs.Linear SVM is suitable for dealing with linear problems but not nonlinear problems.For nonlinear problems, kernel functions can transform them from a low-dimensional space to a high-dimensional space, which can be treated as linear problems.The commonly used kernel functions are the linear kernel function, polynomial kernel function, Gaussian kernel function, Laplace kernel function, and Sigmoid kernel function.The exceptional performance of the support vector machine in handling small sample, nonlinear, and high-dimensional datasets is the reason for selecting it as a classifier.

Number
Feature Description Formula

K-Nearest Neighbor
The K-nearest neighbor (KNN) algorithm is based on the minority following the majority.It has a remarkable prediction effect, is resilient to outliers, and finds extensive use in diverse application areas.It has been applied to maize pest detection with good results [30].The KNN algorithm operates by comparing the features of test data with those in a known training set.Specifically, when the labels for the training set are known, the algorithm finds the K most similar data points in the training set.It assigns the category corresponding to the test data as the most frequent classification among these K points.The learnable hyperparameter is the number of neighbors K.The KNN algorithm works well with small and large amounts of low-dimensional data, consistent with the mini-batch dataset.The KNN algorithm was selected as a classifier.

Decision Tree
A decision tree (DT) classifies data by a set of rules [31].It provides a rule-like approach to which values will be obtained under which conditions.There are two types of DTs: the classification tree and regression tree.The classification tree generates DTs for discrete variables, and the regression tree generates DT for continuous variables.The generation process of DT consists of three main steps: feature selection, DT generation, and pruning.DTs are suitable for both numerical and nominal purposes, where the outcome of a variable takes values from a finite set of targets.They can allow data collection and extract the rules embedded in certain columns.A DT was chosen as a classifier because the discrete features extracted in this study are applicable in its context.

Random Forest
A random forest (RF) algorithm is an integrated algorithm [32].First, it randomly selects different features and training samples to generate many decision trees.Then, it integrates the results of these decision trees to perform the final classification.The RF approach is widely used in real analyses.Compared to decision trees, the RF approach shows a significant improvement in accuracy and enhances the robustness of the model, thereby reducing its susceptibility to attacks.An RF algorithm was selected as one of the classifiers because of its superior accuracy compared to other algorithms.

Logistic Regression
A logistic regression (LR) algorithm is an algorithm that applies linear regression to classification problems.It involves adding individual attribute features by weighted summation to obtain the combined information, converting the linear model fit values to label probabilities using a sigmoid function and obtaining the optimal coefficients by minimizing the cross-entropy cost function.The LR model is simple, and the output values are probabilistically meaningful for the linear correlation features constructed in this study.This is precisely why LR was selected to be used as the classifier.

Deep Learning Methods
DL is a method for learning the intrinsic patterns of sample data based on artificial neural networks, which learn the abstract representations of data hierarchically, allowing for the automatic processing of complex tasks [33].Unlike the traditional ML method, the main advantage of DL is that it can automatically learn features from data without the need for the human design of the features.In addition, the DL models are highly flexible and scalable to cope with various complex tasks and data types.We experimented using DL to rank the nitrogen levels and tested its feasibility with the classical network, AlexNet.Since the target application for large farmlands involves mobile or embedded devices that require high efficiency and accuracy, three lightweight networks including RegNet, ShuffleNet, and EfficientNet, were employed to achieve high accuracy and fast detection.Finally, model improvement was performed on the well-performing ShuffleNet.The implementation details will be provided in the following sections.

AlexNet
AlexNet is a classical convolutional neural network model in the field of DL, which won the ImageNet image recognition competition in 2012.AlexNet mainly consists of five convolutional layers and three fully connected layers, in which the ReLU function is used for the activation function and maximum pooling is used for the pooling operation.Using GPU-accelerated training, AlexNet achieved significant performance improvements, and its Top-5 error rate was reduced to 15.3% [34].The AlexNet network structure is shown in Figure 6.

RegNet
RegNet is an image classification model based on a neural network.The network structure is designed through a novel approach-the new network design paradigm, which combines the benefits of a manual network design and neural network search.The figure depicting its network structure is shown in Figure 7. RegNet's design takes inspiration from the trade-off between the network depth and width.A traditional network design approaches usually improve the performance of a model by increasing the network depth or width.However, this results in higher computational complexity for the model.RegNet accomplishes the objective of enhancing the model performance without adding to the computational load by dynamically adjusting the network depth and width, as demonstrated in [35].
ShuffleNet, and EfficientNet, were employed to achieve high accuracy and fast detection.Finally, model improvement was performed on the well-performing ShuffleNet.The implementation details will be provided in the following sections.

AlexNet
AlexNet is a classical convolutional neural network model in the field of DL, which won the ImageNet image recognition competition in 2012.AlexNet mainly consists of five convolutional layers and three fully connected layers, in which the ReLU function is used for the activation function and maximum pooling is used for the pooling operation.Using GPU-accelerated training, AlexNet achieved significant performance improvements, and its Top-5 error rate was reduced to 15.3% [34].The AlexNet network structure is shown in Figure 6.

RegNet
RegNet is an image classification model based on a neural network.The network structure is designed through a novel approach-the new network design paradigm, which combines the benefits of a manual network design and neural network search.The figure depicting its network structure is shown in Figure 7. RegNet's design takes inspiration from the trade-off between the network depth and width.A traditional network design approaches usually improve the performance of a model by increasing the network depth or width.However, this results in higher computational complexity for the model.RegNet accomplishes the objective of enhancing the model performance without adding to the computational load by dynamically adjusting the network depth and width, as demonstrated in [35].

EfficientNet
As shown in Figure 8, EfficientNet is a convolutional neural network structure that aims to balance the computational efficiency and accuracy.It uses compound scaling, which means the model is efficient by reducing the network's depth, width, and resolution.The authors further improved the model's performance during training by searching for the best combination of hyperparameters using the auto-ML method.In addition, EfficientNet uses a new group convolution method, MB convolution, which combines depth-separable convolution and residual connectivity and can improve the expressiveness and generalization performance of the model.Due to its efficiency and portability, EfficientNet is widely used in mobile devices and embedded systems [36].

EfficientNet
As shown in Figure 8, EfficientNet is a convolutional neural network structure that aims to balance the computational efficiency and accuracy.It uses compound scaling, which means the model is efficient by reducing the network's depth, width, and resolution.The authors further improved the model's performance during training by searching for the best combination of hyperparameters using the auto-ML method.In addition, EfficientNet uses a new group convolution method, MB convolution, which combines depth-separable convolution and residual connectivity and can improve the expressiveness and generalization performance of the model.Due to its efficiency and portability, EfficientNet is widely used in mobile devices and embedded systems [36].

ShuffleNet
ShuffleNet is an efficient convolutional neural network structure mainly used for image classification and target detection tasks [37].ShuffleNet is characterized by a reduced number of parameters and computational complexity while maintaining its accuracy.This feature is achieved by employing two key techniques: group convolution and channel shuffle.Figure 9 illustrates the group convolution process used to reduce the computational cost of convolving different feature maps of the input layer.Specifically, the feature maps are partitioned into groups, each involving a separate set of kernels.This grouping strategy enables parallel processing and parameter sharing, significantly reducing the computation and memory requirements.

EfficientNet
As shown in Figure 8, EfficientNet is a convolutional neural network structure that aims to balance the computational efficiency and accuracy.It uses compound scaling, which means the model is efficient by reducing the network's depth, width, and resolution.The authors further improved the model's performance during training by searching for the best combination of hyperparameters using the auto-ML method.In addition, EfficientNet uses a new group convolution method, MB convolution, which combines depth-separable convolution and residual connectivity and can improve the expressiveness and generalization performance of the model.Due to its efficiency and portability, EfficientNet is widely used in mobile devices and embedded systems [36].ShuffleNet is an efficient convolutional neural network structure mainly used for image classification and target detection tasks [37].ShuffleNet is characterized by a reduced number of parameters and computational complexity while maintaining its accuracy.This feature is achieved by employing two key techniques: group convolution and channel shuffle.Figure 9 illustrates the group convolution process used to reduce the computational cost of convolving different feature maps of the input layer.Specifically, the feature maps are partitioned into groups, each involving a separate set of kernels.This grouping strategy enables parallel processing and parameter sharing, significantly reducing the computation and memory requirements.As shown in Figure 10, group convolution degrades the performance of the model when all of the input feature map information needs to be considered.For this problem, a channel shuffle module is added between two group convolutions to disrupt the channel order, thereby enabling the exchange of information between different groups.As shown in Figure 10, group convolution degrades the performance of the model when all of the input feature map information needs to be considered.For this problem, a channel shuffle module is added between two group convolutions to disrupt the channel order, thereby enabling the exchange of information between different groups.
For ShuffleNet-V2, the authors presented four guidance summaries: balancing the channel size of the input and output using 1 × 1 convolution, paying careful attention to the number of groups by utilizing group convolution, avoiding network fragmentation, and reducing element-level operations [38].The authors analyzed the shortcomings of the ShuffleNet-V1 design and incorporated improvements to create ShuffleNet-V2 as outlined in the guidance summary.The overall structure of ShuffleNet-V2 is shown in Figure 11a and the module (stage) structure diagram of ShuffleNet-V2 is shown in Figure 11b,c.The main improvement introduced in the V2 version is the channel split technique, which involves dividing the input feature map into two parts in the channel dimension.The left branch of the split is mapped equally, while the right branch contains three successive convolutions with identical input and output channels, conforming to guidance 1.Furthermore, in the V2 version, the previously used group convolution for 1 × 1 convolution is replaced with ordinary convolution, aligning with guidance 2. For ShuffleNet-V2, the authors presented four guidance summaries: balancing the channel size of the input and output using 1 × 1 convolution, paying careful attention to the number of groups by utilizing group convolution, avoiding network fragmentation, and reducing element-level operations [38].The authors analyzed the shortcomings of the ShuffleNet-V1 design and incorporated improvements to create ShuffleNet-V2 as outlined in the guidance summary.The overall structure of ShuffleNet-V2 is shown in Figure 11a and the module (stage) structure diagram of ShuffleNet-V2 is shown in Figure 11b,c.The main improvement introduced in the V2 version is the channel split technique, which involves dividing the input feature map into two parts in the channel dimension.The left branch of the split is mapped equally, while the right branch contains three successive convolutions with identical input and output channels, conforming to guidance 1.Furthermore, in the V2 version, the previously used group convolution for 1 × 1 convolution is replaced with ordinary convolution, aligning with guidance 2.

ShuffleNet-Improvement
The model accuracy and model efficiency are a pair of contradictory indicators.ShuffleNet has achieved a very high level of performance in terms of model efficiency.Regardless, the model accuracy is a little behind, and our improvement goal is to improve the model accuracy by sacrificing some of the model efficiency to meet the engineering requirements of large-scale farmland nitrogen detection.The specific improvements are reflected in the following two aspects: (1) The computational distribution of ShuffleNet-V2 is shown in Table 2. Upon examining the table, our team see that the DW convolution contributes a relatively small amount of the computation costs, while the majority of the computation is concentrated on the 1 × 1 convolution.Therefore, all 3 × 3 DW convolutions were replaced with 5 × 5 DW convolutions.This will not increase the computational weight too much but will also improve the model accuracy.

ShuffleNet-Improvement
The model accuracy and model efficiency are a pair of contradictory indicators.Shuf-fleNet has achieved a very high level of performance in terms of model efficiency.Regardless, the model accuracy is a little behind, and our improvement goal is to improve the model accuracy by sacrificing some of the model efficiency to meet the engineering requirements of large-scale farmland nitrogen detection.The specific improvements are reflected in the following two aspects: (1) The computational distribution of ShuffleNet-V2 is shown in Table 2. Upon examining the table, our team see that the DW convolution contributes a relatively small amount of the computation costs, while the majority of the computation is concentrated on the 1 × 1 convolution.Therefore, all 3 × 3 DW convolutions were replaced with 5 × 5 DW convolutions.This will not increase the computational weight too much but will also improve the model accuracy.(2) In deep convolutional neural networks, selecting the feature channels is crucial to improve the performance.However, traditional neural networks often do not consider the correlation between channels when processing the weights of feature channels, thereby failing to utilize the information between feature channels fully.To address this problem, Hu et al. presented a channel attention mechanism based on the squeezeand-excitation (SE) module, called SENet, in 2018 [39].The channel attention mechanism of SENet adaptively learns the importance of each channel, thereby strengthening the important feature channels in the network and suppressing the insignificant ones.As shown in Figure 12, in SENet, the global average pooling layer is first performed using the squeeze operation to obtain the global average of each channel.Afterward, the excitation operation is performed using two fully connected layers that learn the weights of each channel by mapping the c-weight coefficients.These weights are finally multiplied by each channel in the input feature map to obtain the feature map adjusted by the channel attention mechanism.The channel attention mechanism of SENet adaptively learns the importance of each channel, thereby strengthening the important feature channels in the network and suppressing the insignificant ones.As shown in Figure 12, in SENet, the global average pooling layer is first performed using the squeeze operation to obtain the global average of each channel.Afterward, the excitation operation is performed using two fully connected layers that learn the weights of each channel by mapping the c-weight coefficients.These weights are finally multiplied by each channel in the input feature map to obtain the feature map adjusted by the channel attention mechanism.
The channel attention mechanism was borrowed from SENet and applied to ShuffleNet, and the size of the DW convolution was boosted.The overall structure of the improved network is shown in Figure 13.

Model Evaluation Metrics
In classification problems, evaluation metrics can help us assess the performance and effectiveness of a model.In classification problems, four metrics, namely the accuracy (ACC), precision (P), recall (R), and F1 score (F1), are commonly used to assess the accuracy of the model [40].Their formulas are described as shown in ( 3)-( 6): The channel attention mechanism was borrowed from SENet and applied to ShuffleNet, and the size of the DW convolution was boosted.The overall structure of the improved network is shown in Figure 13.The channel attention mechanism of SENet adaptively learns the importance of each channel, thereby strengthening the important feature channels in the network and suppressing the insignificant ones.As shown in Figure 12, in SENet, the global average pooling layer is first performed using the squeeze operation to obtain the global average of each channel.Afterward, the excitation operation is performed using two fully connected layers that learn the weights of each channel by mapping the c-weight coefficients.These weights are finally multiplied by each channel in the input feature map to obtain the feature map adjusted by the channel attention mechanism.
The channel attention mechanism was borrowed from SENet and applied to ShuffleNet, and the size of the DW convolution was boosted.The overall structure of the improved network is shown in Figure 13.

Model Evaluation Metrics
In classification problems, evaluation metrics can help us assess the performance and effectiveness of a model.In classification problems, four metrics, namely the accuracy (ACC), precision (P), recall (R), and F1 score (F1), are commonly used to assess the

Model Evaluation Metrics
In classification problems, evaluation metrics can help us assess the performance and effectiveness of a model.In classification problems, four metrics, namely the accuracy (ACC), precision (P), recall (R), and F1 score (F1), are commonly used to assess the accuracy of the model [40].Their formulas are described as shown in ( 3)-( 6): The true positive (T.P.) is the number of successfully predicted positive samples.The true negative (T.N.) is the number of examples successfully predicted as negative.False positives (F.P.s) represent the number of negative samples incorrectly predicted to be positive.The false negative (F.N.) is the number of positive samples incorrectly predicted as negative.
However, this study deals with a multi-classification problem that requires consideration of the predictive performance of the three categories of low, medium, and high nitrogen for the precision (P), recall (R), and F1 score.While recalibration is not necessary for accuracy (ACC), it is crucial to evaluate the overall functionality of this identification system.Considering the impact of the incomplete balance of the data volume on the results, the weighted-average method was used to describe its performance in terms of the Weighted-P, Weighted-R, and Weighted-F1, whose formulas are described in ( 7)- (10) as follows [41,42]: where W represents the category's weight and the actual number of samples in the category.Since the precision and recall are equally important to us, the Weighted-F1 is selected as the more balanced and comprehensive metric to measure the accuracy of our model.In addition to the above metrics for evaluating the accuracy, the frames per second (FPS) and model size are also used to evaluate the model's efficiency.Using Python's time function, we recorded the time just before reading the image and after outputting the model.To determine the program's running time, here we calculate the difference between these two moments, repeat the process 100 times to obtain an average, and finally calculate the frames per second (FPS) by taking the inverse of the running time.

Machine Learning Results
Since testing the model on a dataset divided in a specific way may not generalize well to new data, a ten-fold cross-validation method is employed.This involves randomly dividing the dataset into ten equal parts, using nine for training and one for testing, repeating this process ten times, and averaging the results.The obtained results are shown below.
Figure 14 shows the performances of different ML classifiers for this dataset.Upon observing the histogram, it becomes apparent that the SVM performs well in terms of accuracy, with values up to 79%.However, LR surpasses the SVM in terms of the accuracy, recall, and F1 score, with a negligible difference in accuracy.The P, R, and F1 score were considered more representative indicators than ACC since they account for the number of instances in each category.Based on these metrics, it is concluded that the LR classifier outperforms the other models for this dataset.ronomy 2023, 13, x FOR PEER REVIEW 18 different.In real production applications, this small difference can be ignored.observing the model size comparison, it can be found that the SVM has the largest m size due to its kernel function and relatively complex computation costs, while the re the models are at the level of 4-5 kb in size.In addition to the classifier's accuracy, the classifier, efficiency of the classifier, and model size are also important factors that affect the model's performance.Figure 15 compares the FPS and model size results of five ML classifiers.The FPS comparison shows that the five classifiers have similar performance levels.After analyzing the data, it can be found that the small difference in FPS (ranging from 39.1 images per second to 40.0 images per second) is because reading the image features accounts for the majority (about 90%) of the running time weight, while the time spent on feature operations by the classifiers is relatively small.Therefore, the FPS levels between different classifiers are not significantly different.In real production applications, this small difference can be ignored.By observing the model size comparison, it can be found that the SVM has the largest model size due to its kernel function and relatively complex computation costs, while the rest of the models are at the level of 4-5 kb in size.

Deep Learning Model Accuracy Comparison
Here, 80% of the dataset was used as the training set and the remaining 20% as the test set.We performed training for 100 epochs, and the results of the test set are shown in Figure 16.The figure shows that the enhanced ShuffleNet network performs well in all four evaluation metrics, namely the accuracy, precision, recall, and F1 score, while achieving the lowest test loss value.This indicates that introducing the attention mechanism and increasing the convolutional kernel size of the DW convolution effectively improve the model accuracy.

Deep Learning Model Accuracy Comparison
Here, 80% of the dataset was used as the training set and the remaining 20% as test set.We performed training for 100 epochs, and the results of the test set are show Figure 16.The figure shows that the enhanced ShuffleNet network performs well in four evaluation metrics, namely the accuracy, precision, recall, and F1 score, w achieving the lowest test loss value.This indicates that introducing the atten mechanism and increasing the convolutional kernel size of the DW convolution effectiv improve the model accuracy.

Recommendation for Machine Learning Classifiers and Deep Learning Models
Combining the performances of the five classifiers in terms of accuracy and efficiency, LR was selected as the best ML classifier for the maize nitrogen ranking problem due to its higher accuracy and smaller model size.
Considering the model accuracy, efficiency, and size, ShuffleNet-improvement has the highest model accuracy, which is sufficient for the nitrogen classification of large farmland areas.Regarding the model efficiency, the 91 images/s detection speed is not as fast as AlexNet and ShuffleNet but it is sufficient to meet the engineering needs.In terms of the model size, ShuffleNet-improvement also inherits ShuffleNet's small model size, and the model size of 5.87 MB is ideal in embedded devices.It can be said that ShuffleNetimprovement optimizes the performance of ShuffleNet while maintaining its lightweight design, resulting in an optimal model for practical engineering applications.Therefore, ShuffleNet-improvement is recommended as the best model for DL.

Recommendation for Machine Learning Classifiers and Deep Learning Models
Combining the performances of the five classifiers in terms of accuracy and efficiency, LR was selected as the best ML classifier for the maize nitrogen ranking problem due to its higher accuracy and smaller model size.
Considering the model accuracy, efficiency, and size, ShuffleNet-improvement has the highest model accuracy, which is sufficient for the nitrogen classification of large farmland areas.Regarding the model efficiency, the 91 images/s detection speed is not as fast as AlexNet and ShuffleNet but it is sufficient to meet the engineering needs.In terms of the model size, ShuffleNet-improvement also inherits ShuffleNet's small model size, and the model size of 5.87 MB is ideal in embedded devices.It can be said that ShuffleNetimprovement optimizes the performance of ShuffleNet while maintaining its lightweight design, resulting in an optimal model for practical engineering applications.Therefore, ShuffleNet-improvement is recommended as the best model for DL.

Comparison of Machine Learning Models and Deep Learning Approaches
This section is based on the results of LR, the optimal ML model, and ShuffleNetimprovement, the optimal DL model.ShuffleNet-improvement achieves the highest level of accuracy in the rank division.For ShuffleNet-improvement, the accuracy, precision, recall rate, and F1 score are 96.8%,97.0%, 97.1%, and 97%, respectively.For LR, the accuracy, precision, recall rate, and F1 score are 77.6%,79.4%, 77.6%, and 72.6%, respectively.Compared to LR, the accuracy, precision, recall rate, and F1 score are increased by 19.2%, 17.6%, 19.5%, and 24.4%, respectively.Regarding the model efficiency, ShuffleNet-improvement achieves 91 FPS while LR only manages 39 FPS.This puts ShuffleNet-improvement ahead in terms of performance.The model sizes for ShuffleNet-improvement and LR were 5.87 MB and 0.19 MB, respectively.While the models trained by ShuffleNet-improvement are significantly larger than those of LR at 5.87 MB, they still have a small footprint for embedded systems in field applications.This is because the software sizes in such applications typically measure several gigabytes.ShuffleNet-improvement is the optimal model for maize nitrogen level classification based on UAV images.
Romualdo et al. studied maize nitrogen nutrition diagnoses using artificial vision techniques, and the global classification accuracy was 94% for the optimal classifier model test for leaves grown in four different N-concentration culture environments [43].In comparison, our optimal classifier not only has a higher classification accuracy rate (96.8%) but also does not have the limitation of the detection time for the leaf-specific growth period.In addition, our evaluation system is more robust for the multiclassification problem of nitrogen classification.By calculating the precision, recall, and F1 score, the problem of unobjective evaluation metrics caused by unbalanced sample sizes among classes is avoided.

Research Limitations and Future Research Directions
This study aimed to detect and grade the nitrogen levels in large farmland areas to guide crop production.However, there are certain objective conditions that impose constraints on the feasibility of our experimental protocol.Two research limitations can be identified as follows: (1) The distance between the UAV and the crop may have an impact on the accuracy of the model.Therefore, it is recommended to maintain a consistent height of 30m, which was used during the data collection, in the actual application process; (2) Since the light intensity can impact the shooting effect of the UAV and the light intensity in the actual application scene is highly likely to differ from that in our experimental setup, it becomes a crucial factor that limits the accuracy of our model.As a result, it is recommended to validate the model using field datasets collected under varying light conditions in the actual application scenario.
In future research, there are two extension directions as follows: (1) Higher data collection heights result in higher efficiency but inevitably lower accuracy.Therefore, it would be beneficial to explore the impact of height on maize nitrogen grading.A balance between collection efficiency and accuracy can be found to give the optimal data collection height; (2) This research is based on static images, and in the future the purpose is to embed our algorithms into UAVs for the real-time detection of nitrogen levels in large agricultural fields.This will be achieved through target detection, thereby contributing to the development of unmanned farms.

Conclusions
In this paper, UAV images of corn fields were first collected and preprocessed.The ML and DL methods were tested based on the UAV image dataset to demonstrate the applicability of computer vision technology for maize nitrogen grading classification.Among the five ML classifiers (SVM, KNN, DT, LR, RF), LR was selected as the best ML classifier for the maize nitrogen classification problem due to its higher accuracy and smaller model size.Among the DL algorithms, several models were tested, and it was found that ShuffleNet outperformed AlexNet, EfficientNet, and RegNet.Therefore, it was improved based on ShuffleNet by introducing the SENet attention function and improving the DW convolution kernel size to generate a new ShuffleNet-improvement model.Finally, our preferred choice for top ML and DL algorithms was the ShuffleNet-improvement, which stood out as our preferred choice due to its exceptional accuracy (96.8%), precision (97%), recall (97.1%),F1 score (97%), high FPS (91 images/s), and relatively small model size (5.87 MB).Future research studies could explore the effect of height on maize nitrogen grading and develop an embedded system for real-time practical applications in conjunction with target detection technology.This study contributes to the further development of precision agriculture and provides a strong support for the management of nitrogen fertilization during maize growth.

Figure 1 .
Figure 1.Flowchart of maize nitrogen level division using UAV images and different ML and DL algorithms.

Figure 1 .
Figure 1.Flowchart of maize nitrogen level division using UAV images and different ML and DL algorithms.

Figure 2 .
Figure 2. (a) Physical picture of the Sprite4RTK UAV, (b) Panoramic view of an RGB image of the experimental field.

Figure 2 .
Figure 2. (a) Physical picture of the Sprite4RTK UAV, (b) Panoramic view of an RGB image of the experimental field.

( 3 )
Remove protected rows and columns: Protected rows and columns (mentioned in Section 2.1) are not our experimental objects.Hence, cropping the RGB panorama manually by removing them is required; (4) Row and column segmentation to obtain RGB images of individual plots: Since the number of planting rows and columns is known, and most of them are uniformly planted, the lengths of individual plots can be calculated directly according to the relationship between the number of rows and columns and the lengths of images.However, due to overlapping and coverage of the plants in the segmentation process, artificial fine-tuning is implemented.When extending a row of plants to other rows, we slightly increase the width of the corresponding image area while decreasing the width of the adjacent area.Similarly, when extending a column of plants to other columns, we slightly increase the length of the corresponding image area while decreasing the length of the adjacent area.The final RGB images of each block are obtained, with each individual block image having a size of 145 × 355 pixels.Due to occlusion in some images, the resolution of each block's RGB image may slightly fluctuate up or down.These images are stored in TIFF format, and an example figure is shown in Figure 3.

( 4 )
Row and column segmentation to obtain RGB images of individual plots: Since the number of planting rows and columns is known, and most of them are uniformly planted, the lengths of individual plots can be calculated directly according to the relationship between the number of rows and columns and the lengths of images.However, due to overlapping and coverage of the plants in the segmentation process, artificial fine-tuning is implemented.When extending a row of plants to other rows, we slightly increase the width of the corresponding image area while decreasing the width of the adjacent area.Similarly, when extending a column of plants to other columns, we slightly increase the length of the corresponding image area while decreasing the length of the adjacent area.The final RGB images of each block are obtained, with each individual block image having a size of 145 × 355 pixels.Due to occlusion in some images, the resolution of each block's RGB image may slightly fluctuate up or down.These images are stored in TIFF format, and an example figure is shown in Figure3.

Figure 3 .
Figure 3.An example image of a single maize plot RGB image.Figure 3.An example image of a single maize plot RGB image.

Figure 3 .
Figure 3.An example image of a single maize plot RGB image.Figure 3.An example image of a single maize plot RGB image.

Figure 4 .
Figure 4. Result of the K-means analysis of maize SPAD values, where the dashed lines are the classification boundaries of the three categories calculated from the three clustering centers.

Figure 4 .
Figure 4. Result of the K-means analysis of maize SPAD values, where the dashed lines are the classification boundaries of the three categories calculated from the three clustering centers.

Figure 5 .
Figure 5. Diagram of the feature extraction process.

Figure 5 .
Figure 5. Diagram of the feature extraction process.

Agronomy 2023 ,
13, x FOR PEER REVIEW 16 of 23 address this problem, Hu et al. presented a channel attention mechanism based on the squeeze-and-excitation (SE) module, called SENet, in 2018 [39].

Figure 16 .Figure 16 .
Figure 16.Accuracy performances of different DL algorithms: (a) loss values of differen algorithms; (b) accuracy values of different DL algorithms; (c) precision values of differen algorithms; (d) recall values of different DL algorithms; (e) F1 scores of different DL algorithm 3.2.2.Deep Learning Running Efficiency and Model Size Comparison According to Figure 17a, when measured by FPS, Alexnet exhibits the hig detection efficiency, capable of detecting 491 images per second.AlexNet was the

Figure 17 .
Figure 17.(a) Computational efficiency comparison of different DL algorithms.(b) Model size comparison of different DL algorithms.

Figure 17 .
Figure 17.(a) Computational efficiency comparison of different DL algorithms.(b) Model size comparison of different DL algorithms.As shown in Figure 17b, comparing the five model sizes, ShuffleNet uses its lightweight advantage, with a model size of only 4.95 MB.Despite the improvement, the model size is only 0.92 MB larger than other network model sizes.ShuffleNet-improvement is still way ahead in model size.

Table 1 .
Description and formulation of color and texture features of UAV images.
Agronomy 2023, 13, x FOR PEER REVIEW 16 of 23 address this problem, Hu et al. presented a channel attention mechanism based on the squeeze-and-excitation (SE) module, called SENet, in 2018 [39].