Lightweight Semantic Segmentation Network for Real-Time Weed Mapping Using Unmanned Aerial Vehicles

: The timely and e ﬃ cient generation of weed maps is essential for weed control tasks and precise spraying applications. Based on the general concept of site-speciﬁc weed management (SSWM), many researchers have used unmanned aerial vehicle (UAV) remote sensing technology to monitor weed distributions, which can provide decision support information for precision spraying. However, image processing is mainly conducted o ﬄ ine, as the time gap between image collection and spraying signiﬁcantly limits the applications of SSWM. In this study, we conducted real-time image processing onboard a UAV to reduce the time gap between image collection and herbicide treatment. First, we established a hardware environment for real-time image processing that integrates map visualization, ﬂight control, image collection, and real-time image processing onboard a UAV based on secondary development. Second, we exploited the proposed model design to develop a lightweight network architecture for weed mapping tasks. The proposed network architecture was evaluated and compared with mainstream semantic segmentation models. Results demonstrate that the proposed network outperform contemporary networks in terms of e ﬃ ciency with competitive accuracy. We also conducted optimization during the inference process. Precision calibration was applied to both the desktop and embedded devices and the precision was reduced from FP32 to FP16. Experimental results demonstrate that this precision calibration further improves inference speed while maintaining reasonable accuracy. Our modiﬁed network architecture achieved an accuracy of 80.9% on the testing samples and its inference speed was 4.5 fps on a Jetson TX2 module (Nvidia Corporation, Santa Clara, CA, USA), which demonstrates its potential for practical agricultural monitoring and precise spraying applications.


Introduction
In farmland, timely weed control is crucial for ensuring optimal crop production. To minimize the negative effects of herbicides, site-specific weed management (SSWM) protocols [1] are essential. For SSWM, farmers obtain orthophoto maps of their fields, formulate an application plan, and generate Appl. Sci. 2020, 10, 7132; doi:10.3390/app10207132 www.mdpi.com/journal/applsci Appl. Sci. 2020, 10, 7132 2 of 11 prescription maps. Methods for obtaining orthophoto maps include satellite imaging and UAV imaging [2]. The resulting remote sensing images are analyzed to map weed distribution information, which can provide decision-making support information for precise spraying applications [3].
Based on their high spatial resolution, UAVs have been found to be appropriate for weed mapping tasks. Ortizet et al. [4] collected UAV images and achieved good performance for weed mapping using a semi-supervised method. Castaldi et al. [5] used UAV multispectral images to classify maize and weeds. However, in previous studies, most image processing has been conducted offline, which results in a significant time gap between image collection and spraying applications. This disadvantage significantly limits the applications of SSWM because agronomy conditions may change during large time gaps. In SSWM applications, the time gaps should be less than one day so that the herbicide treatment can be applied before the agronomy conditions change. One possible solution to address this issue is to utilize real-time image processing onboard UAVs. Based on real-time data collection performed by embedded devices, UAVs can perform corresponding spraying actions immediately, which can solve the problem of time gaps in the current model of SSWM.
Although there has been no related work in the domain of SSWM research, we identified some related studies in industrial domains. One of the barriers in real-time image processing is that the image data is large and difficult to process by embedded devices, and some studies have exploited to reduce the amount of data while maintaining the monitoring objective. Barmpoutis et al. [6] used a 360-degree camera mounted on a UAV for fire detection. The use of 360-degree camera significantly reduces the amount of data, which is meaningful in real-time processing scenarios. In addition to the data catastrophe, the employment of deep learning models utilized by connecting desktops to embedded devices is a major challenge in the industrialization of artificial intelligence. Currently, most studies on employing deep learning models using embedded devices focus on reducing the model size while maintaining reasonable accuracy. Alexander et al. [7] introduced a tiny solid-state drive architecture that requires a storage space of only 2.3 MB to achieve a mean average precision of 61.3% on the VOC 2007 challenge. Sabir et al. [8] proposed a lightweight model for multiple-object detection and tracking that can integrate a deep-learning-based association metric approach with simple online and real-time tracking. Their algorithm was deployed on an NVIDIA Jetson TX module and Intel Neural Compute Stick. The results demonstrated the effectiveness of their algorithms for real-time experiments onboard UAVs. Arpit et al. [9] proposed a modified MobileNetV2 architecture for fire classification. They modified the last fully-connected layer of the MobileNetV2 architecture to fit the fire detection task, and the resulting model exhibited better performance than that of previous models. The proposed model was deployed on a Raspberry Pi 3B device and achieved a speed of 5 fps with an accuracy of 0.92 on the dataset presented in [10].
However, regarding the requirements of SSWM applications, there are some limitations to the methods proposed in the aforementioned studies. (1) Although previous studies have developed various approaches to reduce the number of model parameters, they have largely focused on classification [9] and detection models [7,8,11]. In studies on SSWM, semantic segmentation models [12] have proven to be effective at weed mapping tasks [13,14]; however, their optimization cannot be guaranteed based on the results of the aforementioned studies. One of the greatest obstacles for implementing such deep learning networks for widespread deployment on embedded devices is their high computational and memory requirements. In general, semantic segmentation models are difficult to optimize for small devices because they use 2D information that requires significant computation, making them less suitable for the research described in this paper. (2) The aforementioned studies only considered model design and optimization, while ignoring the optimization of the inference process, which may have the potential to achieve better efficiency. (3) Most previous studies have only conducted model design and validation on computers or embedded devices. However, in the context of SSWM applications, a real-time spraying machine requires modules for map visualization, flight control, image collection, image processing, and spraying. Based on the objectives of this study, it was necessary to construct a basic hardware environment to support modules for map visualization, route planning, image collection, and image processing. Therefore, the main goals of this study were to (1) develop a hardware system integrating map visualization, flight control, image collection, and image processing; (2) perform model design to achieve a compact size while maintaining reasonable prediction accuracy; and (3) exploit the optimization of the inference process, which can further improve inference speed. There are two main contributions of this work. First, we propose a lightweight network architecture for weed mapping tasks, which improves inference speed while maintaining reasonable accuracy. Second, we propose combining precision calibration with model design to improve speed, which has not been discussed in other related papers.

Hardware Development for a UAV System
The developed system can be divided into three main components: (1) a map visualization module for selecting flight areas and generating flight plans, (2) flight control module for controlling the UAV, and (3) onboard image collection and processing module. These components are illustrated in Figure 1a. To apply this system in an experiment, a user must first select an area of interest on a map. The software on the laptop then automatically generates corresponding routes. Waypoints are then uploaded to the flight control system via wireless serial communication. The flight control system controls the UAV and begins to collect images. During the flight process, newly collected images are processed directly by the image processing module. In this step, a fully convolutional network (FCN) is used to map weed distributions at the pixel level. Finally, a weed cover map is generated in real-time, which can provide decision support information for spraying machines. In this study, we adopted the M100 UAV platform (DJI Co., Ltd., Shenzhen, China). The Jetson TX2 module (Nvidia Corporation, Santa Clara, CA, USA) was selected as a control center to realize flight control, image collection, and! timage processing. The main components of the UAV can be observed in Figure 1b. The M100 UAV platform was chosen for its flight stability against airflow disturbance, and the Jetson TX2 was chosen for its GPU unit which is powerful in the inference of the deep learning models.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 11 Therefore, the main goals of this study were to (1) develop a hardware system integrating map visualization, flight control, image collection, and image processing; (2) perform model design to achieve a compact size while maintaining reasonable prediction accuracy; and (3) exploit the optimization of the inference process, which can further improve inference speed. There are two main contributions of this work. First, we propose a lightweight network architecture for weed mapping tasks, which improves inference speed while maintaining reasonable accuracy. Second, we propose combining precision calibration with model design to improve speed, which has not been discussed in other related papers.

Hardware Development for a UAV System
The developed system can be divided into three main components: (1) a map visualization module for selecting flight areas and generating flight plans, (2) flight control module for controlling the UAV, and (3) onboard image collection and processing module. These components are illustrated in Figure 1a. To apply this system in an experiment, a user must first select an area of interest on a map. The software on the laptop then automatically generates corresponding routes. Waypoints are then uploaded to the flight control system via wireless serial communication. The flight control system controls the UAV and begins to collect images. During the flight process, newly collected images are processed directly by the image processing module. In this step, a fully convolutional network (FCN) is used to map weed distributions at the pixel level. Finally, a weed cover map is generated in real-time, which can provide decision support information for spraying machines. In this study, we adopted the M100 UAV platform (DJI Co., Ltd., Shenzhen, China). The Jetson TX2 module (Nvidia Corporation, Santa Clara, CA, USA) was selected as a control center to realize flight control, image collection, and image processing. The main components of the UAV can be observed in Figure 1b. The M100 UAV platform was chosen for its flight stability against airflow disturbance, and the Jetson TX2 was chosen for its GPU unit which is powerful in the inference of the deep learning models.

Map Visualization Module
In this study, the map visualization module (see Figure 2a) was developed using the Google Maps API (Google Inc., Mountain View, CA, USA). The ground station software has two main functions: (i) selecting areas of interest on the map and automatically generating routes, and (ii) transferring flight points to the flight control system via serial communication. The main objective of the ground station software is to provide a map for a user to select an area of interest. The selection of an area of interest is a basic requirement for the subsequent route planning procedure, autonomous flight, and image acquisition performed by the UAV. The map used in this software is a mosaic of map tiles downloaded from Google Maps. Map tiles are offline resources on the laptop that do not need to be accessed via the internet. The map only provides a real-world overview of the area and does not contain specific location information, such as building locations, obstacles, and real-time weather data. Before flight planning

Map Visualization Module
In this study, the map visualization module (see Figure 2a) was developed using the Google Maps API (Google Inc., Mountain View, CA, USA). The ground station software has two main functions: (i) selecting areas of interest on the map and automatically generating routes, and (ii) transferring flight points to the flight control system via serial communication. The main objective of the ground station software is to provide a map for a user to select an area of interest. The selection of an area of interest is a basic requirement for the subsequent route planning procedure, autonomous flight, and image acquisition performed by the UAV. The map used in this software is a mosaic of map tiles downloaded from Google Maps. Map tiles are offline resources on the laptop that do not need to be accessed via the internet. The map only provides a real-world overview of the area and does not contain specific location information, such as building locations, obstacles, and real-time weather data. Before flight planning begins, the user must define the forward lap and side lap. The overlap rate, flight altitude, and resolution of the camera are then used to generate flight routes. As shown in Figure 2b, during the course of a flight mission, UAVs perform image acquisition along the planned waveform path.

Flight Control Module
In this study, the flight control system was developed using the open-source Onboard Software Development Kit provided by the producer of the M100 (DJI Co., Ltd., Shenzhen, China). The flight control system realizes secondary control of the UAV platform. Communication between the flight control system and laptop is realized using a wireless serial module. Communication between the flight control system and UAV platform is realized using the controller area network protocol [15].

Image Collection and Processing
The image processing module in the UAV has two main functions: (i) automatically directing image acquisition according to the planned routes and predefined overlaps and (ii) processing collected images in real time. The camera (see Figure 1b) used in the image acquisition module of the system is a TOP-T10X camera (Tuopu Lianchuang co., ltd., Beijing, China) [16], which is stabilized by an optimized three-axis gimbal. On the M100 UAV, image acquisition is accomplished using the "opencvsdk" library [17]. Figure 3 presents the overall workflow of the system. The software acquires the most recent frame from the camera at a resolution of 1280 × 720 pixels. Images are cropped according to the central 720 × 720 pixel area and then resized to 1000 × 1000 pixels, which was proven to be a suitable image size for weed mapping by the previous work of our team [14]. The camera can acquire images at up to 60 fps. The overall processing speed is determined by the network model and the capabilities of the GPU.

Task Assignment
During flight, the Jetson TX2 module performs several tasks. It is necessary to assign these tasks to different components onboard the Jetson TX2 module. It is equipped with a 256 core NVIDIA Pascal GPU and six core ARMv8 64-bit CPU complex. According to the designs presented in related papers, the CPU serves as the host and is responsible for task scheduling, while the GPU serves as a device and performs the computing tasks required for image processing. For semantic segmentation models, more than 80% of the operation time of most algorithms is consumed by convolution, deconvolution, and matrix operations. A GPU can handle these high-density and high-volume operations using Appl. Sci. 2020, 10, 7132 5 of 11 highly parallel multi-threading, while the CPU is responsible for the UAV flight control program and communication processing, as shown in Figure 4. collected images in real time. The camera (see Figure 1b) used in the image acquisition module of the system is a TOP-T10X camera (Tuopu Lianchuang co., ltd., Beijing, China) [16], which is stabilized by an optimized three-axis gimbal. On the M100 UAV, image acquisition is accomplished using the "opencvsdk" library [17]. Figure 3 presents the overall workflow of the system. The software acquires the most recent frame from the camera at a resolution of 1280 × 720 pixels. Images are cropped according to the central 720 × 720 pixel area and then resized to 1000 × 1000 pixels, which was proven to be a suitable image size for weed mapping by the previous work of our team [14]. The camera can acquire images at up to 60 fps. The overall processing speed is determined by the network model and the capabilities of the GPU.

Task Assignment
During flight, the Jetson TX2 module performs several tasks. It is necessary to assign these tasks to different components onboard the Jetson TX2 module. It is equipped with a 256 core NVIDIA Pascal GPU and six core ARMv8 64-bit CPU complex. According to the designs presented in related papers, the CPU serves as the host and is responsible for task scheduling, while the GPU serves as a device and performs the computing tasks required for image processing. For semantic segmentation models, more than 80% of the operation time of most algorithms is consumed by convolution, deconvolution, and matrix operations. A GPU can handle these high-density and high-volume operations using highly parallel multi-threading, while the CPU is responsible for the UAV flight control program and communication processing, as shown in Figure 4.

Model Design
Related studies have demonstrated that a FCN is appropriate for weed mapping tasks. In contrast to a traditional convolutional neural network, an FCN is an end-to-end network structure that can perform dense prediction tasks, allowing it to determine the weed densities at a target site, which can be directly used as references for accurate spraying tasks. For an FCN, input data can be images of arbitrary size and a corresponding output size can be obtained through effective inference. An FCN transforms all fully-connected layers into convolutional layers. In this form, the spatial information of an input image is preserved. By using deconvolutional operations, the feature maps screened from the final convolutional layer are up-sampled and the output image is restored to the same size as the input image. The general architecture of an FCN is presented in Figure 5.

Model Design
Related studies have demonstrated that a FCN is appropriate for weed mapping tasks. In contrast to a traditional convolutional neural network, an FCN is an end-to-end network structure that can perform dense prediction tasks, allowing it to determine the weed densities at a target site, which can be directly used as references for accurate spraying tasks. For an FCN, input data can be images of arbitrary size and a corresponding output size can be obtained through effective inference. An FCN transforms all fully-connected layers into convolutional layers. In this form, the spatial information of an input image is preserved. By using deconvolutional operations, the feature maps screened from the final convolutional layer are up-sampled and the output image is restored to the same size as the input image. The general architecture of an FCN is presented in Figure 5.
images of arbitrary size and a corresponding output size can be obtained through effective inference. An FCN transforms all fully-connected layers into convolutional layers. In this form, the spatial information of an input image is preserved. By using deconvolutional operations, the feature maps screened from the final convolutional layer are up-sampled and the output image is restored to the same size as the input image. The general architecture of an FCN is presented in Figure 5. Similar to other studies, we adopted a pre-trained network and fine-tuned it using a relevant dataset. Commonly used pre-trained networks include the ResNet [18] and VGGNet [19] models. These models tend to outperform their counterparts in terms of accuracy. However, according to the research Similar to other studies, we adopted a pre-trained network and fine-tuned it using a relevant dataset. Commonly used pre-trained networks include the ResNet [18] and VGGNet [19] models. These models tend to outperform their counterparts in terms of accuracy. However, according to the research presented in [14], these models are relatively time consuming, that is, they are inappropriate for real-time tasks. Therefore, we selected AlexNet as a pre-trained model because it is a relatively lightweight framework. As shown in Figure 6, AlexNet contains five convolutional layers and three fully connected layers. To reduce the number of model parameters, we removed the FC6 and FC7 layers. Additionally, the number of neurons in the final fully connected layer was reduced to three to match the number of classes (each pixel was classified as "rice", "weed", or "other"). Next, the last fully connected layer was transformed into a convolutional layer and a deconvolutional layer was appended to restore the feature map to the original image resolution.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 11 presented in [14], these models are relatively time consuming, that is, they are inappropriate for realtime tasks. Therefore, we selected AlexNet as a pre-trained model because it is a relatively lightweight framework. As shown in Figure 6, AlexNet contains five convolutional layers and three fully connected layers. To reduce the number of model parameters, we removed the FC6 and FC7 layers. Additionally, the number of neurons in the final fully connected layer was reduced to three to match the number of classes (each pixel was classified as "rice", "weed", or "other"). Next, the last fully connected layer was transformed into a convolutional layer and a deconvolutional layer was appended to restore the feature map to the original image resolution.

Optimization of the Inference Process
Most deep learning models train and evaluate neural networks with full 32-bit precision (FP32). However, once a model is fully trained, inference computations can use half-precision FP16 instead because gradient backpropagation is not required for inference. Using a lower precision results in a smaller model size, lower memory utilization and latency, and higher throughput. In this case, we applied FP32 for training and used FP16 for inference. However, it is worth noting that this will have an impact on recognition accuracy.

Results
In this section, we proposed a light weight network for weed mapping and compared its performance with the mainstream segmentation models. After that, the proposed network was transferred from the desktop to the embedded devices. Precision calibration was used to optimize the inference process. All models were evaluated in terms of accuracy and efficiency. For the accuracy, the metrics of overall accuracy and mean intersection over union (IoU) were applied. The overall

Optimization of the Inference Process
Most deep learning models train and evaluate neural networks with full 32-bit precision (FP32). However, once a model is fully trained, inference computations can use half-precision FP16 instead because gradient backpropagation is not required for inference. Using a lower precision results in a smaller model size, lower memory utilization and latency, and higher throughput. In this case, we applied FP32 for training and used FP16 for inference. However, it is worth noting that this will have an impact on recognition accuracy.

Results
In this section, we proposed a light weight network for weed mapping and compared its performance with the mainstream segmentation models. After that, the proposed network was transferred from the desktop to the embedded devices. Precision calibration was used to optimize the inference process. All models were evaluated in terms of accuracy and efficiency. For the accuracy, the metrics of overall accuracy and mean intersection over union (IoU) were applied. The overall accuracy is computed using the ratio of the pixels accurately classified to all pixels, and the mean intersection over union is calculated using the percent overlap between the target mask and our prediction output. For the efficiency, the frames per second (fps) was used as the metric, which represents how many images can be processed in one second.

Data Collection
In this study, experimental data were collected from a rice field located in Southern China (113.636888 N, 23.240441 E). Data collection was conducted when the weeds and crops were in their early tillering stages. The flight height was set to 6 m above the ground and the forward lap and side lap for imaging were set to 50% and 60%, respectively. A total of 1092 samples were collected and each sample was a 1000 × 1000 pixel RGB UAV image, where each image corresponds to a 30 cm × 30 cm field area. Each pixel in each image was classified as one of three classes: rice, weed, and other. To perform network training and validation, the images were manually labeled at the pixel level, as shown in Figure 7. In our dataset, 892 samples were used for training and the remaining 200 images were used for network validation. To perform network training and validation, the images were manually labeled at the pixel level, as shown in Figure 7. In our dataset, 892 samples were used for training and the remaining 200 images were used for network validation.

Model Design
As described in Section 3.2., a lightweight classification architecture (AlexNet) was selected as a baseline network. To reduce the number of network parameters, the first two fully-connected layers were removed, the number of neurons of the final fully-connected layer was set to three, and this layer was eventually transformed into a convolutional layer. A deconvolutional layer was appended to restore the feature map to the original image resolution. The training data were augmented using data enhancement (translation, rotation, and tailoring).
During the training process, the Adam optimizer was used for adaptive momentum estimation. The Adam gradient descent method is an adaptive learning rate method with different parameters that yields a fast convergence speed. The entire training process was limited to 30 epochs, where each iteration traversed all training samples. After forward computation, the sigmoid cross-entropy loss function was used as a cost function. The initial learning rate was set to 0.00001. Since the learning rate can be adjusted adaptively according to the rate of gradient descent, the convergence speed of the model was improved, as shown in Figure 8. After 30 epochs, testing on the validation dataset resulted in an overall accuracy of 91.2%. As shown in Table 1, the model operates on a GTX1060 GPU at 0.9 fps, while implementing the same network on the Jetson TX2 module results in a rate of only 0.11 fps, which is insufficient for real-time processing applications.

Model Design
As described in Section 3.2., a lightweight classification architecture (AlexNet) was selected as a baseline network. To reduce the number of network parameters, the first two fully-connected layers were removed, the number of neurons of the final fully-connected layer was set to three, and this layer was eventually transformed into a convolutional layer. A deconvolutional layer was appended to restore the feature map to the original image resolution. The training data were augmented using data enhancement (translation, rotation, and tailoring).
During the training process, the Adam optimizer was used for adaptive momentum estimation. The Adam gradient descent method is an adaptive learning rate method with different parameters that yields a fast convergence speed. The entire training process was limited to 30 epochs, where each iteration traversed all training samples. After forward computation, the sigmoid cross-entropy loss function was used as a cost function. The initial learning rate was set to 0.00001. Since the learning rate can be adjusted adaptively according to the rate of gradient descent, the convergence speed of the model was improved, as shown in Figure 8. After 30 epochs, testing on the validation dataset resulted in an overall accuracy of 91.2%. As shown in Table 1, the model operates on a GTX1060 GPU at 0.9 fps, while implementing the same network on the Jetson TX2 module results in a rate of only 0.11 fps, which is insufficient for real-time processing applications.
Our modified AlexNet-FCN was compared with the classic VGGNet, GoogLeNet, and ResNet. For the VGGNet-FCN, the skip architecture was omitted to increase speed. For the GoogLeNet, ReLU was used as the activation function, and the dropout strategy was used to overcome the problem of overfitting. For the ResNet-FCN, the ResNet-101 model was selected as a baseline architecture because it provides the best balance of accuracy and efficiency [20]. The performances of all models are summarized in Table 1. One can see that the accuracy of AlexNet-FCN is slightly lower than that of VGGNet-FCN and GoogLeNet-FCN. However, it is almost four times faster than the VGGNet-FCN and two times faster than the GoogLeNet-FCN, meaning it can meet the requirements of real-time scenarios. The accuracy of the ResNet-FCN is the highest among the three models based on its deep structure and residual architecture. However, this model is so large that the GTX 1060 GPU could not even perform the forward computation. In this case, we used the GTX 1080 TI GPU for network training and evaluation. However, the GPU resources of the GTX 1060 are greater than those of embedded devices. Therefore, it is impossible to copy the ResNet-FCN from the GTX 1080 TI GPU to the embedded Jetson TX2 module.

Model Design
As described in Section 3.2., a lightweight classification architecture (AlexNet) was selected as a baseline network. To reduce the number of network parameters, the first two fully-connected layers were removed, the number of neurons of the final fully-connected layer was set to three, and this layer was eventually transformed into a convolutional layer. A deconvolutional layer was appended to restore the feature map to the original image resolution. The training data were augmented using data enhancement (translation, rotation, and tailoring).
During the training process, the Adam optimizer was used for adaptive momentum estimation. The Adam gradient descent method is an adaptive learning rate method with different parameters that yields a fast convergence speed. The entire training process was limited to 30 epochs, where each iteration traversed all training samples. After forward computation, the sigmoid cross-entropy loss function was used as a cost function. The initial learning rate was set to 0.00001. Since the learning rate can be adjusted adaptively according to the rate of gradient descent, the convergence speed of the model was improved, as shown in Figure 8. After 30 epochs, testing on the validation dataset resulted in an overall accuracy of 91.2%. As shown in Table 1, the model operates on a GTX1060 GPU at 0.9 fps, while implementing the same network on the Jetson TX2 module results in a rate of only 0.11 fps, which is insufficient for real-time processing applications.

Optimization of the Inference Process
To reduce the model size, the data precision requirement of the network was changed from FP32 to FP16. The Jetson TX2 module uses half-precision mode to reduce the size of the model to half of its original size. Table 2 lists the results of precision calibration for the modified AlexNet-FCN. One can see that precision calibration improves inference efficiency by approximately three to four times on both the GTX 1060 (Nvidia Corporation, Santa Clara, CA, USA) and Jetson TX2 module. Although the prediction accuracy decreases slightly, the overall accuracy is acceptable for agricultural applications because it can provide rapid judgments regarding weed density.  Figure 9 presents the output images corresponding to several testing samples before and after precision calibration. In Figure 9c, one can see that our modified AlexNet-FCN can correctly distinguish the rice and weed areas. Following precision calibration, the prediction accuracy decreases by a small amount. The weed area in the third sample (blue dotted lines) was classified as the "other" category by the model following precision calibration. However, following precision calibration, our model still correctly classifies most areas in the input images, demonstrating its capability to provide decision support information for spraying machines.

Conclusions
UAV remote sensing and precision spraying are two important components of SSWM. However, the time gap between image collection and herbicide treatment significantly limits the application of

Conclusions
UAV remote sensing and precision spraying are two important components of SSWM. However, the time gap between image collection and herbicide treatment significantly limits the application of SSWM technology. This study used real-time image processing onboard a UAV with the goal of eliminating the time gap between image collection and herbicide treatment. First, we established a hardware environment for real-time image processing that integrates map visualization, flight control, image collection, and real-time image processing onboard a UAV based on secondary development. Second, we exploited the proposed model design and presented a lightweight network architecture for weed mapping tasks. The proposed network architecture was evaluated and compared with mainstream semantic segmentation models. Experimental results demonstrated that the proposed network is almost two times faster than the mainstream segmentation models with competitive accuracy. Next, we conducted optimization of the inference process. Precision calibration was applied on both desktop and embedded devices and the precision was reduced from FP32 to FP16. Experimental results demonstrated that precision calibration further improves inference speed while maintaining reasonable prediction accuracy. Our modified network architecture achieved an accuracy of 80.9% on the testing samples and the inference speed of 4.5 fps on the Jetson TX2 module, which demonstrates its potential for practical agricultural monitoring and precise spraying applications.
In the future, we plan to collect additional UAV images for model training and validation. Additionally, we plan to combine our system with variable spraying technology. The classification results of UAV images can provide decision-making information for sprayers, which can help maintain pesticide effects while reducing the use of chemicals.