Agricultural Field Boundary Delineation with Satellite Image Segmentation for High-Resolution Crop Mapping: A Case Study of Rice Paddy

: Parcel-level cropland maps are an essential data source for crop yield estimation, precision agriculture, and many other agronomy applications. Here, we proposed a rice ﬁeld mapping approach that combines agricultural ﬁeld boundary extraction with ﬁne-resolution satellite images and pixel-wise cropland classiﬁcation with Sentinel-1 time series SAR (Synthetic Aperture Radar) imagery. The agricultural ﬁeld boundaries were delineated by image segmentation using U-net-based fully convolutional network (FCN) models. Meanwhile, a simple decision-tree classiﬁer was developed based on rice phenology traits to extract rice pixels with time series SAR imagery. Agricultural ﬁelds were then classiﬁed as rice or non-rice by majority voting from pixel-wise classiﬁcation results. The evaluation indicated that SeresNet34, as the backbone of the U-net model, had the best performance in agricultural ﬁeld extraction with an IoU (Intersection over Union) of 0.801 compared to the simple U-net and ResNet-based U-net. The combination of agricultural ﬁeld maps with the rice pixel detection model showed promising improvement in the accuracy and resolution of rice mapping. The produced rice ﬁeld map had an IoU score of 0.953, while the User‘s Accuracy and Producer‘s Accuracy of pixel-wise rice ﬁeld mapping were 0.824 and 0.816, respectively. The proposed model combination scheme merely requires a simple pixel-wise cropland classiﬁcation model that incorporates the agricultural ﬁeld mapping results to produce high-accuracy and high-resolution cropland maps.


Introduction
Timely and accurate monitoring of cropland extent is essential for crop yield estimation, agricultural land use administration, and climate change simulations [1]. Thus, cropland mapping with remote sensing data has drawn research attention for decades. Precision agriculture has increased the demand for parcel-level cropland maps in recent years. As with other remote sensing image classification problems, cropland mapping from remote sensing imagery generally falls into pixel-based and object-based categories.
Traditional pixel-based crop classification with satellite imagery suffers from the 'salt and pepper' effect, which impairs the integrity of cropland parcels [2]. Therefore, pixelbased crop type classification can hardly fulfill the task of parcel-level cropland mapping. On the other hand, object-based cropland mapping approaches rely on the segmentation of the image and the construction of a hierarchical network of homogenous objects on which cropland classifications were made. Object-based approaches incorporate both spatial and spectral structure of remote sensing data and often yield better results. As implied by previous studies, field boundary improves the accuracy of crop type classification [3]. A critical step of object-based cropland mapping is delineating agricultural field boundaries at certain levels. Two categories of methods are commonly used for this purpose.
to their all-weather capability. Using single-temporal fine-resolution satellite images to generate land parcels as the 'objects' for object-based image analysis (OBIA) on multitemporal medium-resolution radar satellite images is expected to be a low-cost approach to producing high-resolution crop field maps. We adopted this strategy for rice field mapping that uses deep learning-based image segmentation to delineate land parcel boundaries and subsequently classifies each parcel into target crop types with a simple decision-tree classifier. In this study, we attempt to address the following issues: (1) Formulate an image segmentation scheme with appropriate data labeling process, model structure, and post-process methods, specifically for crop mapping applications; (2) Solve the prediction errors and conflicts near the border of the patches that the U-Net segmentation model uses as input; (3) Verify the feasibility of rice field mapping by incorporating a simple pixel-wise classifier with satellite image segmentation.

Study Area
The study area is a typical rice-planting region in Heilongjiang Province, the northmost part of China. It is a 10.5 km × 17.5 km rectangle region in Wuchang County, as shown on the maps in Figure 1. The county is famous for its high-quality japonica rice, with an approximate rice-growing area of 166,000 hectares (2021). A single rice cropping system is practiced in the area. In particular, transplanting is the predominating planting method, while very few direct seeding cases emerged in recent years. The rice-growing season starts in early May and ends in October. Figure 2 illustrates a scene showing the study area's typical rice field boundary bank.
imagery hampers its application in cropland mapping. Another issue of optical remote sensing imagery is that optical sensors are prone to disturbance of weather conditions, e.g., cloud cover. Radar sensors, on the other hand, provide more integrated time series data owing to their all-weather capability. Using single-temporal fine-resolution satellite images to generate land parcels as the 'objects' for object-based image analysis (OBIA) on multi-temporal medium-resolution radar satellite images is expected to be a low-cost approach to producing high-resolution crop field maps. We adopted this strategy for rice field mapping that uses deep learning-based image segmentation to delineate land parcel boundaries and subsequently classifies each parcel into target crop types with a simple decision-tree classifier. In this study, we attempt to address the following issues: (1) Formulate an image segmentation scheme with appropriate data labeling process, model structure, and post-process methods, specifically for crop mapping applications; (2) Solve the prediction errors and conflicts near the border of the patches that the U-Net segmentation model uses as input; (3) Verify the feasibility of rice field mapping by incorporating a simple pixel-wise classifier with satellite image segmentation.

Study Area
The study area is a typical rice-planting region in Heilongjiang Province, the northmost part of China. It is a 10.5 km × 17.5 km rectangle region in Wuchang County, as shown on the maps in Figure 1. The county is famous for its high-quality japonica rice, with an approximate rice-growing area of 166,000 hectares (2021). A single rice cropping system is practiced in the area. In particular, transplanting is the predominating planting method, while very few direct seeding cases emerged in recent years. The rice-growing season starts in early May and ends in October. Figure 2 illustrates a scene showing the study area's typical rice field boundary bank.   Fine-resolution remote sensing data are a requisite to extract field boundaries in ou study case since rice field boundaries are commonly around one meter in width. None theless, only the true-color RGB imagery is required for image segmentation purposes We acquired the RGB composite of the study area from CNES/Airbus Pléiades satellit imagery captured in September 2018. The RGB image is in 20,992×35,072 pixels and 0.5 m spatial resolution.

Sentinel-1 time series images
We acquired the time series of the European Space Agency (ESA) Sentinel-1 Level-Ground Range Detected (GRD) data product between May to October 2018 for pixel-wis rice mapping. Previous studies suggested that VH polarization has an advantage in char acterizing rice growth compared to VV [31,32]. The VH band of the GRD images was de rived for identifying rice field pixels. The data acquisition and processing were completed on the Google Earth Engine (GEE) cloud computing platform. The GRD data has alread undergone several preprocessing steps, provided with a backscattering coefficient (σ°) i decibels (dB) value.
The Sentinel-1 two-satellite constellation has a 6-day repeat cycle. However, it should be noted that only Sentinel-1 B data were accessible. There were overlaps of orbits in th study area. Hence the number of available SAR images reached 27. Spatial resolution o the SAR images is 10 m.

Agricultural Field Boundary Annotation
We selected three rectangle sampling regions on the fine-resolution RGB image t sketch agricultural field boundaries by visual interpretation (see Figure 3). With the assis tance of GIS software, the three sample regions were labeled into polygons representin three landcover classes: field boundary, the agricultural field, and background landcover The data labeling process involves the following steps: (1) Sketch the boundaries of agriculture fields to generate field polygons by visua interpretation.
(2) Create buffers on both sides of the polygon's edges from step (1) with a distanc of 0.5 m, resulting in boundary buffers of one meter width, which is a typical distance fo field boundaries for our study case.  Fine-resolution remote sensing data are a requisite to extract field boundaries in our study case since rice field boundaries are commonly around one meter in width. Nonetheless, only the true-color RGB imagery is required for image segmentation purposes. We acquired the RGB composite of the study area from CNES/Airbus Pléiades satellite imagery captured in September 2018. The RGB image is in 20,992×35,072 pixels and 0.5 m spatial resolution.

Sentinel-1 time series images
We acquired the time series of the European Space Agency (ESA) Sentinel-1 Level-1 Ground Range Detected (GRD) data product between May to October 2018 for pixelwise rice mapping. Previous studies suggested that VH polarization has an advantage in characterizing rice growth compared to VV [31,32]. The VH band of the GRD images was derived for identifying rice field pixels. The data acquisition and processing were completed on the Google Earth Engine (GEE) cloud computing platform. The GRD data has already undergone several preprocessing steps, provided with a backscattering coefficient (σ • ) in decibels (dB) value.
The Sentinel-1 two-satellite constellation has a 6-day repeat cycle. However, it should be noted that only Sentinel-1 B data were accessible. There were overlaps of orbits in the study area. Hence the number of available SAR images reached 27. Spatial resolution of the SAR images is 10 m.

Agricultural Field Boundary Annotation
We selected three rectangle sampling regions on the fine-resolution RGB image to sketch agricultural field boundaries by visual interpretation (see Figure 3). With the assistance of GIS software, the three sample regions were labeled into polygons representing three landcover classes: field boundary, the agricultural field, and background landcover. The data labeling process involves the following steps: (3) Generate agricultural field polygons (excluding the field boundary bank) by erasing the boundary buffers of step (2) from polygons from step (1).
(4) Rasterize the boundary buffers, agricultural field polygons, and the rest into a Geotiff image with different raster values (0, 1, and 2) and in the original image's pixel size, representing field boundary, agricultural field, and background land cover, respectively.

Rice Field Samples
Rice is the only crop that grows in wetland conditions. The flooding signal at its early growing stage provides crucial information to identify paddy fields. We managed to label 100 rice fields based on the radar backscatter value during the rice transplanting season with the assistance of visual interpretation on the fine-resolution Pléiades satellite imagery. Half of the sample rice fields were used to determine the thresholds for a simple decision-tree classifier, and the other half was used to evaluate the final rice field mapping results.
The mean values of the VH band of GRD data from middle May to middle June (rice transplanting season) were computed on the GEE platform. Agricultural fields identified by fine-resolution image whose SAR backscatter coefficient during that period were generally low and close to the water surface were asserted as rice fields.

Methodology
The general strategy of this work is to leverage agricultural field maps to improve crop mapping. Therefore, we emphasize fine-resolution satellite image segmentation designed explicitly for agricultural field mapping. Only a simple pixel-based crop mapping classifier was required with extracted field boundaries to yield a high-resolution crop field map. The workflow of this work is illustrated in Figure 4. We first implemented the data (1) Sketch the boundaries of agriculture fields to generate field polygons by visual interpretation.
(2) Create buffers on both sides of the polygon's edges from step (1) with a distance of 0.5 m, resulting in boundary buffers of one meter width, which is a typical distance for field boundaries for our study case.
(3) Generate agricultural field polygons (excluding the field boundary bank) by erasing the boundary buffers of step (2) from polygons from step (1).
(4) Rasterize the boundary buffers, agricultural field polygons, and the rest into a Geotiff image with different raster values (0, 1, and 2) and in the original image's pixel size, representing field boundary, agricultural field, and background land cover, respectively.

Rice Field Samples
Rice is the only crop that grows in wetland conditions. The flooding signal at its early growing stage provides crucial information to identify paddy fields. We managed to label 100 rice fields based on the radar backscatter value during the rice transplanting season with the assistance of visual interpretation on the fine-resolution Pléiades satellite imagery. Half of the sample rice fields were used to determine the thresholds for a simple decision-tree classifier, and the other half was used to evaluate the final rice field mapping results.
The mean values of the VH band of GRD data from middle May to middle June (rice transplanting season) were computed on the GEE platform. Agricultural fields identified by fine-resolution image whose SAR backscatter coefficient during that period were generally low and close to the water surface were asserted as rice fields.

Methodology
The general strategy of this work is to leverage agricultural field maps to improve crop mapping. Therefore, we emphasize fine-resolution satellite image segmentation Agronomy 2022, 12, 2342 6 of 16 designed explicitly for agricultural field mapping. Only a simple pixel-based crop mapping classifier was required with extracted field boundaries to yield a high-resolution crop field map. The workflow of this work is illustrated in Figure 4. We first implemented the data labeling processes using a fine-resolution true-color satellite image and time series SAR images. Using the labeled data, we test three FCNs, namely a simple U-net, ResNet34-based U-net, and SeresNet34-based U-net, with different parameterizations. We applied a smooth prediction method to deal with prediction errors near the edges of image patches. Meanwhile, rice field pixels were identified with a decision-tree classifier based on phenological traits. The two outputs of field boundary extraction and rice field pixel identification were combined to produce a high-resolution rice field map. labeling processes using a fine-resolution true-color satellite image and time series SAR images. Using the labeled data, we test three FCNs, namely a simple U-net, ResNet34based U-net, and SeresNet34-based U-net, with different parameterizations. We applied a smooth prediction method to deal with prediction errors near the edges of image patches. Meanwhile, rice field pixels were identified with a decision-tree classifier based on phenological traits. The two outputs of field boundary extraction and rice field pixel identification were combined to produce a high-resolution rice field map. The original RGB satellite image and the rasterized annotated target images form the training data for image segmentation. Several preprocessing steps were carried out before model training: (1) The original image and target images were clipped into 126 small patches of 256 256 pixels; (2) The original RGB image channels were transformed into the range [0, 1] using the Min-Max scaler; (3) Not all image patches contain adequate field boundary pixels. We screened out those image patches whose field boundary pixels were less than 100; (4) The resulting training patches underwent data augmentation by flipping (vertically and horizontally) and rotating (at 90° intervals).

U-net architecture-based CNN
The U-net is a convolutional neural network initially developed for medical image The original RGB satellite image and the rasterized annotated target images form the training data for image segmentation. Several preprocessing steps were carried out before model training: (1) The original image and target images were clipped into 126 small patches of 256×256 pixels; (2) The original RGB image channels were transformed into the range [0, 1] using the Min-Max scaler; (3) Not all image patches contain adequate field boundary pixels. We screened out those image patches whose field boundary pixels were less than 100; (4) The resulting training patches underwent data augmentation by flipping (vertically and horizontally) and rotating (at 90 • intervals).

U-net architecture-based CNN
The U-net is a convolutional neural network initially developed for medical image segmentation. The neural network is an improvement based on the FCN and showed its superiority with fewer training data [20]. The U-net is a CNN model of multi-scale encoders-decoders with skip connections. It has a symmetric architecture that consists of two parts: the contracting path or encoder on the left and the expansive part or decoder on the right ( Figure 5). The encoder part follows the general convolutional process, which compresses the spatial information and extracts feature information while reducing the height and width of the input image. The decoder part is constituted by transposed 2D convolutional layers that upscale the encoded features and spatial information to a higher resolution pixel space to achieve a dense classification. parts: the contracting path or encoder on the left and the expansive part or decoder on the right ( Figure 5). The encoder part follows the general convolutional process, which compresses the spatial information and extracts feature information while reducing the height and width of the input image. The decoder part is constituted by transposed 2D convolutional layers that upscale the encoded features and spatial information to a higher resolution pixel space to achieve a dense classification. Many state-of-the-art CNN models have been developed in recent years for computer vision tasks, e.g., VGGNet, ResNet, DenseNet, EfficientNet, and InceptionNet. We tested the original U-net ('simple U-net' hereunder) by Ronneberger, Fischer, and Brox [20], Res-Net34, and SeresNet34 as the backbone networks in the U-net architecture model since those are reported to be effective in satellite image segmentation tasks [21,33,34]. Each backbone network block of the simple U-net consisted of two convolution layers, a dropout layer, and a maxpooling layer. We refer to He, et al. [35], Hu, et al. [36] for details of ResNet and SerestNet, respectively. Pre-trained weights on the ImageNet dataset [37] were adopted for the backbone networks for faster and better convergence on a small training set. Parameters used to train the networks are listed ( Table 1). The image segmentation tasks were conducted using Python 3.7 language, TensorFlow 2.0, and an opensource Python library Segmentation Models (https://github.com/qubvel/segmenta-tion_models (accessed on 12 August 2022)) on a server with Nvidia V100 graphic cards. The original satellite image was cropped into small patches (256 256 pixels) to feed into the trained segmentation models at the prediction phase. The segmentation models Many state-of-the-art CNN models have been developed in recent years for computer vision tasks, e.g., VGGNet, ResNet, DenseNet, EfficientNet, and InceptionNet. We tested the original U-net ('simple U-net' hereunder) by Ronneberger, Fischer, and Brox [20], ResNet34, and SeresNet34 as the backbone networks in the U-net architecture model since those are reported to be effective in satellite image segmentation tasks [21,33,34]. Each backbone network block of the simple U-net consisted of two convolution layers, a dropout layer, and a maxpooling layer. We refer to He, et al. [35], Hu, et al. [36] for details of ResNet and SerestNet, respectively. Pre-trained weights on the ImageNet dataset [37] were adopted for the backbone networks for faster and better convergence on a small training set. Parameters used to train the networks are listed ( Table 1). The image segmentation tasks were conducted using Python 3.7 language, TensorFlow 2.0, and an open-source Python library Segmentation Models (https://github.com/qubvel/segmentation_models (accessed on 12 August 2022)) on a server with Nvidia V100 graphic cards.

Smooth Predictions for Image Patches
The original satellite image was cropped into small patches (256×256 pixels) to feed into the trained segmentation models at the prediction phase. The segmentation models make predictions solely on those small local windows of the image patch rather than the whole study area. As a result, prediction errors and conflicts near the border of the patches were not neglectable. We applied a smooth blending strategy for predicted image patches to solve this issue. First, for each image patch, the eight transformations of Dihedral Group D_4 were used, i.e., three possible 90 degree rotations and a mirrored version of those rotations and the patch itself. As a result, each patch has an eight-fold augmented prediction before merging the predictions.
While making predictions on the whole study area, the original satellite image was cropped into patches with 50% overlapping to eliminate the border effects. All the predictions for each patch were spatially merged by weighting pixels. The basic idea of merging the overlapping region is that if the pixel location is closer to the patch center, the more weight it has from that patch's prediction. The weights are computed by interpolating with a 2-D Gaussian function. The final prediction label L is defined by soft voting strategy with the following equations.
Equation (1) is the voting rule for each pixel, where w i is the ith weight of prediction and p i,c is the ith prediction on the probability of class c. w i is a function of the pixel location relative to the image center as (0, 0). The weight w i (x, y) is computed by Equation (2), where (x, y) denotes the pixel's coordinates ranging from (−127, −127) to (128, 128). k is the number of predictions. Depending on pixel location on the original image, k can be either 8, 16, or 32.
Following the above procedures, the image segmentation of each image patch was blended to produce a full-size segmentation result image. The resulting image then underwent several processing steps for future use in crop mapping: (1) Vectorization of segmentation results in an image while keeping the topology of fields and boundaries. Connected pixels of the same class will result in an individual polygon. (2) Delete the boundaries from the map and keep the only agricultural field and background category for crop mapping. At this point, the boundary class was redundant information since agricultural fields were extracted.
The output vector data contain a large number of objects (polygons) of agricultural fields and background land parcels, while the field boundary bank areas were excluded.

Rice Field Identification
Due to the rice field's unique wetland condition during the vegetative phase, rice fields show a unique temporal profile of radar signal compared to other landcover types. A few days before and after transplanting, the radar signal of rice fields is dominated by the water surface and is at its lowest level. The development of the rice canopy during the vegetative phase leads to a continuous increase in radar backscatter, reaching a maximum at the heading stage [38]. Based on this phenological characteristic of the radar signal, we devised a simple decision-tree classifier to discriminate between rice and non-rice pixels using Sentinel-1 SAR images. The time series of radar backscatter from Sentinel-1's VH polarization was used to detect flooding signals at the transplanting stage and peak signals at the heading stage. Figure 6 illustrates the temporal profile of the SAR backscatter of the sample rice fields from the training set. Based on the local cropping calendar, the time window for flooding signal was set to 10 May to 01 June, and 20 August to 10 September for peak signal. This simple decision-tree classifier (Equation (3)) generated a map showing rice pixels and non-rice pixels. The final step of producing a more precise rice field map is to combine the pixel-based rice map with the field boundary map resulting from satellite image segmentation. A field from the image segmentation is classified as rice or non-rice by majority voting from the pixels within its spatial extent.

Evaluation Metric
We used Intersection over Union (IoU) as the key metric to evaluate the image segmentation and rice field mapping results. IoU is the ratio of the overlap area to the combined area of prediction and ground truth, ranging from 0 to 1 (Equation (4)). It is equivalent to the Jaccard coefficient and is commonly used to evaluate image segmentation tasks. For the evaluation of rice field mapping in this study, IoU is a better metric than the User's Accuracy and Producer's Accuracy since IoU evaluates classification accuracy and spatial coherency simultaneously between test rice fields and the prediction.
Meanwhile, User's Accuracy (UA, precision), Producer's Accuracy (PA, recall), and F1 were used to evaluate pixel-wise rice identification and agricultural boundary extraction. The three metrics for class C are defined as: Two thresholds were fixed according to the mean pixel value (VH band) of sample rice fields from multiple SAR images during the two windows. For the flooding signal, the lower threshold was the upper quartile of sample rice pixels, i.e., −25.43 dB. For the peak signal, the upper threshold was the lower quartile −13.83 dB.
rice pixel = f looding signal ≤ −25.43 dB peak signal ≥ −13.83 dB This simple decision-tree classifier (Equation (3)) generated a map showing rice pixels and non-rice pixels. The final step of producing a more precise rice field map is to combine the pixel-based rice map with the field boundary map resulting from satellite image segmentation. A field from the image segmentation is classified as rice or non-rice by majority voting from the pixels within its spatial extent.

Evaluation Metric
We used Intersection over Union (IoU) as the key metric to evaluate the image segmentation and rice field mapping results. IoU is the ratio of the overlap area to the combined area of prediction and ground truth, ranging from 0 to 1 (Equation (4)). It is equivalent to the Jaccard coefficient and is commonly used to evaluate image segmentation tasks. For the evaluation of rice field mapping in this study, IoU is a better metric than the User's Accuracy and Producer's Accuracy since IoU evaluates classification accuracy and spatial coherency simultaneously between test rice fields and the prediction.
Meanwhile, User's Accuracy (UA, precision), Producer's Accuracy (PA, recall), and F1 were used to evaluate pixel-wise rice identification and agricultural boundary extraction. The three metrics for class C are defined as:

Satellite Image Segmentation Results
Evaluation on the test images shows that the U-net with SeresNet34 had the best performance, with 0.801 for IoU and 0.782 for F1 on boundary detection. The ResNet34based U-net had an IoU of 0.755 and 0.757 for F1 on boundary detection, while the metrics for the simple U-net structure were 0.687 and 0.758, respectively (Table 2). Figure 7 is a comparison of the three image segmentation models' performance. Predictions on a test image show that simple U-net (A) and ResNet34-based U-net (B) had considerable errors in classifying agricultural fields into background land cover. SeresNet34-based U-net (C), on the other hand, overcame this problem with a higher IoU score. Nonetheless, all three models had matching performance levels detecting the field boundaries, which is our study's most critical predicting target. Figure 8 shows the satellite image segmentation map. The map illustrates a good matching of agricultural field boundary extraction with ground truth.

Satellite Image Segmentation Results
Evaluation on the test images shows that the U-net with SeresNet34 had the best performance, with 0.801 for IoU and 0.782 for F1 on boundary detection. The ResNet34-based U-net had an IoU of 0.755 and 0.757 for F1 on boundary detection, while the metrics for the simple U-net structure were 0.687 and 0.758, respectively ( Table 2). Figure 7 is a comparison of the three image segmentation models' performance. Predictions on a test image show that simple U-net (A) and ResNet34-based U-net (B) had considerable errors in classifying agricultural fields into background land cover. SeresNet34-based U-net (C), on the other hand, overcame this problem with a higher IoU score. Nonetheless, all three models had matching performance levels detecting the field boundaries, which is our study's most critical predicting target. Figure 8 shows the satellite image segmentation map. The map illustrates a good matching of agricultural field boundary extraction with ground truth.

Rice Field Mapping Results
The simple decision-tree classifier produced a 10 m spatial resolution rice pixel map. However, the mixed pixel problem and the 'salt and pepper' effect were evident. According to the model evaluation on test data, SeresNet34 had the best overall performance for land parcel delineation, hence was used to produce the rice field map. The comparison of rice field maps ( Figure 9) produced by the decision-tree classifier and by that combined with the image segmentation map proves that the combination of the two outputs tremendously improved rice field mapping in terms of the 'salt and pepper' effect and spatial consistency. The rice field map was assessed with the 50 rice field polygons of the test set. The IoU score reached 0.953. Pixel-wise rice field mapping was evaluated with User's Accuracy (precision), Producer's Accuracy (recall), and F1 which were 0.824, 0.816, and 0.820 respectively (Table 3). Figure 10 illustrates a comparison of the two rice field mapping results on some sample rice fields from the test set. The pixel-wise rice mapping was unable to distinguish the field boundaries and mixed pixel problem occurred on and around the field boundaries. Combining the two mapping results greatly improves the rice mapping precision and the mixed pixel problem with exclusion of boundary pixels.

Rice Field Mapping Results
The simple decision-tree classifier produced a 10 m spatial resolution rice pixel map. However, the mixed pixel problem and the 'salt and pepper' effect were evident. According to the model evaluation on test data, SeresNet34 had the best overall performance for land parcel delineation, hence was used to produce the rice field map. The comparison of rice field maps ( Figure 9) produced by the decision-tree classifier and by that combined with the image segmentation map proves that the combination of the two outputs tremendously improved rice field mapping in terms of the 'salt and pepper' effect and spatial consistency. The rice field map was assessed with the 50 rice field polygons of the test set. The IoU score reached 0.953. Pixel-wise rice field mapping was evaluated with User's Accuracy (precision), Producer's Accuracy (recall), and F1 which were 0.824, 0.816, and 0.820 respectively (Table 3).      Figure 10 illustrates a comparison of the two rice field mapping results on some sample rice fields from the test set. The pixel-wise rice mapping was unable to distinguish the field boundaries and mixed pixel problem occurred on and around the field boundaries. Combining the two mapping results greatly improves the rice mapping precision and the mixed pixel problem with exclusion of boundary pixels.

Discussion
Satellite image segmentation and crop mapping are two valued areas in remote sensing application research. This study combined these two techniques to enable an OBIA on pixel-wise classification for rice field mapping. Our strategy was to use objects generated from fine-resolution image segmentation and a crop map produced by a simple pixelwise classification model to utilize multi-source remote sensing data. Several issues and implications about this methodology should be highlighted.
The satellite image segmentation process was designed for agricultural field boundary extraction. The delineated individual agricultural field relies on the connectivity of field boundaries. Therefore, the field boundary is the foremost target for segmentation performance among the three classification targets (field, field boundary, and background). Experiments on three backbone networks presented similar F1 scores in field boundary extraction, although varied in IoU score for all three categories. The speculated reason is that field boundaries form the most prominent features for the CNN model to extract in the training process, compared to the other two categories that comprise most pixels. A higher IoU score would contribute to a more precise agricultural field map.
We noted that with many trials on the different model hyperparameters, evaluations presented slight improvement with the IoU score. A significant factor that hampered the model performance was the limited training set. The training data were manually annotated with laborious work. A larger training set is anticipated to improve the image segmentation results. It should be noted that the date of the fine-resolution has a substantial visual effect on detecting field boundaries. The image acquisition date should be chosen before the planting date or at the late phase of crop growing so that field boundaries are visually contrasting against the background.
Agricultural fields were identified based on closed field boundaries. Some of the predicted field boundaries were discontinuous, leading to the merging of multiple actual fields on the field map. The impact of the delineated field boundaries on crop mapping should be noted. The agricultural land use of this study area is dominated by rice growing. The field boundaries on the ground at our study level were mainly built for agronomic instead of cadastral purposes. The merging of several neighboring fields would have an acceptable impact on rice field mapping. However, this impact should be further investigated in a study area with complex agricultural land use.
The pixel-wise classification model for rice field detection was based on rice phenology traits. The simple decision-tree model had a mediocre performance in precision and recall scores. Combining the result with agricultural field maps generated a precise and highresolution rice field map. Many more sophisticated pixel-wise crop mapping methods were developed using time series remote sensing data, e.g., Deep Neural Network [39][40][41][42], support vector machines [43][44][45], and Random Forest [46,47]. These pixel-wise crop mapping studies reported overall good results (0.85 or above in F1 score). However, mixed pixel and the 'salt and pepper' effect were inevitable in their resulting crop maps as with other pixel-wise classification tasks. On the other hand, those sophisticated pixel-wise crop mapping methods are likewise eligible to incorporate into the agricultural field map of this study to yield precise crop maps. Nonetheless, the necessity of developing those complex pixel-wise crop mapping models is debatable since the OBIA on the classified pixels was based on majority voting.

Conclusions
This study proposed a crop mapping framework that combines agricultural field boundary extraction with fine-resolution satellite images and pixel-wise crop detection with time series SAR imagery. We solved the prediction errors and conflicts near the border of the patches with a smooth blending strategy from multiple weighted predictions. An OBIA on the pixel-wise crop maps was employed based on majority voting to produce high-resolution crop maps.
Several conclusions can be drawn from the experiment results: (1) SeresNet34 as the backbone of the U-net model had the best performance in agricultural field extraction compared to a simple U-net and ResNet-based U-net; (2) A combination of agricultural field maps with a rice pixel detection model showed promising improvement in accuracy and resolution of rice mapping. The proposed model combination scheme only requires a simple pixel-wise crop detection model incorporating an OBIA to produce high-precision and high-resolution crop maps. This would potentially lower the cost of producing highresolution crop maps.  Data Availability Statement: Sentinel-1 remote sensing data available in publicly accessible repositories. Other data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.