Farmland Parcel Mapping in Mountain Areas Using Time-Series SAR Data and VHR Optical Images

: Accurate, timely, and reliable farmland mapping is a prerequisite for agricultural management and environmental assessment in mountainous areas. However, in these areas, high spatial heterogeneity and diversiﬁed planting structures together generate various small farmland parcels with irregular shapes that are difﬁcult to accurately delineate. In addition, the absence of optical data caused by the cloudy and rainy climate impedes the use of time-series optical data to distinguish farmland from other land use types. Automatic delineation of farmland parcels in mountain areas is still a very difﬁcult task. This paper proposes an innovative precise farmland parcel extraction approach supported by very high resolution(VHR) optical image and time series synthetic aperture radar(SAR) data. Firstly, Google satellite imagery with a spatial resolution of 0.55 m was used for delineating the boundaries of ground parcel objects in mountainous areas by a hierarchical extraction scheme. This scheme divides farmland into four types based on the morphological features presented in optical imagery, and designs different extraction models to produce each farmland type, respectively. The potential farmland parcel distribution map is then obtained by the layered recombination of these four farmland types. Subsequently, the time proﬁle of each parcel in this map was constructed by ﬁve radar variables from the Sentinel-1A dataset, and the time-series classiﬁcation method was used to distinguish farmland parcels from other types. An experiment was carried out in the north of Guiyang City, Guizhou Province, Southwest China. The result shows that, the producer’s accuracy of farmland parcels obtained by the hierarchical scheme is increased by 7.39% to 96.38% compared with that without this scheme, and the time-series classiﬁcation method produces an accuracy of 80.83% to further obtain the ﬁnal overall accuracy of 96.05% for the farmland parcel maps, showing a good performance. In addition, through visual inspection, this method has a better suppression effect on background noise in mountainous areas, and the extracted farmland parcels are closer to the actual distribution of the ground farmland.


Introduction
The spatial distribution information of farmland is essential for promoting agricultural production and policy making, and has been widely used in various agricultural applications, such as yield estimation, ecological environment assessment, and food risk analysis [1][2][3]. In mountainous areas, due to the impact of population, environment, and soil fertility, smallholder farms are the most common form of agriculture and provide food for the majority of the population [4]. The farmland spatial distribution map of smallholder farms plays a critical role in mountainous agricultural management and food security. However, smallholde farmers often cause uncertainties in the use of farmland and varies from year to year because of various factors (such as economic shocks and land use policies) [5]. Therefore, the spatial distribution maps of farmland are not available in most mountainous areas, which seriously hinders the improvement of mountainous agricultural production [4]. The traditional statistical methods of agricultural information that almost rely on field surveys are severely restricted in terms of economic efficiency, representativeness, and reliability. Remote sensing technology can continuously and quickly obtain ground information over large areas at low cost, which provides the possibility of systematically mapping farmland distribution at regional and global scales.
With the popularity of accessible satellite data, many studies have achieved certain results in the use of remote sensing data to extract farmland spatial distribution information [6][7][8][9][10][11]. Moreover, with the increase of the spatial resolution, the basic unit of farmland mapping is also converted from pixel to object [12]. Compared with the pixel-oriented method, the object-oriented method have obvious advantages in extracting ground information from high-resolution remote sensing images [13][14][15]. However, precise farmland spatial distribution mapping in complex scenes is still a quite challenging task, which mainly includes the following two aspects: 1. The high heterogeneity in mountainous areas produces a host of small farmland parcels with irregular shapes. These parcels with multiple crops construct a complex planting structure, which leads to the difficulty of obtaining accurate boundaries of the ground objects. 2. Universal cloud coverage in mountainous areas, caused by rainy and hot weather during the same period, results in the absence of opportune optical data, which makes it hardly to distinguish farmland parcels from other parcel types.
Regarding the first issue, recent remote sensing satellite sensors have provided new possibilities for monitoring small farmland parcels in mountainous areas [16]. For example, the very high resolution (VHR) images, with a spatial resolution of 1 m or finer, can more clearly describe the morphological features of image objects, and easily distinguish the boundaries and textures of farmland parcels, which are hard to obtain from high to medium resolution satellite images. However, the substantial content and complex form of VHR images in mountainous areas also bring a new challenge to the mapping of farmland spatial distribution.
In recent years, deep learning technology has made great progress in the field of computer vision. In particular, the convolutional neural networks (CNNs), with the ability to learn and process a hierarchy of spatial features, have shown a remarkable ability in image classification and semantic segmentation [17][18][19]. Recently, CNNs have been applied to remote sensing tasks, such as land cover/land use(LCLU) classification [20][21][22][23][24][25][26], object detection [27][28][29], and scene classification [30,31]. The CNNs are able to learn high-level abstract features from the raw pixels of satellite images, and achieve remarkable performance in VHR images classification [32]. Various CNN architectures have been investigated for LCLU classification in simple remote sensing scenes. However, for complex scenes, these investigations, which focus on sample optimization or model improvement, always intend to use only one model or method to achieve the extraction of all target image objects in VHR remote sensing images. With the increasing complexity of the model structure adjustment, it also brings a burden to the production of a large number of samples, and the results achieved so far are still relatively limited [33].
To deal with complex scenes, a hierarchical method based on a divided and stratified strategy is introduced into satellite image information extraction. Wu and Li [34] made use of such a strategy for crop acreage estimation in complex agricultural landscapes. Their method significantly improves the estimation accuracy and, at the same time, reduces the field sampling cost. Fenske et al. [35] proposed a 3-level hierarchical classification method to obtain heathland habitats, and their study showed that the hierarchical classification approach can distinguish ambiguous target objects from VHR images in complex landscapes. Haest et al. [36] also found that a hierarchical classification is beneficial to the classification of remote sensing images. Therefore, under the guidance of the divided and stratified strategy, the farmland extraction method is able to obtain precise parcel objects from VHR optical imagery without a host of model improvements and sample making tasks.
Regarding the second issue, it is worth noting that there are many farmland-like parcels that are similar to the visual features of farmland in VHR images (such as idle farmland, wasteland, and grassland). Especially with the hierarchical method, these farmland-like parcels are often extracted as farmland by CNNs. Past studies indicate that most crops exhibit a unique phenological pattern. Therefore, time-series satellite datasets covering the entire growing season of crops can be used to develop features for mapping crop distribution [1]. Nevertheless, for mountainous areas, cloudy and rainy weather leads to a lack of optical data, and it is impossible to collect enough cloud-free optical images to construct the time-series features of crops to remove these farmland-like parcels [37].
Instead of optical sensors, Synthetic Aperture Radar (SAR) is an active imaging technology that transmits and receives electromagnetic waves in the microwave range and can penetrate clouds. With the advantages of all-weather data acquisition, SAR has shown great potential for remote sensing monitoring in cloudy and rainy areas [38,39]. The SAR backscattering intensity is affected by multiple factors such as vegetation structure and ground surface properties, mainly including direct backscattering from the vegetation canopy, ground backscattering weakened by the vegetation canopy, and backscattering from the interaction between vegetation and the ground [40]. In particular, the structure of the vegetation canopy plays an important role in the influence factors of SAR backscattering intensity changes [41]. Therefore, due to the substantial changes in crop canopy structure at different growth stages, SAR time series data shows a great potential to map the farmland distribution [42][43][44]. However, in the mountainous areas, the complex agricultural landscapes, composed of multiple crops where inter-cropping is common, have brought a new challenge to time series classification. The Recurrent Neural Networks (RNNs) [45] are a mature machine learning technology, which has shown good performance in different fields such as speech recognition, signal processing, and natural language processing [46][47][48]. In the remote sensing field, recent studies have shown great potential for crop classification using RNN in satellite image time series [49][50][51]. Unlike convolutional neural networks, RNNs can explicitly manage the temporal data dependencies, and are suitable for mining time series features of multitemporal datasets [52], such as time series SAR data for farmland.
In this paper, we propose a precise farmland parcel extraction method in complex scenes (i.e., mountains of Southwest China) by incorporating VHR optical images and time series SAR data. VHR Google satellite optical imagery with a spatial resolution of 0.55 m was used to delineate the precise parcel object distribution map, and Sentinel-1A (S1A) SAR datasets provided the backscattering features to construct the temporal behaviors of parcel objects to further distinguish farmland parcels from other types. Furthermore, a CNN-based hierarchical extraction scheme was developed for parcel extraction in complex scenes, and an LSTM-based classification method was used to learn time-series SAR features of these parcels. This study is designed to solve a host of difficulties, such as landscape fragmentation, complex planting structures, and cloudy and rainy weather in mountainous areas, and achieved the following objectives: • By using CNN technology, a hierarchical extraction scheme based on VHR optical images was developed to obtain an accurate image object distribution map of complex scenes such as mountainous areas.

•
Combined with the LSTM networks, the potential of multiple variables of time-series SAR for farmland parcel identification is fully explored in cloudy and rainy areas.
We conducted an experiment in the north of Guiyang City, Guizhou Province, Southwestern China. This experiment focuses on developing an automatic method for precise farmland parcel mapping in complex landscapes, and the results validate the effectiveness and feasibility of this method.

Study Site
To evaluate the proposed method, a typical karst hilly region in Southwest China was selected as the experimental area. As illustrated in Figure 1, this study area is located in the north of Guiyang City, Guizhou Province, China, with an area of about 794 km 2 (center latitude and longitude is 27 • 9 59 N, 106 • 42 24 E). In this region of hilly lands, the terrain is low in the north and high in the south, with an altitude ranging from 600 to 1800 m. This area is mainly covered by farmland, woodland, grassland, buildings, and water, which together constitutes a complex and diverse agricultural landscape. Due to the mild climate, abundant precipitation and great variations in elevation, it is suitable for the growth of various agricultural crops, such as rice, rape, soybeans, and corn, and some scattered farmlands grow various vegetables such as cucumber, pepper, and cabbage. According to local planting practices, corn is planted from April to May and harvested from September to October, while rice and soybean are planted from May to June and harvested from September to October. As the main winter crop in this region, rape is planted from November to October and harvested from April to May. In addition, various vegetables are planted all year round. In this area, cloudy and rainy weather throughout the year made it difficult to obtain sufficient cloudless optical image data. Therefore, how the limited optical data and time series SAR data can be used for farmland mapping in mountainous areas is an urgent problem to be solved.

Guizhou Province
Guiyang Study area

Remote Sensing Data
It is hard to obtain fully covered VHR optical images for a cloudy and rainy study area. Therefore, in this study, Google satellite images with a spatial resolution of 0.55 m was downloaded and employed for delineating the boundaries of ground objects ( Figure 1). This image has three spectral bands that provide detailed spatial and spectral information and clearly describe the land cover information. S1A SAR of European Space Agency (ESA) provides C-band radar images at a center frequency of 5.405 GHz. The interferometric wide swath (IW) is the default acquisition mode over land, with an available dual polarization (VV + VH and HH + HV) operation mode. The geometric resolution is 5 m by 20 m with a large swath width of 250 km. The dataset used in this study consists of a time series of 30 S1A images (Table 1) between January 2019 and December 2019, with a time interval of 12 days. All the radar imagery we used was acquired in TOPS mode with dual-polarization (VV + VH) from the Copernicus Open Access Hub as Single Look Complex products collected in ascending passes.

Field Sampling Data
Reference data is necessary to generate a land cover map, especially a farmland distribution map. For classifier calibration and output validation, a field survey was conducted in May 2019 to collect sampling points, which present a detailed description of the land cover and locations, using GPS-enabled tablets. We subsequently delineated the reference parcels for which the sampling points were collected. The boundaries of these parcels were delineated by manual visual interpretation using the Google satellite imagery mentioned in Section 2.2 as a reference, and the land cover type of each parcel is consistent with its internal sampling points. Additional samples of underrepresented classes are also supplemented by this image. We finally obtained 913 sample parcel objects (Table 2), including farmland (such as rice, rape, soybeans, corn, and vegetables) and non-farmland (such as woodland, grassland, urban areas, and water.).

Method
In this study, we propose a method for precise farmland parcel mapping in mountainous areas. The proposed method architecture is elaborated in this section, as shown in Figure 2, which mainly includes three parts: 1. hierarchical extraction schemes using VHR optical images, consisting of a farmland parcel classification system and CNN-based farmland parcel extraction; 2. time-series SAR features extraction from the S1A dataset; 3. farmland/non-farmland classification using time series SAR data, consisting of parcel-level time-series feature construction and LSTM-based classification.

Farmland Classification System Based on Geographical Divisions
Due to the distinguishing topographic conditions in the mountainous areas, the farmland objects present different spatial morphological characteristics, which also leads to the complexity and difficulty of state-of-the-art farmland parcel extraction methods based on deep learning technology. Therefore, a hierarchical farmland classification system was designed for this study, by the visual characteristics of the farmland presented in VHR images. This classification system consists of two levels (Table 3), divides the mountain area into three main geographic sub-regions, and produces four main farmland types distributed in different sub-regions, respectively. Subsequently, we separately conducted the four types of farmland extraction using VHR optical images from Google Earth, which is shown in detail in Section 3.1.2. Table 3. The farmland classification system.

Level 1 Geographic Area Level 2 Farmland Type Features
Plain area Regular farmland Regular shape, clear boundaries, uniform internal texture, and neat spatial distribution.
Hillside area Terraced farmland Small and narrow shape, clear boundaries, uniform internal texture, and dense spatial distribution.
Slope farmland Various shapes, fuzzy boundaries, rough but obvious texture, and mixed with trees and grass.
Forest area Woodland farmland Various shapes, fuzzy boundaries, grass-like texture, and scattered between woodland and grassland.
In the first level, the entire study area was divided into three geographic sub-regions based on topographical characteristics, namely, plain areas, hillside areas, and forest areas. Due to the relatively consistent geographic conditions, the farmland distributed in each geographical sub-region has similar morphological features on VHR images. The second level aims to divide farmland into four farmland types in different geographical sub-regions, namely regular farmland in plain areas, terraced farmland and slope farmland in hillside areas, and woodland farmland in forest areas, as shown in Figure 3. Among them, woodland farmland represents the farmland with a low degree of reclamation among forests. This carelessly planted farmland, which is scattered among the forest and grass, is very difficult to identify. Therefore, this paper classifies the woodland farmland type as a separate category for extraction. This hierarchical extraction method allows the complex mountain landscape to be gradually divided into relatively uniform geographic units (farmland parcels), in which finer farmland can be distinguished. The applicability of this farmland extraction method can also be evaluated in different regions. In addition, this method can flexibly use a single-level map (such as slope farmland) or combine maps from multiple levels (such as regular farmland and terraced farmland) to meet other specific needs.

Parcel-Stratified Extraction Based on CNNs
The CNN was used as a classifier because of its representation learning ability, which was used to identify precise target objects from VHR images. However, the performance of existing models is hindered by the complex visual features of the farmland, as well as the substantial VHR image content, which has led to a host of sample production and complex model improvements tasks. Based on the analysis of the visual characteristics of the four farmland types mentioned in Section 3.1.1, a farmland parcel-stratified extraction process was designed, as follows: 1. The deep learning algorithm was selected according to the unique features of target objects. In this study, the Richer Convolutional Features(RCF) model [53] was used for the extraction of parcels with clear boundaries (regular farmland and terraced farmland), and the D_LinkNet model [54] was used to obtain texture-based parcels (slope farmland and woodland farmland). Since we focus on the effect of this hierarchical extraction scheme in complex landscapes, other studies on the improvement of deep learning models for the characteristics of target objects are not discussed in this paper. 2. CNN training requires a relatively large amount of labelled data. The training data for four farmland types were cropped from the VHR satellite images, where the sampled data are not available, and manually delineated through visual-interpretation. The samples of regular farmland and terraced farmland were used to train an edge model, while the samples of slope farmland and woodland farmland were used to train two different texture models, respectively. Through the iterative training method, four farmland types parcel extraction models were respectively generated based on these samples. 3. Due to the higher similarity between regular farmland and terraced farmland, these farmlands were first extracted together. We then used these results as a mask to extract slope farmland in the remaining area. All the obtained parcels were subsequently used together as a mask, and the woodland farmland was finally obtained to ensure that as much farmland as possible was obtained. Finally, through sample addition and iterative training methods, the missing and wrong areas were supplemented and corrected to improve the final map.
To facilitate complete farmland parcel delineations over the study area, all farmland-like parcels were considered in this parcel map, which will be optimized in subsequent farmland/non-farmland classification method.

Time-Series SAR Feature Extraction
In this study, the S1A dataset was used to generate time-series SAR features. The role of linear polarization backscatter intensity and polarimetric variables were investigated for their effects, and five variables were obtained from each SAR images, consisting of two linear polarizations parameters (VV/VH) and three decomposition parameters (entropy, anisotropy, and alpha angle). The linear polarization intensity value indicates the change in the total energy reflected by the ground. Polarization decomposition obtains the polarization parameters that can characterize the structural features and scattering mechanism of ground objects by deconstructing the polarized SAR signal. The H-A-α decomposition method provided by the Snap platform, derived from the Cloude-Pottier decomposition method [55], was used to process each S1A image and generate three parameters: entropy (H), anisotropy (A), and alpha angle (α). Entropy represents the randomness of scattering, with a range from 0 to 1. Anisotropy reveals the relative importance of the secondary scattering mechanism, and this value is a measure of the intensity of scattering, ranging from 0 to 1. The angle alpha indicates the main scattering mechanism, with a range of 0 to 90 • , such as surface scattering (α lower than 40 • ), volume or dipole scattering (α about 45 • ), and double rebound scattering (α more than 50 • ) [56,57]. We firstly obtained the backscatter intensity value (dB) for both polarizations (VV, VH), mainly including five steps: Apply-Orbit-File, Calibration, Multilooking, Speckle-Filter, and Terrain-Correction. In order to further analyze the impact of the variables related to the scattering mechanism, we subsequently calculated the polarization decomposition parameters, which mainly include the following seven steps: Apply-Orbit-File, Calibration, Polarimetric-Matrices, Multilooking, Terrain-Correction, Polarimetric-Speckle-Filter, and Polarimetric-Decomposition. In this process, high-precision orbit data and 30 m SRTM-1sec DEM data provide precise geographic location information so that the final result has a high geometric accuracy and can be matched with the Google image mentioned in Figure 1 without manual correction. All this preprocessing of the S1A SAR data was completed in ESA snap software.

Farmland/Non-Farmland Classification Using Time Series SAR Data
With the obtained results, which include farmland-like parcels, in Section 3.1, the next step in this methodology was to construct temporal features to distinguish farmland parcels from other types. These features (VV, VH, alpha angle, anisotropy, and entropy) were generated from the processed S1A dataset. Hence, for the automatic mapping of precise farmland parcels, we need a reliable method to identify the types of farmland/non-farmland parcels by investigating the farmland throughout the year.

Time Series Features Construction
Recent studies have shown that the linear SAR backscatter value of farmland has an obvious change pattern, which is related to the type of plant [58]. In addition, due to the ability to characterize the structural characteristics and scattering mechanism of ground targets, there are also obvious change patterns in polarization decomposition parameters, which are affected by the plant structure and growth [59]. Therefore, this study investigates the farmland classification using the linear backscattering intensity and polarization decomposition parameters. For each SAR image, the mean value of pixels inside a parcel is considered as the feature value of that parcel. We finally obtained five time series curves for each parcel from the S1A dataset, consisting of two linear polarizations (VV, VH) and three H-A-α decomposition parameters (H, A, and α). This method of calculating the mean value of pixels also helps to remove the speckle effect of the SAR data. The reference data (Section 2.3) is also processed in this way to generate training and validation samples .

Classification Model Based on LSTM
RNNs can only have short-term memory due to the problem of gradient disappearance or explosion. As one of the best performing RNN units, long-term memory (LSTM) can learn long-term dependence and express the characteristics of sequence changes, which is widely used in time series analysis [60]. Hence, we choose the LSTM unit as the recursive unit in our research. Formulas (1)- (6) formally describe the LSTM neuron [50,52].
The LSTM unit consists of two cell states-the memory c t and the hidden state h t -and three different gates-the input(i t ), the forget( f t ), and the output (o t ), which are used to control the flow of information. These three gates combine the current input x t with the hidden state h t−1 of the previous step and filter the information in this process to solve the problem of gradient disappearance/explosion. These gates are implemented by a sigmoid, which gives values between 0 and 1. A new content memory cell y t is also been used to adjust the current input, which is implemented by a hyperbolic tangent function. As shown in Equations (1) and (2), the input i t specifies how much information needs to be retained in the memory cell, and the forget gate f t decides the extent to which the existing stored content is forgotten, as shown in Equation (3). Equation (4) indicates that the memory state c t can be updated by the input (i t ), the forget ( f t ), and the last memory state c t−1 . Finally, the output gate o t determines how much current memory content can be output to the next step and impacts the new hidden state (h t ), which is exported as shown in Equations (5) and (6). The memory c t and the hidden state h t ) are both used as part of the input at the next step.
where W denote weight matrices. b o is the deviation coefficient of the model. In this article, we determined the optimal parameter values in LSTM by an experimental method, the detailed experiment as shown in Section 4.5. Eventually, a two-layer LSTM network with 18 hidden neurons was used for time-series sequence classification, as shown in Figure 4.

Accuracy Assessment
We used the overall accuracy (OA), producer's accuracy (PA), user's accuracy (UA), and F1-score to evaluate and compare the accuracies of the farmland parcel maps, which were obtained by the hierarchical extraction scheme and the farmland/non-farmland classification method (Stage 2 and 3 in Table 4) : where TP, TN, FP, and FN represent the true positive, true negative, false positive, and false negative pixels in the parcel extraction results, and are calculated by comparing the extracted data with the reference data at each of the sampled pixels .
In addition, we further evaluated the performance of LSTM-based classifier ( Table 5, Figures 8 and 9), where TP, TN, FP, and FN represent the true positive, true negative, false positive, and false negative parcels in the classification results , and are computed by comparing the classified parcels with the sampled parcels. Since the sample dataset is relatively small, we performed a 10-fold cross-validation on the sample dataset for each classification. The 913 sampled parcels were divided into ten equal subsets, and one of them was used as the validation dataset and the others were used as the training dataset in each fold. Moreover, each subset was selected using spatial partitions, so as to make sure that training and validation datasets are taken from different areas. It is also noted that we choose each subset according to the approximate proportion of the number of categories in the field sampling data. All classification accuracies were averaged across the ten validations. Figure 5 shows the final farmland parcel distribution map of the study area. This paper final obtained 156,965 farmland parcels in total, with an average area of 0.0935 ha. The area of the parcel is mainly between 0.01 ha and 0.5 ha, accounting for 90%. This also shows the characteristics of small and broken farmland parcels in mountainous areas. The method proposed in this paper mainly includes two steps: a hierarchical extraction scheme using VHR optical imagery and a farmland classification using a time series SAR dataset. Therefore, we divided the extraction process into three stages, Stages 1, 2, and 3, to evaluate the enhancement of these methods on the mapping results, as shown in Figure 6. Stage 1 obtains farmland parcels by only an edge model (EM), which is shown in Figure 6(a2,b2) (apple dust parcels). Stage 2 refers to the use of a texture model (TM) on the basis of Stage 1 to obtain slope farmland and woodland farmland (blue parcels in Figure 6(a2,b2)). These two stages together constitute the hierarchical extraction scheme, and the result is displayed in Figure 6(a3,b3). In the final stage, the farmland classification method based on LSTM is used to eliminate non-farmland to further optimize the obtained results before. The accuracies of these stages are shown in Table 4 . Figure 6. Detailed process of the proposed method. (a1,b1) are the sub-regions of the study area; (a2,b2) show farmland parcels extracted using different models (Stage 1); (a3,b3) are the results obtained by the hierarchical scheme using VHR images (Stage 2); (a4,b4) demonstrate the ability to identify non-farmland parcels using SAR temporal features (Stage 3).

Hierarchical Extraction Results
In this part, we focus on the delineation of farmland parcels and ensure that all farmland parcels in the study area are included in the obtained results. Hence, the producer's accuracy (PA) of farmland extraction is the indicator we pay attention to in this step, because it represents the error of omission. As shown in Table 4, the overall accuracy between the two stages does not differ significantly (87.47% and 87.56%), which is due to a host of non-farmland types in the results of Stage 2. However, the hierarchical extraction scheme (Stage 2) resulted in an extremely higher PA of farmland extraction, reaching 96.38%, while the PA of Stage 1 was only 88.99%, which is conducive to the farmland type identification in the next stage.

Time Series Feature Construction Results
We established the annual time series curves of farmland, forestland, grassland, water, and urban areas from the S1A image datasets related to the ground truth pixels in the field sampled parcels, including VV (dB), VH (dB), entropy, anisotropy, and alpha angle (α). Figure 7 shows the constructed temporal profiles of the farmland class and four non-farmland classes per metric. D O Y f a r m l a n d f o r e s t l a n d g r a s s l a n d w a t e r u r b a n Figure 7. Temporal curves of SAR parameters for VV, VH, alpha, anisotropy, and entropy of five classes.
By comparing the temporal behavior of farmland and other types, we observe that farmland parcels have significantly different temporal behaviors in terms of entropy, alpha, and anisotropy. The time series curves from DOY 4 to DOY 160 and from DOY 292 to DOY 352 are especially distinguishable. However, it is difficult to distinguish farmland from woodland and grassland in the temporal curves during the middle of the year (DOY 160 to DOY 280). For VV and VH, the time behavior of farmland is very easy to distinguish from that of water and buildings. Yet the VH time series curves of farmland and grassland have similar time curves between DOY 4 and DOY 100. For the VV signal, forestland and farmland have similar time series curves throughout the year.

Farmland/Non-Farmland Classification Results
As shown in a4 and b4 of Figure 6, in Stage 3, we implemented a time series classification method based on LSTM to distinguish farmland/non-farmland types, so as to obtain an accurate farmland distribution map. Five radar parameters from the 30 S1A image datasets were used to assess the capability of time series SAR data for identifying farmland parcels. We obtained the classification accuracy about VV, VH, Entropy, anisotropy, and alpha angle time series, including UA, PA, OA, and F1-scores. Table 5 shows that an overall accuracy of 80.83% was obtained when performing 10-fold cross-validation with five variables. Farmland parcels obtained an especially high accuracy, with a producer's accuracy of 89.02%, a user's accuracy of 78.80%, and an F1 score of 0.8360. Therefore, in stage 3, non-farmland parcels are identified and removed using this LSTM-based classification method, resulting in an very high overall accuracy of 96.05% in the final farmland parcels map (Table 4). Table 5. Classification performance using different variables.

Variables
Farmland To investigate the impact of variables on classification accuracy, we subsequently compared the accuracy of classification using different variables. For a single variable, alpha has the highest accuracy among all variables, with an overall accuracy of 0.7689. The linear polarization intensity and polarization decomposition parameters represent the different characteristics of ground objects. When only using polarization decomposition variables, the overall accuracy is 0.7776, and the F1-score of the farmlands is 0.8146. In comparison, the overall accuracy and F1-score of only using linear polarization are increased by 1.42% and 1.14%, respectively, reaching 0.7918 and 0.8260. This result demonstrates the contribution of linear polarization variables to farmland extraction.

Parameter Setting in the LSTM-Based Classification Model
The number of neurons in the hidden layer and network layers are important parameters of LSTM, which have a great impact on the classification performance. Figure 8 shows how the overall accuracy of LSTM classification varies with the number of neurons in the hidden layer and the number of network layers. The number of neurons range from 2 to 80 with a step of 2, and the number of network layers range from 1 to 5 with a step of 1. Figure 8 also shows the standard deviation, which represents the stability of the model classification and is represented in the box plot in this figure.
For the current limited data, a network that is too complex or too simple can easily lead to overfitting or underfitting. Hence, after extensive training and comprehensive consideration of the accuracy and stability of classification, we set up the optimal network parameter configuration, with 18 neurons and 2 network layers. In order to get the best time series combination, we evaluated the classification accuracy of different time series, as shown in Figure 9. The growth of the OA indicates that the added period data is conducive to the farmland classification. The overall classification accuracy increases as the number of time-series images increases, reaching the highest accuracy at DOY 172. In the middle of the year (between DOY 172 and DOY 268), the OA enters a stable state before DOY 268). This change is consistent with the analysis in Section 4.3. The period between DOY 172 and DOY 268) represents the season of plant growth, during which the characteristics of crops change more obviously. However, the rainy weather in the area and the careless planting lead to difficulty in distinguishing the wasteland from the farmland, so the time series features of this period fails to improve the classification accuracy. The harvest of the main crops caused the curve to drop rapidly at DOY 292 and then gradually increased. Therefore, the relatively dry period is more conducive to identifying the types of farmland parcels, which is also the main planting season of rape, that is, the time series after DOY 292 in the previous year and before DOY 172 in the following year. . The relationship between classification performance and time series SAR data. The horizontal axis represents the time series data combination from the fourth day to that time.

Discussion
In this paper, we propose a novel method to map precise farmland distribution in mountainous areas. The method has the following characteristics: • The advantages of VHR optical data and time series SAR data are combined. VHR Google satellite optical imagery is used for obtaining basic parcels for farmland mapping, and the S1A SAR dataset provides time-series features for identifying the farmland parcels.

•
The hierarchical extraction scheme based on a divided and stratified strategy greatly reduces the difficulty of obtaining farmland parcels in complex scenes. By dividing the ground objects layer by layer, these objects are extracted at a relatively simple level, so the design and training of the model can be conducted at a lower cost. It is worth noting that the distinguishability between each extracted object class affects the extraction accuracy. Low distinguishability will cause the same object to be extracted by different models, which affects the accuracy of the results. This is also the reason why the overall accuracy of Stage 2 shown in Table 4 has not increased compared to Stage 1. However, in this step, we focus on obtaining the parcel objects containing all the farmland and identifying them through subsequent time series classification. Therefore, this problem does not affect the results of the proposed method.

•
The CNN-based parcel extraction method is different from the previous method of object segmentation, and the obtained parcel objects more closely matches the shape of the ground objects.

•
Recent studies [56,61] have shown great potential of polarization decomposition variables for crop classification. However, in this study, linear polarization variables produced a higher accuracy. This is because, in complex agricultural landscapes, such as mountainous areas, consisting of various types of crops, polarization decomposition variables that reflect the different scattering mechanisms of land covers are more suitable than linear polarization variables for the identification of high biomass crops, but do not have an advantage in the identification of agricultural landscapes with various crops. • Compared with the pixel-level method, this paper assigned the mean value of all pixels in the parcel as the feature value of this parcel, thereby reducing the influence of SAR speckle noise.
Through the experiment in the southwest mountainous areas of China, the proposed method in this paper shows excellent performance for the farmland extraction in complex scenes. However, there are still some problems in this article, which can be improved in the future.

•
In this experiment, farmland types are classified by only visual features, leading to the acquisition of other types of parcels. We should consider using other geographic element features (such as elevation, slope, and water content) to improve discrimination for more detailed classifications. As type discrimination increases, more accurate and complete extraction results can be obtained.

•
There are some extremely fragmented and small farmland parcels in mountainous areas that are hard to extract, and the temporal behavior of these parcels is also difficult to construct from the S1A dataset. Therefore, the parcel extraction model needs to be further optimized to improve the acquisition ability of small boundaries, and higher spatial resolution SAR data should be obtained to generate the time profiles of these parcels.

•
In smallholder agricultural areas, the crop temporal behaviors vary from field to field due to the different types of crops and planting time, which leads to the offset of the time series curve of farmland that the model fits, thus reducing the classification accuracy.

•
We also need to obtain more features of farmland parcels from multi-source data, so as to identify farmland in a higher dimensional space and achieve greater accuracy.

Conclusions
The complex landscape conditions and the absence of optical data in mountainous areas severely hinder the mapping of precision farmland parcels. In this study, we proposed a new automatic mapping approach for delineating farmland parcels in mountainous areas. The main purpose of this paper is to establish a multi-step extraction framework that integrates the superiority of VHR optical images, time series SAR data, and deep learning technology for obtaining precise parcel objects in complex scenes. The experiment conducted in Southwest China using sub-meter Google satellite imagery and 30 Sentinel-1A images shows the excellent performance of this method.
Through the hierarchical extraction scheme, CNNs can successfully delineate the precise boundaries of ground parcels from VHR images in mountainous areas. By the verification of visual inspection, the delineations of the obtained parcels are close to the level of human visual interpretation. The good PA of farmland parcels (>95%) also indicates the remarkable ability of this method for obtaining precise ground objects in complex scenes. These parcels, as the processing units of the time series classification, provide an important basis for mapping farmland distribution. In addition, with the LSTM-based time series classification method, we have noticed the beneficial value of SAR backscattering variables for farmland identification in cloudy and rainy areas (OA > 80%). Linear polarization variables, compared to the polarization decomposition variables, are more suitable for the farmland classification under the conditions of complex planting structures in mountainous areas. Moreover, this method converts the research object from pixels to parcels, which greatly reduces the influence of SAR data speckle noise. These study results show that the proposed method can also be used for the identification of other land types under complex landscape conditions and can be extended to any other region. However, there are still some aspects that need to be further explored.
In future research, we will explore how more discriminating classification rules can be introduced, and more targeted parcel extraction algorithms, by combining other edge and shape detection methods, will be designed. In addition, other indicators from different data sources, such as multi-band SAR data, may be useful in improving farmland identification accuracy.
Author Contributions: W.L. proposed the research methodology, prepared the data, performed and analyzed the experiments, and wrote the manuscript. J.W. helped conduct the experiments and analysis. J.L. outlined the research topic and acquired the funding. Z.W., Z.S. and Y.Z. made great contributions to manuscript reviewing and editing. J.C. and Y.S. assisted with data preparation and manuscript writing. N.X. and Y.Y. helped to conduct and validate the experiment. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare that there is no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

VHR
Very High Resolution SAR Synthetic Aperture Radar CNNs Convolutional Neural Networks RNNs Recurrent Neural Networks LSTM Long and Short-Term Memory